Documentación en PDF - LIEC

UNIVERSIDAD DE CÓRDOBA 

Departamento de Química Analítica 

Departamento de Informática y Análisis Numérico 

DESARROLLO DE UN LIMS Y UNA 

PLATAFORMA PARA LA AUTOMATIZACIÓN 

DE PROCESOS ANALÍTICOS CONTINUOS 

BASADOS EN LA TECNOLOGÍA ORIENTADA A 

OBJETOS. DESARROLLO Y USO DE MÉTODOS 

QUIMIOMÉTRICOS PARA EL TRATAMIENTO 

DE DATOS ESPECTROSCÓPICOS 

Manuel Urbano Cuadrado 

Córdoba, Febrero de 2005

DESARROLLO DE UN LIMS Y UNA PLATAFORMA PARA LA 

AUTOMATIZACIÓN DE PROCESOS ANALÍTICOS CONTINUOS 

BASADOS EN LA TECNOLOGÍA ORIENTADA A OBJETOS. 

DESARROLLO Y USO DE MÉTODOS QUIMIOMÉTRICOS PARA EL 

TRATAMIENTO DE DATOS ESPECTROSCÓPICOS 

Fdo. María Dolores Luque de 

Castro, 

Catedrática del Departamento de 

Química Analítica, Universidad de 

Córdoba 

Fdo. Pedro Pérez Juan, 

Doctor en Ciencias, Sección Químicas 

Por 

Manuel Urbano Cuadrado 

Los Directores, 

Fdo. Miguel Ángel Gómez-Nieto, 

Catedrático del Departamento de 

Informática y Análisis Numérico, 

Universidad de Córdoba 

Trabajo presentado para optar al grado de Doctor en Ciencias, 

Sección Químicas

María Dolores Luque de Castro, Catedrática del Departamento de Química 

Analítica de la Universidad de Córdoba, Miguel Ángel Gómez-Nieto, 

Catedrático del Departamento de Informática y Análisis Numérico, y Pedro 

Pérez Juan, Doctor en Ciencias, Sección Químicas, en calidad de Directores de 

la Tesis Doctoral presentada por el Licenciado en Ciencias Químicas, con el 

título “Desarrollo de un LIMS y una plataforma para la automatización de 

procesos analíticos continuos. Desarrollo y uso de métodos quimiométricos 

para el tratamiento de datos espectroscópicos”, 

CERTIFICAN: Que la citada Tesis Doctoral se ha realizado en los 

laboratorios del Departamento de Química Analítica y 

del Departamento de Informática y Análisis Numérico 

de la Universidad de Córdoba y que, a su juicio, reúne 

los requisitos necesarios exigidos en este tipo de 

trabajos. 

Y para que conste y surta los efectos pertinentes, 

expiden el presente certificado en Córdoba a 21 de Enero 

de 2005. 

Fdo. María Dolores Luque de Castro Fdo. Miguel Ángel Gómez-Nieto 

Fdo. Pedro Pérez Juan

Agradecimientos 

A María Dolores Luque de Castro, Miguel Ángel Gómez-Nieto y Pedro Manuel 

Pérez-Juan, directores de este trabajo, por su constante labor y por haber puesto a 

mi alcance su conocimiento y experiencia. Especialmente a María Dolores Luque 

de Castro por haberme ofrecido la oportunidad de realizar esta tesis doctoral. 

A los muchísimos amigos y compañeros del Departamento de Química 

Analítica y del Departamento de Informática y Análisis Numérico de la 

Universidad de Córdoba, que me brindaron siempre su más sincera ayuda y sin 

los cuales todo se habría retrasado mucho. 

A toda mi gente de Elche (Alicante) por el interés y cariño que siempre 

me han mostrado. 

A mis amigos Manolo y Antonio por haberme hecho llorar tantas veces 

de risa; sois “especies en extinción”, y gracias por soportar mis ausencias y 

excusas. 

A mis hermanos Rafa y Paqui, por estar ahí, por todo. A mi abuela 

Araceli, por todos sus cuidados. 

A mis padres por ser los responsables de que esté hoy donde estoy a 

través de su ejemplo y trabajo, por haber entregado siempre toda su vida por mí y 

mis hermanos. Estoy orgulloso de vosotros. 

A toda mi familia en general por tantas y tantas cosas. 

A Noelia, mi “rocheta”, por todo su amor.

Índice

XI 

Índice 

Objetivos _______________________________________________________ 1 

Introducción ____________________________________________________ 7 

1.- Introducción __________________________________________________ 9 

2.- Ingeniería del Software. Modelado orientado a objetos: el significado 

de UML y el Proceso de Desarrollo Unificado _________________________ 22 

3.- El lenguaje de programación Java ________________________________ 39 

4.- La información como recurso. Oracle. SGBD basados en el modelo de 

datos objeto-relacional ____________________________________________ 44 

5.- Quimiometría: modelos clásicos para diseñar experimentos, clasificar 

productos y predecir parámetros ____________________________________ 61 

6.- Química computacional: índices de similitud y huellas digitales ________ 80 

7.- Referencias __________________________________________________85 

Parte experimental _____________________________________________ 103 

Parte I: Informatización del proceso analítico y de la gestión y 

análisis de los datos producidos mediante el uso del paradigma 

orientado a objetos _____________________________________________ 105 

Capítulo 1. Use of object-oriented techniques for the design and 

development of standard software solutions in automation and data 

management in analytical chemistry ________________________________ 107 

Capítulo 2. An open solution for computer control of flow injection 

analyses in wine production monitoring _____________________________ 131 

Capítulo 3. Trigger-bassed concurrent control system for automating 

analytical processes _____________________________________________ 151

Manuel Urbano Cuadrado Tesis Doctoral 

Capítulo 4. Fully automated flow injection analyser for the 

determination of volatile acidity in wines ____________________________ 185 

Capítulo 5. A fully automated method for in real time determination 

of lacasse activity in wines _______________________________________ 201 

Capítulo 6. JWisWine: a Java web information system for quality 

control in wineries ______________________________________________ 217 

Parte II: Desarrollo de métodos quimiométricos y algoritmos 

para el desarrollo de modelos cualitativos y cuantitativos en 

química analítica ______________________________________________ 237 

Capítulo 7. Ultraviolet-visible spectroscopy and pattern recognition 

methods for differentiation and classification of wines __________________ 239 

Capítulo 8. Study of spectral analytical data using fingerprints 

and scaled similarity measurements ________________________________ 265 

Capítulo 9. Near infrared reflectance spectroscopy and 

multivariate analysis in enology: determination or screening 

of fifteen parameters in different types of wines _______________________ 295 

Capítulo 10. Comparison and joint use of near infrared 

spectroscopy and fourier transform mid infrared 

spectroscopy for the determination of wine parameters _________________ 315 

Discusión de los resultados ______________________________________ 333 

Conclusiones __________________________________________________ 345 

XII

XIII 

Índice 

Anexo: comunicaciones a congresos _______________________________ 351

Objetivos

Objetivos 

Las últimas tendencias en metodología y tecnología informáticas poseen las 

características idóneas para la construcción de software que supere su tradicional 

dependencia de la variabilidad de la información que exhibe y requiere su 

entorno, de la diversidad de configuraciones hardware y sistemas operativos 

disponibles, y de la forma aleatoria de desarrollar productos software. Uno de los 

grupos de investigación (Ingeniería del Software, Conocimiento y Bases de 

Datos) en los que se ha desarrollado el trabajo recogido en esta Memoria ha 

empleado metodologías como la ingeniería de sistemas y los paradigmas 

evolutivo y orientado a objetos, junto con herramientas como Oracle, Java, XML, 

etc., para el desarrollo de soluciones software en diversas áreas. Este grupo de 

investigación también ha desarrollado en los últimos años algoritmos basados en 

cálculo de similitudes para el estudio de compuestos químicos y su recuperación 

de bases de datos. 

El otro grupo (Innovaciones en Sistemas Continuos y Discontinuos para 

la Automatización de Procesos Analíticos) donde se ha desarrollado parte de la 

investigación aquí presentada ha puesto a punto un gran número de métodos 

continuos de análisis. Estos métodos han supuesto la simplificación, reducción y 

automatización de las etapas previas a la medida en el conjunto del proceso 

analítico, aportando las ventajas de eliminación de errores, reducción del tiempo 

de análisis y del consumo de reactivos, etc. El grupo también ha usado los 

métodos quimiométricos clásicos para la determinación multiparamétrica y para 

la clasificación de muestras a partir de la información espectral. 

La Memoria de tesis aquí presentada ha tenido dos objetivos genéricos 

principales, a saber: (1) ensamblar parte de los logros alcanzados por ambos 

3


grupos de investigación para llevar a cabo la informatización del proceso 

analítico y la gestión de la información que proporciona utilizando las últimas 

tendencias computacionales y, (2) usar y comparar los métodos quimiométricos 

convencionales con otros basados en el cálculo de similitudes a partir de la 

información espectral. Estos objetivos genéricos se han concretado en los 

siguientes objetivos específicos: 

1) Diseño y desarrollo de un sistema de automatización del proceso analítico 

dotado de las siguientes características: 

4 

• Abierto a cualquier tipo de análisis, con independencia de la 

instrumentación y método analítico utilizado. 

• Orientado al usuario (“el químico”) que lo utilizará, de forma que la 

traducción del procedimiento analítico a “órdenes computacionales” no 

requiera un aprendizaje especial. 

• Operativo en cualquier entorno hardware, de forma que pueda 

utilizarse con los diferentes recursos computacionales que existan en 

los laboratorios analíticos. 

• Seguro en la adquisición de los datos relacionados con el proceso de 

análisis, de forma que esta información pueda comprobarse, validarse, 

analizarse y almacenarse para su posterior tratamiento. 

2) Diseño y desarrollo de un sistema de información para la gestión y análisis 

de la información analítica generada en procesos de producción o 

elaboración. El sistema de gestión deberá considerar otros aspectos 

relevantes en el proceso analítico, tales como el instrumental con el que se 

realiza el análisis, las personas implicadas, la validación de los resultados, 

etc. Las características más destacables que deberá satisfacer el sistema son 

las siguientes:

Objetivos 

• Escalable, de forma que esté abierto a la gestión y análisis de la 

información analítica de cualquier proceso de producción y a los 

cambios que pueda sufrir la información a monitorizar. 

• Operativo en cualquier entorno hardware y sistema operativo 

disponibles en la empresa. 

• Integrado con el sistema de automatización desarrollado, de forma que 

“entienda” la información suministrada por este sistema para su 

posterior tratamiento. 

• Orientado al usuario no especializado que lo utilizará, de forma que la 

interacción con el sistema sea similar al proceso manual que se lleva a 

cabo en la empresa. 

• Seguro y robusto en el manejo de la información, de forma que la 

información almacenada no esté sujeta a pérdidas, robos o 

manipulaciones malintencionadas. 

• Rentable para la empresa, de forma que las utilidades integradas en el 

sistema aporten a los usuarios la reducción del tiempo invertido en las 

tareas diarias de gestión de la información analítica. Por otro lado, el 

sistema también debe poseer un conjunto de utilidades de análisis de la 

información almacenada para facilitar a los responsables el 

conocimiento suficiente para la correcta toma de decisiones. 

3) Diseño de nuevos modelos y algoritmos orientados a encontrar nuevas 

soluciones quimiométricas que puedan utilizarse en el análisis de la 

información espectral. Estas soluciones se compararán con los métodos 

quimiométricos convencionales. La información espectral que se usará será 

la correspondiente a las zonas ultravioleta, visible, infrarrojo cercano e 

infrarrojo medio y se aplicará en el campo enológico. La validación de los 

métodos y algoritmos desarrollados será diferente si el objetivo es 

cuantitativo o cualitativo. Para métodos cuantitativos se emplearán 

parámetros tales como el coeficiente de determinación, el error en la 

5


6 

calibración y en la validación, la correlación con el método de referencia 

usado, etc. Para métodos cualitativos, la bondad de los algoritmos será 

función del grado de agrupación de muestras y del porcentaje de clasificación 

en la validación.

Introducción

1. Introducción 

Introducción 

Uno de los pilares básicos de la sociedad actual es la información. La dimensión 

que adquieren los avances económicos, sociales y culturales deriva, en gran 

parte, de las cualidades de la información y de su accesibilidad. El diccionario de 

la Real Academia Española de la lengua define información, entre otras 

acepciones, como “comunicación o adquisición de conocimientos que permiten 

ampliar o precisar los que se poseen sobre una materia determinada”. Desde un 

punto de vista práctico, la información puede ser considerada como el resultado 

de un proceso de gestión y análisis de datos. Los datos son registros de hechos, 

observaciones, valores, etc. y constituyen la entrada al proceso. La Fig. 1 

visualiza el dinamismo implícito en el concepto de información. La importancia 

de la disponibilidad de una información de calidad es cada vez más explícita 

debido al reconocimiento de que la planificación y toma de decisiones en un ente 

dado está basada en la información a la que tiene acceso o la que ha sido capaz 

de elaborar. 

Gestión Análisis 

Datos Proceso 

Información 

Fig. 1. Proceso dinámico de síntesis de información. 

9


Diversos autores elevan a revolución el proceso experimentado por la 

sociedad como consecuencia del advenimiento de la “era de la información”. 

Así, se considera que esta revolución no sólo ha acelerado el flujo de 

información, sino que ha transformado la estructura profunda de la decisión de la 

que dependen nuestras acciones [1,2]. 

Cuando se habla de cualidad y calidad de la información se hace 

referencia a características muy genéricas y, normalmente, difíciles de 

cuantificar. Adjetivos como “relevante”, “precisa”, “completa”, y frases como 

“destinada al receptor adecuado”, “disponible en un tiempo adecuado”, 

“detallada a un nivel entendible por el usuario”, etc. [3,4], son algunos ejemplos 

del lenguaje impreciso en este ámbito. Cada área de conocimiento emplea una 

definición distinta de las cualidades de la información que, normalmente, 

consiste en un refinamiento de la genérica. Además, utiliza una serie de patrones 

de medida o evaluación para adaptar el concepto de información a su entorno y 

aplicación. La especialización de este concepto en la química analítica se expone 

a continuación. 

1.1 Información y química analítica 

Obtener información analítica apropiada es de suma importancia para la 

caracterización de materiales, seguimiento de procesos de producción, detección 

de fraudes, etc. La química analítica es una disciplina que tiene como objetivo la 

síntesis de información (bio)química de interés mediante la aplicación de 

métodos de análisis a un objeto o sistema. Malissa definió la química analítica 

como “la ciencia que hace evidente, asequible, verdadera y útil la información 

(bio)química latente, intrínseca de un objeto o sistema”. Cuando la calidad de la 

información suministrada por el proceso analítico es mala, las decisiones 

tomadas pueden generar problemas económicos-sociales. Tölg estimó, en 1984, 

que las pérdidas económicas en Alemania derivadas de la mala calidad de la 

información analítica ascendían a 6000 millones de marcos. 

10

Introducción 

La calidad de la información analítica puede considerarse desde 

diferentes puntos de vista. Teóricamente, la calidad es la conjunción de unas 

propiedades que permiten clasificar cierta información como peor, mejor o igual 

que otra. Al ser los datos los pilares que sustentan la información, la exactitud y 

representatividad de aquéllos son los dos indicadores más importantes en la 

caracterización teórica de la calidad analítica. Los indicadores se corresponden 

con las propiedades analíticas supremas (caracterizan un resultado analítico), que 

están a su vez soportadas en las propiedades básicas precisión, sensibilidad y 

selectividad (que caracterizan el proceso analítico). En el cálculo de la 

incertidumbre (la reducción de ésta) en química analítica, como en cualquier 

ciencia informativa, se usan los indicadores exactitud y precisión. 

Desde un punto de vista práctico, la calidad debe estar orientada a la 

satisfacción del usuario o cliente, adquiriendo importancia propiedades analíticas 

llamadas productivas, como la rapidez y los costes. De igual forma que el primer 

enfoque de la calidad analítica, la veracidad de la información sigue teniendo una 

importancia clave en la calidad, sólo que el grado requerido viene impuesto por 

el cliente. Así surgen los conceptos de calidad externa y calidad interna. 

El cliente y el laboratorio analítico participan de diferente forma en estos 

dos tipos de calidad. La calidad externa está asociada a la exigencia y a la 

percepción de las características de la información analítica por parte del cliente 

(así, se habla de calidad requerida y calidad percibida). Por otro lado, la calidad 

interna siempre es el producto que la política y la gestión de la calidad ofrecen a 

través del laboratorio o la organización de la que dependen. Se suele llamar 

calidad alcanzada. 

El resultado de un análisis es la consecuencia del proceso analítico, 

también llamado proceso de medida químico (PMQ), que se define como “el 

conjunto de operaciones que separa el sistema en estudio (sin muestrear, sin 

medir y sin tratar) de los resultados generados y expresados según los 

requerimientos del problema analítico planteado”. Esta definición, dada por 

11


Chalmers y Bailescu [5], es de carácter genérico y representa el proceso analítico 

como una caja negra. 

Si se analiza esta caja negra con mayor detalle se pueden distinguir 

diferentes etapas que componen el proceso analítico y el aumento del número de 

entradas a la caja, no siendo considerada la muestra como única entrada. Este 

análisis se plasma en la Fig. 2. 

Sistema en 

Estudio 

12 

Operaciones 

previas 

Proceso Analítico 

Detección 

Adquisición y 

tratamiento 

de datos 

Información requerida Metodología de medida Herramientas 

Fig. 2. Esquema del proceso analítico. 

Resultado 

Instrumentación 

Reactivos 

Normas y guías 

INFORMÁTICA 

La primera etapa del proceso la componen las operaciones previas 

encaminadas a la preparación de la muestra para la medición. La etapa 

intermedia es la detección, en la que se ponen de manifiesto las características 

físico-químicas de los analitos. Por último, la adquisición y el tratamiento de 

datos para la generación del resultado analítico completan el proceso analítico. 

En cuanto a las diferentes entradas al proceso analítico, la información 

requerida, la metodología de medida y las herramientas también han de tenerse 

en cuenta, además del sistema a estudiar, que proporcionará la muestra, tal como 

aparece en la Fig. 2. Estas entradas actúan a modo de condicionantes del proceso 

analítico, afectando por tanto a sus propiedades analíticas.

Introducción 

La importancia de la información requerida ya ha sido comentada 

anteriormente al hablar de la calidad analítica. Viene impuesta por las personas u 

organizaciones que van a utilizarla con diversos fines. El papel de la metodología 

de medida también es de suma relevancia debido a que su desarrollo tiene como 

fin extraer del sistema en estudio la información con unas propiedades idóneas. 

Por último, los diferentes tipos de herramientas constituyen los pilares técnicos y 

materiales que sustentan el proceso, haciendo posible que se pueda llevar a cabo. 

Las herramientas son los aparatos e instrumentos empleados para la 

preparación de la muestra y para la medición de la señal de interés, 

respectivamente, además de los reactivos, las normas de calidad de los 

laboratorios analíticos, y las aplicaciones informáticas. Este último tipo de 

herramientas es el utilizado en esta Tesis, ya que la investigación desarrollada es 

un intento tanto de introducir la tecnología informática más reciente en las 

diferentes partes del proceso analítico, como de evaluar los efectos de estas 

últimas tendencias en la química analítica. 

1.2 La informática y sus usos en química analítica 

La Real Academia Española de la Lengua define la informática como “el 

conjunto de conocimientos científicos y técnicas que hacen posible el tratamiento 

automático de la información por medio de ordenadores”. 

A pesar de que el hombre siempre ha buscado la forma de sustituir su 

participación en tareas de rutina (el desarrollo de las máquinas ha seguido un 

curso paralelo a la historia del ser humano), la implantación de sistemas 

eficientes para la automatización de la capacidad de razonar y tratar la 

información empezó a ver la luz a mediados del siglo pasado con el 

advenimiento de la primera generación de computadoras [6]. La carencia de la 

tecnología apropiada impidió hasta entonces materializar modelos formales de 

cálculo, que ya surgieron en la Grecia del siglo IV a. C. y que continuaron hasta 

13


la descripción de la máquina teórica por el matemático británico Alan Turing en 

1936 [7]. 

A partir de la primera generación de ordenadores, su evolución ha 

seguido un ritmo vertiginoso hasta el punto de que se habla de hasta cuatro 

generaciones de ordenadores [8]. En estas cuatro etapas la caducidad y aparición 

de conceptos relacionados con la tecnología, estructura, sistema operativo, 

lenguaje de programación, base de datos, etc., ha sido una constante, de forma 

que términos como “trabajos por lote”, “acceso secuencial a registros”, “tarjetas 

perforadas”, etc., resultan ya lejanos en el desarrollo de la computación; mientras 

que conceptos como “multiprogramación”, “base de datos relacional”, “clusters”, 

“Java”, “Internet”, etc., tienen un diferente grado de cercanía a la informática de 

nuestros días. 

Los programas informáticos, formados por un conjunto de datos, 

instrucciones y operaciones aritméticas y lógicas, han encontrado un uso extenso 

en todas las áreas sociales. Así, no es posible el entendimiento del estado actual 

de muchas áreas de conocimiento sin el uso de estas herramientas, que, por otra 

parte, siguen un proceso constante de desarrollo e investigación de nuevas 

utilidades. La química analítica no se ha sustraído a la adopción de las ventajas 

que ofrece la computación. La aplicación de los métodos analíticos hace que la 

información química de los sistemas en estudio se ponga de manifiesto. 

Como se ha comentado anteriormente, el proceso analítico ofrece un 

resultado tras una serie de operaciones en la muestra empleando diversas 

herramientas, entre las que se encuentra la informática. Pero la química analítica 

moderna ha traspasado los límites del laboratorio para entrar en contacto con los 

problemas económicos-sociales que la rodean, requiriéndose un nuevo papel de 

la información analítica diferente a la concepción clásica de la entrega de 

resultados. 

Con este fin, el proceso analítico necesita una continuación a partir de la 

salida de resultados, resolviéndose con un enlace a un proceso de información, 

denominado “Proceso de Información Automatizado”, si en él intervienen los 

14

Introducción 

ordenadores. La Fig. 3 muestra el nuevo proceso de información analítica 

(resultante de la unión del proceso de información automatizado al proceso 

analítico clásico) y la reubicación de las herramientas informáticas en este nuevo 

escenario. 

INFORMÁTICA 

Control 

Control 

Adquisición 

Introducción y 

manipulación 

Consultas 

Estructuradas 

Obtención de datos 

no estructurados 

Muestra 

Operaciones 

previas 

Detección 

Adquisición y 

tratamiento de 

datos 

Resultado 

Gestión de 

datos 

Extracción 

de datos 

Análisis de 

datos 

Información 

Proceso 

analítico 

clásico 

Fig. 3. La informática en el proceso de información analítica. 

Proceso de 

información 

automático 

15


1.2.1 La informática en el proceso analítico clásico 

La informática participa, aunque con un distinto grado de extensión, en todas las 

etapas del proceso analítico clásico. El manejo automático de los muestreadores y 

aparatos utilizados en las operaciones previas se realiza a través del software 

adecuado [9-17]. El flujo de información de estos programas es unidireccional 

desde el ordenador hasta los aparatos debido a que éstos no proporcionan 

información de interés para la resolución del problema analítico. Existe en la 

bibliografía un escaso número de estos programas a causa de que esta etapa es la 

más difícil de automatizar, siendo hoy en día uno de los frentes de investigación 

en química analítica. Su interés radica en el aumento de la calidad analítica que 

supone la eliminación de la participación humana en esta etapa, que es una de las 

que introducen mayor error en el resultado final. 

El uso del ordenador en la etapa de detección ya conlleva un flujo 

bidireccional de información. Por un lado, los programas permiten el control de 

instrumentos mediante la selección y ajuste de parámetros de medida, como 

pueden ser la longitud de onda en un espectrofotómetro, el voltaje en un equipo 

de electroforesis capilar o en un potenciostato, etc. Por otra parte, y a 

consecuencia de la diferencia entre aparatos e instrumentos, para la recogida de 

los datos proporcionados por el instrumento se hace uso, casi en todos los casos, 

de la informática. 

Aunque la adquisición de datos ya entraría dentro de la tercera etapa del 

proceso analítico clásico, desde el punto de vista informático existe una simbiosis 

con la de detección. Así pues, aunque a nivel conceptual constituyan distintas 

etapas, a nivel de funcionalidad pueden ser tratadas como una única solución 

software [18-28]. Incluso existen trabajos donde las etapas de tratamiento de 

muestra y las de detección y adquisición de datos se controlan por ordenador 

[29,30]. 

Para la tercera etapa, correspondiente al tratamiento de datos, se hace uso 

de la informática en toda la acepción de su definición, pues se usan unos datos 

16

Introducción 

como entrada a una aplicación software para, a través de un algoritmo, generar 

un resultado [31-39]. Los programas encargados de la detección y de la 

adquisición de la señal en los instrumentos pueden llevar incorporados un 

módulo para el tratamiento de los datos [40]. Una aplicación software para el 

tratamiento de datos independiente de la adquisición sería otra modalidad 

[41,42]. 

Son de destacar algunas ventajas de la primera opción. La más 

importante de todas es la compatibilidad del formato para los datos entre la 

adquisición y el tratamiento, evitando el problema de tiempo que conlleva la 

conversión de un formato a otro, cuando no la imposibilidad de hacerlo. Otras 

ventajas pueden ser la compatibilidad con las propias configuraciones de 

hardware y sistema operativo que ofrece la opción continua, la parecida 

estructura de las interfaces gráficas de usuario, etc. 

Las aplicaciones de tratamiento de datos independientes de la 

adquisición de la señal ofrecen un mayor espectro de métodos de tratamiento, ya 

que los programas están construidos a partir de unos requerimientos generales 

sobre procesamiento de datos, a diferencia de la aplicabilidad de programas para 

el control, adquisición de la señal y tratamiento de datos de un determinado tipo 

de instrumento. 

El tratamiento de los datos puede dividirse en dos grupos atendiendo al 

fin perseguido por los diferentes métodos de procesamiento: filtrado y reducción 

del ruido de la señal analítica por un lado, y análisis de los datos para la 

obtención de un resultado por otro. En lo que respecta al primer grupo, un 

ejemplo representativo es el filtrado de la señal digital, para el que es 

imprescindible el uso del ordenador. Los filtros analógicos, que también juegan 

un papel importante en el tratamiento de la señal y suelen ser usados en 

combinación con los filtros digitales, no van a considerarse. 

Los filtros digitales son algoritmos desarrollados para la eliminación del 

ruido, que han encontrado aplicabilidad en la resolución de muchos y variados 

problemas analíticos. La señal analítica está compuesta por dos partes: una parte 

17


de interés que contiene la información química, y una parte aleatoria causada por 

deficiencias en la detección, que no sólo no tiene información de interés, sino 

que, además, enmascara la información analítica. La parte aleatoria se conoce 

como ruido. 

Se distinguen dos tipos de técnicas para eliminar el ruido: de dominio del 

tiempo y de dominio de la frecuencia. Las primeras aprovechan el carácter 

aleatorio del ruido y están soportadas en algoritmos que, de una manera u otra, 

realizan un promediado de puntos consecutivos o un ajuste por mínimos 

cuadrados [43-45]. Suele distinguirse entre filtros de ventana fija y filtros de 

ventana móvil. 

Las técnicas de dominio de la frecuencia están basadas en la diferencia 

que ésta exhibe para la señal de interés y el ruido. Las diferentes modalidades de 

la transformada de Fourier se encuadran en este tipo [46-49]. 

Las técnicas basadas en el filtrado de Kalman [50], normalmente usadas 

para la multideterminación de componentes mediante el empleo de técnicas 

cinéticas, también se han usado para la eliminación del ruido [51,52]. 

El otro gran grupo de tratamiento de datos engloba los procesamientos de 

alto nivel, y tienen la finalidad de producir un resultado analítico tras el 

procesamiento y análisis de datos. Las herramientas informáticas desarrolladas 

están soportadas en los numerosos métodos estadísticos usados por el químico 

analítico para cuantificar, clasificar, e incluso dilucidar aspectos estructurales de 

los analitos. 

Wold, en 1972, fue el primer investigador que denominó Quimiometría 

al “uso de métodos estadísticos, matemáticos y otros de lógica formal para el 

diseño y selección de experimentos químicos y/o para la obtención de la máxima 

información química relevante a partir de datos químicos” [53]. Hoy en día, 

muchos autores, no sólo en química analítica, sino en química farmacéutica, 

química de los alimentos, etc., la consideran como una ciencia esencial para 

soportar los métodos analíticos [54-56]. 

18

Introducción 

1.2.2 La informática en el proceso analítico actual. El nuevo papel de la 

información analítica 

Como se ha comentado anteriormente, la labor de la química analítica hoy en día 

no se circunscribe a la entrega de resultados. En la Fig. 3 se puede observar que 

éstos son la entrada al sistema de procesamiento de la información, que es la 

segunda parte de lo que en la investigación realizada se ha denominado proceso 

de información analítica. Por lo tanto, los resultados de los análisis son la materia 

prima para la síntesis de información. 

Toda la aplicabilidad de los programas informáticos a las operaciones 

previas, a la detección de la señal, y a la adquisición y tratamiento de datos sigue 

estando vigente en la reubicación de la informática en el nuevo proceso de 

información analítica. 

En los últimos años ha surgido el concepto de LIMS (las iniciales de la 

expresión anglosajona “Laboratory Information Management System”) que 

abarca los sistemas informáticos para la gestión y el análisis de datos analíticos. 

Su definición puede ser derivada de las muchas usadas para “Sistema de 

Información”. 

Andreu et al. [57] definieron Sistema de Información como: “Un 

conjunto formal de procesos que, operando en una colección de datos 

estructurada según las necesidades de la empresa, recopilan, elaboran y 

distribuyen la información (o parte de ella) necesaria para las operaciones de 

dicha empresa y para las actividades de dirección y control correspondientes, es 

decir, las decisiones para desempeñar su actividad de acuerdo a su estrategia de 

negocio”. 

Para definir LIMS, lo vamos a hacer añadiendo una serie de matices que 

particularicen la definición dada de un sistema de información a la de LIMS. Así, 

cuando se habla de empresa, ésta se corresponde con un laboratorio de análisis 

químico, y por tanto, la colección de datos estructurados será de resultados 

analíticos, equipos del laboratorio, analistas, etc. En segundo lugar, las 

19


operaciones de la empresa tienen que matizarse de diferente forma según que el 

laboratorio sea independiente o pertenezca a una organización de mayor 

dimensión, de la que el laboratorio es un departamento más. Incluso cuando el 

laboratorio es independiente, podría ser un laboratorio de referencia o control 

perteneciente al ente público o un laboratorio privado que ofrece un servicio de 

análisis, que generalmente suele cubrir las necesidades analíticas de un 

determinado campo, ya sea clínico, agrario, ambiental, etc. Así, las operaciones 

para las que será necesaria la información y el flujo de ésta son dependientes del 

tipo de laboratorio. 

Cuando un laboratorio es una parte de un ente de mayores dimensiones, 

la información generada sigue un camino vertical. Cuando es ascendente empieza 

en el laboratorio y termina en la dirección técnica del ente global, incluso 

llegando a la dirección general, pero con un nivel elevado de transformación y 

siempre con características indicativas para su uso táctico y estratégico. Por 

tanto, suele ser un paso desde los datos (con el calificativo “producidos”) a la 

información (con el calificativo “elaborada”). 

Por otro lado, cuando la información es descendente adquiere matices 

imperativos para la síntesis de nuevos datos requeridos por la organización de la 

empresa. Entonces, el proceso es el contrario y se pasa de la información (con el 

calificativo “requerida”) a los datos (con el calificativo “necesarios”). Es obvio 

que éste es el flujo de información entre el laboratorio y la dirección, habiendo 

otros, por ejemplo, entre la dirección y el departamento de producción, con una 

dirección descendente y con un contenido dependiente del ya explicado. 

Cuando el laboratorio es independiente y estatal o regional, el flujo suele 

ser también vertical, pero cambia el destino de la información, apareciendo la 

administración en el lugar de la dirección de la empresa. Por el contrario, cuando 

el laboratorio es independiente y privado el camino que sigue la información es 

horizontal entre el laboratorio y el cliente. Éste requiere una información que 

generalmente necesita un menor grado transformación después de la salida del 

proceso analítico clásico, es decir, después de la generación del resultado. 

20

Introducción 

Cuando la información requerida es cualitativa (no implicando este término la 

respuesta binaria “si/no, referida a la presencia de un compuesto”) se alcanza el 

mayor grado de elaboración necesaria. Esto es debido a que el cliente solicita 

respuestas sobre la pertenencia o no del material en estudio a una determinada 

clase (por ejemplo, material adulterado, con un estándar de calidad establecido 

por la ley, etc.) cuyo criterio se basa en la evaluación de una serie de parámetros 

analíticos. 

Aunque la definición que se ha dado de Sistema de Información no 

incluye ninguna referencia a los medios informáticos, es difícil no relacionarlos. 

Normalmente, se diferencia el “Sistema de Información Total” del “Sistema de 

Información Automatizado” porque, en el último, la informática sí está 

considerada como la herramienta principal. 

1.2.3 Tipos de aplicaciones informáticas para el sistema de procesamiento 

Se van a distinguir dos tipos de aplicaciones: la conocida como MIS (de las siglas 

Management Information System) y los sistemas de ayuda a la decisión, 

conocidos como DSS (de las siglas Decission Support System). Estos sistemas 

representan diferentes niveles en la funcionalidad de los programas informáticos 

empleados (ver el proceso de información en la Fig. 3). También suele 

considerarse un nivel más bajo que el de los MISs, que abarca las aplicaciones 

software para la realización de tareas repetitivas, operativas y transaccionales con 

un número grande de datos. 

Los MISs en química analítica ayudan a los usuarios en la gestión de la 

información química, incluyendo introducción de resultados, codificación de 

muestras, control de equipos, inventario de reactivos, etc. [58-66]. Aunque tienen 

cierta hibridación con los sistemas de nivel más bajo (ya que muchas de estas 

tareas son repetitivas), son programas que ayudan a la toma de decisiones en 

tareas de gestión al proporcionar informes sobre la carga de trabajo del 

laboratorio, el uso de un instrumento, los análisis realizados por un determinado 

analista, etc. Se basan en consultas estructuradas a la base de datos. 

21


En cuanto a los DSSs, su uso en química analítica ayuda a la toma de 

decisiones que, estratégicamente, realiza la dirección de una empresa a raíz de la 

información química contenida en su base de datos. Esta ayuda a la decisión se 

basa en el análisis de los datos proporcionados por consultas que no siguen 

criterios preestablecidos [67-74]. Por ejemplo, conocer la semana, en un año 

cualquiera, en la que un determinado parámetro mostró los valores máximos, 

saber qué zona dentro de una denominación de origen de vinos es la que 

proporciona la uva con mayores concentraciones de ácido glucónico, etc. Estos 

sistemas son clave para el objetivo de la química analítica moderna de no 

limitarse sólo a la entrega de datos. 

2. Ingeniería del software. Modelado orientado a objetos: el 

22 

significado de UML y el Proceso de Desarrollo Unificado 

La participación de la informática en el proceso analítico se materializa en el uso 

de aplicaciones informáticas para la automatización (control de instrumentos y 

aparatos) y para el procesamiento y análisis de datos (adquisición y filtrado de la 

señal, tratamiento de datos, y gestión y análisis de la información analítica). 

Los programas informáticos, que anteriormente se han definido de una 

forma trivial como un conjunto de datos, instrucciones y operaciones aritméticas 

y lógicas, son productos que, en la mayoría de los casos, surgen después de una 

fase de desarrollo compleja. Existe una serie de condicionantes que diferencian 

de modo cualitativo y cuantitativo el producto software de otros derivados de un 

proceso de producción clásico o de un producto de servicios. Se van a comentar 

de manera breve algunos de ellos. 

En primer lugar, la variabilidad constante del entorno con el que 

interactúan los programas afecta considerablemente a la forma de construir 

aplicaciones informáticas. Los datos y la información a elaborar con éstos están 

influenciados, en un mayor o menor grado, por cambios en los requisitos 

funcionales. Por tanto, el desarrollo de un programa siempre tiene que estar

Introducción 

soportado en un diseño que haga a las estructuras de datos y a los procesos 

independientes (en la mayor dimensión posible) de la variabilidad intrínseca de 

los requisitos funcionales de la información. 

En segundo lugar, el soporte físico y tecnológico para las soluciones 

software está caracterizado, de igual forma que los requisitos funcionales, por 

una estructura dinámica. La compatibilidad entre el triángulo formado por el 

software, el hardware y el sistema operativo, indispensable para el uso de la 

computación, suele ser breve en el tiempo debido a la asimetría en el desarrollo y 

evolución de estos tres componentes. De nuevo el desarrollo de sistemas 

informáticos busca el concepto de independencia, esta vez relacionado con el 

triángulo anteriormente citado. 

Por último, la complejidad intrínseca en el mantenimiento de un 

programa informático también hace que su desarrollo sea especial. Los fallos del 

software no se pueden solucionar con un simple “recambio de piezas”, sino que 

son derivados del diseño del programa y se materializan en el código ejecutable, 

debiendo analizarse para su solución la estructura y el comportamiento del 

sistema, tanto a nivel de diseño como de implantación. Una aproximación para 

paliar este inconveniente es el diseño y desarrollo del sistema con un enfoque 

modular encaminado a construir componentes con una interfaz para la unión 

entre ellos. Además de la facilidad que este enfoque ofrece para la localización 

de los fallos, la reusabilidad del código aumenta debido a la consideración de 

diferentes componentes. 

Los condicionantes comentados hacen necesario que la forma de 

desarrollar programas no sea aleatoria, ni específica para una determinada 

aplicación. De esta necesidad surge el concepto de ingeniería del software que, 

como cualquier otra ingeniería, utiliza el conocimiento científico para la 

construcción de productos o servicios de interés para la sociedad. En el caso de la 

ingeniería del software, el conocimiento utilizado es sobre computación y hace 

hincapié en la forma disciplinada de desarrollar productos software que superen 

23


los inconvenientes citados, alcanzando el grado de provecho adecuado para el 

hombre. 

2.1 Ingeniería del software. Proceso de desarrollo de productos software 

Muchas son las definiciones dadas del concepto de Ingeniería del Software. Se 

recoge la dada por Boehm en 1976 [75]: “Es la aplicación práctica del 

conocimiento científico en el diseño y construcción de programas de 

computadora y la documentación asociada requerida para desarrollarlos, 

operarlos y mantenerlos. Se conoce también como desarrollo de software o 

producción de software”. 

Analizando en detalle la definición se pueden observar dos conceptos 

diferentes: el producto y el proceso. Sobre el producto, la solución software, cabe 

destacar que los programas informáticos son muy recientes en la historia (se 

empezaron a desarrollar y a usar a partir de los años 50 del siglo pasado); de lo 

que deriva el hecho de que la ingeniería del software sea una disciplina que está 

por desarrollar en una gran parte [76]. Los siguientes pasos a dar en este 

desarrollo están encaminados a la reducción de las muchas diversificaciones que 

existen todavía en esta área, unificando los criterios sobre el proceso para llegar a 

conseguir un producto software de calidad. 

El proceso se refiere a la metodología usada para conseguir el producto. 

No se debe confundir proceso con ingeniería del software, pues ésta, además de 

la metodología, emplea la tecnología y herramientas automáticas disponibles. Por 

tanto, los métodos sustentan el desarrollo de software al definir técnicamente los 

pasos a seguir, que normalmente suelen ser: el análisis de requisitos, el diseño, la 

codificación, la implantación y pruebas y, por último, el mantenimiento. 

A la hora de estudiar el proceso suele hacerse desde dos puntos de vista 

comunes en muchos libros de ingeniería del software. Éstos son, en primer lugar, 

un acercamiento a los diferentes tipos de procesos (también llamados modelos o 

paradigmas), para continuar con una descripción de las fases comunes que 

24

Introducción 

componen el proceso de desarrollo, independientemente de su tipo [77,78]. De 

forma no detallada se van a exponer tanto los modelos como las fases del proceso 

para, después, poder describir el modelado orientado a objetos y sus cualidades, 

que lo han hecho idóneo para la utilización en la investigación recogida en esta 

Memoria. 

2.1.1 Tipos de procesos (paradigmas de ingeniería del software) 

El proceso de desarrollo puede verse de una forma simple como la separación 

entre una funcionalidad requerida y basada en el ordenador (en el caso de la 

química analítica versará sobre control de instrumentación, gestión y análisis de 

datos) y un producto aportado para cumplir esos requisitos funcionales. Existe 

cierta hibridación entre el proceso y el producto entregado debido a que la fase de 

mantenimiento se realiza en este último, tal como puede verse en la Fig. 1.4, 

donde se describe de forma gráfica el proceso de desarrollo de software y su 

entorno. En ella se observa cómo en el proceso participan la metodología, las 

herramientas y el conocimiento sobre informática. 

Existen diferentes tipos de procesos que exhiben una estructura interna 

particular, siendo los puntos inicial y final siempre los mismos: los requisitos y el 

producto. Se van a comentar a continuación los más importantes. 

Requisitos funcionales 

basados en ordenador 

Herramientas 

Metodología 

PROCESO DE 

DESARROLLO 

Conocimiento 

ordenadores 

Análisis del sistema 

Análisis de requisitos 

Diseño 

Codificación 

Implantación y pruebas 

Mantenimiento 

Fig. 4. El proceso de desarrollo de software. 

Producto 

software 

25


El primer tipo es el Modelo Lineal Secuencial [79], en el que se 

considera el proceso como una encadenación lineal de las diferentes fases del 

proceso: análisis del sistema donde se va encuadrar el producto, análisis y 

especificación de los requisitos, diseño, codificación, implantación y pruebas, y 

mantenimiento. Este modelo ha recibido muchas críticas que ponen en duda su 

eficacia debido a una serie de puntos débiles [80,81]. Entre ellos se pueden 

destacar la imposibilidad de una estricta especificación de requisitos en una fase 

temprana del desarrollo, la posible desconexión entre las tareas llevadas a cabo 

por diferentes miembros y la dificultad de seguir un camino lineal en el 

desarrollo del producto. 

El modelo conocido como Desarrollo Rápido de Aplicaciones (DRA) 

[82,83] es una variante del Modelo Lineal Secuencial que tiene el objetivo de 

recoger los procesos que, siguiendo un camino lineal, están orientados al 

desarrollo de un producto en condiciones extremas de tiempo. El DRA se basa en 

la construcción de software por medio de componentes. Además, se utilizan 

herramientas de cuarta generación que proporcionan automáticamente el código a 

partir de las especificaciones. 

El siguiente paradigma a considerar es el Modelo del Prototipo [84], 

basado en unas fases rápidas de análisis de requisitos (se identifican los más 

claros), y diseño (centrado en aspectos visibles para el usuario o cliente final). 

Estas dos etapas llevan a la construcción de un prototipo encaminado a poner de 

manifiesto nuevos requisitos funcionales que no estaban asequibles en un primer 

análisis general y que serán entendidos por el desarrollador de una forma más 

clara. Uno de los inconvenientes de este modelo es que el usuario quiere empezar 

a trabajar con el prototipo y no entiende la falta de eficiencia que éste suele 

exhibir. Además, el desarrollador puede emplear en la elaboración del prototipo 

herramientas (tecnología, estructuras de datos, etc.) que, aunque no eran las más 

idóneas, sí eran las que estaban más accesibles o permitían el desarrollo en un 

menor tiempo. El ingeniero del software debe usar el criterio riguroso para la 

selección de los recursos adecuados. 

26

Introducción 

Un tercer paradigma de procesos es el denominado Modelo Incremental 

[85]. Éste, al igual que el modelo del prototipo y otros modelos evolutivos, 

cuenta con la ventaja de una implicación activa del cliente. Esto es debido a su 

estructura basada en iteraciones, las cuales se corresponden con subprocesos de 

desarrollo lineal que producen “incrementos de software” usados para el 

refinamiento de los requisitos. A diferencia del modelo del prototipo, el 

incremental entrega en cada iteración un producto que está preparado para ser 

operacional. Además de ser útil en la evaluación de riesgos, aporta una buena 

adaptación al hecho de no disponer del número adecuado de desarrolladores. 

El Modelo en Espiral [86] también es un proceso interactivo con el 

cliente. Está basado en incrementos que adquieren cada vez mayor dimensión, 

empezando incluso por un esquema en papel en la primera iteración, para llegar a 

sistemas de gran escala en las últimas versiones. Este modelo se divide en 

regiones de tareas (se distinguen entre 3 y 6 regiones de tareas), las cuales 

forman parte de cada versión. El modelo en espiral se aplica no sólo a la fase de 

desarrollo, sino que puede considerarse un procedimiento de trabajo que sustenta 

el ciclo de vida del software (es el tipo que da más importancia a la fase de 

mantenimiento). Además, usa la construcción de prototipos como un modo de 

evaluación de riesgos. Pero este paradigma también presenta inconvenientes, 

entre los que se citan la dificultad para convencer a los clientes de que se puede 

controlar el proceso y la habilidad requerida para la evaluación del riesgo. 

2.1.2 Fases comunes en el proceso de desarrollo de software 

Existe una serie de etapas comunes en todos los modelos comentados 

anteriormente. Estas etapas son: análisis y especificación de requisitos, diseño del 

sistema, codificación e implantación de la solución construida, pruebas, y, por 

último, mantenimiento del producto software. Se van a comentar las más 

relevantes. 

El Análisis y Especificación de Requisitos permite conocer las 

necesidades a cumplir respecto a la funcionalidad del software, su rendimiento, 

27


las interfaces con el entorno, y sus restricciones [87]. Aunque se puede pensar 

que es una tarea que no reviste dificultad, es de importancia crítica en el proceso 

de desarrollo. Al hablar de la calidad de la información analítica se comentó que 

el cumplimiento de los requerimientos del cliente es el único camino para 

conseguirla. Un símil con este concepto se puede establecer con la calidad de un 

producto software, pues, aunque las demás fases del proceso se hayan hecho de 

una forma correcta, si el análisis de requisitos no recoge las necesidades del 

cliente la calidad del producto es mala [88,89]. 

Suelen distinguirse cinco actividades implicadas normalmente en el 

análisis de requisitos: reconocimiento del problema, evaluación y síntesis, 

modelado, especificación y, por último, revisión. La fase de modelado es de 

suma importancia, pues en ella se construye una serie de modelos para describir 

los requerimientos del cliente, establecer la base del diseño y, finalmente, definir 

un conjunto de requisitos que se puedan validar una vez que se ha construido el 

software. 

Dos tipos de modelado del análisis se han impuesto sobre otros en los 

últimos tiempos: el análisis estructurado [90,91] y el análisis orientado a objetos 

[92-94]. En este apartado se van a comentar las características más relevantes del 

primero, para más adelante introducir el análisis orientado a objetos. 

Son cuatro los principales elementos del análisis estructurado: el 

diccionario de datos, el diagrama entidad-interrelación, el diagrama de flujo de 

datos, y el diagrama de transición de estados. Los dos primeros elementos 

pertenecen al modelado de los datos, el tercero al modelado de la función, y el 

último al modelado del comportamiento del sistema. El diccionario de datos es 

un término introducido por DeMarco [90] y recoge las definiciones de las 

estructuras de datos de entrada al sistema, las que se van a generar o consumir en 

el procesamiento de la información, y las que forman la salida del programa. 

El modelado de los datos describe las agrupaciones de datos presentes en 

el dominio de información del sistema y la relación entre ellas. Los diagramas 

entidad-interrelación, propuestos por primera vez por Chen en el año 1977 [95], 

28

Introducción 

son la principal herramienta en el modelado de datos. La intención de este autor 

era representar el nivel conceptual de una base de datos 

El flujo de información por el sistema se representa por el diagrama de 

flujo de datos. En él se describen, con un nivel de detalle dado, el camino de los 

datos, las entidades que operan con ellos, las transformaciones a que son 

sometidos y los lugares donde se almacenan. El refinamiento de los diagramas de 

flujo se utiliza para mostrar un mayor nivel de detalle. 

La creación de los diagramas de flujo de datos estuvo pensada para 

sistemas de procesamiento de la información discretos. Pero las demandas 

impuestas por los sistemas de tiempo real (en química analítica todos los sistemas 

de automatización y control de los instrumentos) hizo necesaria la extensión de 

los diagramas de flujo de datos [96,97]. Las nuevas aportaciones tienen en cuenta 

la recogida o producción de la información de una forma continua, el flujo de 

control necesario y el paralelismo de tareas. 

Para modelar uno de los requisitos más importantes de un sistema, su 

comportamiento, se introdujeron nuevas ampliaciones del análisis estructurado 

[96,97]. Este modelado se plasma en un diagrama de transición de estados que 

muestra los estados y las ocurrencias que hacen que un sistema cambie de un 

estado a otro. 

La siguiente fase del proceso es la fase de Diseño. En 1959, Taylor lo 

definió como “el proceso de aplicar distintas técnicas y principios con el 

propósito de definir un dispositivo, un proceso o un sistema con suficiente detalle 

como para permitir su realización física” [98]. El diseño constituye, junto con las 

fases de codificación y prueba, la fase técnica del proceso de desarrollo de 

software. 

Desde un punto de vista más práctico la fase de diseño puede visualizarse 

como el conjunto de actividades que transforman los modelos de análisis en 

modelos de diseño. Las tendencias del diseño están marcadas, por tanto, por las 

del análisis: un diseño estructurado [99-101] y un diseño orientado a objetos 

[102,103]. Se va a hacer un comentario sucinto del diseño estructural para 

29


después introducir, junto al análisis, el diseño orientado a objetos y sus 

aportaciones. 

El diseño se afronta con un enfoque sistemático, por lo que juegan una 

baza importantísima conceptos como abstracción, refinamiento, modularidad, 

partición estructural, etc. La modularidad se puede considerar como una 

actividad admitida en todas las ingenierías que hace más fácil la adaptación del 

sistema a los cambios y permite el desarrollo en paralelo. Está basada en la 

independencia respecto a otros módulos, siendo el subsiguiente acoplamiento de 

vital importancia para la estructura del programa. 

El diseño se divide en 4 partes: diseño de datos (su resultado son las 

estructuras de datos necesarias para implantar el software); diseño arquitectónico 

(se definen las partes estructurales del software); diseño de interfaz (se describe 

la comunicación de las diferentes partes estructurales entre sí, además del 

entorno) y diseño procedimental (los elementos estructurales del programa se 

describen de forma procedimental) [76]. Son numerosos los métodos de diseño 

descritos para cada actividad, estando fuera del alcance de esta introducción 

entrar en ellos, ni siquiera con un mínimo detalle. 

La fase de Codificación e Implantación supone la traducción física de los 

modelos de diseño al programa o prototipo, según el paradigma y el estado del 

proceso de desarrollo. Se pueden usar en estas etapas numerosos lenguajes de 

programación y, al igual que en las anteriores, se pueden distinguir lenguajes 

estructurados [104,105] y lenguajes orientados a objetos [106,107]. 

Las Pruebas del producto software, siguiendo con el paralelismo de los 

programas de calidad de las empresas, son las acciones correspondientes a la 

garantía de la calidad, es decir, el control de calidad y las subsiguientes acciones 

correctoras. Constituyen la evaluación final de las etapas de análisis, diseño, 

codificación e implantación. Numerosos autores han abordado la fase de pruebas 

del software [108-112], tratando los principios de éstas, el diseño de los casos de 

prueba, etc. 

30

2.2 Modelado orientado a objetos 

Introducción 

El término orientado a objetos fue introducido a finales de los años sesenta y 

comienzos de los setenta, aunque hasta la primera mitad de la década de los 

noventa no se empezó a utilizar el paradigma orientado a objetos de forma 

considerable en ingeniería del software [103,113-116]. El concepto apareció con 

el uso de lenguajes orientados a objetos (el primer lenguaje fue Smalltalk [106]), 

dejando fuera la implicación del término en el proceso de desarrollo de 

aplicaciones. 

2.2.1 Conceptos básicos del modelado y de los lenguajes de programación 

orientados a objetos 

El mundo real puede ser visto como un espacio lleno de objetos. Cualquier objeto 

tiene un conjunto de propiedades o atributos (atributo es el término más usado) 

que lo caracteriza y diferencia del resto. Apoyándonos en el área de 

automatización en química analítica, un instrumento es un objeto que tiene unas 

propiedades determinadas, como son: marca, modelo, puerto de comunicación, 

etc. Además, un instrumento se puede conectar y desconectar al ordenador, fijar 

una serie de parámetros para su funcionamiento, etc. Es decir, un objeto realiza 

una serie de operaciones (métodos es el término empleado). 

Una Clase es una abstracción de un conjunto de objetos que tienen la 

misma estructura (atributos) y el mismo comportamiento (métodos). Es decir, la 

clase Instrumento abarcaría objetos como un espectrofotómetro, un 

espectrofluorímetro, un medidor de pH, etc. También se suele emplear el término 

instancia (de la clase) en lugar de objeto. 

El hecho de que una clase pueda heredar los atributos y métodos de otra 

es conocido con el término Herencia, siendo ésta una de las características clave 

del paradigma de orientación a objetos. Se establece una jerarquía en la que la 

clase que hereda se denomina clase hija o subclase de la clase padre o superclase. 

Una clase, además de heredar las propiedades y el comportamiento de otra, puede 

31


añadir nuevos atributos y métodos, conociéndose esto como Especialización, lo 

que permite diferenciar las clases del mismo nivel en una determinada jerarquía. 

Así, una clase denominada InstrumentoOptico hereda los atributos y 

métodos de la clase Instrumento, pero para especializar su funcionalidad debe 

incorporar nuevas propiedades como el tipo de instrumento óptico (para saber si 

se corresponde con una técnica molecular o atómica, si es de absorción o 

emisión, etc.) y nuevos métodos, como, por ejemplo, el empleado para fijar la 

longitud de onda. 

Para obtener las propiedades de un determinado objeto que se requieren 

para la funcionalidad de otro, este último tiene que utilizar los mensajes como 

medio de comunicación con el primero. Los Mensajes estimulan el 

comportamiento del objeto receptor, necesitándose que en el mensaje se indiquen 

el método solicitado y los parámetros requeridos para que la operación se realice 

de forma correcta. De esta forma, un objeto oculta sus datos y sus métodos, y 

sólo los objetos que tengan permiso podrán llegar a la información que el objeto 

posee por medio de la comunicación anteriormente descrita. Esta característica es 

conocida como Encapsulamiento. 

El Polimorfismo es otra de las características del paradigma de 

orientación a objetos. Puede definirse como la diferente funcionalidad que 

exhiben los mismos métodos en diferentes objetos pertenecientes a la misma 

clase, o incluso en un mismo objeto (llamándose en este caso sobrecarga de 

métodos). La primera modalidad se puede llevar a cabo mediante el proceso de 

herencia. Así, una clase define unos determinados métodos, pero no los implanta 

ella, sino las clases hijas, adaptándolos a sus propias características. Estos 

métodos son conocidos en algunos lenguajes de programación como métodos 

abstractos (por ejemplo, el lenguaje Java [107]). 

La otra posibilidad, la sobrecarga de métodos, se lleva a cabo mediante la 

diferencia en los parámetros del mensaje al objeto receptor, ya sea por el tipo o 

números de datos. Éste utiliza el método apropiado para la finalidad requerida. 

32

2.2.2 Ventajas inmediatas de los sistemas orientados a objetos 

Introducción 

Aunque el concepto de objeto fue utilizado primeramente en el mundo de los 

lenguajes de programación (Smalltalk), la filosofía y las características implícitas 

en el concepto hicieron conveniente su introducción en el proceso de desarrollo 

de software. Se van a comentar las características más destacadas. 

El modelado orientado a objetos se adapta de una forma más eficiente 

que el modelado estructurado a los paradigmas evolutivos, que son los que se 

están imponiendo en el desarrollo de productos software. Incluso hay autores que 

definen un nuevo paradigma de desarrollo llamado Modelo de Ensamblaje de 

Componentes [117], basado en el modelado orientado a objetos. En una primera 

iteración del proceso evolutivo, en el análisis y diseño orientados a objetos se 

buscan las clases necesarias en bibliotecas de clases, y, si no están, se construyen 

las nuevas. Una nueva iteración partiría ya de estos componentes, que se adaptan, 

como se verá a continuación, de una forma más eficiente a los cambios y 

ampliaciones necesarios en las nuevas iteraciones. 

Por tanto, el modelado orientado a objetos permite la reutilización de 

software, proporcionando los siguientes datos según Yourdon [118]: reducción 

del 70 % de tiempo en el ciclo de desarrollo del software, reducción del 84 % en 

el coste del proyecto, y un índice de productividad del 26.2, comparado con la 

norma de industria 16.9. 

Las propiedades del diseño de un sistema software son las mismas para 

los métodos clásicos (estructurados) que para los orientados a objetos. Una de 

ellas es la eficiencia alcanzada en la modularidad de los programas. Se ha visto 

que el concepto de clase no sólo representa los objetos del mundo real, sino que 

encapsula los atributos y los métodos dentro de la clase. Se puede considerar que 

es una forma eficaz de abstracción de los datos y los métodos dentro del concepto 

de clase, que va a ocultar toda la estructura de datos y la implantación de los 

métodos. 

33


Una gran ventaja salta a la vista con este enfoque. Las clases 

proporcionan los datos necesarios para otros componentes (otras clases) a través 

de los métodos que tienen permiso (se habla de métodos privados y métodos 

públicos), actuando éstos a modo de interfaz. La posibilidad de efectos 

colaterales disminuye drásticamente. Los errores se localizan de forma rápida y 

están circunscritos a una determinada clase, por lo que una vez modificada la 

clase y eliminado el problema, la arquitectura del programa no se afecta. El 

acoplamiento, otra de las características buscadas, es máximo con un sistema de 

clases bien diseñado y construido. La cohesión también está asegurada si el 

número de datos a manejar por cada método es pequeño, que es la situación 

usual. 

La versatilidad de los sistemas orientados a objetos también es mayor 

que la conseguida por métodos convencionales. Este aspecto deriva del 

encapsulamiento de los métodos, que pueden considerarse como módulos en el 

sentido convencional, y que ofrecen una representación específica del 

comportamiento de un objeto, teniendo en otros métodos otros comportamientos 

para funciones distintas. La ventaja radica en que toda esta funcionalidad reside 

en una única y simple unidad llamada clase. 

La reusabilidad de los sistemas orientados a objetos se ha visto desde un 

punto de vista arquitectónico, con la utilización de clases ya diseñadas y 

construidas. Desde el punto de vista de la codificación, el concepto de herencia 

permite también aprovechar el concepto de reusabilidad, al no tener que 

reescribir el código de la implantación de los atributos y métodos heredados. 

Además, la introducción de modificaciones es una tarea más fácil debido a la 

propagación automática de las correcciones realizadas en las clases padre de la 

jerarquía. 

El polimorfismo facilita la expansión de los programas sin la 

modificación de las estructuras de control, lo que supone una ventaja 

considerable frente al modelado estructurado. Lo vamos a ver con un ejemplo. 

Supongamos una estructura de control utilizada en un programa de 

34

Introducción 

automatización en química, en la que, si se cumple una premisa de tiempo (el 

tiempo en el cual el instrumento debe realizar una determinada operación), la 

clase Instrumento recibe un mensaje para llevar a cabo el método ejecutar de la 

clase Accion. La estructura es la siguiente: 

/* Se recorre la lista de instrumentos que forman un 

autoanalizador */ 

for (int i=0; i


en 1994, y conllevaba la falta de estandarización para el desarrollo de los 

programas. Además, estos métodos sólo cubrían áreas parciales, por lo que el 

ingeniero del software rara vez obtenía un método que le permitiese modelar 

todos los aspectos de una aplicación. 

Surgió entonces una serie de métodos con el objetivo de ensanchar el 

campo de aplicación de sus respectivos lenguajes de modelado, no 

restringiéndolo a necesidades específicas. De estos métodos destacaron el método 

OOSE (Object-Oriented Software Engineering) propuesto por Jacobson [103], el 

método OMT (Object Modeling Technique) propuesto por Rumbaugh [119], y el 

método de Booch [92]. Aunque eran considerados métodos completos, 

presentaban una serie de puntos fuertes y puntos débiles. 

La evolución de cada uno de los tres métodos de modelado principales 

fue el adoptar ideas y conceptos de los otros dos, por lo que cada vez fue más 

necesario el esfuerzo de sintetizar un lenguaje unificado. Además, la 

colaboración entre los creadores de los tres métodos dio como resultado una 

mejora global del modelado orientado a objetos. De esta forma, desde 1994 se 

desarrolló un trabajo en la compañía Rational Software que culminó en 1997 con 

la publicación de la versión 1.0 de UML [120-122], que se ofreció para su 

estudio al grupo OMG (Object Management Group). En esta versión y en el 

estudio posterior los autores recibieron la colaboración de muchas organizaciones 

del área informática, entre las que se citan IBM, Hewlett-Packard, Microsoft, 

Oracle, etc. 

2.3.2 Elementos del lenguaje UML 

Se consideran cuatro tipos de elementos UML: estructurales, de comportamiento, 

de agrupación y, por último, de anotación. 

El primer elemento estructural que se cita es la clase, que describe un 

conjunto de objetos que tienen los mismos atributos, operaciones y relaciones en 

el modelo a desarrollar. Otro elemento estructural es la interfaz, que describe un 

conjunto (total o parcial) de operaciones de una clase o componente desde el 

36

Introducción 

punto de vista de la funcionalidad externa y no de la implantación de ésta. Una 

colaboración es un elemento estructural que define una interacción de elementos 

que participan de forma conjunta para conseguir un comportamiento no 

alcanzable por un elemento individual. Un caso de uso describe un conjunto de 

secuencias de acciones que el sistema ejecuta y que produce un resultado 

observable de interés para un actor particular. Se dice que un caso de uso se lleva 

a cabo por una colaboración. Otros tres elementos estructurales de UML son la 

clase activa, el componente, y el nodo. Son similares al concepto de clase, pero 

introducen particularidades que hacen necesario su empleo. 

Los elementos de comportamiento describen la parte dinámica de los 

modelos desarrollados con UML, por lo que son considerados los verbos de este 

lenguaje. Se distingue la interacción, que describe un conjunto de mensajes 

intercambiados entre un grupo de objetos para conseguir un objetivo específico, y 

la máquina de estados, que recoge la secuencia de estados por la que pasa un 

objeto o una interacción en respuesta a un evento. 

Los elementos de agrupación son las partes organizativas de los modelos. 

En UML sólo se utiliza el paquete, que puede ser considerado como un elemento 

que engloba elementos estructurales, de comportamiento, e incluso propios 

elementos de agrupación. 

Por último, los elementos de anotación son comentarios que se añaden al 

modelo para un mejor entendimiento de éste. La nota es el objeto de anotación 

básico para describir explicaciones y restricciones. 

2.3.3 Relaciones en el lenguaje UML 

Para construir modelos se utilizan cuatro tipos de relaciones en UML, que tienen 

el objetivo de unir los elementos entre sí: la dependencia, la asociación, la 

generalización y la realización. 

La dependencia es una relación entre dos elementos, en la que un cambio 

a un elemento (denominado independiente) afecta a la semántica del otro 

(denominado dependiente). La asociación es una relación estructural que 

37


describe las conexiones entre objetos. La agregación es un tipo especial de 

asociación que representa la relación entre un todo y sus partes. La 

generalización (la forma de representar la herencia en UML) describe la relación 

de sustitución de un elemento general (padre) por un objeto especializado (hijo). 

Por último, la realización es una relación semántica donde un elemento 

especifica un contrato que otro elemento garantiza que cumplirá. 

2.3.4 Diagramas en el lenguaje UML 

Un diagrama es la representación gráfica de un conjunto de elementos y 

relaciones que recogen una parte de la estructura, comportamiento y arquitectura 

del sistema. Los aspectos estructurales se recogen en el diagrama de clases y en 

el diagrama de objetos. El primero es una descripción de las clases, las 

interfaces, las colaboraciones y las relaciones entre los tres elementos anteriores. 

El segundo representa una vista instantánea de los elementos y relaciones 

encontrados en el diagrama de clases. 

El comportamiento del sistema se modela con la construcción de varios 

tipos de diagramas. El diagrama de casos de uso representa las relaciones entre 

las acciones del sistema y los actores implicados. El diagrama de secuencia y el 

diagrama de colaboración se emplean para el modelado de las interacciones 

entre un grupo de objetos que intercambian una serie de mensajes, por eso se les 

denominan también diagramas de interacción. Los de secuencia resaltan la 

ordenación temporal de los mensajes y los de colaboración señalan los aspectos 

estructurales de los objetos que envían y reciben mensajes. 

Otros diagramas se encargan de modelar la dinámica del sistema desde el 

punto de vista del flujo de control. Estos son el diagrama de estados y el 

diagrama de actividades. En ambos se muestran los estados por los que puede 

pasar un sistema, y, por otro lado, las actividades y eventos que provocan la 

transición de un estado a otro. 

La arquitectura del sistema se modela mediante el diagrama de 

componentes y el diagrama de despliegue. El primero se encarga de representar 

38

Introducción 

la organización y dependencias entre un conjunto de componentes, y se relaciona 

con los diagramas de clases en que un componente incluye una o más clases, 

interfaces o colaboraciones. El diagrama de despliegue muestra la configuración 

de los nodos de procesamiento en tiempo de ejecución y los componentes que 

residen en ellos. 

3. El lenguaje de programación Java 

3.1 Desarrollo del lenguaje de programación Java 

Java es un lenguaje de programación de alto nivel como lo son Pascal, Basic, 

C++, etc., o sus entornos gráficos de programación, Visual Basic, Delphi, Visual 

C++, etc. La primera versión de Java apareció en el verano de 1995 [123,124] en 

un intento de paliar los inconvenientes que presentaba la dependencia del código 

desarrollado respecto al microprocesador y sistema operativo del ordenador 

donde iba a utilizarse. El software de reducidas prestaciones que se empotra en 

dispositivos electrónicos de bajo precio, como electrodomésticos, calculadoras, 

sensores, etc., tenía que ser actualizado constantemente al modificar el 

microprocesador de estos dispositivos. Esto ocurre incluso con los lenguajes de 

alto nivel como C. 

La empresa Sun Microsystem [125] se marcó como objetivo, a principios 

de los años noventa, el desarrollo de un nuevo lenguaje de programación que se 

adecuara al entorno de ejecución sin la necesidad de realizar ningún cambio en el 

código. Es decir, el objetivo fue buscar la portabilidad total del lenguaje a 

construir. Éste fue el inicio de un proceso de desarrollo de un nuevo lenguaje que 

seguramente, y a pesar de importancia de la portabilidad del código hoy en día, 

no hubiese alcanzado las cotas de popularidad de las que goza si no hubiese 

nacido en la que ya se ha denominado “era de la información”. Así, fue 

determinante el “encuentro” entre el principal símbolo, herramienta, etc., de esta 

era, Internet, y el lenguaje Java. 

39


Gosling, al frente del equipo de Sun inmerso en la síntesis del nuevo 

lenguaje, vio la posibilidad de mostrar la total portabilidad del lenguaje 

ejecutando un programa dentro de una página Web, es decir, ejecutar un 

programa desde cualquier punto de la red. Para este fin necesitó un navegador 

que integrara la posibilidad de ejecutar Java en Internet, creando entonces el 

navegador HotJava. 

A partir de aquí surgieron las características que consagraron a Java 

como el lenguaje estándar de la red. 

3.2 Características generales de Java 

3.2.1 Simplicidad 

Java, sin lugar a dudas, es un lenguaje simple. De lenguajes como C y C++ ha 

tomado las características de sintaxis, de diseño y de seguridad que han hecho 

que estos lenguajes sean de los más utilizados en la programación actual. Java ha 

avanzado respecto a estos lenguajes al eliminar las características que implican 

para el programador un mayor grado de dificultad, reduciendo considerablemente 

el número de errores. Así, los tipos de estructuras (struct), y su definición 

(typedef), no están presentes en el lenguaje Java. 

Como punto más importante, Java ha eliminado el concepto de puntero 

presente en el lenguaje C y C++ y, por tanto, la aritmética de punteros, uno de los 

aspectos más engorrosos de estos lenguajes. Java posee una gestión automática 

de la memoria que permite liberarla a través del denominado recolector de basura 

(garbage collector) [126, 127]. 

3.2.2 Robustez 

Java incorpora una serie de medidas y comprobaciones que evitan errores 

inesperados, por lo que se considera un lenguaje robusto (de hecho, se pensó 

llamarlo Oak, roble en inglés, haciendo alusión a su robustez). Tanto en tiempo 

40

Introducción 

de compilación como de ejecución se realizan chequeos para encontrar posibles 

problemas. 

También se realiza una comprobación de los bytecodes, que son el 

código resultante tras la compilación de un programa Java, es decir, no se trata de 

un código máquina directamente entendible por el hardware. Los bytecodes son 

la entrada a la Máquina Virtual de Java (JVM), que se tratará más adelante. 

El control automático de la memoria, eliminando los errores por 

desbordamiento (overflow), también aporta robustez al lenguaje. 

3.2.3 Portabilidad e independencia 

Características como la portabilidad y la independencia de Java y su arquitectura 

derivan del hecho de que este lenguaje es compilado e interpretado, es decir, 

están presentes las dos formas de traducción de un código fuente (lenguaje de 

alto nivel a código máquina). 

Tras el proceso de compilación del programa Java, se obtiene el código 

objeto formado por bytecodes, que contiene el 80% de las instrucciones del 

programa en código máquina. Este código es genérico y, para adaptarlo a una 

determinada plataforma (el concepto de plataforma se refiere a la combinación de 

un procesador y un sistema operativo), se interpreta por la JVM para añadir el 

20% de las instrucciones del programa. 

La JVM, que es la encargada de interpretar los bytecodes al lenguaje 

máquina, hace que la ejecución de una determinada aplicación se adapte al 

hardware de la plataforma dada. Por lo tanto, la JVM es dependiente de la 

plataforma, siendo el único elemento específico que se necesita para ejecutar un 

programa Java, independientemente de que éste haya sido compilado en 

cualquier otra máquina con diferente sistema operativo o distinto 

microprocesador [128]. 

La JVM está disponible en la arquitectura Java (Java 2 Platform 

Standard) para Solaris 2.x, SunOS 4.1.x, Microsoft Windows (95, 98, 2000, NT, 

Server y XP), Linux, Iris, Aix, Aple, etc. Además, ha sido incorporada a los 

41


distintos navegadores (Microsoft la incorpora en la versión 5.0 de Internet 

Explorer), a los sistemas de gestión de bases de datos, servidores virtuales, etc. 

3.2.4 Lenguaje distribuido y capacidad para la programación multihilo 

Java está concebido para el desarrollo de aplicaciones ejecutables en distintas 

máquinas en red. Por eso posee una serie de librerías para facilitar la 

interconexión vía protocolo TCP/IP. Por lo tanto, se pueden desarrollar 

aplicaciones para acceder a la información en la red de igual forma que se accede 

a la memoria secundaria de una máquina local [129]. 

Una aplicación construida en Java puede ejecutar varias actividades de 

forma simultánea dentro de una aplicación. Esta propiedad aporta un mejor 

rendimiento interactivo y facilita el desarrollo de aplicaciones para control de 

dispositivos en tiempo real [130]. 

3.2.5 Lenguaje orientado a objetos y gratuito 

En primer lugar, es un lenguaje orientado a objetos, tanto a nivel arquitectónico, 

al aportar las propiedades más importantes de los sistemas de objetos (herencia, 

encapsulación y polimorfismo), como a nivel funcional, al ser los datos tratados 

como objetos en todo momento. 

Por otro lado, las diferentes herramientas para el desarrollo de 

aplicaciones Java son gratuitas. Todas las plataformas están disponibles en 

http://java.sun.com. 

3.3 Aplicaciones Java 

Las diferentes plataformas e interfaces para la programación de aplicaciones 

(denominadas APIs del inglés Application Programming Interfaces) de Java 

proporcionan un amplio abanico de posibilidades en la construcción de 

programas de distinto tipo. Así, se pueden desarrollar soluciones “standalone”, 

que son aplicaciones sin características especiales (no acceden a un servidor, no 

utilizan recursos en máquinas remotas, etc.) para ejecutar en un único ordenador. 

42

Introducción 

Para su desarrollo son suficientes las librerías y utilidades incluidas en la 

plataforma estándar de Java (Java 2 Platform, Standard Edition, J2SE). 

Aplicaciones más complejas son las que están basadas en una 

arquitectura multicapa. Por ejemplo, los sistemas de información de tres capas, 

denominados en inglés three tiered Enterprise Information Systems y 

desarrollados con tecnología Java usan la plataforma Java 2 Entreprise Edition 

(J2EE). Por este motivo, también se denominan aplicaciones J2EE [131,132]. 

La primera capa se denomina capa cliente, formada por diferentes 

componentes que se ejecutan en las máquinas que acceden de forma remota a un 

servidor Web. Un componente J2EE es una unidad software con una determinada 

funcionalidad y que se acopla a los demás componentes de la aplicación J2EE a 

través de sus respectivas clases. 

Los diferentes componentes de la capa cliente se pueden agrupar en tres 

bloques: los clientes Web, los applets y las aplicaciones cliente. Los clientes Web 

están formados por las páginas Web dinámicas, escritas con los lenguajes de 

marcas HTML o XML (eXtensible Markup Language), y los navegadores Web, 

que permiten la búsqueda y visualización de la información. Por su parte, los 

applets son pequeñas aplicaciones cliente escritas en Java y que ejecutan la JVM 

perteneciente al navegador. Por último, las aplicaciones cliente son interfaces 

normalmente realizadas con los paquetes Swing o Awt de Java para aportar una 

mayor riqueza gráfica. 

La segunda capa es el servidor Web, que está formada por Servlets y las 

páginas JSP (Java Server Page). Los servlets son clases Java que, de forma 

dinámica, procesan las peticiones al servidor y construyen las respuestas a los 

clientes. Las páginas JSP presentan la misma funcionalidad que los servlets, 

diferenciándose de éstos en la estructura basada en página Web que poseen. 

La arquitectura de tres capas es considerada por algunos autores, que 

desde una óptica del dominio de la información a tratar por la aplicación J2EE, 

establecen cuatro capas, y argumentan que la arquitectura de tres capas sólo hace 

referencia a la localización de los componentes. Por tanto, en la nueva estructura 

43


propuesta para las aplicaciones J2EE, la capa servidor se divide en la capa Web y 

la capa de negocio. 

La capa Web está formada por los componentes que formaban la antigua 

capa servidor, es decir, los servlets y las páginas JSP. Por otro lado, la capa de 

negocio está formada por los Enterprise Java Beans (EJB), que son la interfaz 

entre la capa Web (peticiones y respuestas entre el cliente y el servidor) y la 

última capa, denominada capa del sistema de información. Se distinguen tres 

tipos de EJB: los de sesión, los de entidad y, por último, los de mensaje. 

La capa del sistema de información incluye todas las estructuras que 

facilitan el almacenamiento de la información de la empresa. Por tanto, su núcleo 

es la base de datos. Los componentes J2EE de esta capa son aquéllos que 

permiten la conexión entre la capa Web y la de negocio, y la base de datos. 

Para terminar, se van a citar otras plataformas que ofrece la arquitectura 

Java para el desarrollo de aplicaciones informáticas en otros campos. J2ME (Java 

2 Platform Micro Edition) proporciona las utilidades necesarias para el desarrollo 

de software para dispositivos electrónicos como móviles, calculadoras, etc. 

También están disponibles diferentes APIs para el desarrollo de servicios Web, 

que son aplicaciones para el uso de datos con Java y XML. Para el ensamblaje de 

componentes Java y componentes XML son útiles las interfaces JAXB, SAX y 

JDOM [133,134]. 

4. La información como recurso. Oracle. SGBD basados en el 

44 

modelo de datos objeto relacional 

Los correctos procesos de almacenamiento, mantenimiento y consulta de la 

información analítica hacen uso de las bases de datos. La gestión automática de 

los datos siempre se ha apoyado en las características de la memoria secundaria 

(memoria no volátil) y en una tecnología que ha avanzado desde los sistemas de 

ficheros hasta los sistemas de gestión de bases datos, basados estos últimos en 

modelos avanzados de representación de la información. En este apartado se van

Introducción 

a considerar las ventajas aportadas por el sistema de gestión de bases de datos 

Oracle, que es la herramienta de gestión de la información utilizada en el 

desarrollo y uso del sistema de información que se ha diseñado y construido y 

que se recoge en esta Memoria. 

4.1 De los sistemas de ficheros planos a los de bases de datos 

4.1.1 Evolución 

Las demandas funcionales requeridas para las aplicaciones de procesamiento de 

la información han marcado el camino a seguir en la tecnología de acceso a los 

datos. Así, los primeros sistemas tenían el objetivo de cumplir unos requisitos 

normalmente relacionados con las tareas repetitivas de administración, buscando 

una reducción del tiempo y el espacio que requerían la gestión en papel de los 

datos. Cuando las empresas llegaron a considerar que la información es el 

recurso más importante para la vista de negocio, la tecnología evolucionó hacia 

el desarrollo de las bases de datos. Hoy en día, las bases de datos son el pilar de 

los sistemas de información. 

Los sistemas de almacenamiento basados en ficheros mantienen la 

información por medio de archivos que contienen una colección de registros con 

diversos campos de datos. La principal limitación de estos sistemas para 

sintetizar información es la imposibilidad de relacionar diferentes registros sin 

aplicaciones externas complejas. Además, como se verá más adelante, presentan 

también una serie de inconvenientes que hacen inapropiado su uso en el entorno 

computacional de la sociedad actual. 

Los sistemas clásicos de almacenamiento, como se conocen a las 

estructuras de ficheros, también experimentaron una evolución interna. El hito 

más importante que la marcó fue el cambio del acceso secuencial al acceso 

directo de los datos. Así, para consultar o proporcionar los datos de un 

determinado registro en un fichero ya no fue necesario leer ni ordenar los 

archivos de uno en uno, llevando este hecho a un ahorro de tiempo significativo. 

45


El acceso aleatorio de los datos estaba asegurado con ficheros con estructura 

indexada, soportados en el método ISAM (de las siglas en inglés de Indexed 

Sequential Access Method) [135]. 

En la primera generación de ordenadores (años cincuenta y comienzos de 

los sesenta) se utilizaron ampliamente los sistemas de almacenamiento clásicos. 

Estos sistemas sólo proporcionaron una solución parcial (por los motivos 

anteriormente comentados y los inconvenientes que se citarán en el siguiente 

apartado) por lo que a mediados de los sesenta apareció el concepto de bases de 

datos [136]. Se cita una definición del concepto de bases de datos de las muchas 

que aparecen en la literatura: “Colección de archivos relacionados que 

almacenan tanto una representación abstracta del dominio de un problema del 

mundo real cuyo manejo resulta de interés para una organización, como los 

datos correspondientes a la información acerca del mismo. Tanto la 

representación como los datos están sujetos a una serie de restricciones, las 

cuales forman parte del dominio del problema y cuya descripción está también 

almacenada es esos ficheros” [137]. 

La tecnología de las bases de datos también ha seguido una evolución, 

esta vez marcada por los diferentes modelos que representan la estructura de la 

base de datos a diferentes niveles, desde un nivel físico hasta el externo, pasando 

por los niveles conceptual y lógico [138]. Así, aparecen las bases de datos 

jerárquicas, en red, relacionales, objeto-relacional, orientadas a objetos y las 

bases de datos soportadas en el conocimiento. Los dos primeros modelos, 

apoyados en el concepto de puntero, tenían una alta dependencia de los niveles 

físicos, lo que seguía siendo un inconveniente y provocó la búsqueda de nuevos 

modelos y la evolución de la tecnología. 

4.1.2 Ventajas e inconvenientes de los sistemas de bases de datos frente a los 

sistemas basados en ficheros 

Como ya se ha comentado anteriormente, la dificultad de representar la relación 

entre los datos es el inconveniente más destacado de los ficheros clásicos. Las 

46

Introducción 

bases de datos surgen del concepto de Sistema Orientado a los Datos, dejando 

atrás los Sistemas Orientados a los Procesos. El primero busca la relación de los 

datos en la definición de la estructura y en el comportamiento, y no en las 

aplicaciones externas de manejo de datos. 

La independencia entre los datos y los tratamientos es de vital 

importancia para que el dinamismo implícito del entorno que los rodea no les 

afecte. Las bases de datos aseguran que la inclusión de nueva información no 

modifica los tratamientos de la información y, en dirección contraria, que los 

cambios de los procesamientos no influyen en el diseño de los datos. Aunque esta 

independencia no es absoluta, las bases de datos se acercan a este objetivo. 

La integridad de la información, básica para su uso, se consigue 

mediante las bases de datos, que permiten que se almacene una única copia de la 

información que representa un determinado dato. Esta propiedad se sustenta en 

otra característica de los sistemas de bases de datos, la mínima redundancia que 

presenta, siendo ésta a nivel físico la totalidad de las veces. En el mantenimiento 

de los sistemas clásicos, la introducción o modificación de un dato obligaba al 

acceso a varios ficheros, lo que conllevaba la posible inconsistencia de la 

información almacenada. 

La disponibilidad de la información es más apropiada debido a que 

ningún fichero de datos es para el uso exclusivo de un tratamiento, sino que los 

datos pueden consultarse por todas las aplicaciones. 

El espacio de almacenamiento, debido al menor número de redundancias 

existentes, se reduce en los sistemas de bases de datos. Este menor espacio de 

memoria secundaria utilizada se pone de manifiesto en los sistemas de gran 

volumen de datos. Esto es debido a que un sistema de bases de datos utiliza un 

diccionario de datos, punteros, índices, etc., que, generalmente, ocupan un 

espacio considerable. 

Las bases de datos presentan un alto desempeño en la interacción 

hombre-máquina porque los datos se recogen y se consultan una sola vez. Por lo 

47


tanto, el rendimiento de la consulta y la actualización en una base de datos va a 

ser siempre mayor que en los sistemas de ficheros planos. 

Los sistemas de gestión de bases de datos (que se verán más adelante) 

garantizan la seguridad y la privacidad de la información. Una de las 

características más importantes de Oracle (uno de los más conocidos sistemas de 

gestión de bases de datos) es la extrema protección que su arquitectura 

proporciona [139]. 

A pesar de esta serie de ventajas, existen algunos inconvenientes en los 

sistemas de bases de datos, normalmente potenciados por el desconocimiento por 

parte de las empresas de las características de los sistemas de gestión de bases de 

datos. Uno de estos inconvenientes es la implantación larga y costosa, y como 

consecuencia, una rentabilidad no visible a corto plazo. Otros problemas están 

relacionados con la falta de estandarización (aunque ya se dispone de 

herramientas estándares, sobre todo para las bases de datos relacionales), y el 

desfase entre una teoría adelantada y una práctica a la zaga. 

4.2 El Sistema de Gestión de Bases de Datos (SGBD) 

El SGBD es el sistema computacional que facilita la gestión de las bases de 

datos. Por tanto, el SGBD engloba el conjunto de programas, procedimientos, 

lenguajes, interfaces, etc., que suministra a los usuarios de la base de datos las 

utilidades necesarias para describir y manejar los datos almacenados. Además, 

garantiza la seguridad y privacidad de la información contenida en la base de 

datos. Por lo tanto, se puede considerar que es la interfaz entre la base de datos y 

los diferentes usuarios y aplicaciones del sistema de información. 

Cuando se habla de usuarios de las bases de datos, se refiere a dos 

niveles o dos grupos, que a su vez se pueden subdividir. El primer grupo lo 

componen los usuarios que crean y mantienen la base de datos, además de 

desarrollar los programas que acceden a los datos. El segundo grupo de usuarios 

son aquellos que emplean los datos para sus actividades dentro de la empresa. 

48

4.2.1 Funciones y lenguajes de los SGBD 

Introducción 

Suelen distinguirse tres funciones principales de los SGBD: la función de 

definición, la de manipulación y la de control de los datos. La función de 

definición permite especificar los elementos de datos, su relación y las 

restricciones impuestas tanto por el dominio de la información, como por las 

estructuras físicas de almacenamiento. El lenguaje encargado de esta función es 

el Lenguaje de Descripción de Datos (Data Definition Language, DDL), que es 

dependiente del SGBD. Este lenguaje suele encargarse también de indicar el 

espacio reservado para la extensión de los campos, las estructuras de datos y sus 

relaciones, y las restricciones. Ésta es la definición física y, en algunos SGBD, su 

especificación se hace separadamente con otro lenguaje denominado Lenguaje de 

Definición del Almacenamiento de los Datos (Data Storage Definition Language, 

DSDL). 

La función de manipulación se lleva a cabo con el Lenguaje de 

Manipulación de Datos (Data Manipulation Language, DML), que permite la 

definición externa de los datos, su consulta y la actualización de éstos, es decir, 

su inserción, modificación y borrado. EL DML es un lenguaje dependiente del 

modelo que emplea el SGBD, y puede ser una serie de mandatos dentro de un 

lenguaje de programación, llamado huésped o, por el contrario, un lenguaje que 

no necesita apoyarse en ningún otro. 

Por último, la función de control se lleva a cabo por medio de una serie 

de interfaces que proporcionan el correcto acceso de los usuarios a la base de 

datos. Esta función reside en el Lenguaje de Control de Datos (Data Control 

Language, DCL). 

4.2.2 Otros componentes del SGBD 

Se ha comentado anteriormente que el SGBD está formado no sólo por los 

lenguajes citados, sino que también cuenta con una serie de programas, 

estructuras de datos, etc., que permiten la correcta gestión de los datos. El 

49


Diccionario de Datos es un conjunto de archivos que contienen la información de 

los datos que pueden ser almacenados (es una Metabase de Datos). En este 

diccionario se almacenan los esquemas lógico y físico de la base de datos, al 

igual que los subesquemas de ésta. 

Más adelante se verá que hay 3 niveles o modelos de abstracción en los 

que se representa una base de datos: físico, lógico y externo. Por lo tanto, el 

diccionario de datos recoge la definición de estos tres niveles. Los 

procedimientos de acceso necesitarán en algún momento de cómputo vincular las 

tres definiciones, basándose en lo que se conoce como Mapa de Reglas, también 

almacenadas en el diccionario de datos. 

Otro componente a destacar del SGBD es el Gestor de la Base de Datos, 

que es un componente software que actúa de interfaz entre los datos almacenados 

y las aplicaciones que acceden a ellos. Garantiza el correcto, seguro y eficiente 

almacenamiento y consulta de los datos. 

4.2.3 Estandarización en los SGBD 

La estandarización y normalización de las bases de datos y de sus sistemas de 

gestión han sido tratadas por distintos organismos desde los años sesenta, pero su 

avance ha sido y es lento debido tanto a motivos técnicos, como burocráticos. 

Los objetivos de la estandarización están encaminados a que el cambio de un 

producto comercial SGBD a otro no implique la modificación del diseño de la 

base de datos en uso y de las aplicaciones que usan los datos. Es decir, el 

concepto de Sistema Abierto es el centro de los objetivos. 

Dos organismos como son la ISO (International Organisation for 

Standarisation) e IEC (International Electrochemical Commission) han 

establecido un comité conjunto denominado JTCI (Joint Technical Committee) 

para la estandarización en las tecnologías de la información. Uno de los muchos 

grupos de este comité, el denominado WG3, está dedicado a la búsqueda de 

sistemas abiertos de bases de datos. Actualmente trabajan en cuatro proyectos: 

50

Introducción 

lenguajes de bases de datos, modelos de referencia, acceso remoto a datos y 

sistemas de diccionarios de recursos de información [138]. 

Codasyl (Conference on Data System Languages) es un grupo que 

investigó la normalización en los modelos de datos y sus diferentes lenguajes y 

propuso un modelo de datos en red, llamado también Codasyl, con sus lenguajes 

de descripción y manipulación. 

El grupo ANSI/X3/SPARC es el grupo de estudio de la organización 

Standard Planning and Requirements Committee (SPARC), perteneciente al 

American National Standards Institute (ANSI). El comité X3 es el que trata los 

temas informáticos. En el año de 1972 estos grupos empezaron a aunar esfuerzos 

encaminados a una potencial normalización de los SGBD. Como en anteriores 

intentos por parte de otros grupos, el principal inconveniente era el temor a hacer 

una estandarización en un momento no apropiado que frenase los avances del 

área. Por este motivo, sus primeras actividades fueron estudios sobre los 

componentes o aspectos de los SGBD que requerían un mayor grado de 

normalización. En el año 1977 se elaboró un informe en el que se analizó la 

arquitectura de un SGBD y las 42 interfaces identificadas. 

Los trabajos continuaron durante años, creándose grupos dentro del 

ANSI/SPARC específicos del área de las bases de datos, para, en 1986, proponer 

un Modelo de Referencia para la estandarización de los SGBD [140]. Tanto el 

informe final como los intermedios introducen una diferencia en la arquitectura 

de los SGBD respecto a los trabajos de otros organismos, que sólo distinguían 

entre la estructura o nivel lógico y la estructura o nivel físico. La arquitectura 

ANSI/X3/SPARC introduce un tercer nivel entre las estructuras física y lógica. 

Es el llamado nivel conceptual. Aunque empleando terminología distinta, otros 

grupos (el grupo GUIDE/SHARE de usuarios de IBM, el Club de Banco de 

Datos del INRIA, etc.) han introducido arquitecturas similares de tres niveles. 

51


4.2.4 La arquitectura ANSI/X3/SPARC 

La arquitectura ANSI/X3/SPARC propone tres niveles de abstracción de datos. 

En ella el nivel clave es el denominado esquema conceptual, del cual derivan una 

serie de esquemas externos que son la imagen de los datos que tienen los usuarios 

y aplicaciones que hacen uso de ellos. Del esquema conceptual también deriva el 

interno, que describe los datos desde un punto de vista físico. La conversión de 

un nivel a otro se efectúa por medio de funciones de correspondencia. 

La arquitectura, propuesta en el informe de 1978, está dividida en dos 

partes: una para la definición de los datos y otra para su manipulación. Se 

distingue una serie de funciones tanto humanas como de programas, un conjunto 

de interfaces lógicas o físicas, y un diccionario de datos, también denominado 

metadatos, clave en esta arquitectura. Los requisitos de independencia y 

escalabilidad entre los tres niveles se cumplen por la arquitectura ANSI/SPARC, 

por lo que los cambios en los esquemas interno, conceptual y externo no se 

afectan recíprocamente. 

El impacto de la estructura de tres niveles en los SGBD actuales es 

elevado, pero el informe que la propuso fue criticado debido al número tan 

elevado de interfaces y la vaguedad del concepto del diccionario de datos. Como 

se ha comentado anteriormente, los trabajos prosiguieron hasta el Modelo de 

Referencia (MR), publicado en 1986 [140]. 

El MR está basado en la arquitectura ANSI, pero supera los 

inconvenientes del elevado número de interfaces y del concepto del diccionario 

de datos (se contesta a las numerosas preguntas sobre su concepto que surgieron 

en el primer informe). Su objetivo es explicar la relación entre los diferentes 

componentes y niveles del SGBD, lo que supone un intento de marcar las bases 

para futuras estandarizaciones. 

52

4.3 Modelos de datos 

Introducción 

Un modelo es una representación de un sistema a un nivel de detalle dado para 

considerar sus aspectos más importantes. Desde el punto de vista de las bases de 

datos, el concepto a tener en cuenta es Modelo de Datos, que puede definirse 

como el conjunto de elementos semánticos que permiten la descripción de la base 

de datos a diferentes niveles de abstracción. Teniendo en cuenta la arquitectura 

ANSI, existen modelos internos, conceptuales y externos. Por tanto, se pueden 

considerar los modelos como herramientas que ayudan a la implantación, 

comprensión y uso de las bases de datos. 

En este apartado se va a hacer hincapié en los modelos conceptuales, 

pues los modelos externos suelen estar basados en los conceptuales y, por otra 

parte, los modelos internos no están estandarizados y son dependientes del 

fabricante. Los modelos conceptuales se dividen en modelos conceptuales 

propiamente dichos y modelos lógicos. Los primeros se centran en la descripción 

del conjunto de la información a tratar por cada aplicación desde un enfoque 

totalmente independiente de la máquina. Los modelos lógicos, a diferencia de los 

conceptuales, describen el dominio de la información, pero desde un punto de 

vista dependiente del SGBD, utilizando elementos que se soportan en éste y que 

limitan la representación semántica del problema. 

Un modelo de datos debe estar compuesto por dos submodelos: el 

estructural y el dinámico. El submodelo estructural describe la parte estática de 

los datos (entidades u objetos, sus atributos, sus asociaciones, y sus 

restricciones), mientras que el dinámico describe las operaciones que hacen que 

la base de datos varíe de un estado a otro. 

4.3.1 Modelos conceptuales. Análisis semántico de la base de datos 

Los modelos conceptuales permiten capturar la semántica del problema a tratar 

sin las restricciones impuestas por el SGBD. Por lo tanto, son modelos más 

flexibles y proporcionan un mayor grado de abstracción. La interacción que este 

53


tipo de modelos permite es la que tiene lugar entre la información del mundo real 

a representar y el diseñador de la base de datos. 

Son varios los modelos conceptuales recogidos en la bibliografía, entre 

los que se citan los modelos Entidad/Interrelación (E/R), Infológico, Modelo 

Semántico de Datos (SDM), etc., [141,142]. Por su versatilidad, el primero es el 

que más se usa hoy en día en el diseño de las bases de datos. No suele estar 

implantado en los SGBD, aunque algunos de éstos pueden llevar una herramienta 

CASE incorporada para realizar el diseño conceptual y luego la traducción 

automática a un modelo lógico. 

Se suele elegir entre un modelo conceptual u otro, al igual que entre 

diferentes modelos lógicos, pero no entre un modelo conceptual y un modelo 

lógico, ya que tienen diferentes ámbitos de funcionalidad. La metodología a 

seguir en el diseño de las bases de datos es utilizar primero un modelo conceptual 

para recoger la mayor dimensión semántica del problema y, luego, diseñar la 

base de datos con el modelo lógico que soporta el SGBD. Además, suelen existir 

reglas para pasar de un modelo conceptual a un modelo lógico de forma 

sistemática garantizando la menor pérdida de información posible. 

El modelo E/R fue propuesto por Chen en los años 1976 y 1977 [143- 

145], y son varios los investigadores que han contribuido a su expansión [146- 

148]. El concepto clave en torno al que giran los demás elementos semánticos es 

Entidad, que es cualquier objeto de interés para el dominio del problema. Las 

entidades tienen unos atributos con sus respectivos dominios y se asocian a otras 

a través de lo que se denominan Interrelaciones, que también pueden presentar 

atributos. Aunque una descripción del modelo E/R queda fuera del alcance de 

esta introducción, se van a citar los términos más importantes de este modelo, 

como son: entidades o interrelaciones débiles o fuertes, cardinalidad de las 

interrelaciones, debilidad por existencia o por identificación, interrelaciones 

jerárquicas, etc. 

El modelo E/R, tal como lo propuso su creador, es únicamente un 

modelo estructural de la información. Ampliaciones aportadas por Poonen [149] 

54

Introducción 

y Shoshani [150] incorporan lenguajes para la recuperación y actualización de los 

datos almacenados en sus estructuras. Muchos investigadores dudan de la 

aplicabilidad de la parte dinámica añadida al modelo debido al uso que se le da, 

pues se emplea el diseño de la base de datos desde la óptica conceptual y no 

desde la visión lógica del SGBD, que opera sobre los datos. 

4.3.2 Modelos lógicos. Diseño de la base de datos en función del SGBD 

Un modelo lógico permite un diseño de la base de datos que cubre una 

información parcial del problema, debido a que la estructura y el comportamiento 

de los datos en el SGBD imponen unas restricciones al dominio de información a 

tratar. 

Muchos son los modelos de este tipo propuestos en la literatura de bases 

de datos, aunque el más extendido actualmente es el Modelo Relacional debido a 

que la mayoría de los SGBD comerciales están basados en él. Otros modelos 

lógicos que han sido usados en gran medida son el modelo en red o Modelo 

Codasyl, y el Modelo Jerárquico. 

La sencillez que el usuario encuentra en la parte estática (relaciones o 

tablas) y en la parte dinámica (lenguajes de consulta y modificación) del modelo 

relacional es una de las causas que hace que sea el más usado. Además, este 

modelo no presenta dependencia respecto al nivel físico, que sí la tienen los 

modelos jerárquico y en red, que utilizan el concepto de puntero (con la 

implicación física del término) para representar la relación entre objetos. 

Por otra parte, el modelo relacional ha recibido críticas referentes a la 

debilidad semántica que posee respecto a los modelos jerárquicos y Codasyl al no 

permitir, principalmente, la distinción entre los objetos y las asociaciones, ambos 

representados mediante tablas. Otras críticas se le hicieron cuando los primeros 

prototipos que implantaban el modelo relacional salieron al mercado y dieron 

problemas relacionados con la eficiencia de los sistemas. Ya se ha comentado en 

esta introducción que la teoría y la práctica de las bases de datos ha presentado 

siempre un desfase a favor de la primera, y éste no fue menos para los sistemas 

55


relacionales. Hoy en día, uno de los más conocidos SGBD, Oracle, soporta el 

modelo relacional (desde 1980). 

En los años 90, con el auge del paradigma orientado a objetos en las 

áreas de ingeniería del software y programación, se desarrolló un nuevo modelo 

de datos, el Modelo Orientado a Objetos, aunque no se prevé ni a medio ni a 

corto plazo que sustituya la base relacional de los SGBD actuales. A 

continuación se exponen brevemente los aspectos clave del modelo relacional y 

su expansión para soportar la programación orientada a objetos. 

4.3.3 Estática y dinámica del modelo relacional 

Codd es el creador del modelo relacional, que introdujo a través de una serie de 

publicaciones donde presentó tanto sus aspectos estructurales como dinámicos 

[151,152]. El modelo relacional está basado en la teoría matemática de la 

relaciones, siendo la tabla o relación la estructura básica de almacenamiento, 

compuesta por filas o tuplas cuyo número varía con el tiempo. 

La relación, desde el punto de vista de las bases de datos y no 

matemático, se define como una tabla, identificada normalmente por un nombre 

que suele ser único en el modelo. Esta tabla posee un conjunto de filas donde la 

primera se denomina intención de la relación, que está formada por los m pares 

atributo-dominio. El resto de filas (denominándose al conjunto extensión de la 

relación) están formadas por los m pares atributo-valor, siendo el orden de estos 

m pares igual al de los m pares de la intención. La intención es invariable con el 

tiempo (estática), al contrario de la extensión, que es variable (dinámica). Una 

base de datos relacional es simplemente un conjunto de relaciones cuya extensión 

varía con el tiempo. 

Una fila o tupla no puede repetirse en una relación en un instante t, por lo 

que debe haber un atributo o un conjunto de ellos que identifiquen únicamente a 

una determinada tupla. 

La normalización de las relaciones fue propuesta por Codd [153] con el 

objetivo de eliminar las inconsistencias y redundancias de un sistema relacional. 

56

Introducción 

Para ello se emplea un conjunto de operaciones encaminadas a eliminar una serie 

de dependencias entre los atributos de una relación. Además, es posible la 

construcción de las relaciones en una base de datos relacional a través de una 

serie de reglas de traducción de modelos E/R a modelos relacionales [137]. 

La dinámica del modelo relacional es el conjunto de operaciones que 

transforman el estado de una relación, y por lo tanto de una base de datos. Las 

operaciones son las siguientes: inserción de tuplas, borrado, modificación y 

consultas. La dinámica del modelo se expresa mediante lenguajes de 

manipulación relacionales, siendo éstos de dos tipos: algebráicos y predicativos 

[154,155]. Los primeros dan lugar a lo que se conoce como Álgebra Relacional, 

que se caracteriza porque modifica el estado de la base de datos mediante 

operaciones donde los operandos y el resultado son relaciones. Los lenguajes 

predicativos constituyen el Cálculo Relacional, donde los cambios de estado se 

especifican por predicados que definen el estado final de la relación sin indicar 

las operaciones. Se dividen en dos tipos: los que están orientados a tuplas y los 

que están orientados a dominios. 

El lenguaje de manipulación que se ha impuesto con diferencia en los 

SGBD relacionales es el lenguaje estándar SQL (Structured Query Language), 

que es del tipo algebráico. 

4.3.4 El modelo de datos orientado a objetos 

El modelo de datos orientado a objetos tiende a eliminar la separación entre los 

datos y los procesos, que siempre ha existido en la tecnología de bases de datos. 

Los SGBD Orientados a Objetos (SGBDO) gestionan entidades donde se 

encuentran encapsulados tanto los datos como las operaciones a realizar con 

ellos. Así, parte del código que se encontraba en el lado de las aplicaciones en los 

sistemas convencionales con este modelo se almacena en la base de datos. 

El modelo de objetos también elimina la frontera entre el nivel 

conceptual y el nivel lógico, ya que la captura semántica del problema se hace a 

través del modelo que va a ser implantado por el SGBD. El SGBDO emplea y 

57


aprovecha las características del paradigma de orientación a objetos. Por ejemplo, 

el encapsulamiento y el ocultamiento de la información, que aportan la ventaja de 

que el usuario no ve los aspectos de implantación. Además, la modificación de 

un objeto no afecta a los objetos que interaccionan con él. 

Las interacciones entre los objetos pueden ser de dos tipos: estáticas y 

dinámicas. Las primeras se basan en la herencia de objetos (en la tecnología de 

bases de datos denominada generalización) y en la agregación de objetos para 

crear objetos complejos. Por otro lado, las interacciones dinámicas tienen lugar a 

través de los mensajes que solicitan servicios y proporcionan resultados. 

Los SGBDO poseen una serie de características que son propias del 

paradigma de orientación a objetos, y otras que han tenido que incorporar de los 

SGBD. Entre las primeras se encuentran la extensibilidad y la disponibilidad de 

bibliotecas. Entre las generales de los SGBD están la persistencia, el 

cumplimiento de la arquitectura ANSI, la seguridad, etc. 

Hay dos tendencias a la hora de aplicar la metodología de objetos a las 

bases de datos (se denomina tercera generación de bases de datos) que se 

diferencian en la distancia que establecen con el modelo relacional. Así, se 

distingue entre SGBDO puros (rompen por completo con el modelo relacional) y 

SGBD relacionales extendidos (aprovechan el modelo relacional y la tecnología 

basada en éste). La comunidad científica está dividida en dos corrientes, una a 

favor del enfoque evolutivo [156-158] y otra que defiende el enfoque 

revolucionario [159]. Pero el debate también se está dando en el ámbito 

económico, decantándose la mayoría de las empresas de SGBD por el enfoque 

evolutivo para no desaprovechar la base tecnológica en la que se asientan las 

bases de datos relacionales. 

58

4.4 Oracle 

4.4.1 Evolución reciente del sistema de gestión de bases de datos Oracle 

Introducción 

Oracle Data Server, comercializado por la compañía Oracle Corporation, y más 

comúnmente conocido como Oracle, es el líder en el mercado de los SGBD 

[139]. A partir de la versión Oracle7 se incorporaron las características que 

hicieron que este producto fuese útil para muchos tipos de aplicaciones: sistemas 

de almacenamiento de datos, sistemas de ayuda a la decisión, sistemas para el 

procesamiento de datos operacionales y transaccionales, etc. A partir de la 

versión Oracle8 soporta tanto el modelo relacional, como el orientado a objetos 

(enfoque evolutivo) [160]. 

El lanzamiento de la versión Oracle8 [161,162], verano de 1997, supuso 

la inclusión de una serie de nuevas características para adaptar el producto a las 

exigencias de los nuevos entornos computacionales. Así, se incluyeron la 

partición de datos, los tipos de objetos y sus métodos, los tipos de objetos 

grandes (LOB, large object), la adjudicación de contraseñas, etc. 

En 1999, se inició la distribución de Oracle8i, que aportó una serie de 

mejoras para las aplicaciones de almacenamiento masivo de datos y para las 

aplicaciones Web. Respecto al primer tipo, Oracle8i incluye características para 

el incremento del rendimiento del procesamiento de peticiones complejas, como 

son las vistas materializadas, reescritura de peticiones automáticas e índices 

basados en funciones. Respecto a las aplicaciones Web, la nueva versión incluye 

la máquina virtual de Java, comentada anteriormente, lo que permite construir 

tanto aplicaciones de acceso a los datos, como componentes del SGBD usando el 

nuevo lenguaje de programación. También se ha incluido el Sistema de Archivos 

de Internet (conocido por sus siglas iniciales, IFS del Internet File System). 

4.4.2. Arquitecturas de procesamiento de Oracle 

Oracle distingue entre la base de datos y la instancia de la base de datos. La 

definición del primer término ya se ha dado anteriormente. La instancia de la 

59


base de datos es el conjunto de los procesos de sistema operativo y de memoria 

que el SGBD usa para administrar el acceso a la base de datos. Es decir, para 

poder acceder a la base de datos es necesario que la instancia de base de datos 

esté en ejecución [162]. 

Una instancia de Oracle usa un conjunto independiente de hilos o 

procesos para soportar las conexiones de usuario (a través del SGBD o de una 

aplicación externa). Oracle soporta un número de sesiones conectadas a la 

instancia en varios tipos de entorno computacional. 

Las aplicaciones cliente/servidor, también denominadas de 

procesamiento distribuido, realizan varias tareas a través de dos o más 

componentes. Por ejemplo los tres componentes de los sistemas cliente/servidor: 

el cliente, el servidor y la red. 

Para soportar las conexiones distribuidas, Oracle emplea varias 

arquitecturas: la de servidor dedicado y la de servidor multihilo. En el primer tipo 

de servidor se crea lo que se denomina un servidor de segundo plano para cada 

cliente que se conecta al sistema. En el segundo tipo se crea un conjunto de hilos 

de servidor que de manera eficaz soportan la conexión de gran cantidad de 

usuarios. 

4.4.3 Java y el acceso a bases de datos Oracle: JDBC y SQLJ 

La relevancia que tiene actualmente el lenguaje de programación Java ha hecho 

que Oracle desarrolle una serie de características para la adecuación de las 

aplicaciones construidas con este lenguaje. 

En primer lugar, Oracle incorpora la máquina virtual de Java necesaria 

para la interpretación de los programas. En segundo lugar, Oracle permite usar 

los estándares industriales JDBC (del inglés Java DataBase Connectivity) y 

SQLJ (hace referencia al lenguaje de consulta SQL y al lenguaje Java). 

Por último, las clases Java se pueden almacenar en la base de datos, es 

decir, no sólo se pueden almacenar los datos, sino también la funcionalidad 

60

Introducción 

requerida mediante módulos Java. Además, ya se ha comentado la inclusión del 

modelo de bases de datos objeto-relacionales en Oracle. 

5. Quimiometría: modelos clásicos para diseñar experimentos, 

clasificar productos y predecir parámetros 

El empleo de la estadística y las matemáticas en química analítica, conocido este 

uso con el nombre de Quimiometría, tiene dos objetivos básicos. El primero de 

ellos es usar la parte deductiva del método científico (a partir del conocimiento se 

llega a los datos) para el diseño de experimentos [163-167]; el segundo objetivo 

es extraer la máxima información (bio)química, a priori oculta, de los sistemas en 

estudio. Esta información puede ser cuantitativa (sistemas de regresión 

multivariante [168-172]) o cualitativa (diferenciación y desarrollo de modelos de 

clasificación de objetos [173-176]). Este segundo objetivo usa la parte inductiva 

del método científico (desde los datos se llega al conocimiento). 

El diseño de experimentos está encaminado, por una parte, a la selección 

del conjunto de ensayos necesarios para desarrollar un determinado modelo. Por 

otro lado, el diseño de experimentos está dedicado al estudio de las variables que 

influyen en un proceso analítico o en la síntesis de un producto. 

Los procesos analíticos están afectados por una serie de variables como 

pueden ser el pH, la temperatura, el caudal en un sistema dinámico, etc. La 

puesta a punto de un nuevo método analítico debe implicar la mejora de las 

características analíticas de los métodos oficiales, de referencia o convencionales. 

Una de estas características es la sensibilidad, con el fin de llegar a discernir 

entre concentraciones de analito lo más pequeñas posible. Por lo tanto, hay que 

optimizar el método para ver qué variables afectan a la señal analítica y la 

dimensión o peso de esta influencia. 

Además, algunas de las variables a optimizar pueden estar relacionadas 

con la propiedades analíticas productivas (rapidez, coste, etc.) y afectar 

61


negativamente a la señal, por lo que el cálculo de cómo influyen es de vital 

importancia para el desarrollo del método. 

Las técnicas de análisis actuales se caracterizan por proporcionar mucha 

información. Por ejemplo, los equipos espectroscópicos multicanal permiten la 

obtención de datos correspondientes a cientos e incluso miles de variables en 

unos minutos. Esta información no sería útil sin el uso de los diferentes métodos 

multivariantes propuestos en la bibliografía. Así, observar las posibles tendencias 

de los datos o utilizarlos para determinar un parámetro dado requiere el uso de las 

herramientas multivariantes para clasificar o para cuantificar, respectivamente. 

5.1 Diseño experimental para la optimización de procesos analíticos 

5.1.1 Factores experimentales y respuesta 

Las variables o condiciones experimentales que afectan a un determinado 

proceso o producto reciben el nombre de factores en la terminología del diseño 

de experimentos. Los factores se clasifican en cualitativos (por ejemplo, el uso de 

un determinado reactivo) y cuantitativos (por ejemplo, la presión en un proceso 

de extracción con un fluido supercrítico). Los diferentes valores que pueden 

tomar los factores se denominan niveles. 

Las características del producto o proceso que queremos optimizar se 

llaman respuestas. Generalmente, cada respuesta se modela separadamente, 

aunque muchas veces los valores óptimos para las respuestas están en conflicto. 

Existen métodos de múltiple criterio para resolver estos conflictos [177-179]. 

5.1.2 Diseño de experimentos y análisis multivariante 

En el diseño de experimentos se utiliza el análisis multivariante debido a que una 

aproximación univariante puede conducir a errores. Se va a suponer un proceso 

analítico en el que influyen dos factores, x1 y x2. Con un enfoque univariante se 

dejaría un factor fijo, por ejemplo, x2, y se irían variando los valores para x1 hasta 

encontrar el óptimo de este factor. Se estudiaría después el factor x2 dejando fijo 

62

Introducción 

x1 en el valor óptimo. Cuando se obtuviera el valor óptimo para x2 la 

optimización habría finalizado. Este método sólo es correcto cuando los factores 

no interaccionan entre sí. Además, si el número de factores es muy elevado, el 

proceso es largo y engorroso. Por tanto, el procedimiento más correcto es recurrir 

a métodos secuenciales o simultáneos, que se comentan más adelante. 

5.1.3 Estrategias en el diseño de experimentos 

Los primeros pasos a seguir en el diseño de experimentos son la selección de los 

factores y las respuestas. Aunque se pueden conocer de antemano los factores 

que influyen en la respuesta, no es una información con la que se cuente 

normalmente como punto de partida. Por lo tanto, hay que tener en cuenta todos 

los posibles factores y hacer un screening con el objetivo de descartar los 

factores no influyentes. 

Antes del screening se define el dominio experimental, en el que se 

establecen los valores extremos para cada factor. Normalmente se emplea el 

diseño de dos niveles, aunque existen otras aproximaciones; por ejemplo, el 

diseño de Doehlert [180]. 

Tras el screening, y una vez elegidos los factores, las respuestas y los 

niveles, el siguiente paso es la selección de la metodología de diseño a utilizar 

entre dos grandes grupos: el secuencial o el simultáneo. 

5.1.4 Método secuencial Simplex 

El diseño secuencial está basado en la realización de pocos experimentos que se 

usan para determinar las condiciones del próximo experimento. El diseño más 

empleado de este tipo es el método Simplex [181,182]. Para explicar brevemente 

en qué consiste, se suponen de nuevo dos factores, x1 y x2. Se empieza con la 

realización de tres experimentos situados en un triángulo en el plano formado por 

x1 y x2. El experimento que dé peor respuesta indica que la dirección tiene que ser 

la contraria a la que lleva a este vértice del triángulo. Por tanto, se construye un 

63


nuevo triángulo formado por los dos vértices anteriores que dieron las mejores 

respuestas y el vértice opuesto al que proporcionó el peor resultado. 

El proceso se continúa hasta que la inclusión de un nuevo experimento 

no mejora los resultados obtenidos. El principal inconveniente de este método 

radica en la dependencia de la calidad del diseño (superficie de respuesta) de la 

ruta escogida para alcanzar el máximo, por lo que presenta connotaciones 

azarosas. 

5.1.5 Métodos simultáneos 

Entre los diseños simultáneos, también llamados diseños factoriales, se pueden 

distinguir dos tipos con distinta filosofía de trabajo: 

a) Diseños en los que los objetivos son establecer/conocer qué factores influyen 

y en qué magnitud lo hacen. El diseño más usado dentro de este tipo es el 

denominado diseño factorial completo de dos niveles, en el que para cada 

factor se consideran dos niveles. Este método permite calcular el efecto de 

los factores y sus interacciones. El número de experimentos a llevar a cabo es 

2 n , donde n es el número de factores. 

Una modalidad de este tipo de diseños son los denominados diseños 

factoriales fraccionarios, en los que sólo se lleva a cabo una fracción de los 

experimentos (1/2, 1/4, 1/8, etc.). Aunque la información obtenida es menor, 

su uso es práctico cuando el número de factores es muy alto. 

En los casos en los que las interacciones entre factores no influyen y 

sólo se quiere conocer qué factores son relevantes, se emplean los diseños 

factoriales fraccionarios saturados [183] o los de Plackett-Burman [184]. 

Son útiles para procesos de screening de factores. También se han empleado 

en el estudio de la robustez de métodos analíticos [185]. 

b) Diseños en los que la importancia radica en la obtención de la función de 

respuesta en el punto óptimo. Las funciones de respuesta para dos factores y 

una única respuesta vienen dadas por las siguientes expresiones: 

64

y = b0 + b1x1 + b2x2 + b12x1x2 (1) 

y = b0 + b1x1 + b2x2 + b11x1 2 + b22x2 2 + b12x1x2 (2) 

En estas ecuaciones se pueden observar: 

Introducción 

• Un término independiente b0, que será igual al valor de y cuando x1 y 

x2 son cero. La mayoría de las veces se suele trabajar con valores 

codificados donde se le dan los valores -1 y +1 a los extremos elegidos 

para cada factor, por lo que 0 corresponde al valor intermedio del 

rango entre -1 y +1. En estos casos, b0 es el valor de y en el centroide. 

• Términos de primer orden en la ecuación (1) y (2), y de segundo orden 

en la ecuación (2) para x1 y x2. 

• Los términos de la interacción ente x1 y x2 (último sumando de las 

ecuaciones (1) y (2)). 

Cuando se usa la ecuación (2) se consideran tres niveles para cada factor con 

el fin de tener en cuenta los términos cuadráticos de esta ecuación. El modelo 

más empleado de este tipo es el diseño central compuesto, que tiene en 

cuenta dos niveles y varias réplicas de los puntos centrales. 

5.2 La quimiometría en el tratamiento de la información espectroscópica 

5.2.1 La necesidad del análisis multivariante en espectroscopía 

Las técnicas espectroscópicas han experimentado un gran avance en los últimos 

años que ha dado lugar a nuevos métodos analíticos. Una de las zonas espectrales 

más utilizadas ha sido la del infrarrojo, surgiendo la tecnología NIRS (de las 

siglas en inglés Near InfraRed Spectroscopy), que ha sido incorporada en el 

control de calidad en muchos campos, especialmente en las áreas 

agroalimentaria, farmacéutica, química, petroquímica y ambiental [186-188]. 

Otra técnica también muy extendida por las mismas razones es la relacionada con 

65


otras zonas del espectro IR, además de técnicas como la Resonancia Magnética 

Nuclear, la Espectroscopía Raman, la Espectrometría de Masas, etc. En una 

parte de la investigación recogida en esta Memoria se han utilizado los datos 

correspondientes a la absorción en las zonas ultravioleta, visible, infrarrojo 

cercano e infrarrojo medio. 

Puesto que la muestra se analiza apenas sin tratar, los espectros recogidos 

son muy poco característicos y las bandas corresponden a la absorción de un 

variado número de componentes. La aplicación de la ley de Lambert-Beer a estos 

espectros es imposible debido a la gran cantidad de datos, la colinealidad de éstos 

y, por tanto, su redundancia. Una aproximación univariante carece de sentido en 

este tipo de espectros [189,190]. Además, los datos espectroscópicos están 

afectados por diversos tipos de error, entre los que destacan el ruido intrínseco 

del instrumento y los relacionados con la heterogeneidad, textura, estabilidad, 

etc., de las muestras a analizar. 

Por todo lo expuesto se hace necesario el uso de diversas herramientas 

quimiométricas basadas todas ellas en el Análisis Multivariante, que abarca los 

diferentes métodos estadísticos, matemáticos, o gráficos para el análisis de datos 

que usan de forma simultánea varias variables [191,192]. En el caso de las 

técnicas espectroscópicas, el análisis multivariante tiene dos objetivos: 1) la 

determinación de una o varias propiedades físico-químicas a partir de múltiples 

variables espectrales (valores de absorbancia a diferentes longitudes de onda); 2) 

la diferenciación o la clasificación de grupos de muestras con características 

comunes a partir de la información espectral. 

5.2.2 Preprocesamiento de los datos espectrales 

Uno de los aspectos que más interfiere en la información espectral es la 

dispersión de la radiación incidente (conocida en la bibliografía espectroscópica 

como Efecto Scatter) [193-196]. Es un fenómeno físico provocado por la 

interacción no relacionada con fenómenos de absorción entre la radiación y la 

muestra, sino con la dispersión causada por el tamaño y la geometría de las 

66

Introducción 

partículas que existen en la muestra. También intervienen en su origen los 

cambios en el índice de refracción. 

Con el fin de minimizar el efecto de dispersión scatter se ha desarrollado 

una serie de soluciones para el preprocesamiento de los datos espectroscópicos. 

Entre ellas se encuentran los tratamientos MSC (de las siglas en inglés de 

Multiplicative Scatter Correction), SNV (Standard Normal Variate), DTR 

(Detrending) y OSC (Orthogonal Signal Correction) [197,193]. 

Otro tipo de transformación de los datos espectroscópicos es el 

tratamiento de derivadas, que tiene el objetivo de disminuir tanto la 

superposición de picos, como la variación de la línea de base del espectro (esto 

último aumenta la relación señal/ruido). También son numerosos los métodos de 

derivadas, entre los que se destacan el de Savitzky-Golay y el de las diferencias 

finitas, y las transformadas de Fourier [198,199,190]. 

La normalización y escalado de los datos también son métodos de 

preprocesamiento empleados en espectroscopía con el objetivo de igualar la 

influencia de todas las variables que componen el espectro. 

5.3 Análisis cualitativo 

En análisis cualitativo los datos espectroscópicos, sometidos o no a un 

preprocesamiento, se utilizan con el fin de calcular una variable categórica, por 

ejemplo, origen, variedad, tipo de proceso, etc. Es decir, la información espectral 

se emplea para la clasificación de muestras en diferentes grupos atendiendo a la 

similitud de sus espectros respecto a otros de muestras conocidas. Estos métodos 

son conocidos con el nombre de Métodos de Reconocimiento de Pautas o 

Patrones [54]. 

Suelen distinguirse dos grupos de estos métodos: supervisados y no 

supervisados, en función de la información del agrupamiento de datos de que se 

dispone a priori. En los métodos no supervisados no se dispone de información 

de las muestras analizadas o, si se dispone, es escasa. El objetivo, por tanto, es 

67


poner de manifiesto tendencias que están ocultas en la matriz de datos formada 

por los espectros de las muestras. Son métodos que se aplican en las primeras 

etapas de la investigación de la clasificación de muestras y, según la 

discriminación alcanzada por éstos, la posterior fase de clasificación conducirá a 

mejores o peores resultados. Los métodos más frecuentes en la bibliografía son el 

Análisis de Cluster y el Análisis de Componentes Principales. 

Por el contrario, los métodos supervisados sí cuentan con las categorías 

existentes en el colectivo de muestras, y el objetivo es la síntesis de reglas, 

empleando la información de partida de los tipos de muestras para la 

clasificación de muestras desconocidas. 

En los métodos supervisados se distingue entre métodos discriminantes y 

métodos de modelado. El objetivo de los primeros es discriminar entre las 

categorías existentes de forma que una determinada muestra se clasifica dentro 

de alguno de los tipos existentes. Los métodos discriminantes que más se han 

empleado son el Análisis Discriminante Lineal (ADL) y su derivado Análisis 

Discriminante Cuadrático (ADC). También son importantes el método de los 

vecinos más cercanos, más conocido como KNN (de las siglas en inglés de knearest 

neighbour), los métodos de densidad y los basados en el análisis 

discriminante mediante regresión. Este último se basa en una calibración donde 

las variables de salida son discretas y toman el valor de la clase a la que 

corresponden. 

En los métodos de modelado, a diferencia de los anteriores, se define un 

volumen de espacio para cada una de las clases, de tal forma que puede haber 

muestras que no pertenezcan a ninguna clase. La filosofía de este tipo de métodos 

es clasificar según la similitud dentro de una misma clase, y no por 

discriminación entre clases. La clasificación de las muestras desconocidas se 

basa en los valores de distancia a una referencia distinta según el modelo usado. 

Se emplean diferentes tipos de medida de distancia, entre las que se encuentran la 

distancia euclidea y la de Mahalanobis [200,201]. El método de modelado más 

empleado para la clasificación de muestras según datos espectroscópicos es el 

68

Introducción 

método SIMCA (de las siglas en inglés de Soft Independent Modelling of Class 

Analogies). 

En análisis cualitativo existen diferentes métodos basados en la técnica 

de redes neuronales. Estos métodos pueden ser tanto no supervisados (por 

ejemplo, las Redes Kohonen), como supervisados (por ejemplo, la máquina de 

aprendizaje lineal, más conocida por sus siglas en inglés, LLM, de Linear 

Learning Machine). Este último se considera uno de los primeros de clasificación 

supervisados. 

A continuación se exponen con mayor detalle los métodos que se han 

utilizado en la investigación presentada en esta Memoria. 

5.3.1 Análisis de clusters 

En este grupo se encuentra una serie de métodos que permiten, a partir del 

cálculo de distancias entre objetos o muestras, agruparlos según la similitud o 

diferencia entre ellos. La salida más frecuente de estos métodos suele ser un 

dendograma que muestra los agrupamientos de los objetos o muestras. El espacio 

multidimensional en el que se calculan las distancias es el formado por las m 

longitudes de onda, es decir, no se lleva a cabo ninguna reducción de variables 

[202, 173]. 

5.3.2 Análisis de componentes principales (ACP) 

El ACP se define en la norma ASTM (1990) [203] como el procedimiento 

matemático que se emplea para transformar un conjunto de datos en nuevas 

variables ortogonales, denominadas componentes principales (CPs), siendo estas 

últimas una combinación lineal de las variables de partida (cada longitud de onda 

en el caso de las técnicas espectroscópicas). Es decir, en el ACP se realiza una 

síntesis de nuevas variables, intentando explicar cada una de estas nuevas 

variables la máxima variación de los datos. El número de variables finales es 

menor que el número de variables de partida, por lo que, junto con algunos 

métodos discriminantes, se les denomina métodos de reducción de variables. 

69


Existen variados criterios para seleccionar el número de CP a partir del cual el 

modelo no aporta más información, es decir, la varianza explicada por dos 

componentes consecutivas es prácticamente la misma. 

Tres causas justifican el uso del ACP en el tratamiento de la información 

espectroscópica. La primera es el alto grado de colinealidad presente en el 

espacio original de los datos, hecho éste que se elimina con una de las 

características de las nuevas variables construidas: son ortogonales, y por lo 

tanto, linealmente independientes. En segundo lugar, cabe destacar el reducido 

número final de variables, siendo más fácil la visualización de tendencias de los 

datos en este espacio que en el de partida. En tercer lugar, y relacionado con la 

anterior característica, la reducción de variables es clave para el uso de las CP en 

la construcción de modelos, tanto cuantitativos como cualitativos, usando 

métodos basados en el ACP. De nuevo, el número grande de variables iniciales 

supone prácticamente la imposibilidad de llevar a cabo la etapa de calibración o 

aprendizaje (en términos cuantitativos o cualitativos, respectivamente) por el 

número de muestras que se necesitarían. 

Se ha comentado que una CP es una combinación lineal de las m 

longitudes de onda iniciales, como se observa en la siguiente expresión: 

70 

CPi = ai1 * λ1 + ai2 * λ2 +… + aim * λm (3) 

La primera CP es el resultado de la combinación lineal de las longitudes de onda 

de forma que ésta explique la máxima variación presente en los datos. La 

segunda CP se elige para que explique de nuevo la máxima variabilidad de los 

datos, una vez restada la variabilidad explicada por la primera, y con la condición 

de ser ortogonal con la primera. Así se procede hasta que no hay diferencia entre 

la varianza explicada por dos componentes consecutivas. 

Los coeficientes de las longitudes de onda para cada componente 

principal se denominan loadings o pesos. Son interesantes los gráficos donde se 

representan los pesos frente a las longitudes de onda, ya que se puede deducir a

Introducción 

partir de ellos las zonas espectrales que más interesan por su elevado peso en una 

determinada CP. 

Los scores o autovalores son las coordenadas de las muestras en los 

nuevos ejes formados por las CP. Cuando los objetos o muestras se sitúan en el 

nuevo espacio, denominado gráfico de scores, existe la posibilidad de distinguir 

agrupamientos en los objetos, lo que resulta imposible en el anterior espacio. 

5.3.3 Análisis discriminante lineal (ADL) y análisis discriminante cuadrático 

(ADC) 

En este grupo, las categorías o clases existentes en el colectivo de aprendizaje se 

separan a partir de una combinación lineal de variables que minimiza la 

variabilidad dentro de las muestras de una misma clase y la maximiza cuando se 

trata de grupos diferentes [204,205]. A partir de estas combinaciones lineales se 

genera un número de funciones lineales una unidad menor que el número de 

clases. Por lo tanto, y junto con el ACP, también es un método en el que se lleva 

a cabo una reducción de variables. 

Las nuevas funciones, también denominadas variables latentes, se 

seleccionan teniendo en cuenta las direcciones con las que se consigue la máxima 

separación entre las clases. Ésta es la principal diferencia del análisis 

discriminante respecto al ACP, pues en este último la obtención de las CP está 

basada en la búsqueda de direcciones que expliquen la mayor variabilidad de los 

datos, lo que resulta del hecho de que un método es supervisado y el otro no. 

Se asume una distribución normal de las variables empleadas y la 

igualdad entre las matrices de covarianza, empleando como criterio de 

clasificación la distancia de Mahalanobis. El hecho de asumir la igualdad de las 

matrices de covarianza puede conducir a errores, ya que no siempre se alcanza la 

igualdad, surgiendo una variante del ADL. En este tipo de métodos se generan 

funciones cuadráticas, de ahí el nombre de análisis discriminante cuadrático. 

71


5.3.4 SIMCA (Soft Independent Modelling of Class Analogies) 

SIMCA es un método de modelado [206,207] en el que se realiza un ACP para 

cada una de las clases que se pretende modelar. Existen tres posibilidades a la 

hora de clasificar una muestra usando un modelo SIMCA: la muestra no 

pertenece a ninguna de las clases que componen el sistema de modelado, 

pertenece a una de ellas o, puede pertenecer a dos o más clases. La última opción 

se dará siempre que solapen dos clases y la muestra a clasificar se ubique en la 

zona de solapamiento. 

Hay dos criterios para ubicar una muestra en una determinada clase: la 

distancia de la muestra al centro del modelo y la distancia de la muestra al 

modelo. En el primer caso, la distancia informa sobre la varianza que queda 

explicada por el modelo de una determinada clase cuando la muestra se proyecta 

sobre el espacio de componentes principales de esta clase. En la segunda opción, 

informa sobre la varianza que no ha podido explicarse por el modelo, 

denominándose ésta residual. 

5.4 Análisis cuantitativo 

En esta vertiente del análisis multivariante el espectro correspondiente a una 

muestra se utiliza para la determinación de sus propiedades físico-químicas. Es 

decir, pone de manifiesto la información que contiene el espectro sobre 

composición química de la muestra. 

Para la determinación de un analito es necesario un proceso de 

calibración (la construcción de una función que relacione el espectro con la 

concentración del compuesto o propiedad a determinar). El hecho de emplear las 

técnicas multivariantes conlleva el cambio del concepto de calibración. Así, el 

proceso de calibración univariante a través del cual se determina un analito 

empleando una recta de calibrado que se construye cada cierto tiempo es 

diferente al proceso empleado en los métodos multivariantes. 

72

Introducción 

La diferencia más importante es la derivada de la definición de 

calibración multivariante, es decir, el desarrollo de un modelo cuantitativo para la 

determinación de un conjunto de propiedades (y1, y2, …, yq) a partir de un número 

de variables predictoras (x1, x2, …, xp). Desde un punto de vista práctico, esta 

diferencia adquiere connotaciones triviales. Así, la principal diferencia pasa a ser 

que el objetivo de la calibración en tecnologías como NIRS, RMN, 

espectroscopía Raman, etc., es la construcción de ecuaciones globales y robustas. 

El primero de estos términos hace referencia a la aplicabilidad de las ecuaciones 

a la práctica totalidad de las muestras, mientras que el segundo indica el 

mantenimiento de la exactitud y precisión a lo largo del tiempo [208,209]. 

5.4.1 Métodos de regresión multivariante lineales 

Estos métodos se emplean cuando existe una relación lineal entre los datos 

espectrales (o su transformación a otro espacio de coordenadas, normalmente 

reducido) y la concentración de los analitos en las muestras. Los métodos lineales 

más empleados son la regresión lineal múltiple, la regresión por componentes 

principales y la regresión mediante mínimos cuadrados parciales, más conocidos 

por sus iniciales del inglés MLR, PCR y PLSR, respectivamente [54]. 

En la MLR el espectro se considera una función de la composición 

química de la muestra. Esta consideración hace que la MLR presente una serie de 

inconvenientes que limitan su aplicabilidad al uso de los datos espectroscópicos. 

Destacan la dificultad de su aplicación a sistemas complejos como son las 

muestras reales (el modelo debe recoger también los interferentes o los analitos 

que no son de interés) y la limitación impuesta por el número de muestras 

necesarias para la construcción del modelo, que debe ser como mínimo igual al 

de variables. 

Por lo arriba expuesto la MLR se lleva a cabo empleando un número 

reducido de longitudes de onda seleccionadas mediante un test F, de forma que 

sólo se tienen en cuenta aquellas longitudes de onda que presentan una alta 

correlación entre el dato de absorbancia y el analito a determinar. Por estos 

73


inconvenientes, además de la utilidad que supone el uso de un método de 

reducción de variables para la eliminación de la colinealidad de los datos 

espectroscópicos, se recurre a los métodos PCR y PLSR. 

Los dos métodos son parecidos en cuanto a que ambos construyen un 

nuevo espacio de variables reducido respecto al espacio inicial formado por todas 

las longitudes de onda que conforman el espectro. La diferencia radica en la 

forma de construcción de las nuevas variables: en la PCR, basada en el ACP, se 

construyen de forma que explique la mayor variabilidad de los datos 

espectroscópicos; en la PLSR, junto a la variabilidad de los datos 

espectroscópicos, se tiene en cuenta también la variación de los datos químicos 

de las muestras usadas para la construcción del modelo. Tanto PCR como PLSR 

consideran la composición función del espectro, lo que elimina la limitación de 

que el número de muestras tiene que ser mayor que el número de variables, 

aunque ya se haya reducido drásticamente el número de estas últimas. 

5.4.2 Métodos de regresión multivariante no lineales 

Son numerosas las aplicaciones recogidas en la bibliografía analítica que versan 

sobre la determinación cuantitativa mediante el uso de redes neuronales. Su 

interés es capital para los sistemas en los que los datos del espectro no tienen una 

relación lineal con los datos químicos y, por tanto, los métodos expuestos 

anteriormente no son adecuados [210]. 

Son varios los tipos de redes empleados, aunque en número de 

aplicaciones destaca el tipo MLF, del inglés Multilayer Feed Forward [211,212]. 

Últimamente está muy extendido también el uso de las redes RBF, del inglés 

Radial Basis Function [213,214]. Son múltiples las posibilidades de desarrollo de 

modelos de determinación basados en redes según el tipo empleado, el algoritmo 

de aprendizaje usado, etc. 

74

5.5 Metodología general en el desarrollo de métodos de determinación 

Introducción 

Son dos las etapas generales a considerar en el desarrollo de un método de 

determinación, ya sea cualitativo o cuantitativo: la etapa de calibración o 

aprendizaje (el primer término se usa en análisis cuantitativo y el segundo en 

cualitativo) y la etapa de validación [54,208,209]. En la primera se construyen las 

ecuaciones o modelos (nuevamente la diferente denominación distingue el tipo 

de análisis) y en la segunda se procede a la verificación de las propiedades 

requeridas para estas ecuaciones o modelos. La forma de llevar a cabo las dos 

etapas es diferente, aunque tienen puntos comunes, según el análisis sea 

cuantitativo o cualitativo. 

5.5.1 Selección de los grupos de aprendizaje o calibración y de validación 

En el análisis cuantitativo tiene una importancia clave la selección del grupo de 

calibración, ya que el que se cumpla o no el objetivo de construir ecuaciones 

globales depende de la representatividad (una de las dos propiedades analíticas 

supremas) de las muestras empleadas. 

El número de muestras en la calibración tiene que ser elevado para 

aumentar la capacidad de determinación de las ecuaciones. La obtención de los 

datos químicos de referencia conlleva la mayoría de las veces métodos lentos y 

costosos (de ahí el interés de su sustitución por métodos rápidos basados en la 

espectroscopía y en el análisis multivariante). Este hecho limita el número de las 

muestras empleadas en la fase de calibración. 

Más importante incluso que un número elevado de muestras es la 

representatividad de éstas. Así, es necesario recoger la variabilidad química 

existente en las muestras y todas las fuentes de variación en el análisis de 

muestras futuras (variaciones en la forma y ciclos de producción, procedencia de 

las muestras, etc.) [215-217]. Son numerosos los métodos propuestos para 

seleccionar el conjunto de calibración, estando la mayoría basados en un análisis 

75


de componentes principales seguido del cálculo, generalmente, de la distancia de 

Mahalanobis. 

En el análisis cualitativo los criterios de selección del conjunto de 

aprendizaje varían ligeramente, siendo la representatividad de las muestras de 

nuevo la clave para la fiabilidad de los modelos a construir. Como existe una 

serie de categorías, las muestras de cada tipo deben de estar en una proporción 

homogénea, cubriendo la máxima variabilidad dentro de una misma clase. 

5.5.2 Detección de anómalos espectrales y anómalos químicos 

En el desarrollo de los modelos se suele observar la existencia de una serie de 

muestras que se diferencian del resto debido a anomalías de la información tanto 

espectral como química. Estas anomalías reducen la capacidad de los modelos, 

por lo que se procede a una etapa de estudio de los denominados outliers 

[218,219,189,191,192,198]. 

Las anomalías espectrales se pueden dar a una determinada longitud de 

onda, a varias, o en todo el espectro. Para su detección existe, por un lado, una 

serie de métodos basados en el cálculo de distancias en espacios n-dimensionales, 

como pueden ser la de Mahalanobis o el estadístico leverage. Por otra parte 

existen métodos que se basan en el cálculo de residuales en los datos 

espectroscópicos, es decir, espectros que presentan una variabilidad significativa 

que no explica el modelo propuesto. El espacio n-dimensional usado suele ser el 

derivado del ACP. 

Las anomalías químicas se dan cuando algunas muestras exhiben 

diferencias significativas en los datos de composición respecto al resto del 

conjunto de calibración. El dato diferente puede ser el aportado por el método de 

referencia o bien por la ecuación de calibración. Por tanto, es un tipo de 

anómalos que se da sólo en el análisis cuantitativo. La detección de los anómalos 

químicos se lleva a cabo, principalmente, mediante el cálculo de residuales en los 

datos de composición. El test T, que está basado en la diferencia entre el valor de 

76

Introducción 

referencia y el valor estimado, además de la dispersión de los datos, es el más 

empleado para la detección de los outliers químicos. 

Una vez detectados los anómalos se procede a un estudio de las causas 

de la desviación que presentan, intentando repetir el análisis de referencia o el 

espectral siempre que sea posible. 

5.5.3 La etapa de calibración o aprendizaje 

Una vez eliminada la influencia de los outliers en el colectivo de calibración o 

aprendizaje se realiza la etapa correspondiente (calibración o aprendizaje). Si el 

análisis es cuantitativo la calibración puede evaluarse internamente mediante el 

coeficiente de determinación y el error estándar obtenido en la calibración. Otros 

parámetros de la correlación entre el método de referencia y el método 

espectroscópico son la pendiente y la ordenada en el origen, pero se aplican más 

en la etapa posterior de validación. 

En análisis cualitativo se establecen las reglas de clasificación, no 

obteniéndose en esta etapa estadísticos que indiquen la fiabilidad de las reglas. 

La etapa posterior de validación sí proporcionará los parámetros apropiados para 

la evaluación de las reglas. De todas formas, el realizar un análisis exploratorio 

(ACP y análisis de clusters) antes de la construcción de las reglas da una idea 

general sobre las características de los futuros modelos. 

5.5.4 La etapa de validación en análisis cuantitativo 

En la etapa de validación se mide la capacidad de predicción de los modelos 

construidos mediante diferentes criterios estadísticos basados en la diferencia 

entre los valores de referencia y los valores estimados por los métodos 

espectroscópicos. 

Para ello se emplea un conjunto de muestras que, con la excepción de la 

validación cruzada, está formado por muestras que no han intervenido en la 

anterior etapa de calibración, denominándose a éste, conjunto de validación. 

77


La modalidad de validación cruzada fue descrita por Stone en 1974 

[220], y es de gran utilidad en los casos en que el número de muestras de las que 

se posee información química es reducido. Para paliar este inconveniente se 

forma un conjunto de calibrado del que se extrae un pequeño subconjunto de 

muestras que no van a intervenir en la calibración y sí en la validación del 

modelo. Este proceso se repite hasta que todas las muestras que forman el 

conjunto global de calibración se han utilizado una vez para la validación. El 

modelo final es la media de los submodelos formados en cada iteración. 

Las propiedades analíticas exactitud y precisión son los indicadores 

empleados para la evaluación del error y la capacidad de determinación del 

nuevo método. En métodos cuantitativos, el coeficiente y el error de la 

determinación, y la pendiente y la ordenada en el origen de la correlación entre el 

método de referencia y el método espectroscópico son los parámetros estadísticos 

usados en el estudio de la exactitud y precisión del método. 

El Error Típico de la Calibración (ETC) es un estimador usado en la 

propia calibración, por lo que suele ser sobrestimado respecto al Error Típico de 

la Predicción (ETP), ya que éste tiene en cuenta muestras no usadas en la etapa 

de calibración, por lo que es mucho más útil que el ETC. Si se trabaja con 

validación cruzada, el parámetro se denomina Error Típico de la Validación 

Cruzada (ETVC) [189,192,209,219]. 

El ETP puede dividirse en dos partes: el error aleatorio o no explicado, y 

el error sistemático o sesgo [221,54]; lo que puede observarse en la conocida 

fórmula de partición de la varianza del error: 

78 

ETP 2 = sesgo 2 + ETP(C) 2 (4) 

Donde el ETP(C) informa sobre los errores aleatorios. Algunos autores afirman 

que el ETP representa la exactitud de una ecuación, mientras que el ETP(C) 

evalúa la precisión [189]. En la literatura sobre calibración multivariante no hay 

un uso constante de la denominación de estos estadísticos, denominando algunos

Introducción 

autores y algunos programas informáticos (por ejemplo el Unscrambler) ETP(C) 

al ETP. 

Otros autores centran el interés del ETP en la influencia de la calidad del 

dato de referencia [222]. Así, el ETP se expresa de la siguiente forma: 

ETP 2 = ETL 2 + ETEsp 2 + ETModelo 2 (5) 

Donde el ETL es el error típico del método de referencia, ETEsp es el error típico 

debido a la medida espectral, y ETModelo es el error introducido por el método 

quimiométrico. De los tres, y gracias a las mejoras alcanzadas en las técnicas 

espectrales y en los métodos quimiométricos, es el error asociado al método 

analítico de referencia el que tiene un mayor peso en la fórmula anterior. 

Se proponen diferentes formas de evaluar y comparar el coeficiente de 

determinación y el ETP en el desarrollo de una aplicación cuantitativa basada en 

datos espectroscópicos, la mayoría de las veces datos NIR. 

La pendiente y la ordenada en el origen de la recta de correlación entre el 

método espectroscópico y el método de referencia también son dos parámetros 

importantes para la visualización de errores sistemáticos. La pendiente tiene que 

ser igual a uno y la ordenada tiene que ser igual a cero para una perfecta 

correlación entre los métodos. Obviamente éste es un caso ideal, siendo necesaria 

la realización de un test de hipótesis para ver si son estadísticamente igual a uno 

y a cero. Algunas organizaciones han redactado un protocolo para la validación 

de nuevos métodos de análisis frente a los de referencia u oficiales. Estos 

protocolos suelen recoger la realización del test T para diferentes intervalos de 

confianza y grados de libertad. En la investigación recogida en esta Memoria se 

ha utilizado el protocolo propuesto por la Oficina Internacional de la Viña y del 

Vino (OIV) [223], ya que los métodos desarrollados corresponden al campo 

enológico. 

79


5.5.5 La etapa de validación en el análisis cualitativo 

En la validación de una aplicación cualitativa el error es el principal indicador de 

la fiabilidad [224]. El error en este caso se calcula de una forma mucho más 

simple, ya que se atribuye al porcentaje de muestras clasificadas de forma 

incorrecta. El proceso de validación puede ser externo (con muestras que no han 

participado en el establecimiento de las reglas de clasificación) o llevarse a cabo 

mediante validación cruzada. 

Cuando se trata de un método de modelado en el que las muestras no 

tienen forzosamente por qué pertenecer a una clase, este estadístico tiene que ser 

modificado para adaptarlo al nuevo modelo. Surgen los conceptos de falsos 

positivos y falsos negativos. Los primeros corresponden a aquellas muestras que 

se clasifican dentro de una clase sin pertenecer a ella, mientras que los falsos 

negativos son las muestras que no se clasifican en sus respectivas clases. 

Varios autores proponen estrategias para el desarrollo y evaluación de 

modelos cualitativos [189,207]. 

6. Química computacional: índices de similitud y huellas 

80 

digitales 

La química computacional engloba una serie de métodos matemáticos y 

estadísticos, implantados en el ordenador mediante algoritmos, que se emplean 

en diversas áreas de la química, la mayoría de carácter teórico. Entre estas áreas 

destacan la mecánica cuántica, la determinación estructural, y el estudio de 

reacciones químicas [225,226]. 

El campo de aplicación se extendió ya desde su creación, abarcando 

áreas de índole más práctica como pueden ser la espectroscopía, la química 

analítica (métodos de regresión y de reconocimiento de patrones), y la química 

farmacéutica (QSAR, de las siglas Quantitative Structure-Activity Relationships, 

y química combinatoria) [227-232].

Introducción 

Los métodos y herramientas utilizados en la química computacional son 

de muy variada naturaleza. Por un lado se emplean las herramientas y métodos 

propiamente informáticos, como son las bases de datos, la lógica difusa, la teoría 

de la información, el hardware, etc.; por otro lado es necesario el desarrollo de 

utilidades algorítmicas que implantan una serie de métodos matemáticos y 

estadísticos con varios fines. De éstos destacan el cálculo de descriptores 

moleculares, los métodos de regresión (esta parte es quimiométrica pues se 

suelen usar regresiones PLS), el cálculo de similitudes, etc. 

6.1 Aplicaciones de la química computacional 

Especial relevancia está adquiriendo hoy en día el diseño de fármacos por 

ordenador, que ha convertido la química computacional en una de las 

herramientas clave en la industria farmacéutica [233-236]. Los métodos 

computacionales han elevado el papel de la informática en este sector, dejando de 

lado el concepto de almacenamiento de los compuestos moleculares que poseían 

las bases de datos y beneficiándose de las técnicas de minería de datos (data 

mining). Estas últimas permiten obtener compuestos de una base de datos 

mediante consultas cuyo criterio se centra en la actividad biológica de un 

determinado tipo de fármacos. Así, se pueden obtener los compuestos y sus 

propiedades sin el costoso proceso de su síntesis y purificación. 

Gran parte de los avances de la química computacional en farmacia son 

debidos al desarrollo de las bibliotecas de química combinatoria, que es una de 

las modalidades de construcción de bases de datos en química computacional. Se 

basa en el almacenamiento de pequeños fragmentos o subestructuras que se 

utilizan para hacer una síntesis de un compuesto de manera automática siguiendo 

una serie de reglas [237,238,232]. 

La teoría de la información basada en el cálculo de probabilidades, 

incertidumbre y entropía, también se ha usado con diversos fines en química. 

Así, se ha utilizado en la identificación de sustancias en cromatografía de capa 

81


fina, en la obtención de la máxima información de procedimientos que combinan 

diversos métodos analíticos, elección de fases móviles en cromatografía de gases, 

análisis de datos, etc. [239-245]. 

Los conjuntos difusos, más conocidos por su acepción inglesa fuzzy sets, 

permiten modelar situaciones analíticas imprecisas. La definición de los 

conjuntos difusos y las operaciones a las que se someten fueron propuestas por 

Zadeh en 1965 [246]. Su utilidad radica en la resolución de problemas en los que 

un elemento no se puede asignar a un determinado conjunto de forma discreta y 

clara. Han encontrado aplicaciones tanto en quimiometría (en la identificación de 

patrones y en los métodos de regresión) [247,248], como en el desarrollo de 

sistemas expertos [249]. 

El cálculo de similitud estructural y espectral es una de las últimas 

tendencias en química computacional [250-256]. Se va a tratar en mayor 

extensión al ser una de las herramientas usadas por el doctorando. 

6.2 Índices de similitud y huellas digitales 

6.2.1 Cálculo de similitud estructural 

El cálculo de similitudes se ha utilizado en aspectos estructurales 

mayoritariamente, aunque últimamente también se ha aplicado en cálculos de 

similitud espectral. Se han propuesto numerosos parámetros para medir la 

similitud entre dos objetos, destacando en química computacional los 

denominados índices de similitud [257]. El más utilizado es el Índice de 

Tanimoto (IT), que calcula la similitud entre dos cadenas de bits de igual tamaño, 

A y B, mediante la siguiente expresión: 

donde: 

82 

c 

= 

a + b − c 

TA, B 

(6) 

• a es el número de bits igual a 1 en la cadena A.

• b es el número de bits iguales a 1en la cadena B. 

Introducción 

• c es el número de bits iguales a 1 comunes en las cadenas A y B, es 

decir, que están en la misma posición. 

El IT varía entre cero y uno, de forma que cuanto mayor sea el valor, más 

similitud exhiben las cadenas. La forma de construir las cadenas de bits es 

diferente según el área donde se aplica el método de Tanimoto. 

En el cálculo de la similitud estructural entre dos compuestos, las 

cadenas de bits son representativas de un determinado compuesto, recibiendo el 

nombre de huellas digitales o, más comúnmente, el de su traducción inglesa, 

fingerprints. Se construyen a partir de la tabla de conexión (teoría de grafos), que 

a su vez deriva de la estructura en dos dimensiones del compuesto químico. 

Posibilita la transformación de las complejas estructuras moleculares a cadenas 

de bits, que se pueden tratar fácilmente por ordenador. La principal aplicación 

radica en la búsqueda de compuestos químicos en las bases de datos. Se reduce el 

tiempo de consulta gracias a que la comparación se lleva a cabo sólo entre la 

estructura problema y los compuestos que presentan un índice de similitud igual 

o mayor que un valor límite que introduce el usuario. 

6.2.2 Calculo de similitud espectral 

El cálculo de la similitud espectral se ha utilizado la mayoría de las veces con el 

fin de identificar compuestos a través de la interpretación del espectro o la 

comparación del espectro problema con los almacenados en una base de datos de 

espectros. En la bibliografía aparecen descritos varios parámetros para medir la 

similitud espectral en la espectroscopía de infrarrojo y en la de masas [258]. 

En otras aplicaciones se ha usado el cálculo de la similitud espectral 

junto con otras técnicas computacionales y quimiométricas para la siempre difícil 

tarea de interpretar el espectro [259-261] con el fin de elucidar una estructura 

molecular no almacenada en la base de datos. Con este objetivo, algunos autores 

han utilizado los dos tipos de cálculo de similitud (el estructural y el espectral) 

83


[255,256]. Sin embargo, no se ha encontrado una buena relación entre ambos 

tipos de similitudes. 

6.2.3 Cálculo de similitud en química analítica 

El cálculo de similitudes es interesante en análisis cualitativo en química 

analítica al proporcionar un valor de semejanza entre dos muestras. El cálculo de 

distancias (en diferentes espacios multidimensionales), como se vio en el 

apartado de quimiometría, es el usado en la casi totalidad de los casos. Aparte de 

las redes, no son usuales los métodos de cálculo de similitud basados en 

parámetros distintos a la distancia. 

Sólo una aplicación [262] aparece en la bibliografía del uso del índice de 

Tanimoto para la clasificación de muestras a partir de los datos obtenidos por 

cromatografía de gases. Por tanto, es interesante el estudio de las formas de 

construcción de huellas digitales a partir de espectros para la clasificación de 

muestras, siendo éste un campo a desarrollar en los próximos años, además de la 

búsqueda de nuevos índices o parámetros de similitud. 

84

7. Referencias 

Referencias 

[1] A. Toffler, “La Tercera Ola”, Plaza & Janes S.A., Barcelona, 1980. 

[2] J. Naisbitt, “Megatrends, Ten New Directions Transforming Our Lives”, 

Warners Books, Nueva York, 1982. 

[3] S. Green, “Information Systems Design”, Thomson Computer Press, 

Londres, 1996. 

[4] I. Luque Ruiz, M.A. Gómez-Nieto, “Ingeniería del Software: Fundamentos 

para el Desarrollo de Sistemas Informáticos”, Servicio de Publicaciones de la 

Universidad de Córdoba, Córdoba, 1999. 

[5] G.E. Bailescu, R.A. Chalmers, “Education and Training in Analytical 

Chemistry”, Ellis Horwood, Chichester, 1982. 

[6] A. Prieto Espinosa, A. Lloris Ruiz, J.C. Torres Cantero, “Introducción a la 

Informática”, Tercera Edición, McGraw-Hill, Madrid, 2002. 

[7] J. Agar, “Turing and the Universal Machine: the Making of the Modern 

Computer”, Icon Books, Cambridge, 2001. 

[8] J.G. Brookshear, “Computer Science: an Overview”, Sixth Edition, Addison- 

Wesley, 2000. 

[9] F. Vogt, M. Karlowatz, M. Jakusch, B. Mizaikoff, Analyst 128(4) (2003) 

397. 

[10] D.L. Pfeil, A. Reed, Int. Lab. 32(2) (2002) 23. 

[11] J.Y.K. Hsieh, L. Lin, W. Fang, B.K. Matuszewski, J. Liq. Chromatogr. 

Relat. Technol. 26 (2003) 895. 

[12] L. Wendler, J. Miller, Am. Lab. (Shelton, Conn) 33(16) (2001) 18, 20, 

22, 24. 

[13] J.Y. Neira, N. Reyes, J.A. Nóbrega, Lab. Rob. Autom. 12(5) (2000) 246. 

[14] A. MacDonald, Am. Lab. (Shelton, Conn) 28(8) (1996) 29. 

[15] R. Raso, H.W. Fehlhaber, Rapid Commun. Mass Spectrom. 9 (1995) 

1400. 

[16] K. Dettmer, L. Stieglitz, Chemosphere 29 (1994) 1789. 

85


[17] R.C. Luders, L.A. Brunner, J. Chromatogr. Sci. 25(5) (1987) 192. 

[18] A.M. Tabert, J. Griep-Raming, A.J. Guymon, R.G. Cooks, Anal. Chem. 

75 (2003) 5656. 

[19] Y.Q. Xia, J.D. Miller, R. Bakhtiar, R.B. Franklin, D.Q. Liu, Rapid 

Commun. Mass Spectrom. 17 (2003) 1137. 

[20] Z. Karpas, W. Chaim, R. Gdalevsky, B. Tilman, A. Lorber, Anal. Chim. 

Acta 474(1) (2002) 115. 

[21] T.L. Buxton, P. de B. Harrington, Appl. Spectrosc. 57(2) (2003) 223. 

[22] J.R. Johnson, F.Y. Meng, A.J. Forbes, B.J. Cargile, N.L. Kelleher, 

Electrophoresis 23 (2002) 3217. 

[23] C.E. Lenehan, N.W. Barnett, S.W. Lewis, J. Autom. Methods Manage. 

Chem. 24(4) (2002) 99. 

[24] T.R. McJunkin, P.L. Tremblay, J.R. Scott, J. Assoc. Lab. Automat. 7(3) 

(2002) 76. 

[25] H.T. Chueh, J.V. Hatfield, Sens. Actuators, B 83B(1-3) (2002) 262. 

[26] O.O. Soyemi, M.A. Busch, K.W. Busch, J. Chem. Inf. Comput. Sci. 40 

(2000) 1093. 

[27] E. Maire, E. Lelievre, D. Brau, A. Lyons, M. Woodward, V. Fafeur, 

Anal. Biochem. 280(1) (2000) 118. 

[28] A. Krauss, U. Weimar, W. Goepel, Trends Anal. Chem. 18 (1999) 312. 

[29] M.M. Gónzalez-García, F. Sánchez-Rojas, C. Bosch-Ojeda, A. García de 

Torres, J.M. Cano Pavón, Anal. Bioanal. Chem. 375 (2003) 1229. 

[30] E. Becerra, A. Cladera, V. Cerdá, Lab. Rob. Autom. 11(3) (1999) 131. 

[31] G.K. Taylor, Y.B. Kim, A.J. Forbes, F.Y. Meng, R. McCarthy, N.L. 

Kelleher, Anal. Chem. 75 (2003) 4081. 

[32] D. Chelius, T. Zhang, G.H. Wang, R.F. Shen, Anal. Chem. 75 (2003) 

6658. 

[33] J. Gallardo, S. Alegret, R. Muñoz, M. de Román, L. Leija, P.R. 

Hernández, M. del Valle, Anal. Bioanal. Chem. 377(2) (2003) 248. 

86


[34] F. Dieterle, B. Kieser, G. Gauglitz, Chemom. Intell. Lab. Syst. 65(1) 

(2003) 67. 

[35] P.S. Williams, M.C. Giddings, J.C. Giddings, Anal. Chem. 73 (2001) 

4202. 

[36] P. Courcoux, M.F. Devaux, B. Bouchet, Chemom. Intell. Lab. Syst. 

62(2) (2002) 103. 

[37] K.P. Hinz, M. Greweling, F. Drews, B. Spengler, J. Am. Soc. Mass 

Spectrom. 10 (1999) 648. 

[38] H. Masui, M. Yoshida, J. Chem. Inf. Comput. Sci. 36(2) (1996) 294. 

[39] J. Moore, P. Solanki, R.D. McDowall, Chemom. Intell. Lab. Syst. 31(1) 

(1995) 43. 

[40] WinISI software, Infrasoft International, Port Matilda, PA, EE.UU. 

[41] The Unscrambler, Camo Process AS, Oslo, Noruega. 

[42] Matlab: the Language of Thecnical Computing, The MathWorks, 

http://www.mathworks.com. 

[43] A. Savitzky, M.J.E. Golay, Anal. Chem. 36 (1964) 1627. 

[44] C.G. Enke, T.A. Nieman, Anal. Chem. 48 (1976) 705-A. 

[45] J. Wang, S. Bollo, J.L.L. Paz, E. Sahlin, B. Mukherjee, Anal. Chem. 71 

(1999) 1910. 

[46] R.N. Bracewell, “The Fourier Transform and its Applications”, McGraw- 

Hill, Nueva York, 1986. 

[47] G. Quintas, A. Morales Noe, S. Armenta, S. Garrigues, M. de la Guardia, 

Anal. Chim. Acta 502(2) (2004) 213. 

[48] W.D. Cao, X.Y. Chen, X.R. Yang, E.K. Wang, Electrophoresis 24 (2003) 

3124. 

[49] J.A. McReynolds, P. Edirisinghe, S.A. Shippy, Anal. Chem. 74 (2002) 

5063. 

[50] D. Graupe, “Identification of Systems”, Krieger, Nueva York, 1976. 

[51] R. Ergon, K.H. Esbensen, J. Chemom. 16 (2002) 401. 

[52] T.L. Cecil, R.B. Poe, S.C. Rutan, Anal. Chim. Acta 250(1) (1991) 37. 

87


[53] S. Wold, Kemisk Tidskr. 3 (1972) 34. 

[54] D.L. Massart, B.G.M. Vandeginsten, S. Buydens, S. De Jong, P.J. Lewi, 

J. Smeyers-Verbeke, “Handbook of Chemometrics and Qualimetrics: Parts A 

and B”, Elsevier, Amsterdam, 1998. 

[55] K.H. Esbensen, “Multivariate Data Analysis - in Practice”, Camo Process 

AS, Oslo, 2002. 

[56] J.N Miller, J.C. Miller, “Estadística y Quimiometría para Química 

Analítica”, Prentice Hall, Madrid, 2002. 

[57] R. Andreu, J.E. Ricart, J. Valor, “Estrategias y Sistemas de Información”, 

McGraw-Hill, Madrid, 1996. 

[58] M. Bitton, Spectra Anal. 31(229) (2002) 37. 

[59] T. Wessa, C. Wegner, Spectra Anal. 31(229) (2002) 34. 

[60] V. Kershner, Am. Lab. (Shelton, Conn) 34(17) (2002) 22. 

[61] J.A. Schibler, Am. Lab. (Shelton, Conn). 32(6) (2000) 52, 54-56. 

[62] H.M.J. Goldschmidt, M.J.T. Cox, R.J.E. Grouls, W.A.J.H. van de Laar, 

G.G. van Merode, Lab. Autom. Inf. Manage. 34(1) (1999) 1. 

[63] R. Pavlis, Int. Labmate. 22(4) (1997) 24. 

[64] M. Hinton, P.R. Hinton, S. Williams, Am. Lab. (Shelton, Conn) 28(2) 

(1996) 51. 

[65] M. Pradella, R. Dorizzi, A. Burlina, Chemom. Intell. Lab. Syst. 17(2) 

(1992) 187. 

[66] R. Megargle, Anal. Chem. 61 (1989) 612A-614A, 616A-618A, 620A. 

[67] T. Staab, T. Shiina, D. Miller, J. Assoc. Lab. Autom. 8(6) (2003) 107. 

[68] S. Cross, S. Ahmed, Lab. Update. Dec (2000) 16. 

[69] B.K. Alsberg, R. Goodacre, J.J. Rowland, D.B. Kell, Anal. Chim. Acta 

348(1-3) (1997) 389. 

[70] P. Vankeerberghen, J. Smeyers-Verbeke, D.L. Massart, J. Anal. At. 

Spectrom. 11(2) (1996) 149. 

[71] M. Ivandic, W. Hofmann, W.G. Guder, Clin. Chem. (Washington, DC) 

42 (1996) 1214. 

88


[72] P. Hubert, P. Chiap, M. Moors, B. Bourguignon, D.L. Massart, J. 

Crommen, J. Chromatogr. A 665 (1994) 87. 

[73] R. Wehrens, P. Van-Hoof, L. Buydens, G. Kateman, M. Vossen, W.H. 

Mulder, T. Bakker, Anal. Chim. Acta 271(1) (1993) 11. 

[74] J. Klaessens, B. Vandeginste, G. Kateman, Anal. Chim. Acta 223(1) 

(1989) 205. 

[75] B.W. Boehm, IEEE Trans. Comput. C-25 (1976) 1226. 

[76] R.S. Pressman, “Software Engineering: a Practitioner’s Approach”, 

McGraw-Hill, Nueva York, 1997. 

[77] D.C. Ince, “Ingeniería del Software”, Addison-Wesley Iberoamericana, 

Buenos Aires, 1993. 

[78] I. Sommerville, “Software Engineering”, Addison-Wesley, Wokingham, 

1996. 

[79] W.W. Royce, “Managing the Development of Large Software Systems: 

Concepts and Techniques”, Proc. WESCON, 1970. 

[80] M. Hanna, Software Magazine, Mayo (1995) 38. 

[81] M. Bradac, D. Perry, L. Votta, IEEE Trans. Softw. Engineer. 20 (1994) 

774. 

[82] J. Kerr, R. Hunter, “Inside RAD”, McGraw-Hill, Nueva York, 1994. 

[83] J. Martin, “Rapid Application Development”, Macmillan International 

Editions, Nueva York, 1991. 

[84] F. Brooks, “The Mythical Man-Month”, Addison-Wesley, Reading, 

1995. 

[85] J. McDermid, P, Rook, “Software Development Process Models”, 

Software Engineer’s Reference Book, 15/26-15/28, CRC Press, 1993. 

[86] B. Boehm, Computer 21(5) (1988) 61. 

[87] A. Davis, “Software Requirements: Objects, Functions and States”, 

Prentice Hall, Englewood Cliffs, 1993. 

[88] R. Zultner, Am. Programmer, Febrero (1992) 28. 

89


[89] Y. Asao (ed.) “Quality Finction Deployment: Integrating Customer 

Requirements in Product Design”, Productivity Press, Cambridge, 1990. 

[90] T. DeMarco, “Structured Analysis and System Specification”, Prentice 

Hall, Englewood Cliffs, 1979. 

[91] M. Page-Jones, “The Practical Guide to Structured Systems Design”, 

Yourdon Press, Nueva York, 1980. 

[92] G. Booch, “Object Oriented Analysis and Design with Applications”, 

Benjamin Cummings, Redwood City, 1994. 

[93] P. Coad, E. Yourdon, “Object Oriented Analysis”, Prentice Hall, 

Englewood Cliffs, 1991. 

[94] K.S. Rubin, A. Goldberg, Communic. ACM 35(9) (1992) 48. 

[95] P. Chen, “The Entity-Relationship Approach to Logical Database 

Design”, QED Information Sciences, Wellesley, 1977. 

[96] D.J. Hatley, I.A. Pirbhai, “Strategies for Real-Time System 

Specification”, Dorset House, Nueva York, 1987. 

[97] P.T. Ward, S.J. Mellor, “Structured Development for Real-Time 

Systems”, Yourdon Press, Nueva York, 1985. 

[98] E.S. Taylor, “An Interim Report on Engineering Design”, Massachusetts 

Institute of Technology, Cambridge, 1959. 

[99] J. Warnier, “Logical Construction of Programs”, Van Nostrand Reinhold, 

Nueva York, 1974. 

[100] W. Stevens, G. Myers, L. Constantine, IBM Systems J. 13(2) (1974) 15. 

[101] O. Dahl, E. Dijkstra, C. Hoare, “Structured Programming”, Academic 

Press, Nueva York, 1972. 

[102] E. Gamma, R. Helm, R. Johnson, J. Vlissides, “Design Patterns”, 

Addison-Wesley, Reading, 1995. 

[103] I. Jacobson, “Object-Oriented Software Engineering”, Addison-Wesley, 

Reading, 1992. 

[104] C.H. Koebel, D.B. Loveman, R.S. Schreiber, G.L.S. Jr, M.E. Zosel, “The 

High Performance Fortran Handbook”, The Mit Press, Cambridge, 1994. 

90


[105] S.C. Herbert, “C: The Complete Reference Fourth Edition”. McGraw- 

Hill, Nueva York, 2000. 

[106] A. Goldberg, D. Robson, “Smalltalk-80: the Language”, Addison 

Wesley, Reading, 1989. 

[107] M. Campione, “The Java Tutorial Third Edition”. McGraw-Hill, Nueva 

York, 2002. 

[108] IP.G. Frankl, S. Weiss, IEEE Trans. Softw. Engineer. 19 (1993) 770. 

[109] S.C. Ntafos, IEEE Trans. Softw. Engineer. 16 (1988) 868. 

[110] P.G. Frankl, E.J. Weyuker, IEEE Trans. Softw. Engineer. 14 (1988) 

1483. 

[111] G. Myers, “The Art of Software Testing”, Wiley, Nueva York, 1979. 

[112] T. McCabe, IEEE Trans. Softw. Engineer. 2 (1976) 308. 

[113] E.V. Berard, “Essays on Object-Oriented Software Engineering”, 


[114] D. A. Taylor, “Object-Oriented Technology: A Manager’s Guide”, 


[115] M. Cashman, CM Softw. Engineer. Notes 14(6) (1989) 67. 

[116] G. Booch, IEEE Trans. Softw. Engineer. SE-12 (1986) 211. 

[117] O. Nierstrasz, S. Gibbs, D. Tsichritzis, Communic. ACM, 20(1) (1992) 

160. 

[118] E. Yourdon, Applic. Develop. Strateg. VI(12) (1994) 1. 

[119] J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, W. Lorensen, “Object- 

Oriented Modeling and Design”, Prentice-Hall, Englewood Cliffs, NJ, 1991. 

[120] J. Rumbaugh, I. Jacobson, G. Booch, “The Unified Modeling Language. 

Reference Manual”, Addison-Wesley, Reading, 1999. 

[121] G. Booch, J. Rumbaugh, I. Jacobson, “The Unified Modeling Language. 

User Manual”, Addison-Wesley, Reading, 1999. 

[122] I. Jacobson, G. Booch, J. Rumbaugh, “The Unified Software 

Development Process”, Addison-Wesley, Reading, 1999. 

91


[123] H. Schildt, P. Naughton, “Java. Manual de Referencia”, McGraw-Hill, 

Madrid, 1997. 

[124] J. Zukowski, “Java 2. J2SE 1.4”, Anaya-Multimedia, Madrid, 2003. 

[125] http://www.sun.com/ 

[126] B. Eckel, “Thinking in Java Third Edition”, Prentice Hall, Englewood 

Cliffs, 2003. 

[127] F.J. Ceballos Sierra, “El Lenguaje de Programación Java”, Ra-ma, 

Madrid, 2001. 

[128] P. Naughton, “The Java Handbook”, McGraw-Hill Osborne Media, 

1996. 

[129] “Java Network Programming, Second Edition”, O’ Reilly, Sebastopol, 

2001. 

[130] P.J. Perrone, S.R. Chaganti, T. Schwenk, “J2EE Developer’s Handbook”, 

Sams Publishing, 2003. 

[131] P. Wainwright, A. Ahmad, M. Link, P. Sarang, “Professional Apache 

2.0.” Wrox, 2001. (http://www.apache.org). 

[132] G. Reese, “Database Programming with JDBC and Java”, O’ Reilly, 

Sebastopol, 2001. 

[133] Jasnokwski, M. Java, XML, and Web Services Bible. Hungry 

92 

Minds, Wiley, Nueva York, 2002. 

[134] M. Akif, S. Bordead, A. Cioroianu, J. Hart, E. Jung, D. Writz, “Java y 

XML”, Anaya, Madrid, 2001. 

[135] G.H. Gonnet, R. Baeza-Yates, “Handbook of Algorithms and Data 

Structures”, Addison-Wesley, Reading, 1984. 

[136] C.J. Date, “An Introduction to Database Systems”, Sexta edición, 


[137] I. Luque Ruiz, M.A. Gómez-Nieto, E. López Espinosa, G. Cerruela 

García, “Bases de Datos: Desde Chen hasta Codd con Oracle”, Ra-Ma, 

Madrid, 2001.


[138] A. de Miguel, M. Piattini, “Fundamentos y Modelos de Bases de Datos”, 

Ra-Ma, Madrid, 1997. 

[139] G. Koch, K. Loney, “Oracle8i: The Complete Reference, 10th edition”, 

Oracles Press, Osborne-McGraw-Hill, 2000. 

[140] Informe del grupo de estudio de bases de datos del ANSI/X3/SPARC, 

“Reference Model for DBMS Standarisation”, Sigmo Record 15(1) (1986). 

[141] G.W. Hansen, J.V. Hansen, “Database Management and Design”, 2nd 

edition, Prentice-Hall, 1996. 

[142] C. Batini, S. Ceri, S. Navathe, “Diseño Conceptual de Bases de Datos”, 

Addison-Wesley Iberoamericana, Buenos Aires, 1994. 

[143] P.P. Chen, Assoc. Computing Machin. Transac. Database Syst. (ACM 

TODS) 1(1) (1976). 

[144] P.P. Chen, “The Entity/Relationship Model: a Basis for the Enterprise 

View of Data”, Am. Fed. Inf. Process. Soc. Conf. Proc. Vol. 46, 1977. 

[145] P.P. chen (Ed.), “Entity Relationship Approach to System Analysis and 

Design”, North Holland, Amsterdam, 1979. 

[146] R. Elsmari, S. Navathe, “Fundamentals of Database Systems”, 2nd 

edition, Benjamin Cummings, Nueva York, 1994. 

[147] T.J. Teorey, “Database Modelling and Design. The Entity Relationship 

Approach”, Morgan Kaufmann Publishers, San Francisco, 1990. 

[148] S. Ferg, “Modelling the Time Dimension in an Entity-Relationship 

Diagram”, Proc. 4th Int. Conf. Entity/Relationship App., EE.UU., 1985. 

[149] G. Poonen, “CLEAR: a Conceptual Language for Entities and 

Relationships”, Proc. Int. Conf. Managem. Data, Italia, 1978. 

[150] A. Shoshani, “CABLE: a Language Based on Entity-Relationship 

Model”, Conf. Database Managem., Israel, 1978. 

[151] E.F. Codd, “Recent Investigations into Relational Data Base Systems”, 

Proc. Int. Fed. Inf. Process., Suecia, 1974. 

[152] E.F. Codd, Communic. Assoc. Computing Machin. (CACM) 13(6) 

(1970). 

93


[153] E.F. Codd, “Further Normalisation of the Database Relational Model”, 

en Database Systems, Courant Computer Science Symposia Series 6, 

Prentice-Hall, Englewood Cliffs, 1972. 

[154] E.F. codd, “Relational Completeness of Data Base Sublanguages” en 

Database Systems, Courant Computer Science Symposia Series 6, Prentice- 

Hall, Englewood Cliffs, 1972. 

[155] E.F. Codd, “A Data Base Sublanguage Founded on the Relational 

Calculus”, Proc. ACM SIGFIDET Workshop on Data Description, Access 

and Control, EE.UU., 1971. 

[156] M. Stonebraker, R. Agrawal, U. Dayal, E.J. Neuhold, A. Reuter, “DBMS 

Research at a Crossroads: the Vienna Update”, Proc. 19th Int. Conf. Very 

Large Data Bases, Irlanda, 1993. 

[157] E.F. Codd, “The Relational Model for Database Management”, Versión 

2, Addison Wesley, Reading, 1990. 

[158] H. Darwen, C.J. Date, Sigmod Record 24(1) (1995) 39. 

[159] Atkinson et al. “The Object-Oriented Database System Manifiesto”, en 

Deductive and Object-Oriented Databases, Elsevier, Amsterdam, 1989. 

[160] P. Dorsey, J.R. Hudicka, “Oracle8. Design Using UML Object 

Modeling”, Oracles Press, Osborne-McGraw-Hill, 1999. 

[161] S. Bobrowski, “Oracle8i para Windows NT”, Oracles Press, Osborne- 

McGraw-Hill, 2000. 

[162] S. Urman, Oracle8. Programación PL/SQL Oracles Press, Osborne- 

McGraw-Hill, 1999. 

[163] G.E.P. box, W.G. Hunter, J.S. Hunter, “Statistics for Experimenters, An 

Introduction to Design, Data Analysis and Model Building”, Wiley, Nueva 

York, 1978. 

[164] S.N. Deming, S.L. Morgan. “Experimental Design: a Chemometric 

Approach, 2nd edition”, Elsevier, Amsterdam, 1993. 

[165] J. Salafranca, C. Domeno, C. Fernández, C. Nerín, Anal. Chim. Acta 

477(2) (2003) 257. 

94


[166] A. Cerrato, D. de Santis, M. Moresi, J. Sci. Food Agric. 82 (2002) 1189. 

[167] P.A. Martoglio-Smith, Vib. Spectrosc. 24(1) (2000) 47. 

[168] S. Van Huffel, J. Vandewalle, “The Total Least Squares Problem: 

Computational Aspects and Analysis”, SIAM, Philadelphia, 1991. 

[169] T. Næs, E. Risvik (Ed.), “Multivariate Analysis of Data in Sensory 

Science”, Data Handling in Science and Technology Series, Elsevier, 

Amsterdam, 1996. 

[170] M. Felipe Sotelo, J.M. Andrade, A. Carlosena, D. Prada, Anal. Chem. 75 

(2003) 5254. 

[171] H.W. Tan, S.D. Brown, Anal. Chim. Acta 490(1-2) (2003) 291. 

[172] H.L. Yu, J.F. MacGregor, Chemom. Intell. Lab. Syst. 67 (2003) 125. 

[173] L. Kaufman, P.J. Rousseeuw, “Finding Groups in Data: an Introduction 

to Cluster Analysis”, Wiley, Nueva York, 1990. 

[174] R.G. Brereton (Ed.), “Multivariate Pattern Recognition in 

Chemometrics”, Elsevier, Amsterdam, 1992. 

[175] J.M. Andrade, M.P. Gómez-Carracedo, E. Fernández, A. Elbergali, M. 

Kubista, D. Prada, Analyst 128 (2003) 1193. 

[176] D. Coomans, I. Broeckaert, M. Jonckheer, D.L. Massart, Meth. Inform. 

Med. 22 (1983) 93. 

[177] R.J. Laub, J.H. Purnell, J. Chromatogr. 112 (1975) 71. 

[178] G.K. Bolhuis, C.A.A. Duineveld, J.H. de Boer, P.M.J. Coenegracht, 

“Simultaneous Optimization of Multiple Criteria in Tablet Formulation: Part 

I” Pharmaceutical Technology EUROPE, Junio (1995), 42-49. 

[179] A.K. Smilde, A. Knevelman, P.M.J. Coenegracht, J. Chromatogr. 369 

(1986) 1. 

[180] D.H. Doehlert, Appl. Statist. 19 (1970) 231. 

[181] D.E. Long, Anal. Chim. Acta 46 (1969) 193. 

[182] K.W.C. Burton, G. Nickless, Chemom. Intell. Lab. Syst. 1 (1987) 135. 

[183] D.K. Lin, Technometrics 37 (1995) 213. 

[184] R.L. Plackett, J.P. Burman, Biometrika 33 (1946) 305. 

95


[185] M.M.W.B. Hendriks, J.H. De Boer, A.K. Smilde, “Robustness of 

Analytical Chemical Methods and Pharmaceutical Technological Products”, 

Elsevier, Amsterdam, 1996. 

[186] A.M.C. Davies, R. Giangiacomo (Eds.), “Near Infrared Spectroscopy: 

Procedings of the 9th International Conference”, NIR Publications, 

Chichester, 2000. 

[187] A.M.C. Davies, R.K. Cho, “Near Infrared Technology in the Agricultural 

and Food Industries”, NIR Publications, Chichester, 2001. 

[188] A.M.C. Davies, R.K. Cho (Eds.), “Near Infrared Spectroscopy: 

Procedings of the 10th International Conference”, NIR Publications, 

Chichester, 2002. 

[189] T. Næs, T. Isakson, T. Davies, “A User-Friendly Guide to Multivariate 

Calibration and Classification”, NIR Publications, Chichester, 2002. 

[190] D.A. Burns, E.W. Ciurczak, “Handbook of Near Infrared Analysis”, 

Marcel Dekker, Nueva York, 1992. 

[191] H. Martens, M. Martens, “Multivariate Analysis of Quality: an 

Introduction”, Wiley, Nueva York, 2000. 

[192] H. Martens, T. Næs, “Multivariate Calibration”, Wiley, Nueva York, 

1989. 

[193] D. Bertrand, E. Dufour, “La Spectroscopie Infrarouge et ses Applications 

Analytiques”, Editions TEC & DOC, París, 2000. 

[194] B.G. Osborne, T. Fearn, “Practical NIR Spectroscopy with Applications 

in Food and Beverage Analysis”, Longman Scientific and Technical, 

Londres, 1993. 

[195] V. Fernández-Cabanás, A. Garrido-Varo, Química Analítica 18 (1999) 

113. 

[196] M.S. Dhanoa, S.J. Lister, R. Sanderson, R.J. Barnes, J. Near Infrared 

Spectrosc. 2 (1994) 43. 

96


[197] D. Bertrand, “Data Pre-treatment and Original Analysis in 

Spectroscopy”, en Advanced Comet Chemometrics School, Libramont, 

Bélgica, 1993. 

[198] J.S. Shenk, M.O. Westerhaus, “Analysis of Agricultural and Foods 

Products by Near Infrared Reflectance Spectroscopy”, Monograph, 

NIRSystems Inc., Silver Spring, 1995. 

[199] P.C. Williams, K. Norris (Eds.), “Near Infrared Technology in the 

Agricultural and Food Industries”, American Association of Cereal Chemists 

Inc., St. Paul, 1987. 

[200] P.C. Mahalanobis, “On the Generalised Distance in Statistics”, Proc. 

National Institue of Science of India, 12:49-55, 1936. 

[201] R. De Maesschalck, D. Jouan-Rimbaud, D.L. Massart, Chem. Intell. Lab. 

Syst. 50 (2000) 1. 

[202] D.L. Massart, L. Kaufman, “The Intepretation of Analytical Chemistry 

Data by the Use of Cluster Analysis”, Wiley, Chichester, 1983. 

[203] ASTM, “Standard Definitions of Terms and Symbols Relating to 

Molecular Spectroscopy”, American Society for Testing and Materials, vol. 

14.01, Standard E131-90, West Conshohcken, 1990. 

[204] Y. Mallet, D. Coomans, O. de Vel, Chemom. Intell. Lab. Syst. 35 (1996) 

157. 

[205] G. McLachlan, “Discriminant Analysis and Statistical Pattern 

Recognition”, Wiley, Nueva York, 1992. 

[206] S. Wold, Pattern Recogn. 8 (1976) 127. 

[207] D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte, L. 

Kaufman, “Chemometrics: a Textbook”, Elsevier, Amsterdam, 1988. 

[208] J.S. Shenk, M.O. Westerhaus, “Calibration the ISI Way”, en Near 

Infrared Spectroscopy: the Future Waves, A.M.C. Davis, P.C. Williams 

(Eds.) NIR Publications, Chichester, 1996. 

97


[209] J.S. Shenk, M.O. Westerhaus, “Routine Operation, Calibration, 

Development and Network System Management Manual”, NIRSystems Inc., 

Silver Spring, 1995. 

[210] F. Despagne, D.L. Massart, Analyst 123 (1998) 157R. 

[211] B. Wythoff, Chemom. Intell. Lab. Syst. 18 (1993) 115. 

[212] J. Zupan, J. Gasteiger, “Neural Networks for Chemists. An Introduction”, 

VCH, Weinheim, 1993. 

[213] J. Park, I.W. Sandberg, Neural Computat. 3 (1991) 246. 

[214] B. Walczack, D.L. Massart, Anal. Chim. Acta 331 (1996) 187. 

[215] J.S. Shenk, M.O. Westerhaus, Crop Sci. 31 (1991) 469. 

[216] T. Isaksson, T. Næs, Appl. Spectrosc. 44 (1990) 1152. 

[217] T. Næs, T. Isaksson, Appl. Spectrosc. 43 (1989) 328. 

[218] V. Barnett, T. Lewis, “Outliers in Statistical Data”, Wiley, Nueva York, 

1994. 

[219] H. Mark, Y. Workman, “Statistics in Spectroscopy”, Academic Press 

Inc., Nueva York, 1991. 

[220] M. Stone, J. R. Statist. Soc. B 39 (1974) 111. 

[221] W.R. Windham, P.C. Flinn, “Comparison of MLR and PLS Regression 

in NIR Analysis of Quality Components in Diverse Feedstuff Populations”, 

en Near Infrared Spectroscopy. Bridging the Gap between Data Analysis and 

NIR Applications, K.I. Hildrum, T. Isaksson, T. Næs, A. Tandberg (Eds.), 

Ellis Horwood, Chichester, 1992. 

[222] T. Fearn, Anal. Proc. 23 (1986) 123. 

[223] Resolution OENO 6/99. Validation Protocol for a Typical Analytical 

Method Compared to the OIV Reference Method. Office International de la 

Vigne te du Vin. http://www.oiv.int/Database/Images/Client/oeno699uk.doc. 

[224] E. Trullols, I. Ruisánchez, F.X. Rius, Trends Anal. Chem. 23 (2004) 137. 

[225] C.J. Cramer, “Essentials of Computational Chemistry: Theories and 

Models, Second Edition”, Wiley, 2004. 

98


[226] T. Clark, “A Handbook of Computational Chemistry: a Practical Guide 

to Chemical Structure and Energy Calculations”, Wiley, Nueva York, 1985. 

[227] J.W. Robinson (Ed.), “CRC Handbook of Specrtoscopy”, Vols. I-III, 

CRC Press, Ohio, 1974. 

[228] M.A. Sharaf, D.L. Illman, B.R. Kowalski, “Chemometrics”, Wiley, 

Nueva York, 1986. 

[229] P.C. Jurs, “Chemometrics and Multivariate Analysis in Analytical 

Chemistry”, en Reviews in Computational Chemistry, Vol. 1, Willey, Nueva 

York, 1990. 

[230] H. van de Waterbeemd (Ed.), “QSAR: Chemometric Methods in 

Molecular Design”, VCH, Weinheim, 1995. 

[231] H. Kubinyi, “QSAR: Hansch Analysis and Related Approaches”, VCH, 

Weinheim, 1993. 

[232] J.M. Bamard, G.M. Downs, Persp. Drug Discov. Des. 7-8 (1997) 13. 

[233] A. Hillisch, R. Hilgenfeld, “Modern Methods of Drug Discovery”, 

Springer-Verlag, Nueva York, 2003. 

[234] T. Lenganer, R. Mannhold, H. Kubinyi, H. Timmerman, “Bioinformatics 

– from Genomes to Drugs”, Wiley, Nueva York, USA, 2002. 

[235] J. Zupan, J. Gasteiger, “Neural Networks in Chemistry and Drug Design: 

an Introduction, Second Edition”, Wiley, Nueva York, 1999. 

[236] M.A. Miller, Nature Rev. Drug Discov. 1 (2002) 220. 

[237] W.A. Warr, J. Chem. Inf. Comput. Sci. 37 (1997) 134. 

[238] B.A. Leland, J. Chem. Inf. Comput. Sci. 37 (1997) 62. 

[239] K. Eckschlager, K. Danzer, “Information Theory in Analytical 

Chemistry”, Wiley, Nueva York, 1994. 

[240] D.L. Massart, J. Chromatogr. 79 (1973) 157. 

[241] F. Dupuis, A. Dijkstra, Anal. Chem. 47 (1975) 379. 

[242] P.J. Slonecker, X. Li, T.H. Ridgway, J.G. Dorsey, Anal. Chem. 68 (1996) 

682. 

99


[243] P.J. Tandler, J.A. Butcher, H. Tao, P. De B. Harrington, Anal. Chim. 

Acta 312 (1995) 231. 

[244] D.R. Scott, A. Levitski, S.E. Stein, Anal. Chim. Acta 278 (1993) 13. 

[245] L.A. Clark, D. Pregibon, “Tree-Based Models”, en Statistical Models, 

J.M. Chambers, T.J. Hastie (Eds.), S. Chapman and Hall, Nueva York, 1992. 

[246] L.A. Zadeh, Inform. Control 8 (1965) 338. 

[247] M.Otto, H. Bandemer, Chemom. Intell. Lab. Syst. 1 (1986) 71. 

[248] T. Blaffert, Anal. Chim. Acta 161 (1984) 135. 

[249] B. Walczak, E. Bauer-Wolf, W. Wegscheider, Microchim. Acta 113 

(1994) 153. 

[250] D.H. Rouvray, A.T. Balaban, “Chemical Applications of Graph Theory. 

Applications of Graph Theory”, R.J. Wilson, L.W. Beineke (Eds.), Academic 

Press, Nueva York, 1979. 

[251] G.M. Downs, P. Willet, “Similarity Searching in Databases of Chemical 

Structures” en Reviews in Computational Chemistry, Vol. 7, Wiley, Nueva 

York, 1995. 

[252] H. Scsibrany, M. Karlovits, W. Demuth, F. Mueller, K. Varmuza, 

Chemom. Intell. Lab. Syst. 67 (2003) 95. 

[253] R. Hefferlin, M.T. Matus, J. Chem. Inf. Sci. 41 (2001) 484. 

[254] R.P. Sheridan, M.D. Miller, D.J. Underwood, S.K. Kearsley, J. Chem. 

Inf. Sci. 36 (1996) 128. 

[255] K. Varmuza, M. Karlovits, W. Demuth, Anal. Chim. Acta 490 (2003) 

313. 

[256] W. Demuth, M. Karlovits, K. Varmuza, Anal. Chim. Acta 516 (2004) 75. 

[257] P. Willet, J.M. Barnard, G. Downs, J. Chem. Inf. Comput. Sci. 38 (1998) 

983. 

[258] K. Varmuza, P.N. Penchev, H. Scsibrany, J. Chem. Inform. Comput. Sci. 

38 (1998) 420. 

[259] V. Schoonjans, F. Questier, Q. Guo, Y. Van der Heyden, D.L. Massart, J. 

Pharm. Biomed. Anal. 24 (2001) 613. 

100


[260] F. Ehrentreich, Anal. Chim. Acta 393 (1999) 193. 

[261] W. Werther, K. Varmuza, Fresenius J. Anal. Chem. 344 (1992) 223. 

[262] P.J. Dunlop, C.M. Bignell, J.F. Jackson, D. Brynn Hibbert, Chemom. 

Intell. Lab. Syst. 30 (1995) 59. 

101

Parte experimental

Parte I: Informatización del proceso 

analítico y de la gestión y 

análisis de los datos 

producidos mediante el uso 

del paradigma orientado a 

objetos

Capítulo 1 

USE OF OBJECT-ORIENTED 

TECHNIQUES FOR THE DESIGN AND 

DEVELOPMENT OF STANDARD 

SOFTWARE SOLUTIONS IN 

AUTOMATION AND DATA 

MANAGEMENT IN ANALYTICAL 

CHEMISTRY 

El contenido de este capítulo ha sido enviado para su publicación a la revista 

Trends in Analytical Chemistry.

Trends Anal. Chem., enviado para su publicación Parte I, cap. 1 

USE OF OBJECT-ORIENTED TECHNIQUES FOR THE DESIGN AND 

DEVELOPMENT OF STANDARD SOFTWARE SOLUTIONS IN 

AUTOMATION AND DATA MANAGEMENT IN ANALYTICAL 

CHEMISTRY 

Manuel Urbano Cuadrado, M. D. Luque de Castro, Miguel Ángel Gómez-Nieto 

Abstract 

An object-oriented design of software structures for both automating 

analytical processes and managing the data generated from these processes is 

proposed. Two models, for automation and data management, are developed for 

building the programs used in these two areas. The advantages arisen from these 

object-oriented models regarding the classical design are exposed. Thus, the 

standardisation, scalability, independence, etc., of the programs are assured by 

the object-oriented modelling. Characteristics like inheritance, polymorphism, 

encapsulation, etc., are used for achieving the pursued goals, in addition to 

having a new way for representing the analytical concepts in automation and data 

management. 

Keywords: Object-oriented paradigm, Automation, Data management. 

109


1. Introduction 

The development of two main subareas 

in analytical chemistry as 

automation and laboratory information 

management systems (LIMS) 

can not be conceived without the fast 

advances on informatics. Firstly, 

computing has allowed automation of 

the different steps of the analytical 

process —sample preparation [1,2], 

detection [3,4] and data collection and 

processing [5-7]— thus avoiding 

human participation in analysis 

development and errors owing to 

manual operations. Regarding information 

management, different approaches 

have been developed with 

different complexity and scope, 

namely: systems for handling daily 

data from laboratories —Management 

Information Systems (MIS)— [8,9], 

systems for supporting decisions 

making in laboratories —Decision 

Support Systems (DSS)— [10,11], 

and, at the highest level, systems for 

entirely substitution of human skills 

and expertise —Knowledge Based 

Systems (KBS)— [12,13]. In addition, 

analytical disciplines such as 

110 

chemometrics, hyphenated techniques, 

molecular design, QSOR 

/QSAR, etc., have also taken advantages 

from informatics. Thus, 

computers and modern analytical 

chemistry are closely interrelated 

subjects. 

Computer programs are built 

following engineering processes — 

Software Engineering [14,15]— 

consisting of a series of steps, namely: 

specification of requirements (also 

known as system analysis), system 

design, codification, implementation 

and testing. Most efforts, experience 

and time are devoted to programs 

analysis and design—these two steps 

constitute the modelling of a system—. 

The quality of the remaining 

process, and, in turn, of the software 

developed, depends on the modelling 

stage. 

The analytical problems (that 

is, the number and properties of the 

target analytes, characteristics of the 

samples, etc.), and their resolution 

(the time for obtaining results, the 

level of accuracy required, etc.) are in 

continuous change. Re-updating computational 

resources to the new


scenarios raised from these changes is 

often an expensive and long period, 

which frequently makes necessary to 

improve the software and hardware 

involved. This is a consequence of 

shortcomings of the classical model 

ling in software engineering. Other 

problems in analytical laboratories 

related to deficient modelling are the 

specificity of software for a given 

equipment and hardware, disconnection 

between software for 

automation and software for data 

management, etc. 

New efforts in modelling 

software systems have been addressed 

at the development of programs with a 

high degree of scalability —property 

referred to the capacity for enlarging 

the program functionality with 

minimum efforts—, independence of 

both the hardware used and the 

required information, re-usability, etc. 

With this aim, computational researchers 

developed the Object-Oriented 

Paradigm (OOP) for engineering 

software [16,17] at the beginnings of 

the 90s —despite the object-oriented 

programming was introduced by 

Smalltalk in the 70s, the modelling 

based on objects was not used for 

analysis and design of software 

solutions—, based on the key and 

intuitive object concept: “an abstraction 

unit of a real entity”. In this 

way, a program is composed by a 

number of objects related between 

them and endowed with a series of 

properties and actions. 

The object-oriented modelling 

has been hardly used in analytical 

chemistry [18,19] owing to reasons 

such as: 1) the shortcomings owing 

products standardisation for commercial 

brands; 2) the relatively 

recent development of the paradigm, 

and in turn, 3) the non-availability of 

both a standardised modelling language 

and tools appropriate for 

implementation. Nevertheless, the development 

of both Unified Modelling 

Language (UML) [20] and objectoriented 

programming languages — 

C++ and, particularly, Java [21]— 

increases the use of OOP. 

In this paper, two objects 

models for automation and data 

management are described with the 

aim of showing the advantages of 

these models in analytical chemistry. 

111


The development of these models is 

an attempt to create a useful programming 

structure, in addition to a 

new meta-language based on the 

objects definition in analytical chemistry. 

2. Object model for automation 

Many approaches have been developed 

in analytical chemistry regarding 

automation, but few of them 

[22,23] deal in some extension with 

software design. The description of 

the model here presented deals with 

structural modelling —the abstraction 

of the analytical equipment—. 

Therefore, only the dynamic aspects 

related to actions carried out by 

analytical hardware are considered. 

The automation processes in which 

this model can be used have been 

described previously. [24]. 

2.1 The analytical hardware: devices, 

apparatus, instruments and autoanalysers 

A given instrumental set-up aimed at 

providing either a signal, such as a 

peak height or area, slope, etc., or, 

directly, the value of the parameter 

112 

under study is necessary in order to 

carry out analyses in an automatic or 

automated way. The use of the word 

“automatic” or “automated” implies 

computer control of some of the steps 

of the analytical process —al least 

data collection and processing—. 

Computer control requires 

two parts. Firstly, the software that, in 

addition to data processing for 

information output, commands the 

hardware, which constitutes the 

second part involved. For the 

development of the model, the 

concept of hardware is divided into 

two types, namely: the physical part 

of computers and the analytical 

equipment —“analytical hardware” is 

the name here proposed—. 

The IUPAC distinguishes 

between devices, apparatus, instruments 

and analysers. Devices are 

considered as the minimum part of 

analytical equipment (either apparatus 

or instruments) able of realising a 

unitary physical or electronic operation 

(e.g. electronic interface, 

mechanical rotor, etc.). Thus, apparatus 

are defined as a set of devices 

assembled to carry out physical,


dynamic or chemical operations 

involving samples and reagents (e.g. 

an ultrasound digester, a selection 

valve, etc.). Nevertheless, apparatus 

cannot provide any information about 

the samples in contrast to instruments 

(e.g. a spectrophotometer, a potentiometer, 

a chromatograph, etc.). At the 

highest level, the combination of 

apparatus and instruments for setting 

up an analytical method is known as 

analyser (e.g. an autoanalyser for 

pesticides in soils). 

The objects model to be 

developed must take into account 

these IUPAC definitions in order to 

assure the standardisation level 

pursued. 

2.2 Classes involved in automation: 

static view of the analytical hardware 

The concept of class means a 

generalisation of a group of objects 

endowed with common both 

properties and behaviour from the 

OOP point of view —attributes are 

properties that characterise the 

structure of a given object and 

methods are operations to be carried 

out by the objects—. Therefore, an 

object is a case of a given class which 

represents a recognisable world real 

entity. 

The object-oriented modelling 

uses a Classes Diagram for designing 

the structural aspects of a system [20]. 

Fig. 1 shows the classes diagram 

developed here for automation in 

analytical chemistry. As can be seen 

in the figure, the class Analytical 

Hardware is considered as the parent 

class. The remaining classes are 

descending from AnalyticalHardware 

and, thus, they inherit its properties 

and methods —inheritance and 

specialisation are concepts below 

commented—. The attributes of the 

class AnalyticalHardware are identifier, 

brand, model and communication_port. 

These attributes 

show different value for each object 

of the AnalyticalHardware class. Any 

hardware involved in an analytical 

process is identified by an item, is 

from a given brand, corresponds to a 

specific model, and, finally, needs a 

channel for communication with 

computers. On the other hand, the 

methods of the class hardware are 

113


-abstract setRemoteControl() : void 

-abstract exitRemoteControl() : void 

114 

APPARATUS 

Branch detailed in Fig. 2 

ANALYTICALHARDWARE 

-identifier : String 

-brand : String 

-model : String 

-communication_port : String 

-setIdentifier() : void 

-setBrand() : void 

-setModel() : void 

-setCommunicationPort() : void 

-getIdentifier() : String 

-getBrand() : String 

-getModel() : String 

-getCommunicationPort() : String 



Branch detailed in Fig. 3 

INSTRUMENT 



-abstract monitoring() : void 

Fig. 1. Classes diagram for automation in analytical chemistry (I). Specialisation 

of the class AnalyticalHardware into classes Apparatus and Instrument. 

those in charge of managing the 

attributes above cited and also two 

abstract methods responsible for 

setting and exiting the remote control 

necessary for automation. The term 

abstract is a method modifier that 

means that the latter is implemented 

in the lower classes. Any class with 

abstract methods is considered as an 

abstract class, and therefore, the 

specialisation of the parent class is 

mandatory in order to implement 

these methods. 

The inheritance capacity is 

one of the key object-oriented modelling’s 

characteristics that yields 

advantages related to reusability of 

the code and programming time. 

Yourdon provided the following data 

[25]: reduction of 70 and 84 % in time


and costs, respectively, required for 

the development of a software 

product. Moreover, the inheritance 

property helps to support a conceptual 

view of the problem. Thus, the classes 

descending from a given class inherit 

the structure and behaviour of its 

parent, but taking into account to add 

new functionality (new attributes or 

methods). This fact is known as specialisation 

and permits to differentiate 

classes of the same level. Thus, 

classes Apparatus and Instrument are 

descending from class Analytical- 

Hardware. This justly corresponds to 

IUPAC hierarchy, and because the 

difference between apparatus and 

instruments is the analytical information 

provided by the latter, class 

Instrument has a new method named 

monitoring, also defined as abstract. 

This method, which is implemented 

by classes derived from class 

Instrument, collects data from 

instruments in a continuous way. 

The use of specialisation appears 

continuously in the proposed 

model. Thus, Figs. 2 and 3 summarise 

the classes sub-diagrams for apparatus 

and instruments, respectively. The 

Apparatus class is specialised into 

classes HydrodynamicApparatus and 

SampleTreatmentApparatus. The former 

corresponds to apparatus for 

propelling samples and reagents 

through the analyser (e.g., auto 

samplers, peristaltic pumps, selection 

valves, etc.) and the latter represents 

apparatus for samples and reagents 

treatment (e.g., thermostats, ultra 

sound digesters, microwave digesters, 

high pressure chambers, etc.). Thus, 

classes as MicrowaveDigester, Auto 

sampler, SelectionValve, etc., are in 

Fig. 2, where the most outstanding 

attributes and methods of each class 

are shown. 

The branch of the classes 

diagram for automation in Fig. 3 is 

devoted to instruments. Class Instrument 

is firstly specialised into 

OpticalInstrument and ElectroanalyticalInstrument. 

Thus, classes as Spectrophotometer, 

Spectrofluorimeter, 

DiodeArraySpectrometer, PotentiometricInstrument,AmperometricInstrument, 

etc., are shown in Fig. 3. 

115


Fig. 2. Classes diagram for automation in analytical chemistry (II). 

Specialisation of the class Apparatus into different classes that represent 

the apparatus involved in analytical chemistry. 

116


Fig. 3. Classes diagram for automation in analytical chemistry (III). 

Specialisation of the class Instrument into different classes that represent 

the instruments involved in analytical chemistry. 

117


2.3 Separation between logical 

structure and physical implementation: 

degree of independence 

The structural modelling based on the 

object-oriented paradigm takes into 

account only a logical point of view 

of the analytical equipment involved 

in automation by classes. These 

classes are endowed with a series of 

properties and operations that are 

close to the analytical chemist. The 

implementation of the abstract 

methods of the corresponding classes 

is carried out by leaf classes, which 

are the manufacturers’ classes. This is 

the linkage between standard 

application for monitoring analytical 

processes and the analytical hardware 

of an autoanalyser. The hardware is 

controlled by commands used by the 

classes in order to implement the 

functionality required for a given 

instrument or apparatus. The set of 

commands is the physical part not 

treated by the logical model and often 

associated to devices —above commented—. 

The join of the physical part 

118 

to the logical model is carried out by 

object-oriented programming languages 

endowed with an interface for 

implementing methods written in 

other languages. The physical part is 

often composed by functions written 

in languages such as C, Fortran, etc. 

responsible for commanding devices 

that compose instruments and apparatus. 

The programming language 

Java is endowed with Java Interface 

Native (JNI) [21] in order to develop 

methods able of using code written in 

other language. 

There are two ways for introducing 

the manufacturers’ classes 

into the model. One of them involves 

delivering by the manufacturer the 

classes corresponding to the hardware 

he/she fabricates. These classes must 

be endowed with the requisites 

(namely, inheritance and implementtation 

of abstract methods) necessary 

for being included in the model, and 

therefore, in a standardised system for 

automation. It is clear that this way 

can be considered only from an 

optimistic point of view as standard 

tools are not of interest for most 

companies. This way is based on the


development of a classes library used 

for the choice of the classes required 

in order to compose a given autoanalyser. 

On the other hand, classes can 

be developed taking into account the 

list of commands necessary to build 

the physical part used by the model — 

the list is often provided by the 

manufacturers, even sometimes the 

entire physical part—. The authors of 

this article have developed some 

manufacturers’ classes, in addition to 

the rest of the classes of the model, in 

Java language (e.g. classes Crison- 

Sampler, Rheodyne7010, Unicam8625 

Spectrophotometer, etc.). These classes 

are at the disposal of researchers 

interested in them. 

2.4 The automation model in the 

analytical process 

The system for automating analytical 

processes proposed by the authors 

[24] is prepared for the classes 

diagram described above. This system 

involves a series of interfaces for the 

design and control of processes and it 

is based on the execution of hardware 

actions and functions if a series of 

time and state premises are fulfilled. 

The aggregation of hardware classes 

and the relationship between the 

AnalyticalHardware and Hardware 

Action classes —the former has a list 

of HardwareAction objects as 

attribute— yields a specific auto 

analyser. Thus, the model allows different 

autoanalysers to be controlled 

by a standard software. 

Polymorphism is other important 

object-oriented modelling’s 

characteristic, which permits that any 

reference to a parent class can be 

converted into a reference to a 

descending class. This is of importance 

for the scalability of the 

program since the introduction of a 

new class in the model does not affect 

the control code because of the use of 

references to the parent Analytical 

Hardware class. 

Hybridisation between the 

programming language Java and the 

markup language XML assures the 

coupling of the model to a standard 

automation system. The definition of 

an analytical method (consisting of 

the specification of both the hardware 

involved and the time and state 

119


actions to be carried) is stored in an 

xml file. Then, objects of the class 

AnalyticalHardware with their lists of 

objects ActionHardware are built in 

execution time from the XML data for 

autoanalyser control by the computer. 

3. Objects model for data management 

Any laboratory for process control 

carries out a series of analyses aimed 

at knowing the value of several 

parameters of interest for monitoring 

a given process. The number of 

analyses to be carried out depends on 

the complexity of the process. Thus, 

the more the complexity of the 

process, the higher the number of 

parameters to be monitored. Other 

factor that influences the amount of 

information is the workload in the 

laboratory with respect to the number 

of samples to be analysed. This aspect 

depends on the monitoring frequency 

for each process. Moreover, independently 

of these factors, the amount 

of information to manage is large — 

independence of the amount and way 

of carrying out the measurements, 

separation between daily data 

120 

even if the number of samples and 

parameters is low— as long as a 

historical management is necessary in 

order to extract information from data 

corresponding to long past periods. 

Specificity to a given laboratory 

or process is an undesirable 

aspect for LIMS use because the low 

degree of scalability the design of the 

programs enables in this case. The 

dynamics of analytical information 

makes necessary systems that require 

null or minimum reprogramming 

when conditions as sample identification 

and classification, new 

analytical parameters to monitor, 

computer resources, etc., change. 

Therefore, open characteristics are 

mandatory for avoiding or minimising 

economical and time efforts when 

reprogramming is necessary. 

The object-oriented model 

here proposed permits to build data 

management applications with the 

above commented requirements (namely: 

applicability to any process 

monitoring or reference laboratory, 

handling and historical data analysis 

modules, etc.).


SELECTED 

PARAMETER 1..1 0..* 

PARAMETER 

MAGNITUDE 

CALCULATED 

MEASUREMENT 

< Uses > 

1..1 

CALCULUS 

FORMULAE 

FORMULAE 

MEASUREMENT 

DATUM 

MEASUREMENT 

0..* 

0..1 

USER 

1..* 

Some measurements may 

not be associated to a 

calibration curve 

0..1 

CALIBRATION 

CURVE 

1..* 

1..1 

CONVERSION 

FORMULAE 

ANALYSER 

Fig. 4. Classes diagram for data management in analytical chemistry (I). Classes 

for representing the samples, parameters and their relationships, unities 

and users of the analytical information. 

3.1 Classes involved in the management 

of samples and 

analytical parameters 

The class Sample represents the 

samples in which several properties or 

parameters have to be determined. 

Samples identification is the key in 

the management of the laboratory 

information. The criterion for identification 

is different for each 

laboratory. Thus, modelling with a 

view on identification independent of 

the laboratory is the first step for 

constructing the classes diagram 

shown in Fig. 4. Class Code represents 

the possible codes used for 

identification, with name and 

data_type attributes. The latter is important 

for correct functioning of the 

programs since data consistence is 

necessary and it indicates the numerical, 

text, date or binary code 

used. 

An analytical parameter is 

121


considered a property of the material 

under study that is of interest for its 

characterisation. The definition of 

parameter is encapsulated in Para 

meter class. This is specialised into 

two classes (see Fig. 4), namely: 

classes DatumParameter and CalculatedParameter. 

The former represents 

parameters that are determined 

directly in samples; and the latter 

those determined by the relationship 

between one or several parameters of 

the first type and a mathematical 

expression, as shown in Fig. 4. 

A result is expressed by a 

value and the unit in which this is 

measured. Class Unit is necessary for 

representing the different units in 

which the analytical parameters can 

be expressed. The association between 

the classes Parameter and Unit 

yields the new class Magnitude, 

which represents a given parameter 

expressed in a specific unit. This class 

is endowed, among others, with two 

key attributes to trigger off alerts for 

users warning, namely, minimum 

_value and maximum_value, which 

define the normal range of a given 

magnitude. 

122 

Class SelectedParameter arises 

from the association between the 

classes Sample and Parameter and 

represents a given parameter to be 

determined in a sample. This class 

manages all the measurements of the 

parameter in the target sample by the 

list of objects of the class Measurement 

described below. 

3.2 Relationship between measurement 

and calibration in 

analytical chemistry 

Fig. 5 shows classes that represent the 

analytical measurements and the 

information they provide. The most 

important class is Measurement, 

which is specialised into classes 

DatumMeasurement and Calculated 

Measurement in a way similar to that 

commented for the class Parameter. 

These classes represent the measurements 

carried out in the laboratory, 

either in a direct or indirect way, and 

therefore, they are related with class 

Magnitude in order to know the 

parameter and unit to which the 

attribute value belongs. 

The relationship between the 

classes Magnitude and Formulae is


SELECTED 

PARAMETER 1..1 0..* 

PARAMETER 

MAGNITUDE 

CALCULATED 

MEASUREMENT 

< Uses > 

1..1 

CALCULUS 

FORMULAE 

FORMULAE 

MEASUREMENT 

DATUM 

MEASUREMENT 

0..* 

0..1 

USER 

1..* 




0..1 

CALIBRATION 

CURVE 

1..* 

1..1 

CONVERSION 

FORMULAE 

ANALYSER 

Fig. 5. Classes diagram for data management in analytical chemistry (II). 

Classes for representing both the measurement processes and analytical 

calibration. 

shown in Fig. 5. The class Formulae 

is an abstraction of the formulas in 

charge of calculating the value of a 

given magnitude from one or several 

values of other magnitudes. The class 

Formulae is specialised into the 

classes CalculusFormula and ConversionFormula. 

These two classes have 

in common a Magnitude attribute 

named output_magnitude as any of 

them provides one output value. The 

difference between these formulas is 

the number of input values. Thus, 

ConversionFormula class has a 

Magnitude attribute, named input 

_magnitude that represents the single 

value accepted by this formula. On 

the other hand, CalculusFormula 

class has a list of Magnitude class as 

attribute because a calculated measurement 

comes from one or several 

data. 

Conversion formulas require 

to be specialised. The conversion of 

an instrumental unit (e.g. an absorbance 

change with time) into a 

123


chemical unit (e.g. g l -1 ) is much more 

complex than a conversion in which 

only two chemical units (e.g. mol l -1 

and ppm) are involved. This is a 

consequence of the necessity of 

characterising the instrumental response. 

Thus, class FormulaConversion 

is specialised into class 

CalibrationCurve. The new structure 

involves an attribute named calibration, 

which is of Calibration class 

type. This class represents the 

calibration process and manages 

information of the date when this step 

was carried out, the type of calibration 

used, the instrument employed, 

patterns, etc. The new classes 

CalibrationType, LinearCalibration, 

LogaritmicCalibration, Measurement- 

Calibration and Measurer are thus 

required. 

The class Analyser (see Fig. 

5) represents analytical equipment 

from the point of view of data 

management. The association between 

classes Magnitude and Analyser (not 

shown in Fig. 5) gives place to a new 

class named Measurer that represents 

the feasibility of measuring an 

analytical parameter in a specific unit 

124 

with a given instrument. Class 

Measurer is an attribute of the 

Calibration class, as well as, date, and 

both permit to search for the proper 

calibration in the database. 

To now, it has been supposed 

that the type of calibration is 

univariate, which fulfil the major part 

of the analyses carried out in any 

laboratory —the signal from the 

instrument is often a value of 

absorbance or potential, a height or 

peak area, slope, etc.—. Nevertheless, 

multivariate methods have been 

developed in the last years with the 

aim of obtaining the value of one or 

several parameters from a large 

number of variables, which is, most 

times, a spectrum. For the case in 

which multichannel spectrophotometers 

are not provided with 

chemometric software for multivariate 

calibration, a block of chemometrics 

has been modelled for supplying this 

possibility. 

3.3 Classes diagram for multivariate 

calibration 

Multivariate calibration requires a 

large amount of standards which, in


turn, provide a huge amount of data. 

Two steps can be distinguished in the 

development of a multivariate model, 

namely: training and validation — 

after which validated equations are 

obtained. Multivariate calibration is a 

continuous process, where the calibration 

set is continuously enlarged in 

order to build ‘universal’ equations 

with a wider application field. A 

complete description of the chemometric 

modelling would require a 

long discussion that is out the scope 

of the research here presented. Only 

the description in Fig. 6, a brief 

classes diagram, is given. 

Class MultivariateEquation, 

which represents an equation in 

multivariate calibration, is associated 

to Magnitude class. The latter class is 

endowed with an attribute named 

equation, that is, a reference to an 

object of the class Multivariate 

Equation. This attribute becomes null 

if the parameter associated to the 

object of the class Magnitude does not 

require multivariate calibration for its 

determination. Classes Spectral Measurement, 

Spectrum, and Spectral 

Datum, and their relationships must 

be taken into account for predicting 

the value of the given parameter. 

Class SpectralMeasurement is a specialisation 

of the DatumMeasurement 

class, and therefore, it is associated to 

both a given sample and a specific 

parameter. This class provides, as 

specific attribute, an object of type 

class Spectrum —which represents a 

spectrum endowed with class Spectral 

Datum list named list_of_points—. 

The process of calibration is 

represented by the different associations 

between the classes Multi 

variateEquation, MultivariateCalibration, 

MultivariateCalibration Type, 

PLSR, PCR, MLR and MCalibrationMeasurement. 

Thus, an instance 

of the class Multivariate Calibration 

is endowed with a list of MCalibrationMeasurement 

objects that provide 

the spectrum corresponding to a 

pattern. In addition to the list of 

MCalibration Measurement objects, 

the class MultivariateCalibration is 

also provided with a Multivariate 

CalibrationType object —specialised 

in PLSR, PCR, and MLR— whose 

attributes and methods are the key for 

developing an instance of the class 

125


MAGNITUDE MULTIVARIATE 

1..1 0..1 EQUATION 

126 

1..1 

1..1 

SPECTRAL 

MEASUREMENT 

< Uses > 

1..1 

SPECTRUM 

1..* 1..1 

1..1 *..* 

MULTIVARIATE 

CALIBRATION 

< Uses > 

1..1 

MULTIVARIATE 

CALIBRATION 

TYPE 

SPECTRAL 

DATUM 

Fig. 6. Classes diagram for data management in analytical chemistry (III). 

Classes for representing multivariate calibration. 

MultivariateEquation. 

3.4 Information from historical data: 

a tool for supporting decisions 

A module for permanent storage of 

data of interest is mandatory for 

efficient handling and analysis of the 

information. After analysing all the 

specified parameters for a given 

sample, the corresponding data 

sample are stored into a historical 

repository and deleted from the 

module for daily management. The 

modelling of the historical repository 

PLSR 

PCR 

MLR 

is shown in Fig. 6. This class diagram 

is similar to that of the management 

of daily data. 

Class Query is used in order 

to extract information from historical 

data. This represents a real-time query 

against the database aimed at obtaining 

tabular or graphical information. 

Thus, this class has a series of 

attributes and methods in charge of 

providing a list of data for building 

either a table or a plot. This class is 

specialised into classes EvolutionQuery 

and HistogramQuery for


fulfilling the characteristics of the 

searching criteria. An object of the 

EvolutionQuery is aimed at obtaining 

the evolution of one or several 

parameters with the value of either a 

numerical or date code. The 

association between classes EvolutionQuery, 

Statistics and DataList 

provides statistical parameters of the 

data. For example: mean, increment, 

maximum, minimum, standard deviation, 

etc. 

On the other hand, Histogram 

Query class is in charge of obtaining 

the data necessary for building a 

frequency histogram. Here, the main 

searching criterion is a set of intervals 

for a numerical or date code, or a set 

of values for a text code. 

3.5 Management of users and 

analytical equipment 

The user of an LIMS can be of 3 

types. Firstly, technicians are users in 

charge of performing daily tasks in 

the laboratory, namely: inclusion and 

codification of new samples and 

specification of the parameters to be 

analysed, control of the workload, 

generation of reports, development of 

the analyses carried out manually, 

validation of results, etc. These users 

interact with the computers system at 

an operational level. Secondly, 

managers are users in charge of 

extracting information from the data 

by a tabular or graphical output after a 

real-time query against a historical 

repository using different criteria. 

This interaction is of the highest level 

since the system behaves as a tool for 

supporting decisions. In addition to 

this functionality, managers use the 

system with the aim of evaluating the 

workload, controlling the use of the 

equipment, etc. The last type of users 

is the administrators of the system, 

responsible for the security, integrity 

and privacy of the analytical 

information. The hierarchical order of 

the class User is shown in Fig. 4. 

Class Analyser has identifier, 

brand, model, list_of_magnitudes, acquisition_date, 

among other attributes. 

The association with class 

Magnitude yields the class Measurer, 

commented in a previous section. 

Also, the association between classes 

Measurement, Analyser, and Technician 

provides a new class named 

127


AnalyserUse that represents the use of 

an analyser by a technician in order to 

determine a given parameter. This 

class permits to control the management 

and use of the analytical 

equipment. 

4. Conclusions 

The advantages arisen from two 

object-oriented models, for automation 

and data management, have 

been discussed in this work. 

The automation model is a 

new way for representing the analytical 

equipment. This model takes 

into account the hierarchy and 

concepts corresponding to IUPAC 

definitions, and, on the other hand, 

classifications proposed in the analytical 

literature. Thus, the encapsulation 

of attributes and methods in a 

given class, and the inheritance, 

specialisation and aggregation pro 

cesses provide the mechanisms for 

specifying the structure of any 

autoanalyser and the operations it can 

carry out. 

The structure of this model 

makes possible its coupling with a 

standard interface for automating 

128 

analytical processes, taking advantages 

from both polymorphism and 

classes libraries in order to enlarge the 

functionality of the programs without 

changing the control flow in the code. 

Both the implementation of the 

abstract methods by the manufacturers’ 

classes and the use of Java for 

translating the model into a physical 

system guarantee the independence of 

the software regarding analytical 

equipment and hardware, respectively. 

The model for data management 

also takes into account the above 

commented characteristics of the 

object-oriented modelling. In this 

case, the proposed model assures the 

development of an LIMS with a high 

degree of scalability regarding 

production process to be monitored, 

information to be stored, different 

users that interact with the system, 

etc. 

Acknowledgements 

The Comisión Interministerial de 

Ciencia y Tecnología (CICyT) is 

thanked for financial support (Project 

AGL2000-0321-P4-03).


References 

[1] R.P.R. Rocha, B.F. Reis, E.A.G. 

Zagatto, J.L.F.C. Lima, R.A.S. 

Lapa, J.L.M Santos, Anal. Chim. 

Acta 468 (2002) 119. 

[2] E.P. Borges, P.B. Martelli, B.F. 

Reis, Mikrochim. Acta 135 (2000) 

179. 

[3] K.A. Howell, E.P Achterberg, 

C.B. Braungardt, A.D. Tappin, 

P.J. Worsfold, D.R. Turner, 

Trends Anal. Chem. 22 (2003) 

828. 

[4] B.F. Ni, P.S. Wang, H.L. Nie, 

S.Y. Li, X.F. Liu, W.Z. Tian, J. 

Radioanal. Nucl. Chem. 244 

(2000) 665. 

[5] A. Calmon, L. Dusserre-Bresson, 

V. Bellon-Maurel, P. Feuilloley, 

F. Silvestre, Chemosphere 41 

(2000) 645. 

[6] X.J. Li, H. Zhang, J.A. Ranish, R. 

Aebersold, 

(2003) 6648. 

Anal. Chem. 75 

[7] R. Perez de Alejo, E.M. 

Rodriguez, I. Rodriguez, D. 

Uribazo, E.D. Alvarez, Meas. Sci. 

Technol.13 (2002) 95. 

[8] M. Bitton, Spectra Anal. 31 

(2002) 37. 

[9] M. Pradella, R. Dorizzi, A. 

Burlina, Chemom. Intell. Lab. 

Syst. 17 (1992) 187. 

[10] Y. Vander-Heyden, P. Van 

keerberghen, M. Novic, J. Zupan, 

D.L. Massart, Talanta 51 (2000) 

455. 

[11] P. Vankeerberghen, J. Smeyers-Verbeke, 

D.L. Massart, J. 

Anal. At. Spectrom. 11 (1996) 

149. 

[12] M. Praisler, I. Dirinck, J. Van 

Bocxlaer, A .de Leenheer, D.L. 

Massart, Talanta 53 (2000); 155. 

[13] M. Peris, Crit. Rev. Anal. 

Chem. 26 (1996) 219. 

[14] R.S. Pressman, Software 

Engineering: a Practitioner’s Approach, 

McGraw-Hill, New York, 

1997. 

[15] I. Sommerville, Software Engineering, 

Addison-Wesley Longman 

Inc, 1996. 

[16] R.G. Fichman, C.F. Kemerer, 

Computers 25(10) (1992) 22. 

[17] G. Booch, IEEE Trans. Software 

Engineer. SE-12(2) (1986) 

129


130 

211. 

[18] K. Smith, J. Duckworth, K. 

Harrington, J. Bebel, Am. Lab. 

(Shelton, Conn). 32 (2000) 28. 

[19] R.E. Majoras, W.M. Richardson, 

R.S. Seymour, J. Radioanal. 

Nucl. Chem. 193 (1995) 207. 

[20] I. Jacobson, G. Booch, J. Rum 

baugh, The Unified Software 

Development Process, Addison- 

Wesley Longman Inc, 1999. 

[21] M. Campione, The Java 

Tutorial Third Edition. McGraw- 

Hill, 2002, (http://java.sun.com). 

[22] F. Zenie, Int. Lab. 32(1) 

(2002) 22. 

[23] M. Urbano, M.D. Luque de 

Castro, M.A. Gómez-Nieto, Automation 

of Flow Injection Methods 

in the Winery Industry through a 

Computer Program based on a 

Multilayer Model. Proceedings of 

9th IEEE International Conference 

on Emerging Technologies 

and Factory Automation. Lisbon, 

Portugal, September, 2003. 


Castro, M.A. Gómez-Nieto, 

Trends Anal. Chem. 23 (2004) 

270. 

[25] E. Yourdon, Development 

Strategies 6(12) (1994) 1.

Capítulo 2 

AN OPEN SOLUTION FOR 

COMPUTER CONTROL OF FLOW 

INJECTION ANALYSES IN WINE 

PRODUCTION MONITORING 


Computers and Electronics in Agriculture y ha sido presentado como 

comunicación oral en la 9th International Conference on Emerging Technologies 

and Factory Automation, celebrada en Lisboa (Portugal) entre los días 16 y 19 de 

Septiembre de 2003.

Comput. Electron. Agric., enviado para su publicación Parte I, cap. 2 

AN OPEN SOLUTION FOR COMPUTER CONTROL OF FLOW 

INJECTION ANALYSES IN WINE PRODUCTION MONITORING 

Abstract 

M. Urbano Cuadrado, M.D. Luque de Castro, M.A. Gómez-Nieto 

A computer program for the design and control of automated methods 

for wine analysis based on the Flow Injection (FI) technique is proposed. It has 

been provided with different layers, encompassing a physical oriented layer to 

user and analysis design layers. The user layer permits control the equipment, 

and the design layer allows selection of both the analytical instruments and the 

actions to be performed by these instruments. Thus, the analyses are carried out 

in an automated manner when they are based on FI. The use of Java and XML 

languages endows the proposed system with versatility to be installed in any 

platform. 

Keywords: Automation, Flow injection, Java, XML, Object-oriented system. 

133



The monitoring of wine production is 

supported by a number of chemical 

analyses in both must and wine 

samples. Wineries use official and 

routine methods to determine most of 

the enological parameters. These 

methods have shortcomings in 

relation to both long analysis times 

and little —mostly nil— degree of 

automation. In recent years, new 

methods have been implemented in 

the enological laboratories with the 

aim of overcoming these limitations. 

These methods have often been based 

on Flow Injection (FI) technique 

(Mataix and Luque de Castro, 1998, 

Mataix and Luque de Castro, 2000, 

González et al., 2001, González et al., 

2002). 

Since the beginning of its 

invention (Stewart et al., 1974, 

Ruzicka and Hansen, 1975) FI has 

been considered an easy to automate 

technique. Despite this general acceptance, 

few FI methods are automated 

methods in the strict sense of the word 

“automated”; that is, without human 

intervention and with the capacity for 

134 

making decisions through a feed-back 

system. The degree of automation 

involved in a given FI method varies 

from potential automation —expressed 

as “The system described can 

easily be automated and controlled 

from a personal computer” (Collins et 

al., 2001), or “The proposed method 

is suitable for automatic and 

continuous analysis” (Li et al., 

2001)— to a real automatic approach 

(Danet et al., 2001). There are 

intermediate states such as the use of 

the label “automated” based on the 

single fact of their continuous 

functioning (Delgado et al., 2001, 

Okamura et al., 2001, Nitao et al., 

2001, Solich et al., 2001); data are 

collected by a computer (Zhang and 

Beck., 2001, Shen et al., 2001) or any 

of the FI units works unattended 

(López et al., 2002). 

On the other hand, the use of 

computer programs for the control of 

the analytical equipment is not very 

frequent in wineries due to the fact 

that each analysis is carried out by a 

given hardware manifold. Thus, 

specific software must be used for 

each method.


The work here presented was 

aimed at the development of a 

computer system to control the 

enological analyses based on FI in a 

configurable and open way that 

surpasses 

reviewed. 

the limitations above 

2. Materials and methods 

2.1 Apparatus and instruments used in 

the development and testing of the 

program 

Apparatus and instruments from 

different suppliers have been used, 

namely: a Gilson Minipuls3 peristaltic 

pump (Villiers le Bel, France); a 

Rheodyne 7010 automatic injection 

valve (Elkay, Galway, Ireland) and a 

Rheodyne 5012 automatic selection 

valve (Elkay, Galway, Ireland) 

connected to a Gilson Valvemate111 

valvemate (Villiers le Bel, France); a 

Unicam 8625 UV-Vis spectrophotometer 

(Cambridge, England); a 

Kontron SFM 25 spectrofluorimeter 

(Zurich, Switzerland). 

Several lists of commands 

have been used to control the above 

apparatus and instruments by a 

computer (Gilson, Minipuls 3 

Peristaltic Pump, User’s Guide; 

Gilson ValvemateTM Valve Actuator, 

User’s Guide; Unicam Limited, 8625 

Series UV/Visible Spectrometer, 

User’s Guide; Kontron Instruments, 

SFM 25, User’s Guide). On the other 

hand, Gilson uses a specific communication 

protocol, a serial GSIOC 

port (Gilson Serial input/output 

channel) which works under the 

RS485 protocol. This port hinders the 

establishment of communication 

directly through RS232C ports. A 

506C interface (Gilson, 506C System 

Interface, User’s Guide) from Gilson 

is necessary. This interface makes 

possible the control of Gilson 

hardware (until 64 by only an 

RS232C port). 

2.2 Semi-automated methods used 

The following methods have been 

used in the construction and testing of 

the system: Semiautomatic Flow- 

Injection Method for the Determination 

of Volatile Acidity in Wines 

(González et al., 2001); Determination 

of Total and Free Sulphur Dioxide in 

Wine by Pervaporation-Flow Injec- 

135


tion (Mataix et al., 1998); 

Simultaneous Determination of 

Ethanol and Glycerol in Wines by a 

Flow Injection-Pervaporation Approach 

with Parallel Photometric and 

Fluorimetric Detection (Mataix et al., 

2000); Method for the Simultaneous 

Determination of Total Polyphenol 

and Anthocyan Indexes in Red Wines 

Using a Flow Injection Approach 

(González et al., 2002). 

2.3 Technology used 

For the construction of the system, 

both the object-oriented and the 

evolutionary incremental software 

engineering paradigms have been 

used. Unified Modelling Language 

(UML) (Booch et al., 1999, Rumbaugh 

et al., 1999, Jacobson et al., 

1999) has been used for the modelling 

of the program; C (Herbert, 2000) and 

Java (Anderson and Stone, 1999, 

Campione, 2002, Eckel, 2003) 

languages have been used in the 

development step. On the other hand, 

eXtensible Markup Language (XML) 

(Morrison et al., 2000, Goldfarb and 

Prescod, 2001) has been used as 

storage structure. Technologies for 

136 

joining Java and XML —JDOM, 

JAXB and SAX (Jasnokwski, 

2002)— have also been used. 

2.4 Software architecture for the 

development of flow injection 

analysis 

For the development of the system, a 

physical and logical control on the 

hardware components is necessary, in 

such as way that the situation of each 

component can be known —and 

modified, if necessary— before, 

during, and after a given analytical 

test. 

A layered architecture model 

capable of providing independence for 

working with the different FI 

manifolds and modus operandi has 

been used for software construction. 

As shown in Fig. 1, the system is 

based on a multi-layer model, which 

takes into account the functionality of 

the system at different abstraction 

levels, from the physical or hardware 

aspects to the analyst, who defines the 

analytical method through which the 

analysis is carried out. The layers or 

software components in the proposed 

model are as follows:


Configuration of FI analyses 

depending on requirements of 

the equipment 

Design Layer 

:Java and XML 

components 

Each Java class represents a 

hardware element, assuring 

the logical structure of the 

analysis 

Communication between the 

logical structure and the 

hardware drivers 

Primary functionality for 

instrument control 

Input/output of information in 

the software system 

Operator Layer 

:IU: Interface Java 

Logical Layer 

:Java Classes 

Communication Layer 

:C Procedures 

Drivers Layer 

:Drivers Hardware 

Fig. 1. Layer model of the approach for the design and control of flow injection 

analysis. 

(a) The drivers layer: hardware level 

This layer comprises commands or 

functions both to make the hardware 

operative (e.g. to change the position 

of an injection valve) and to collect 

the primary and physical information 

(e.g. to check the functioning of a 

137


peristaltic pump). It is a supplierdependent 

layer as each supplier does 

a given physical communication 

protocol for the hardware he/she 

fabricates. 

The driver layer can be 

constructed in any language —which 

usually is a low or medium level 

language as, for example, C— and it 

is supplied as a compiled function 

library. The information of how to 

deploy and to use this library was 

provided by the hardware supplier. 

Appendix 1 shows the basic 

functions present in this file for the 

control, not only of the peristaltic 

pump, but also of any hardware from 

Gilson. Two functions are used to 

send two types of commands to the 

hardware units: type I or Immediate 

commands, which use the ICmd 

function and ask for information 

about the hardware state; and type B 

or Buffered commands, which use the 

BCmd function and send operative 

instructions to the hardware units. An 

example of commands defined for the 

control of Minipuls3 [17] are: “K>” 

(B type), which makes the pump work 

clock-wise, and “K” (I type), which 

138 

retrieves the direction and velocity of 

the pumped fluid. 

(b) Base layer: communication with 

the drivers 

This communicates the drivers layer 

with the logical level. It implements 

all the functionality the hardware is 

able to develop and ensures that the 

logical structure of the system 

recognises the hardware elements. 

This layer has been constructed 

using C language to achieve 

optimum exploitation of the 

computation resources and easy 

integration of its programs into other 

computer languages. This last 

characteristic makes it possible to link 

this level layer with the upper 

(Logical layer), constructed with Java 

technology. The modules developed 

with Java invoke methods written in C 

through the JNI. The software 

procedures which constitute the Base 

layer are used from native Java 

methods through a dynamic library 

(.dll) where the object procedures 

developed in C language are located. 

Appendix 2 shows an 

example of this layer through a C


Appendix 1 

INTERFACE int (WINAPI DYNAMIC ICmd)( int unit, 

char const* cmd, char* rsp, int 

maxrsp ); 

// returns 0=OK, 1=channel error, 4=undefined 

command, // 6=bad ID, 8=response buffer overflow 

INTERFACE int (WINAPI DYNAMIC BCmd)( int unit, 

char const* cmd, char* rsp, int 

maxrsp); 

// returns 0=OK, 1=channel error, 2=unit busy, 3=unit 

// buffer overflow, 6=bad unit 


#include "gsioc32.h" 

#include "GilsonMinipuls3.h" 

. 

. 

. 

JNIEXPORT jint JNICALL 

Java_instrumental_bomba_GilsonMinipuls3_setVelocity 

(JNIEnv *env, object jobj, jstring 

velocity, jint device) 

{ 

int result; 

int slave = (int) device; 

char answer; 

const char *command = (*env)->GetStringUTFChars(env, 

speed, 0); 

// Drivers’ function. 

result = BCmd(slave,command,&answer,5); 

(*env)->ReleaseStringUTFChars(env, speed, command); 

return (jint)result; 

} 

function which has as objective to fix 

the velocity of the peristaltic pump 

Minipuls-3. The function interface 

and body of the function can be 

observed in this Appendix 2. The way 

for using the function BCmd of the 

Driver layer can be seen in the 

function body. 

(c) Logical layer: control level 

The logical layer comprises a group of 

Java classes in correspondence with 

139


each of the different hardware 

elements. These classes include in 

their definition a group of methods 

which make possible the control of 

the analysis carried out by FI. 

The technology used in this 

layer provides the system with a 

parallel availability for executing 

several tasks simultaneously and with 

a very precise time control. In 

addition, —and given the logical 

character of this level concerning the 

chemical problem under study— the 

development of a multiplatform system 

for ensuring its open configuration 

is convenient. 

Under these requirements, the 

logical layer has been developed 

using the Java language, which 

provides the Logic layer structure 

which exactly corresponds to the 

conceptual representation of an FI 

system. The usefulness of this Java 

characteristic is used in the construction 

of the java.instrumental.valve 

package, which includes the 

classes representative of the apparatus 

responsible for injecting and selecting 

samples and reagent in the analyser. 

Another characteristic of Java is its 

140 

capacity to support multithread or 

concurrent programs, which enable 

the simultaneous development of 

several synchronised tasks. 

An example of this level is the 

class Unicam8625.java, which represents 

the detector UV-Visible Unicam 

8625. One of the methods defined is 

that in charge of changing the 

wavelength, then adjusting the 

baseline. This method invokes Thread 

.sleep, which stops the program 

execution for an interval specified in 

the input argument. This delay time is 

necessary because after changing the 

wavelength, the spectrophotometer 

requires time to locate the monochromator 

in the new selected 

position and then for adjusting the 

baseline. Appendix 3 shows the Java 

code for this method. 

(d) Operator layer: user interface 

This level controls the execution of 

the different wine analyses carried out 

from the point of view of interaction 

with the user. From this level, the 

system starts to be “tangible” to the 

analyst, as it allows the input and 

output of information in the software.



public void changeWavelength(String port, String 

wavelength, Integer timesleep) 

{ 

try{ 

setWavelength(port, wavelength); 

try{ 

Thread.Sleep(timesleep); 

}catch(InterruptedException 

e){System.out.println ("error in 

Thread.Sleep()");} 

setZeroAbs(port); 

}catch(Exception e){} 

This level has been built using 

java.awt and java.swing packages, 

giving rise to the powerful windows 

based interface. In this layer the user 

can stop, cancel and re-start the 

execution of a given analysis. 

(e) Experiment layer: design of flow 

injection methods 

This level enables the design of an 

analysis. FI analyses consist of a 

series of actions to be developed by 

the instruments involved. In this layer 

the information referred to number 

and type of instruments involved and 

actions to be carried out by these 

instruments is defined. 

This level was developed 

using Java and XML. The first allows 

the building of a user interface for the 

design, and the second provides the 

information storage structure. Thus, 

.xml files are used in order to store the 

definitions of the different analysis. 

On the other hand, the joint use of 

Java and XML (thanks to the JAXB, 

SAX and JDOM programming 

interfaces) makes it possible to build 

both the Java classes corresponding to 

the equipment and the procedure 

through which the analysis is carried 

out, taking into account data stored in 

the configuration file. 

141


2.5 Software developed 

142 

Fig. 2. Example of the interface for FI analyses design. 

The product is a computer application 

for the control of enological analyses 

based on the FI technique and it is 

being used in a winery of the appellation 

d’origine “Montilla-Moriles”. It 

is formed by two tools: the first is in 

charge of either defining or modifying 

the procedure of the analysis and the 

second is in charge of executing a 

given analysis. 

The design tool is a graphical 

interface through which the user 

defines both the instruments and 

actions to be developed. Figure 2 

shows this interface, in which, the 

hierarchical menu representing the 

analysis is on the left. The rest of the 

window is devoted to a framework


where sub-frames for data introduction 

and modification are launched if 

the user clicks the new element button 

or the corresponding left element, 

respectively. 

The execution tool is a 

graphical interface created dynamically 

taking into account the data 

referred to the given analysis that are 

stored in the .xml file. Figure 3 shows 

the control interface for the determination 

of volatile acidity, which 

requires a pump, two injection valves 

and a photometric detector. The 

control application charges a 

subgroup interface —a frame— for 

each type of hardware. In this case, 

the interface will be composed by a 

peristaltic pump frame, a valve frame 

and a detector frame. The interface 

differentiates each hardware element 

when the number of elements is 

higher than 1; that is, in the above 

example, in the valves frame two subframes 

are charged. 

As can be seen in Figure 3, 

there are interface elements devoted 

to output information inherent to the 

given analysis. This information 

involves several aspects, namely 

analysis time elapsed, activities in 

which the instruments is involved at 

present (sample injection, detector 

monitoring, activated rinsing stream, 

etc.), results in a graphical and 

numerical format, anomalous instrument 

behaviour, etc. 

3. Results and discussion 

3.1 Users participation 

(a) Users in the development of the 

software 

The participation of different users — 

analysts and technicians— in the 

development of the system has been 

made possible by the use of the 

evolutionary incremental paradigm. 

Thus, the requirements for interface 

designs, instruments control, functionality 

of the system, etc., have 

been established by the users in 

various meetings. A prototype has 

been installed in the laboratory for a 

first checking step. 

(b) Users in the analytical control 

under the system 

The program makes it possible to 

develop different wine analyses 

143


without users intervention. The user 

defines the analysis and then, the 

definition can be modified if the 

experimental conditions change. 

Figure 4 shows the functionality 

referred to definition and modification 

of analyses. Ethanol_Glycerol.xml — 

for the simultaneous determination of 

ethanol and glycerol—, VolatileAciditiy.xml 

—for the determination of 

144 

Fig. 3. Example of the control interface. 

volatile acidity—, Total_Free_SulphurDioxide.xml 

—for the determination 

of total and free sulphur 

dioxide— and TotalPolyphenol_Anthocyan.xml 

—for the simultaneous 

determination of total polyphenol 

index and total anthocyan index— 

have been defined for testing the 

systems. 

For the execution of the


Chemist 

Analysis 

Definition 

Update of 

Analysis 

analyses the users load the 

corresponding .xml files and the 

sample codes, before clicking the 

“execute analysis” button. 

3.2 Improvements and limitations 

The following improvements have 

been achieved using this automation 

approach: 

include 

include 

include 

include 

include 

include 

Selection of type and 

number of hardware 

Fig. 4. Functionality of the design layer. 

Selection of tasks for 

each hardware element 

Save the definition 

for the new analysis 

and ready for run 

Retrieval of 

analysis by name 

Modification of 

neccesary data 

Save modifications 

and ready for run 

i. Diverse chemical parameters 

are determined under a unique 

program control. Thus, the 

user, after a learning period, 

controls most of the wine 

analyses carried out. 

ii. The program can be modified 

functionally with few changes 

in the architecture due to the 

145


146 

open and scalable structure of 

the system from the logical 

level. 

iii. The program can be installed 

in any platform (Windows, 

Linux, etc.). 

The main limitation of the program is 

the dependence of the drivers and 

communication layers on both the 

instruments and the platform used. In 

this way, the components of many 

instruments or apparatus in the 

laboratory for these levels have been 

built under Windows and Linux. 

Nevertheless, the introduction of a 

new device in the manifold of any 

analytical method makes necessary to 

develop the drivers —usually provided 

by suppliers— and communication 

components for this 

instrument. 


The aim of this work was to introduce 

an open and configurable tool for 

automation of wine production monitoring, 

intended to overcome the 

limitations of previous contributions, 

which have been restricted to a given 

method. 

The development of this 

program has been supported on a 

multi-layer model that endows the 

control level —close to the analytical 

chemist— with maximum independence 

of the physical tasks requirements 

to communicate with 

instruments. The main limitation of 

this model is in the programming of 

the low level layers components for 

each new type of instrument. On the 

other hand, the multi-layer model 

assures a program with open and 

configurable characteristics. 

This contribution surpasses 

the role of the word “semi-automated” 

in many FI methods described in 

enological analysis. Thus, a key tool 

to improve both the quality control 

and analysis times is presented. 

New research lines are open, 

such as remote control (through a web 

application) of the configuration and 

process monitoring, system of aid for 

reagents selection, etc. 



Ciencia y Tecnología (CICyT) is



AGL2000-0321-P4-03). Users of the 

built system are thanked for their help 

in the development of the system. 


Anderson, C.J., Stone, L.B., 1999. 

Manual de Oracle Jdeveloper 

(Oracle Jdeveloper Handbook). 

McGraw-Hill/Oracle Press, 

Madrid. 

Booch, G., Rumbaugh, J., Jacobson, 

I., 1999. The Unified Modelling 

Language. User Manual. Addison-Wesley 

Longman Inc, 

USA. 

Campione, M., 2002. The Java 

Tutorial Third Edition. Mc 

Graw-Hill. 

Collins, A., Nandakumar, M.P., 

Csoeregi, E., Mattiasson, B., 

1999. Monitoring of alphaketoglutarate 

in a fermentation 

process using expanded bed 

enzyme reactors. Biosens. 

Bioelectron. 16(9-12), 765-771. 

Danet, A.F., Cheregi, M., Martinez, 

J., García, J.V., Aboul, H.Y., 

2001. Flow-injection methods 

of analytes for waters. I. 

Inorganic species. Crit. Rev. 

Anal. Chem. 31 (3), 191-222. 

Delgado, F., Fernández-Romero, J.M., 

Luque de Castro, M.D., 2001. 

Semi-automated spectrometric 

method for the determination of 

pectinesterase activity in natural 

and processed juices. Ana. Lett., 

34 (13), 2277-2284. 

Eckel, B. 2003. Thinking in Java 

(Third Edition). Prentice Hall. 

Gilson S.A. LT801121K, 2001. 

Minipuls 3 Peristaltic Pump, 

User’s Guide. www.gilson.com. 

Gilson S.A. LT3331, 2001. ValvemateTM 

Valve Actuator. 

User’s Guide. www.Gilson.com 

Gilson S.A. LT3635, 2001. 506C 

System Interface, User’s Guide. 

www.gilson.com. 

Goldfarb, C., Prescod, P., 2001. XML 

Handbook (Fourth Edition). 

Prentice Hall. 

González, J., Pérez-Juan, P., Luque de 

Castro, M.D., 2001. Semiautomatic 

flow-injection method 

for the determination of 

volatile acidity in wines. J. 

AOAC Int., 84 (6), 1846-1850. 

González, J., Pérez-Juan, P., Luque de 

147


Castro, M.D., 2002. Method for 

the simultaneous determination 

of total polyphenol and 

anthocyan indexes in red wines 

using a flow injection approach. 

Talanta, 56, 53-59. 

Herbert, S., 2000. C: The Complete 

Reference (Fourth Edition). 

McGraw-Hill. 

Jacobson, I., Booch, G., Rumbaugh, 

J., 1999. The Unified Software 

Development Process. Addison- 

Wesley Longman Inc. 

Jasnokwski, M., 2002. Java, XML, 

and Web Services Bible. 

Hungry Minds. 

Kontron Instruments, Spectrofluorimeter 

SFM 25, User’s 

Guide, Zurich, Switzerland. 

Li, B.X., Zhang, Z.J., Liu, W., 2001. 

Flow-injection chemiluminescence 

determination of 

chlortetracycline using on-line 

electrogenerated [Cu(HIO6)2]5 

as the oxidant. Talanta, 55(6), 

1097-1102. 

Lopez, I., Merino, B., Campillo, N., 

Hernandez, M., 2002. On-line 

filtration system for deter 

mining total chromium and 

148 

chromium in the soluble fraction 

of industrial effluents by 

flow injection flame atomic. 

Anal. Bioanal. Chem., 373(1-2), 

98-102. 

Mataix, E., Luque de Castro, M.D., 

1998. Determination of total 

and free sulphur dioxide in wine 

by pervaporation-flow injection. 

Analyst, 123, 1547. 

Mataix, E., Luque de Castro, M.D., 

2000. Simultaneous determination 

of ethanol and glycerol 

in wines by a flow injectionpervaporation 

approach with 

parallel photometric and 

fluorimetric detection. Talanta, 

51, 489. 

Morrison, M., 2000. XML Unleashed. 


Nitao, J.K., Birr, B.A., Nair, M.G., 

Herms, D.A., Mattson, M.J., 

2001. Rapid quantification of 

proanthocyanidins (condensed 

tannins) with a continuous flow 

analyser. J. Agric. Food. Chem., 

49(5), 2207-2214. 

Okamura, K., Sugiyama, M., Obata, 

H., Maruo, M., Nakayama, E., 

Karatani, H., 2000. Automated


determination of vanadium(IV) 

and (V) in natural waters based 

on chelating resin separation 

and catalytic detection with 

bindschedler’s green leuco base. 

Anal. Chim. Acta., 443(1), 143- 

151. 

Rumbaugh, J., Jacobson, I., Booch, 

G., 1999. The Unified Modelling 

Language. Reference 

Manual. Addison-Wesley Long 

man Inc. 

Ruzicka, J., Hansen, E.H., 1975. Anal. 

Chim. Acta 78, 145. 

Shen, H.L., Grung, B., Kvalheim, 

O.M., Eide, I., 2001. Automated 

curve resolution applied to data 

from multi-detection instruments. 

Anal. Chim. Acta., 

446(1), 313-328. 

Solich, F., Ogrocka, E., Schaefer, U., 

2001. Aplication of automated 

flow-injection analysis to drug 

liberation studies with the Franz 

diffusion cell. Pharmazie, 

56(10), 787-789. 

Stewart, K.K., Beecher, G.R., Hare, 

P.E., 1974. Fed. Proc. Fed. Am. 

Soc. Biochem. 33, 1429. 

Unicam Limited (Division of 

Analytical Technology Inc.). 

9499 230 18011 910916, 1991. 

Unicam 8625 Series UV/Visible 

Spectrometer, UK. 

Zhang, B., Beck, H.P., 2001. A rapid 

and sensitive method for the 

fluorimetric determination of 

phosphate by flow-injection. 

Anal. Lett. 34(15), 2721-2733. 

149

Capítulo 3 

TRIGGER-BASED CONCURRENT 

CONTROL SYSTEM FOR 

AUTOMATING ANALYTICAL 

PROCESSES 

El contenido de este capítulo ha sido publicado en la revista Trends in Analytical 

Chemistry, 23 (2004) 370-384.

Trends Anal. Chem., 23 (2004) 370 Parte I, cap. 3 

TRIGGER-BASED CONCURRENT CONTROL SYSTEM FOR 

AUTOMATING ANALYTICAL PROCESSES 

Manuel Urbano Cuadrado, M. D. Luque de Castro, Miguel Ángel Gómez-Nieto 

Abstract 

A system for the design and automation of analytical processes based on 

concurrent actions developed by the equipment involved is here presented. The 

system thus developed enables the control of any hardware that acts during a 

given process. The control of the process is based on time and state parameters; 

thus the continuous monitoring of these parameters is necessary in order to 

achieve the pursued aim. To take decisions without human intervention is 

possible due to a subsystem in charge of an intelligent control of the analytical 

process. The overall design of the analytical process is divided into two steps: the 

design of the time-based process and the design of the state-based control 

(triggers). The latter also manages the exceptions and alerts activated during each 

experiment. Automation is carried out by a procedure constructed from the data 

design. Computer technologies like UML, Java, XML, etc. have been used for 

program construction. 

Keywords: Experiment automation, Instrumental control, Concurrent 

processing, Trigger-based control. 

153



Automation of chemical or analytical 

processes is aimed at the same 

objectives as in other areas, namely: 

shortening the time required for 

developing a given process, avoiding 

human intervention and human errors, 

and increasing the quality of process 

control [1-3]. The International Union 

Pure and Applied Chemistry (IUPAC) 

distinguishes between automatic and 

automated systems [4]. The formers 

develop programmed actions, but they 

can not take decisions by themselves 

without human intervention; decisions 

that can be taken by the latter. The 

distinction does not appear clear in 

chemical nor analytical publications. 

Thus, the word ‘automation’ is 

referred to computer control of some 

step of the process; usually data 

collection and treatment [5-9]. With 

the exceptions of the wrong use of the 

word, the computer control has been 

widely implemented in the chemical 

field. 

Concerning analytical processes, 

where the ultimate objective is 

the measurement of one or several 

154 

components in a sample, the process 

is usually divided into three steps: 

sample treatment –which involves 

physical and chemical operations 

focussed on optimum monitoring–; 

the measurement step –which produces 

bio-chemical information from 

the sample–; data collection and 

handling for obtaining proper information 

on the system under study. 

Most times, the main shortcoming of 

computer programs in the literature is 

the necessity for a specialised user for 

taking decisions. There are in the 

literature few systems which are able 

to substitute human decisions for 

avoiding user involvement in the 

process [10-14]. Although these 

approaches reach the objective of 

avoiding human decisions, they have 

a series of limitations regarding to 

scalability, open characteristics and so 

on [13]. Some of these proposals are 

closed systems, where the production 

rules, variables and other constructions 

to define the flow control of 

the process must be defined a priori 

and future variations in the 

instrumental used, analytical problems 

to be solved, etc., affect the system.


Thus, it is necessary to reprogram a 

large part of the system due to its low 

scalability. Regarding the purpose of 

the works described in the literature, 

the majority of them use control code 

construction or production rules but as 

simple logical construction (the inference 

engine, the knowledge 

propagation, etc. are not defined). If 

these concepts are not defined, the 

systems are simply based on triggers 

in order to avoid overall human 

intervention. To take decisions involves 

knowing the situation of the 

system from the values of state or 

time variables and either acting or not 

on the system depending on these 

values [15-17]. 

The research here presented is 

a program, the functionality of which 

is on the control of analytical, 

determination processes. The control 

structures used are based on time and 

state system domains; thus triggers 

are used for taking decisions through 

both aspects: time and state. Other, 

more innovative aspect of the 

functionality of the program is that the 

user can design the control structures 

based on both the state of the system 

and time parameters. Thus, the user 

can define the flow of the process 

control based on a series of state 

variables, functions and processes. 

The scalability, portability, versatility 

and free distribution of the developed 

system make it applicable to analyses 

of different nature, in addition to the 

present application to the analysis of a 

number of wine components. 

A description of the open 

software developed for the design and 

automation of analytical processes is 

described on: first, a classification of 

the processes as a function of the type 

of control —based on time and state 

of the system— is the subject of 

section 2. Section 3 describes the 

model proposed for representing the 

time-governed processes; meanwhile, 

section 4 describes the model and 

system of a state-governed control of 

the system. Finally, the part of the 

system which translates the information 

the user introduces in the 

design of the analytical process to 

control instructions is described in 

section 5. A discussion of the results 

obtained and the applications developed 

with the proposed system 

155


constitutes the last part of the article. 

2. System for automation of 

analytical processes by a time 

and state-based model 

A process can be defined as a 

sequence of related actions carried out 

along an interval with the objective of 

producing a change in a given system. 

From the point of view of process 

automation and control it can be 

divided into two types: 

Type I: Processes dependent on the 

system state 

The actions which constitute an 

overall process are developed along 

the time as a function of the 

intermediate state reached by the 

system due to each previous action. 

Automation of these processes 

involves continuous control of the 

system by inspection and analysis of 

the variables or parameters which 

establish its characteristics. So, the 

actions are adopted after checking the 

value or state of the given variables. 

Traditionally, process automation 

has been carried out by 

156 

systems based on events (actions). A 

series of rules are defined in these 

systems taking into account the values 

of state variables. With this aim, it is 

mandatory to construct a monitoring 

subsystem for checking the fulfilment 

of the established rules; if yes, the 

program executes the actions 

associated to the given rule. 

Type II. Processes dependent on the 

time within which the actions are 

developed 

In this group, the actions are 

developed independently of the 

present situation of the system. This 

kind of processes are characterised by 

actions developed in a sequential or 

concurrent manner along the time: 

they only depend on the time elapsed 

since the starting point of the process 

and are independent of the situation of 

the system. So, a module for control 

and measurement of the elapsed time 

is mandatory. 

Evidently, a given process can 

belong to types I and II; these 

processes are based on events 

determined by both the state of the


system and the time elapsed from the 

start of the process. State-based and 

time-based variables will be present in 

the rules, in this case. 

Automation of Type I pro 

cesses is more complex than those of 

Type II, and the control of the given 

process requires higher computer 

costs, due –among others– to the 

following aspects: 

• The number of rules for 

action execution is, usually, 

higher. 

• Continuous monitoring of the 

variables or parameters which 

determine the situation of the 

process is mandatory in order 

to use the information thus 

obtained for ordering the 

corresponding action. 

• The module which analyses 

the state of the system is more 

complex. It requires more 

computer resources because a 

higher number of variables 

are involved, and a higher 

possibility of errors, 

inconsistencies and redundancies 

exists. In addition, it 

is mandatory to establish a 

relationship between the 

defined rules when more than 

one must be fulfilled in order 

to determine either the action 

to be executed first or the 

concurrent execution. 

Nevertheless, it is always 

possible to convert a type I process to 

a type II process. For this, it is only 

necessary to know the foreseeable 

states through which the system 

evolutes with time; delete this 

information from the rules in order to 

consider only time-related information, 

and monitor the state of the 

process only at its end. Most 

processes belong to this group (timebased 

processes -TBP-); that is, the 

user knows the initial state of the 

system and he/she wants to know the 

final state through measurement of 

signals provided by one or several 

detectors: the intermediate state is not 

interesting or it is known. If the 

intermediate state is of interest, only it 

is necessary to consider it as a final 

state of the process. 

Based on the above premises, 

a general model for the design of 

157


158 

a 0 (t=0) 

S 0 

t=t 0 

S 1 S 2 S F 

t=t 1 

Hardware 

a*(t=0) a*(t=t 1 ) a*(t=t 2 ) a*(t=t F ) 

a F (t=t F ) 

t=t 2 

t=t F 

Start End 

Fig. 1. General model to design analytical processes. 

analytical processes can be proposed 

based on the structure shown in Fig.1. 

The initial state of the system, 

S0, is known. Then, the system is 

subject to a series of actions to be 

executed in either a sequential or 

concurrent manner along an interval 

to reach a final state SF, in which the 

characteristics of the system are 

monitored again. Automation of this 

process involves development of the 

necessary actions without human 

intervention. For this, specific equipment 

or hardware is in charge of the 

development of these time-based 

actions a*. The initial action, a0, can 

be executed by the user, time-based or 

determined by either the characteristics 

of the process or its state, S0. 

When the process reaches its final 

state (t=tF) a new series of actions are 

executed by the hardware, and a 

special action will determine the end 

of the process (aF). 

Under this model, time is the 

parameter determinant for the actions 

to be carried out by the hardware. 

Despite these actions are executed at a 

given time, their effect can be applied 

along the time of the process, and 

several actions can be executed simultaneously; 

thus allowing concurrent 

hardware control. 

In addition, action aF can 

determine the start of the process (see 

Fig.1) by considering a new initial 

state of the process. In this way, it is 

possible to represent, under a time-


based model, processes which require 

evolution monitoring. In this case, it is 

necessary to consider as final state 

that corresponding to a given 

evolution point, to apply the final 

action (aF) and, then, re-start the 

process control. From either an 

intermediate state SX or a final state 

SF, total automation of the process 

requires knowing the value of the 

process variables in order to use this 

information to take decisions affecting 

the process. The decisions will be 

executed if the analysis of the 

variables dictates the necessity of 

control actions in order to act on the 

process environment (user alarm); halt 

the process (by stopping the main 

execution process) or to conditioning 

the system for a more accurate and 

error-free result (by eliminating 

potential interferences). 

3. Process design system 

The design and construction of a 

software system for automation and 

control of chemical analytical processes 

has been developed under the 

above-proposed process model. The 

main aim has been to supply 

analytical chemists non-experienced 

on informatics with a tool to design 

the actions to be carried out by both 

the hardware and analytical equipment 

during a given process. 

The system thus developed is 

based on a layer architecture proposed 

by the authors [18], which enables 

control of any hardware and analytical 

equipment. Figure 2 shows –by a 

component diagram [19]– the architecture 

used. The hardware participating 

in a process is physically 

controlled by drivers supplied by the 

manufacturers. Usually, these drivers 

are functions or compiled libraries 

developed by a procedural language 

(e.g. C, Basic, etc.). From these 

libraries, functions in C language in 

charge of managing the functionality 

provided by the hardware are 

constructed. These functions, which 

constitute the Communication level, 

allow modulating the hardware func 

tionality, thus making independent 

each of the physical actions to be 

developed by the hardware. A library 

with the group of defined functions is 

constructed for each specific 

159


160 

This layer is often 

developed by 

manufactures 

Hardware 

Drivers Library 

Hardware 

Libraries built with 

language C 

Drivers Communication 

Drivers Library 

f1 

f2 

f1 

f2 

Classes and packages built 

with Java 

Logical 

Package 

Class 

properties 

methods 

m1() 

m2() 

...... 

mk() 

Class 

properties 

methods 

m1() 

m2() 

...... 

mk() 

Fig. 2. Layer architecture for the design and control of analytical 

experiments. 

hardware; thus allowing each type of 

hardware to be considered as an 

object to be managed in each process 

where the given hardware is used. 

Each of these static libraries is 

represented by an object class by 

definition of the corresponding class 

at the Logical level (see Fig. 2). In 

these object classes: 

• The attributes represent each


parameter which takes part in 

functions defined at the 

Communication level (so they 

are defined at the interfaces of 

the drivers provided by the 

hardware manufacturers). 

• The methods correspond to 

each of the functions defined 

at the Communication level 

and they are present in the 

static libraries developed for 

each type of hardware. 

The classes constructed at this 

Logical level can be compiled as 

packages for a better portability of the 

proposed system. 

The construction of these classes 

requires a specific knowledge of C 

and Java languages [20-22]. The latter 

has been used due to the availability 

of Java-based software for both object 

orientation and portability. The 

authors have developed a number of 

open-code packages for managing 

different type of hardware (namely, 

peristaltic pumps, injection and selection 

valves, photometric detectors, 

etc.) [22-24], which can be used by 

other researchers for the control of 

either other hardware or analytical 

equipment. 

The analysis and development of 

the proposed system for process 

design have been carried out under the 

object orientation paradigm, which 

allows to consider any 'logical' or 

'physical' element forming part of the 

problem domain as an object class. 

Under this point of view, the developed 

system considers an object 

series (object class) as actors in the 

development of an automated process. 

3.1. The analytical equipment and 

hardware 

The analytical equipment also includes 

the hardware in charge of 

preparing the analytical system for 

monitoring and performing. Despite 

IUPAC nomenclature differentiates an 

instrument (as that providing chemical 

information, as a detector) from an 

apparatus (as that used for performing 

steps which do not provide chemical 

information), most analytical equipment 

integrates both types of devices. 

As commented before, the 

Driver, Communication and Logical 

161


levels enable control at a low, 

medium and high abstraction level, 

respectively. Control of a given 

hardware or equipment can involve 

from a mechanical operation (e.g. to 

impel a liquid to a given module for 

the development of a given step) to 

set the values of instrumental 

parameters (e.g. working potential or 

wavelength). 

3.2. Constants and variables 

Constants in the proposed system are 

parameters defined by the designer, 

which represent properties of both the 

control system and the process 

unchangeable along the process. They 

can be numerical and logical, and 

used for taking decisions during the 

process development. 

On the contrary to constants, 

variables are parameters that change 

along the process. They can also be 

logical and numerical. 

Any number of constant and 

variables necessary to control can be 

defined during the process design. 

The developed system has defined by 

default a number of constants and 

162 

variables in order to facilitate the 

process design. Examples of these 

variables are as follows: 

• Logical constants False and 

True, which represent these 

logical values. 

• Numerical constants, c_ProcessTime 

and c_Process- 

Number, which represent the 

process time and the number 

of times the process is 

repeated, respectively. 

• The variables v_ProcessTime 

and v_ProcessNumber, which 

have a zero value at the start 

of the process, and are 

modified along the process 

and represent the elapsed time 

of the process and the number 

of processes developed. 

Both constants and variables 

are used in structures to control the 

process, as described below. 

3.3. Functions 

Action sequences are repeated along 

the development of a given process. 

These sequences can be encapsulated 

as a function, thus simplifying the


Chart 1 

/* Function to adjust the spectral parameters in a 

spectrofluorimeter */ 

void setSpectralParameters(String id_port, 

String wavelength_exc, 

String wavelength_em) 

{ 

/* Adjust the excitation wavelength */ 

setExcitationWavelength(id_port,wavelength_exc); 

/* Wait a given time to enable the monochromator reaches 

the position */ 

try{ 

Thread.sleep(5000); 

}catch(InterruptedException e){} 

/* Adjust the emission wavelength */ 

setEmisionWavelength(id_port,wavelength_em); 

} 

/* Function for enabling the probe sampler to go to the 

vial which the system under study is in */ 

void goVial(String id_port, 

String position) 

{ 

/* Go up the probe sampler */ 

upVial(serial_port); 

/* Wait a given time to enable the probe sampler goes up */ 

try{ 



/* The probe sampler goes to the specified vial */ 

vial(id_port,position); 

/* Wait a given time to enable the probe sampler goes to 

the vial */ 

try{ 



/* Go down the probe sampler */ 

downVial(id_port); 

/* Wait a given time to enable the probe sampler goes down 

*/ 

try{ 



} 

163


overall control. These functions or 

actions sequences can be of two types; 

namely: 

a) Those formed as a single action or 

actions sequence developed by 

one or several hardware involved 

in the process; thus a control 

structure similar to the structure 

corresponding to the analytical 

process is obtained. For example, 

those functions for establishing a 

preset velocity of the peristaltic 

pump, fixing the temperature in a 

thermostat, sampling data from a 

detector, etc. Chart 1 shows 

examples of this type of functions. 

b) Those formed by logical or 

mathematical operations on the 

variables and/or constants 

(parameters involved in the 

process). For example, start a 

repetitive sequence, determine if 

the sequence has been developed 

at the preset times, etc. Chart 2 is 

an example of this type of 

functions. 

3.4. The structure of the process 

control 

The flow or actions sequence to be 

164 

developed along a process only 

controlled by the time (Type II) can be 

described by the structure represented 

in Chart 3. 

Where do and while are 

reserved words. They inform on the 

start and end of the process for 

execution of the actions defined in 

Process-body until the function 

isEndProcess() returns true. 

This will happen when the elapsed 

time is longer than the duration of the 

process. This function is defined in 

Chart 4. As can be seen, some of the 

functions defined in Chart 2 are used 

in Chart 4. 

The structure of the process 

body is represented in Chart 5. Under 

a control model governed by time, 

once the process has started, the 

control on the analytical equipment is 

performed and the time from the start 

of the process at which the given 

actions must be executed by each 

hardware element involved are 

indicated. 

For each time value the following 

can be specified: 

1. The action to be developed by


Chart 2 

/* Function for reset to zero the process time. It is 

called if the process has to be repeated when final 

action is executed */ 

void resetProcessTime() 

{ 

v_ProcessTime = 0; 

} 

/* Function to know if the system is in a time node in 

which the presence or absence of actions to be executed 

has been analysed. Thus, to call the same function in 

the same second is avoided */ 

boolean isAnalysedNode() 

{ 

if (analysed_node |= false) 

return true; 

else 

return false; 

} 

/* Function to acquire the present time from the object 

which manages the time (This object is a java thread) */ 

int getElapsedTime() 

{ 

long time_elapsed; 

time_elapsed = chrono.getTime(); 

time_elapsed = time_elapsed/1000; 

return (int)time_elapsed; 

} 

Chart 3 

/* Structure corresponding to a single process governed by 

the time*/ 

do{ 

Process-body 

}while ( isEndProcess() != True) 

the hardware. The methods 

defined by the classes (Logical 

level) corresponding to the 

analytical equipment used will 

be used. For example, the class 

representing the model Minipuls3 

of peristaltic pumps [23] 

is endowed with a method 

165


Chart 4 

/* Function to know if the end of the process has been 

reached */ 

boolean isEndProcess() 

{ 

v_ProcessTime = getElapsedTime(); 

if(v_ProcessTime


Chart 6 

/* This class represents the peristaltic pump Minipuls3*/ 

class Minipuls3Pump { 

/* Classes code body */ 

native void setSpeed(String id_port, String speed); 

/* Classes code body */ 

} 

Chart 7 

/* Function incharged of setting true the variable 

is_monitoring which represents if the detector is 

monitoring */ 

void setMonitoring() 

{ 

is_monitoring = true; 

} 

execute an action that communicates 

the appropriate 

command to the hardware for 

this purpose. This action is 

composed by two functions: the 

(a) type function setting the 

detector monitoring and the (b) 

type function shown in Chart 7. 

As can be seen, this function 

updates the value of the variable 

that represents the detector state 

(on/off) and it is useful when 

the process ends or it is 

interrupted. If the detector is 

monitoring, the program will 

stop monitoring before turning 

off the hardware remote control. 

As an example, the use of the 

constants, variables, functions and the 

control structures is described by an 

experiment carried out by a peristaltic 

pump that propels the chemical 

system under study to the detector. A 

valve inserts a reagent that endows the 

chemical system with the features to 

be monitored by the detector. The 

signal intensity is proportional to the 

concentration of the target component. 

Time data and hardware activities 

are shown in Table I. The 

control flow for this analysis begins 

when the setInitialActions() function 

167


Table I. Activities to be carried out by the hardware and their execution times. 

168 

TIME (S) HARDWARE ACTIVITY 

0 

20 

40 

Peristaltic pump Turn at preset speed. 

Injection valve No injection 

Electroanalytical detector Set potential 

Injection valve Injection 

Electrochemical detector Start monitoring 

Peristaltic pump Stop 

Injection valve No injection 

Peristaltic pump Stop Monitoring 

is invoked. This function orders to 

each hardware involved in this 

process to execute its initial action 

and also calls to the startChrono(), 

which initiates the part of the program 

in charge of time monitoring. Each 

second, the v_ProcessTime is updated 

and the instrumental list is read out in 

order to verify if there are some 

actions to be executed. As can be seen 

in Table I, there are actions to execute 

in seconds 20 and 40. After the last 

actions —and as the function 

isEndProcess() returns true because 

the value of the variable 

v_ProcessTime is higher than the 

value of the constant c_Process- 

Time— the endProcess() function is 

invoked. This function calls to other 

functions: namely startAgain() and 

exitRemoteControl(), in charge of 

verifying if the process is executed 

again and, if not, it turns off the 

hardware remote control. 

3.5. Designing analytical processes 

In this section, a time-based control of 

the analytical process is described. In 

the next section the analytical process 

design based on system state control 

structures will be described. In all 

instances, the configuration of the 

process is divided into two steps: 

definition of hardware; and, specifications 

of hardware actions. 

Because these actions can use


Fig. 3. Interface to design time-based processes. Configuration of the 

hardware involved in the process. 

constants, variables and functions to 

control the process, these control 

elements must be defined. 

Figure 3 shows a program 

interface that belongs to the subsystem 

in charge of defining the 

analytical process. As can be seen, the 

interface is divided into three areas: 

(a) an upper area where the menu bar 

is placed and which has all the 

functionality related to the process 

design, (b) a left-side area that shows 

all the components and elements 

involved in the process through a 

hierarchical menu, (c) a central 

framework area in charge of data 

input/output concerning the different 

elements and components. 

For example, if a new element 

(a detector) is added to the process 

definition, a window corresponding to 

this hardware component will be 

placed in the central framework. As 

can be seen in Figure 3, the users can 

introduce the hardware data: identifier, 

brand and model, hardware 

169


170 

Fig. 4. Interface to design time-based processes. Specification of the actions 

to be developed by the hardware. 

type, communication port, and initial 

action. When the data introduction is 

finished and validated by the users 

and the program, respectively, the 

node is charged in the configuration 

tree (hierarchical menu). 

After the hardware involved 

in the analytical process is defined, 

the hardware actions can be specified. 

With this aim, the icon corresponding 

to the loading of a group of actions 

per time unit must be selected from 

the tool bar. A window, through 

which the user introduces hardware 

actions at a preset time, is charged in 

the framework as can be seen in 

Figure 4. The user introduces the time 

elapsed between the beginning of the 

process and the actions to be executed 

by the configured hardware after this 

interval. In the hierarchical menu that 

shows the process structure as a 

function of time, the corresponding to 

this time node is created or updated.


In time-based actions, the user can 

add other control elements like 

functions, constants, variables and 

triggers, which are also defined by the 

user as shown later on. Both the 

hierarchical menu and a window in 

the central framework manage 

information about the time-dependent 

actions of the hardware, as can be 

seen in Figure 4. Once the process is 

defined, its definition is stored in a 

file .xml, the structure of which is 

described in the next section. 

4. Control based on the system 

state: triggers 

In automating analytical processes, a 

model based on the time during which 

the hardware executes its actions is 

sometimes not enough. The state of 

the system must also be known at 

each time in order to take decisions 

depending on the given state. These 

decisions both vary the control flow 

of the process and control the 

exceptions produced. 

For this reason, the developed 

software has a state-based control 

consisting of a triggers sub-system. A 

trigger is a logical rule that has the 

general structure shown in Chart 8. 

Depending on the moment at 

which the premises are evaluated, 

there are two types of triggers: 

1. Triggers evaluated at the end of 

the process: their premises are 

evaluated when the process 

governed by the time is executing 

its final action (aF). It is the 

simplest type because the trigger 

does not alter the control flow of 

the process. (see Chart 5). 

2. Triggers evaluated in a way concurrent 

to the process: their 

premises are evaluated in parallel 

to the process and they have the 

capacity to stop or modify the 

control flow of the process. Their 

development and implementation 

has a high degree of difficulty 

because it is necessary both to 

synchronise the access to the 

status variables and their modification 

and create a parallel 

thread of execution. Problems 

related with concurrent access to a 

resource —either a variable or a 

method—, possible redundancy 

and high computation necessities 

characterise these triggers. 

171


Chart 8 

/* Trigger identifier or name */ 

NameTrigger 

/* General structure of the triggers */ 

IF 

/* Logical expression or premise that has the form: 

(variable | constant) operator (variable | constant) */ 

premise 

THEN 

/* Action executed by the program if the premise is true. A 

trigger action can be a function or a new process that 

has the structure shown in Chart 3 */ 

action; 

Figure 5 shows the classes 

diagram [19] that represents the 

structure of the triggers sub-system. 

The TriggersManagement class 

manages all the triggers defined by 

the user to control the process through 

both manageConcurrentTriggers and 

manageNoConcurrentTriggers methods. 

The Trigger class has a method 

in charge of evaluating the defined 

premises, and only if all the premises 

are fulfilled (this is equivalent to carry 

out the logical operation on the 

group of premises), Triggers Management 

triggers off the process 

corresponding to the target trigger. 

The Premise class is 

evaluated from its structure (see 

Figure 5), as formed by two 

measurable nodes which collect the 

172 

value of a constant or variable state, 

and an operator node in charge of a 

logical operation on the previous 

nodes. This structure, also shown in 

Chart 8, represents the system 

properties and the process control 

through the constants and variables 

and the actions corresponding to the 

trigger, respectively. 

Variables involved in the 

premises are consulted and modified 

in the triggers analysis and the actions 

of the main process (governed by 

time), respectively. For this reason, 

consulting and updating must be 

subject to a concurrent control. 

Constants such as c_Upper- 

_Value and c_Rinsing constants are 

defined by the user who designs the 

analytical process. They represent the


TriggerManagement 

Trigger[] triggers 

void : manageConcurrentTriggers( ) 

void : manageNoConcurrentTrigges( ) 

boolean variable 

1..1 

LeftNode 

Object variable 

abstract Object: evaluate( ) 

void : resetValue( ) 

1..1 

BooleanLeftNode 

Object: evaluate( ) 


1..1 

1..n 

Trigger 

String id_trigger 

1..1 

1..1 

TriggerProcess 

Action[] actions 

int trigger_type 

Function[] functions 

Premise[] premise 

TriggerProcess process 

boolean: evaluate( ) 

1..1 

1..n 

Premise 

void: executeProcess( ) 

1..1 int premise_type 

LeftNode left_side 

RigthNode rigth_node 

OperatorNode operator 

boolean: evaluate( ) 

1..1 

1..1 

1..1 

1..1 

OperatorNode 

RigthNode 

int operator 

Object value 

int: getOperator( ) abstract Object: evaluate( ) 

void: resetValue( ) 

1..1 

DoubleLeftNode 

double variable 



1..1 

BooleanRightNode 

boolean variable 



1..1 

DoubleRightNode 

double variable 



Fig. 5. Class diagram with the structure of the trigger subsystem. 

upper limit of the measured property 

and the logical value for the necessity 

of a rinsing process, respectively. 

Variables v_Upper_Value and v_Rinsing 

variables, updated during the 

process, represent the value, at time t, 

of the measured parameter and 

whether or not a rinsing process is 

necessary after the process governed 

by the time, respectively. 

Actions associated to a trigger 

can be of two types: 

• A process governed by time as 

that described in section 3 (see 

Chart 3), so that an analytical 

process is designed as a group of 

steps that execute a list of actions 

in either a series or parallel way. 

• Functions endowed with the same 

considerations as those described 

in section 3. These functions must 

be previously defined in order to 

encapsulate the hardware actions 

and/or update or calculate 

operations. Two groups of functions 

can be emphasized: 

Functions that encapsulate a 

sequence of actions to be 

173


Chart 9 

/* Function that stops the main process */ 

void stopProcess() 

{ 

for (h=0; h


Fig. 6. Interface to design triggers. Premises and the trigger process.. 

the logical sequence in the design of 

an analytical process is that the user 

defines constants, variables, functions 

and the control flow of the process 

under a view governed firstly by the 

time (as described in section 3). Then, 

the user defines the trigger subsystem, 

which uses the elements defined 

previously. The general steps sequence 

to define a trigger is as 

follows: 

• The first step consists of 

identifying the trigger with a 

name and specifying if the trigger 

is concurrent or not. Figure 6 

shows the definition of a trigger 

named ValueControlTrigger characterised 

as a concurrent trigger. 

• In the following step, the user 

defines one by one all the 

premises that constitute the 

trigger. For that, state constants or 

variables are chosen from the 

defined group of constants and 

variables in a previous step. Users 

can select both a constant and 

175


176 

variable, or two variables, and 

then, both nodes are related by a 

relational operator. Figure 6 

shows the choice of the 

v_Upper_Value variable and the 

c_Upper_Value constant from the 

interface elements Parameters. 

The relational operator “>” is 

selected from the interface 

element Operator. 

• The last step is the definition of 

the process associated to the 

trigger. If the trigger process is a 

new group of hardware actions, 

this process is defined in a similar 

way as the one described for the 

main process (see Chart 3). The 

functions are selected from the 

group of functions defined previously 

(see Figure 6). 

4.2. An example of state-based 

control 

The industrial monitoring of the free 

and total sulphur dioxide in wines 

from Montilla-Moriles appellation 

d’origine is described as an example 

of state-based control. 

The analyser for this parameter 

consists of the following 

hardware elements, which are 

susceptible of a physical and logical 

control: an automated sampler, a 

peristaltic pump, two injection valves 

and a UV-VIS detector. These ele 

ments are configured through the 

design interface (see Figures 3, 4 and 

6). This interface allows introducing 

identifiers, characteristics, and serial 

ports. It is also necessary to specify 

the number of steps to be developed 

in the analysis. In this way, n 

evaluations can be carried out, which 

provide a continuous control of this 

parameter. Then, parameters, actions 

and functions necessary to build the 

process governed by the time are 

defined. The user –the analyst– selects 

the actions and functions defined 

previously and assigns a time value to 

each. 

Cleaning actions are also 

necessary along the analytical process 

in order to obtain accurate, reproducible 

and error-free results. 

These cleaning actions are more 

important when the process is executed 

n consecutive times because of 

the risk of cross-contamination. 

In order to control cross-


contamination, a concurrent trigger is 

defined. This trigger involves a 

rinsing process which is triggered off 

when a signal higher than a constant 

preset value is monitored (this can be 

due to a possible contamination). 

With this aim, two premises —a 

process and updating function— are 

used. 

The first premise is formed by 

v_Upper_Value variable and c_Upper 

_Value constant, which have been 

described previously. These state 

parameters allow to control that the 

rinsing step is not triggered off until 

the limit value is surpassed. The 

second premise is formed by v_PreviousRinsing 

variable and c_Previous- 

Rinsing constant. Both parameters are 

defined in a logical domain and allow 

to control if a rinsing process has been 

executed as a previous process; in this 

case, the rinsing process is not 

repeated, because the problem is not 

related with a cross-contamination 

phenomenon. 

The rinsing process is 

composed by a set of actions to be 

executed by the hardware elements. 

The purpose of these actions is to 

propel a rinsing solution through the 

analyser in order to clean the conflictting 

points (mainly dead volumes in 

valves, tubes, etc). 

Finally, setPreviousRinsing( ) 

is the function in the trigger that 

assigns true to v_PreviousRinsing, 

thus making possible the control of 

the second premise. Moreover, this 

trigger has alert functions. 

5. Automation of analytical processes 

Automation of analytical processes 

involves to transform the information 

introduced by the user through the 

design sub-program in a procedure 

consisting of a group of actions 

carried out by the hardware at preset 

times. In addition to this automatic 

system for experiment control, this 

information should have a domain 

concerning the construction of the 

control structures based on triggers. 

With these state-based structures, the 

program can make decisions without 

human intervention according to the 

result of the analysis of these 

parameters. Thus, the program is an 

automated system for the control of 

analytical processes. 

177


Experiment 

178 

+ 

1..n 

1..n 

0..n 

0..n 

0..n 

0..n 

Hardware 

Action 

Constant 

Variable 

Function 

Trigger 

+ 

+ 

+ 

+ 

+ 

+ 

1 

id_hardware 

1 

model 

1 

id_port 

1 

type 

0..n 

other_data 

1 

id_hardware 

1 

activity 

0..1 

value 

1 

time 

1 

id_constant 

1 

type 

1 

value 

1 

id_variable 

1 

type 

1 

initial_value 

1 

id_function 

0..n 

parameter 

1 

id_trigger 

1 

trigger_type 

1 

left_node 

- 

1..n 

Premise 

+ 1 

operator_node 

1 

right_node 

- 

1 

Trigger 

Process 

+ 

0..n 

0..n 

Function 

Action 

+ 

+ 

0..1 

0..1 

0..1 

0..1 

1 

0..n 

1 

1 

0..1 

1 

constant 

variable 

constant 

variable 

id_function 

parameter_values 

id_hardware 

activity 

value 

time 

Fig. 7. XML structure where the process configuration is stored. 

Before considering how the 

information is transformed into the 

control process, the storage structure 

for information is described. The 

developed program uses the mark 

language XML (eXtensible Markup 

Language) [26,27], which is endowed 

with marks that inform about the 

meaning of the elements. The 

information corresponding to the 

configuration of the analytical process 

is saved in an .xml file (when the user 

has introduced the necessary data). 

Subsequent modification, if necess-


ary, of this information is allowed. 

The hierarchical structure of the 

definition of an experiment is shown 

in Figure 7. The upper element is 

ConfigurationProcess. It is formed by 

the Hardware, Action, Variable, Constant, 

Function and Trigger elements 

(defined in previous sections). These 

elements are endowed with characteristic 

properties. 

The construction of the control 

process is developed in two steps: 

a first step in which the .xml file is 

loaded, and a second –derived from 

the first one– in which the executable 

code should understand all the 

information generated in the previous 

reading and execute the corresponding 

sentences. These steps are considered 

in detail below. 

Step 1: Reading of the file containing 

the definition of the analytical 

process 

The hybridisation between Java and 

XML by technologies such as JAXB, 

SAX and JDOM [28] facilitates the 

construction and reading of the XML 

file from a Java structure. The use of 

JAXB allows all the elements of the 

storage structure become Java classes. 

A Java class is built in the 

development of the program to 

manage all the classes resulting from 

the file XML and connect with the 

subsequent step. This class, which is 

called ManageXml.java, loads the file 

through the method unmarshal(File) 

from ConfigurationProcess.java. This 

class comes from the main element of 

the XML structure and is referenced 

within ManageXml.java. Once the file 

has been loaded, the ManageXml 

methods return the necessary objects 

to build the control process. The list 

of objects Hardware, Action, Trigger, 

etc. are used for building the control 

process. 

Step 2: The code execute the xml 

information 

The objects containing the informa 

tion XML are considered by the 

ManageXml methods to build and 

execute the control process by 

MainInterface.java. This class has 

lists that are used by the different 

control structures, both time-based 

179


and stated-based structures. 

The loading process of the 

information referred to triggers in the 

code is similar to the loading 

instrumental. An empty list of the 

trigger class is loaded with the 

information contained in the group of 

the trigger elements in .xml file. The 

triggers sub-system works with this 

list to carry out the control process 

based on system state during the 

analytical process. 

6. Discussion 

Depending on the complexity and 

characteristics of the analytical 

process carried out by the instrumental, 

the design and automation 

of this process can be developed from 

a model which only takes into account 

a time-action domain. Under this 

model, the final state of the process 

can be used in order to take decisions 

about either new actions or execution 

of the process again. 

In this way, the automation of 

analytical processes is simplified from 

the point of view of the chemical user. 

This model is not enough in some 

experiments, where the system state 

180 

must be known before and after each 

action in order to take decisions 

without human intervention depending 

on the value of the state 

parameters. The continuous monitoring 

of the system state increases the 

complexity for implementing the 

proposed model. 

In this work, a software 

system for the design of analytical 

processes and their subsequent control 

based on both time and system state is 

presented. This system involves an 

innovation on the analytical control 

because the user defines the control 

structures for each analysis. Thus, the 

system overcomes limitations of 

previous approaches, because the 

number of premises and their predicates 

are defined and modified by 

the user for each analytical process. In 

addition, the trigger process is also 

defined by the users, who specify both 

a series of actions carried out by the 

hardware —that can be executed in a 

concurrent way— and functions in 

charge of calculus, variables updating 

and query, etc. 

Thanks to the above commented 

characteristics, in addition to


the layer architecture of the system, 

our contribution is able to design and 

control any analytical process independently 

of hardware units to 

control, analysed parameters, and 

control flow of hardware actions and 

alerts. 

The development of both a 

process carried out in real time and a 

triggers system that analyse statebased 

parameters in a parallel way is 

difficult to build. Moreover, triggers 

process can stop the main process and 

execute other hardware actions in 

other thread of time. The high 

possibility of redundancies and 

exceptions requires an exhaustive 

control and development. 

The last technologies and 

standards —UML, Java, XML, etc.— 

have been used in order to develop the 

software. These permit to build an 

open system both applicable to 

different analytical processes and 

independent of the used platform. The 

configuration of the processes is 

stored in an XML structure, which is 

later interpreted by the control 

subsystem –built in Java– in order to 

generate an ad-hoc application. This is 

in charge of full automation of the 

analysis, even to take decisions. 

The developed system constitutes 

a valuable tool for quality 

control in laboratories where a huge 

number of analyses is carried out in 

order to control the production 

process. The manifold for each 

analysis can be designed. Thus, the 

program is being used successfully in 

a winery in the Montilla-Moriles 

appellation d’origine. 

New research lines are open 

in order to couple the system to the 

web with the aim of develop a 

distributed system in charge of the 

overall monitoring in industrial pro 

cesses. The authors have both the 

developed system and code at the 

disposal of researchers interested in 

them. 





AGL2000-0321-P4-03). 


[1] Kormos, F., Tarsiche, I., 

181


182 

Reghini, D., Mihalcioiu, F. 

Lab. Robotics Automation. 11 

(1999) 147. 

[2] Hilliard, L.A., Lynch, T., 

Ligthtowlers, D., Greenwood, 

P.. Lab. Robotics Automation. 

34 (1999) 57. 

[3] Moscosa-Santillan, M., Bondoux, 

C., Porte, C.; Delacroix, 

A; Besselievre, R., Cortial, S. 

Lab. Robotics Automation. 

11(1999) 197. 

[4] Valcárcel, M.; Luque de 

Castro M.D. Automatic Methods 

of Analysis. Elsevier, 

Amsterdam, 1988. 

[5] McJunkin, T.R.; Tremblay, 

P.L.; Scott, J.R.. J. Assoc. 

Lab. Automation. 7 (2002) 

76. 

[6] Pfeil, D.L.; Reed, A. Int. Lab. 

32 (2002) 23. 

[7] Roussis, S.G. Anal. Chem. 73 

(2001) 3611. 

[8] Haber, E.; Muñoz-Guerra, 

J.A.; Soriano, C.; Carreras .D. 

J. Chromatogr., B 755 (2001) 

17. 

[9] Rodrigues, P.G.; Rodrigues, 

J.A.; Barros, A.A.; Lapa, 

R.A.S. J. Agric. Food Chem. 

50 (2002) 3647. 

[10] Bonastre, A.; Ors, R.; Peris, 

M. Chemometr. Intell. Lab. 

Syst. 50 (2000) 235. 

[11] Du, H.; Stillman, M.J. Anal. 

Chim. Acta. 354 (1997) 77. 

[12] Chow, C.W.K.; Davey, D.E.; 

Mulcachy, D.E. Lab. 

Automat. Inform. Manage. 33 

(1997) 17. 

[13] Peris, M.; Ors, R.; Bonastre, 

A.; Gil, P. Lab. Automat. 

Inform. Manage. 33 (1997) 

49. 

[14] Ruisanchez, I.; Lozano, J.; 

Larrenchi, M.S.; Rius, F.X. 

Anal. Chim. Acta 348 (1997) 

113. 

[15] Bhatikar, S.R.; Mahajan, R.L. 

IEEE Trans. Semicond. 

Manuf. 15, Issue 1 (2002) 71. 

[16] Sabharwal, J.; Jianhua, C. 

Proc. Twenty-Eigth Southeastern 

Symp. Syst. Theory 

(1996) 514-518.


[17] Kurtz, M.J.; Henson, M.A. 

Proc. Am. Control Conf. 4 

(1995) 2667-2671. 

[18] Urbano, M.; Luque de Castro, 

M.D.; Gómez-Nieto, M.A. 

Automation of Flow Injection 

Methods in the Winery 

Industry through a Computer 

Program based on a 

Multilayer Model. Proceedings 

of 9th IEEE International 

Conference on Emerging 

Technologies and Factory 

Automation. Lisbon, Portugal, 

September, 2003. 

[19] Rumbaugh, J., Jacobson, I., 

Booch, G., The Unified 

Modeling Language. Reference 

Manual. Addison- 

Wesley Longman Inc, USA, 

1999. 

[20] Herbert, S.C. C: The 

Complete Reference Fourth 

Edition. McGraw-Hill. 2000. 

[21] Eckel, B. Thinking in Java 

Third Edition. Prentice Hall. 

2003. 

[22] Campione, M. The Java 

Tutorial Third Edition. Mc- 

Graw-Hill. 2002 (http:// 

java.sun.com). 

[23] Gilson S.A. LT801121K. 

Minipuls 3 Peristaltic Pump, 

User’s Guide. 2001. (www 

.gilson.com). 

[24] Gilson S.A. LT3331. Valve 

mateTM Valve Actuator. 

User’s Guide. 2001. (www 

.Gilson.com). 

[25] Unicam Limited (Division of 

Analytical Technology Inc.). 

9499 230 18011 910916. 

Unicam 8625 Series UV 

/Visible Spectrometer, UK. 

1991. 

[26] Goldfarb, C., Prescod, P. 

XML Handbook Fourth 

Edidtion. Prentice Hall. 2001. 

[27] Morrison, M. Et al. XML 

Unleashed. 

2000. 


[28] Jasnokwski, M. Java, XML, 

and Web Services Bible. 

Hungry Minds. 2002. 

183

Capítulo 4 

FULLY AUTOMATED FLOW 

INJECTION ANALYSER FOR THE 

DETERMINATION OF VOLATILE 

ACIDITY IN WINES 


Journal of Wine Research.

J. Wine Research, enviado para su publicación Parte I, cap. 4 

FULLY AUTOMATED FLOW INJECTION ANALYSER FOR THE 

Abstract 

DETERMINATION OF VOLATILE ACIDITY IN WINES 

M. Urbano Cuadrado, M.D. Luque de Castro, M.A. Gómez-Nieto 

A fully automated method based on both Flow Injection (FI) and 

analytical pervaporation for the determination of the volatile acidity in wine is 

here presented. Both the different hardware units that constitute the analyser and 

data collection are controlled by a computer. The method is based on 

pervaporation of the volatile compounds present in wine into an indicator 

solution and monitoring the absorbance change, which is correlated with the 

volatile acidity content. The control software was developed with Java, C and 

XML programming languages. The analysis frequency and the precision of the 

method were 10 samples per hour and 0.035 g l -1 , respectively. The validation of 

the method was carried out with respect to the Mathieu method. The proposed 

method correlates (r = 0.95) with the Mathieu method (the standard method in 

wineries). 

Keywords: Automation, Flow injection, Pervaporation, Volatile acidity. 

187



Volatile acidity is constituted by the 

fatty acids present in wine that belong 

to the acetic series (Flanzy, 2000, 

Ribereau-Gayon et al., 2000). Its 

importance in enological chemistry is 

due to the information provided by 

this parameter about possible bacteriological 

contamination. Official 

and usual methods are based on 

distillation of the volatile fraction of 

wine and titration of the distillated 

(Berezin et al., 1995). These methods 

have limitations regarding to the time 

they require and operational errors. 

Approaches based on flow 

injection (FI) (Válcarcel et al., 1987) 

have been proposed in order to 

surpass the above commented shortcomings. 

The method proposed by 

Tubino and Barros (Tubino et al., 

1991) makes use of the coupling of FI 

with gas-diffusion and conductimetric 

detection. This approach has the main 

limitation of non-linearity of the 

calibration curve (Barros et al., 1992). 

On the other hand, Su et al. (Su et al, 

1998) used both a gas-diffusion channel 

and a gas-permeable membrane as 

188 

separation module, and a bulk 

acoustic wave impedance sensor. The 

main drawback of this method is the 

time required for sample pretreatment 

(20 min boiling for CO2 

removal, oxidation of SO2 and 

filtration). 

Other papers dealing with 

volatile acidity are focused on the 

joint use of FI and analytical 

pervaporation (Mataix et al, 1999, 

Gónzalez-Rodríguez et al, 2001). The 

latter is a membrane-based technique 

for the removal of volatile analytes or 

their volatile derivatives from the 

sample matrix. It is considered as the 

integration of evaporation and gasdiffusion 

in a single module. These 

approaches, which use a photometric 

detector, enable the determination of 

volatile acidity with the analytical 

features (namely, precision, selec 

tivity, and sensitivity) required in 

wineries. In addition, these methods 

can be easily automated thanks to 

their continuous functioning. Nevertheless, 

no ‘automated methods’ in the 

strict sense of the word ‘automation’ 

are being applied in wineries in order 

to determine the volatile acidity in


wine. 

This work is aimed at both 

surpassing the role of the word semiautomated 

in the above commented 

approaches and providing a tool to 

monitor the volatile acidity in wine in 

a fully automated way, thus making a 

more frequent monitoring of this 

parameter easier. The method proposed 

by González et al. (Gónzalez- 

Rodríguez et al, 2001) and the 

software platform for automation 

proposed by the authors (Urbano et al, 

2003) were used as starting points for 

the development of this work. 

2. Experimental 

2.1 Reagents and solutions 

The reagents used for calibration were 

glacial acetic acid (Merck, Darmstadt, 

Germany) and absolute ethanol of 

analytical reagent grade (Panreac, 

Barcelona, Spain). Bromocresol 

purple (Sigma-Aldrich, Steinheim, 

Germany) (5x10 -4 M) in a potassium 

dihydrogen phosphate (Panreac, 

Barcelona, Spain) (0.010 M) buffer of 

pH 6.3 constituted the acceptor 

solution in the pervaporation unit. 

2.2 Samples 

Different wines, including young and 

aged, sweet and dry wines, from the 

appellation d’origen Montilla-Moriles 

were used for the automation study. 

Volatile acidity in these samples was 

determined by both the proposed 

method and Mathieu method. 

2.3 Apparatus 

The configuration of the analyser is 

shown in Fig. 1. It is composed by an 

auto-sampler Crison Sampler15A of 

15 positions (Barcelona, Spain); a 

four channel Gilson Minipuls3 

peristaltic pump (Villiers le Bel, 

France); two Rheodyne 7010 

automatic injection valves (Elkay, 

Galway, Ireland), one of them used to 

inject the sample into the donor 

stream, and the other as selection 

valve; a UV-visible spectrophotometer 

Unicam 8625 (Cambridge, 

England), equipped with a 

Helma 138-QS flow-cell (Jamaica, 

NY); a Pentium II computer to control 

all the units above described. 

A Selecta Tectrom polyethylenglycol 

bath (Barcelona, Spain) 

189


190 

acceptor 

stream 

auto-sampler 

donor 

stream 

peristaltic 

pump 

injection 

valve 1 

pervaporation 

module 

injection 

valve 2 

thermostat 

rection coil 

membrane 

waste 

Fig. 1. Configuration of the FI autoanalyser. 

is used to maintain constant the 

temperature in the pervaporation 

module, which has been described 

elsewhere (Mataix et al, 1999, 

Gónzalez-Rodríguez et al, 2001). 

PTFE pervaporation membranes of 47 

mm diameter and 1.5 mm thickness 

(Trace, Braunschweig, Germany) 

were used in this module. 

2.4 Software used for the analyser 

control 

All the automated units that compose 

the analyser are controlled by the 

detector 

waste 

software developed by the authors. 

The software is an interactive 

interface —developed in Java language 

and supported on a communication 

layer developed in C 

language— that enables the input and 

output of information referred to each 

analysis. Concerning detection, the 

absorbance versus time is the 

conventional transient analytical 

signal in FI. Both the area and the 

height of the peak are provided by the 

algorithms of the program. These 

values are related with volatile


acidity. 

The automated procedure 

described in section 2.5 is stored in a 

file under the eXtensible Markup 

Language technology (XML). This 

Appendix 

file is composed by data referred to 

the different units and the actions to 

be carried out by these units. The file 

structure is shown in the following 

appendix: 

 

 

 

 

crisonSampler15A 

com3 

15 

true 

 

 

gilsonMinipuls3 

34 

 

 

gilsonValvemate111R7010 

36 

 

 

gilsonValvemate111R7010 

35 

 

 

unicam8625 

191


192 

com2 

 

 

goVial 

 

 

turnClockwise 

35.00 

 

 

setInjection 

 

 

setFilling 

 

 

setWavelength 

0430 

 

 

turnClockwise 

05.70 

60 

 

 

setFilling 

60 

 

 

setInjection 

60


 

 

setZero 

80 

 

 

monitoringAbsorbance 

090 

 

 

setInjection 

260 

 

 

turnClockwise 

28.00 

300 

 

 

noMonitoring 

360 

 

 

stop 

360 

 

 

2.5 Procedures 

The results obtained by the automated 

FI method were compared with those 

provided by the Mathieu procedure — 

used in the winery that provided the 

samples. Mathieu procedure. Ten ml 

of the target wine is distilled until 6 

193


ml of distillate is collected; then, 6 ml 

of water is added to the distillation 

flask. The process is repeated 3 times 

more so that 24 ml of distillate is 

finally in the collection flask. 

Approximately 10/11 of the total 

acetic acid present in the wine is 

collected in this way. The solution is 

then titrated with 0.1 M sodium 

hydroxide after addition of 2 drops of 

ethanol solution of phenolphtalein. 

After this, the solution is acidified 

with chloride acid and titrated with 

0.05 M iodine for sulphur dioxide 

correction in the distillate. 

Proposed procedure. Two ml 

of sample from an autosampler vial is 

inserted via the injection valve 1 into 

a water carrier stream (flow-rate 0.8 

ml min -1 ). This carrier leads the 

sample plug to the lower chamber of 

the pervaporation module and then to 

waste 1. Meanwhile, the injection 

valve 2 is in the filling position, so the 

content of its loop is static for 

enrichment in the pervaporated 

species. Thus, the acceptor stream, 

indicator solution (flow-rate 0.8 ml 

min -1 ) is going to the detector, thus 

establishing the baseline. After 4 min 

194 

since injection, injection valve 2 is 

switched to the injection position and 

the acceptor stream drives the content 

of the loop to the detector for 

monitoring, at 590 nm, the change of 

the indicator caused by the analyte. A 

calibration graph was run for 

interpolation of the data from the 

samples. 


3.1 Optimisation 

The optimisation of the method was 

carried out by the authors of the semiautomated 

approach (Gónzalez- 

Rodríguez et al, 2001). The variables 

studied were pH, temperature and 

flow-rate. The optimum value of the 

pH of the acceptor solution was 6.3; 

changes of 0.1 units produced a 

significant change of the analytical 

signal (13%). Concerning temperature, 

changes of 1 ºC involved also 

error (6%). Thus, polyetilenglycol 

was used in the thermostat bath. This 

maintained the temperature constant 

in the pervaporation module. The 

optimum flow-rate was set-up at 0.8 

ml min -1 . Changes of 0.1 ml min -1


Table I. Different statistical parameters obtained by different ways. 

Height Peak 

Area Peak 

(Algorithm 1 * ) 

Area Peak 

(Algorithm 2 ** ) 

Outliers number 3-5 of 18 2-3 of 18 2-3 of 18 

Correlation 

coefficient 

Relative standard 

deviation 

0.994-0.996 0.997-0.999 0.997-0.999 

0.025-0.040 0.010-0.030 0.010-0.030 

* without taking into account baseline re-establishment 

** taking into account baseline re-establishment 

involved a change of the analytical 

signal of about 7%. 

3.2 Comparison of the results obtained 

by the height and area of 

the FI peak 

Although the estimation of the 

volatile acidity making use of the area 

is better than that when the height is 

used, the difference is not significant. 

The comparison between the two 

signals —height peak and area peak, 

the latter calculated by two algorithms, 

with and without reestablishment 

baseline—, based on 

the outliers number, correlation 

coefficient and relative standard 

deviation, is shown in Table I. The 

use of the peak area yields slightly 

better results, independently of the 

algorithm used. 

3.3 Characterisation of the method 

Calibration curve and correlation 


Standards of different concentrations 

were used. They were prepared by 

dilution of glacial acetic acid in bidistilled 

water. For long aged wines 

(ethanol content about 20%) the large 

amount of ethanol pervaporated with 

the acid compounds changes the 

spectrum of the indicator. Thus, the 

values obtained do not correspond to 

195


the real values. In this case, an 

addition of 15% ethanol to the 

standards was mandatory. 

The calibration curve was 

built using the following concentrations 

of acetic acid: 0.20, 0.30, 

0.40, 0.50, 0.60, 0.70, 0.80 g l -1 . The 

correlation coefficient was 0.999. 

Precision: repeatability and reproducibility 

The precision of the automated 

method was studied taking into 

account the repeatability and reproducibility 

concepts. The standard 

deviation of the values resulting 

from applying the same method to 

the same sample in a short interval 

of time —repeatability— was 

0.035 g l -1 . On the other hand, the 

standard deviation of the values 

resulting from applying the same 

method in different days — 

reproducibility between days— 

was 0.51 g l -1 in a series of six 

days. 

Comparison between the results 

obtained by the automated method 

196 

and Mathieu method: validation of the 

former 

Different wines from the appellation 

d’origine Montilla-Moriles were used 

in order to compare both methods. A 

significant difference was not found 

between both procedures. The 

correlation of the Mathieu method 

versus the automated method is 

characterised by a coefficient of 

0.975. The regression graph is shown 

in Fig. 2. 

The analysis frequency is 10 samples 

per hour; thus, the time required for 

the determination is shortened as 

compared with that of the Mathieu 

method. In addition, no human 

intervention is required. 

Advantages of the analyser for the 

determination of volatile acidity 

The proposed analyser permits to 

determine the volatile acidity in a 

fully automated manner. A large 

number of samples can be analysed 

without human intervention, with 

analysis time of 6 minutes per sample. 

Thus, the quality control of wine can 

be improved thanks to a more


Automated method 

0,7 

0,6 

0,5 

0,4 

y = 0,9653x + 0,0335 

r 2 = 0,975 

0,4 0,5 0,6 0,7 

Mathieu method 

Fig. 2. Correlation graph of the Mathieu method versus the automated method. 

frequent monitoring of this parameter. 

The cost of the implementation of the 

autoanalyser in wineries is not high 

because the instruments and apparatus 

used, as well as the control software, 

are not expensive. 


In this work, the advantages involved 

in the use of an automated method for 

the determination of volatile acidity 

have been presented. The system 

makes human intervention unnecessary 

and reduces the analysis time. 

The implementation of the auto 

analyser can be carried out in any 

winery thanks to its low cost and easy 

use. In addition, no specialised users 

are required to develop the analyses. 

On the other hand, the 

accuracy and precision of the 

automated measurements are similar 

to those of the Mathieu method, 

commonly used in wineries. Thus, the 

analyser allows improving the quality 

control in the winery by a more 

frequent analysis of the target 

parameter. 

197


A research line is thus open 

for automation of continuous methods 

and the integration of these systems 

with an information system. 


Barros F.G., Tubino M. (1992) 

Conductimetric and Spectrophotometric 

Determination of 

the volatile Acidity of Wines by 

Flow Injection. Analyst, 117, 

917-919. 

Berezin O.Y., Tur’yan Y.I., 

Kuselman, I., Shenhar, A. 

(1995) Alternative Methods for 

Titratable Acidity Determination. 

Talanta, 42, 507-517. 

Flanzy C. (2000) Enología: Fundamentos 

Científicos y Tecnológicos. 

AMV-Mundi Prensa, 

Madrid. 

González-Rodríguez J., Perez-Juan P., 

Luque de Castro M.D. (2001) 

Semiautomatic Flow-Injection 

Method for the Determination 

of Volatile Acidity in Wines. 

Journal of Association Official 

of Analytical Chemistry, 84, 

1846-1850 

198 

Mataix E., Luque de Castro M.D. 

(1999) Sequential Determination 

of Total and Volatile 

Acidity in Wines Based on a 

Flow Injection-Pervaporation 

Approach. Analytical Chimica 

Acta, 381, 23-30. 

Official Journal of European Community, 

L 272, October 1990. 

Ribereau-Gayon P., Glories Y., 

Maujean A., Dubourdieu D. 

(2000) Handbook of Enology. 

Vol 2: The Chemistry of wine. 

Stabilization and Treatments. 

John Wiley & Sons, Ltd. 

Su X.L., Nie L.H., Yao S.Z. (1998) 

Flow-Injection Analysis Based 

on a Membrane Separation 

Module and a Bulk Acoustic 

Wave Impedance Sensor - 

Determination of the Volatile 

Acidity of Fermentation Products. 

Fresenius' Journal of 

Analytical Chemistry, 360, 272- 

274. 

Tubino M., Barros F.G. (1991) 

Conductimetric and Color 

imetric Determination of


Volatile Acidity of Vinegar by 

Flow Injection. Journal of 

Association Official of Analytical 

Chemistry, 74, 346-350. 

Urbano, M., Luque de Castro, M.D., 

Gómez-Nieto, M.A. (2003) 


Methods in the Winery Industry 

through a Computer Program 

based on a Multilayer Model. 

Proceedings of 9th IEEE 

International Conference on 

Emerging Technologies and 

Factory Automation, 530-536. 

Valcárcel M., Luque de Castro M.D. 

(1987) Flow Injection Analysis: 

Principles and Applications, 

Ellis Horwood, Chichester. 

199

Capítulo 5 

A FULLY AUTOMATED METHOD 

FOR IN REAL TIME 

DETERMINATION OF LACASSE 

ACTIVITY IN WINES 


Analytica Chimica Acta.

Anal. Chim. Acta, enviado para su publicación Parte I, cap. 5 

A FULLY AUTOMATED METHOD FOR IN REAL TIME 

DETERMINATION OF LACASSE ACTIVITY IN WINES 

Manuel Urbano Cuadrado, Pedro M. Pérez-Juan, María D. Luque de Castro 

Abstract 

An automated method for the measurement of lacasse activity in musts 

and wines is here presented. The method is based on the Flow Injection technique 

using the mode of stop-flow at the detector with monitoring of the absorbance 

change caused by the oxidation of the substrate (syringaldazine). All the units 

that form the autoanalyser are controlled by a computer program for automating 

continuous processes developed by authors. This work was aimed at providing 

wineries with a new tool for evaluating the degree of infection by Botrytis 

cinerea that overcomes the disadvantages involved in the measurement of 

gluconic acid. In addition, a study of the relationship between the content of 

gluconic acid and lacasse activity was carried out and no relationship was found. 

Thus, this acid is not an appropriate marker for evaluating infection by Botrytis. 

The analytical method was characterised and the limit of detection and the limit 

of quantification were 0.2 and 0.6 U ml -1 , respectively. The linear range was 0.6 

– 24.0 U ml -1 . This range is enough for the application of the method to wine; 

nevertheless an enlargement of the range, if required, could be obtained with a 

higher dilution factor. Repeatability and reproducibility, expressed as relative 

standard deviation, were 4.0 and 6.5 %, respectively. The total time necessary per 

analysis is 12 minutes. The method can be used as a routine method at grape 

reception in wineries. 

Keywords: Automation, Lacasse, Wine, Flow injection. 

203



The chemical composition of wine is 

very complex. The final product must 

contain the major and minor components 

within a concentration range 

that assures both the pursued 

properties of a kind of wine and the 

absence of stability problems in long 

time storage [1,2]. This fact implies 

the monitoring of a number of 

chemical parameters during the winemaking 

process, that is, from the 

reception of the grape to just before 

wine market. Thus, the analytical 

tools used in monitoring wine 

elaboration have an important role in 

this field. Techniques like spectrophotometry, 

densitometry, titration, 

chromatography, etc. are widely 

employed in winery laboratories. 

Most analytical methods 

recognised by the international wine 

community are manual or with a low 

degree of automation. Although these 

methods show high accuracy and 

roughness, they are often very timeconsuming. 

So monitoring wine elaboration 

in real time, and particularly at 

grape reception, and even before 

204 

harvest is an unfeasible task. Thus, to 

obtain data of the raw material, useful 

for making a good musts, and avoid 

economical losses is very desirable. 

Oenology, as other areas, has 

taken advantages from automation in 

analytical chemistry. During the two 

last decades, numerous automated 

methods, most based on continuous 

techniques —Flow Injection (FI) as 

the commonest—, have been developed 

for monitoring several parameters 

in wine [3-8]. In addition, 

the joint use of multivariate analysis 

and spectroscopic techniques with 

multichannel detection —Near Infrared 

Spectroscopy, Fourier Transform 

Mid Infrared Spectroscopy, etc.— has 

enabled the determination of various 

enological parameters in a few 

minutes [9-13]. Nevertheless, these 

methods are not applicable to minor 

wine components. 

As starting point, a good state 

of grape is the key to elaborate a good 

wine. The quality of grape diminishes 

when infections of different nature 

occur. The infections can be caused 

by either fungi (such as Botrytis 

cinerea, Penicilium, Aspergillus and


Mucor) or by bacteria (Acetobacter 

and Gluconobacter). Although the 

presence of Botrytis cinerea can drive 

to “noble rot” —pursued rot in the 

elaboration of special wines such as 

the French Sauternes or the Hungarian 

Tokay—, this fungus often involves 

problems in the wine-making process. 

When this occurs, a wine disease 

called as “common rot” or “grey rot” 

takes place. [14-15] 

The common rot consists of 

an increase of the oxidase activity of 

enzymes such as lacasse and tyrosine, 

which cause colour changes in wines, 

the formation of glycerol and some 

polysaccharides, decrease of titrable 

acidity and increase of volatile acidity 

(if Botrytis cinerea is accompanied by 

the presence of acetic bacteria) 

[16,17]. Although the oxidase activity, 

as well as bacteria contamination 

can be reduced by addition of sulphur 

dioxide and filtering of polysaccharides, 

the sensory properties are 

affected as a result. In addition, the 

presence of gluconic acid also affects 

the stability for long-term storage. 

Nowadays, the content of 

gluconic acid is the unique parameter 

employed in most wineries for 

monitoring the degree of infection in 

grapes. The gluconic acid concentration 

in must or wine is characteristic 

of the source of the acid, that 

is, caused by either fungi or bacteria 

[18,14]. This criterion is only 

approximate. Moreover, as above 

commented, gluconic acid is determined 

in a manual way by an 

expensive enzymatic. 

Measurement of lacasse activ 

ity is a good parameter to evaluate the 

infection by botrytis only. So, this 

parameter could be very useful for 

controlling problems regarding colour 

change and increase of the dry extract. 

A manual and easy to automate photometric 

method for the determination 

of lacasse activity in wines was 

developed by Dubourdieu et al. [19]; 

nevertheless and despite its simplicity, 

the automation of the lacasse activity 

determination in wine using the 

reaction described has been not 

carried out. An FI method based on 

fluorimetric detection has been 

described, but not applied to real 

samples [20]. Although the sensitivity 

of this method is very high, its 

205


complexity and costs are higher than 

those of the photometric method. 

The aim of the research here 

presented was the development of an 

FI method for in real time 

determination of lacasse activity in 

wine. Thus, this general objective was 

divided into three more specific goals: 

1) to use the photometric reaction 

above commented; 2) to develop a 

rough method to be used as routine 

method; 3) to automate the method for 

shortening the time necessary for 

making decisions about the state of 

the grape, thus improving the quality 

of the final product. 


2.1 Apparatus and instruments 

The experimental set-up proposed for 

lacasse determination is outlined in 

Fig. 1. This autoanalyser is composed 

by the following elements: a Crison 

Sampler15A auto-sampler of 15 

positions (Barcelona, Spain) equipped 

with a stirrer; a four channel Gilson 

Minipuls3 peristaltic pump (Villiers le 

Bel, France); two Rheodyne 7010 

automatic injection valves (Elkay, 

206 

Galway, Ireland), used for injecting 

the sample into the carrier 1 stream 

and the substrate solution into the 

carrier 2 stream; a UV-visible spectrophotometer 

Unicam 8625 (Cambridge, 

England), equipped with a 

Helma 138-QS flow-cell (Jamaica, 

NY); a Pentium II computer for 

controlling all the units above 

described. 

A Selecta Tectrom polyethylenglycol 

bath (Barcelona, Spain) 

was used for maintaining constant the 

temperature of both carrier and 

substrate solutions. 

2.2 Reagents and solutions 

Buffer solutions (0.100 M) of pH 6.5, 

5.5 and 4.5 were prepared from 

potassium dihydrogen phosphate 

anhydrous (Merck, Darmstadt, Germany), 

sodium acetate anhydrous 

(Panreac, Barcelona, Spain) and 

sodium cytrate anhydrous (Panreac, 

Barcelona, Spain), respectively. 

Carrier 1 and carrier 2 were 

composed by distilled water and 

distilled water–ethanol mixture, 

respectively. The composition of 

carrier 2 was aimed at avoiding air


remote control 

substrate 

reagent 

carrier 2 

autosampler 

carrier 1 

peristaltic 

pump 



injection 

valve 2 

injection 

valve 1 


confluence 

point 


detector 

Fig. 1. Autoanalyser for the determination of lacasse in musts. 

bubbles owing to the mixture between 

aqueous and organic phases in the 

confluence point (see Fig. 1). A 1:3 

water-ethanol ratio assured the absence 

of air bubbles. 

Substrate solutions (0.05, 0.10 

and 0.15%) were prepared by 

dissolving syringaldazine (Sigma, 

Steinheim, Germany) in distilled 

water-ethanol (25–75 %) mixture (by 

the reasons above commented). 

Ultrasonification was necessary for 

complete dissolution of the substrate. 

Lacasse from Trametes versi- 

waste 

color (Fluka Chemie, Steinheim, 

Germany) was used for preparing a 

concentrated solution of lacasse in 

sodium acetate buffer (0.100 M) of 

pH 5.5. Standards were prepared from 

this solution. 

Polyvinypolypyrrolidone (P- 

VPP) (Sigma, Steinheim, Germany) 

was the resin used for anthocians and 

tanins removal. 

Enzymatic tests (Boehringer 

Mannheim, R-Biopharm, Darmstadt, 

Germany) based on ultraviolet monitoring 

were used for quantifying the 

207


content of gluconic acid. 

2.3 Samples and preparation 

Forty musts and wines—including 

red, rosé and white musts and wines 

(20, 10 and 10 samples, respectively) 

and from different grape variety— 

from the apellation d’origine “La 

Mancha” were employed in the study 

of applicability of the method to real 

samples. Lacasse standards for the 

characterisation of the method were 

prepared using a pool from the 

different musts and wines. Standard 

additions of lacasse were carried out 

due to the absence of infections 

produced by Botrytis in the 2004 

harvest (good climate factors). 

Samples were prepared by 

adding equal volumes of must or wine 

and 0.1 M sodium acetate buffer 

solution of pH 5.5; thus, samples were 

diluted in 1:2 proportion. 

2.4 Procedures 

Flow injection method for the determination 

of lacasse in wines. Ten ml 

of sample is added to vials containing 

0.5 g of PVPP. The mixture is stirred 

for 10 min for anthocians and tanins 

208 

removal, and then, the sample (0.300 

ml) and the substrate (1 ml) solutions 

are injected into the carriers 1 and 2, 

respectively. A filter at the end of the 

FI tube inserted in the vials assures 

the absence of solid particles from the 

resin into the FI tubes. The sample is 

injected five seconds after injecting 

the substrate. Both the difference in 

injection times and the ratio of the 

injection volumes make possible that, 

at the confluence point, the substrate 

merges with the centre of the sample. 

Then, when the sample-substrate plug 

reaches the detector, the flow is 

stopped. Ten seconds after stopping 

the flow, the absorbance data are 

monitored at 700 nm for 30 seconds. 

The slope of absorbance vs time plot 

represents the progress of the 

enzymatic reaction. The increment of 

absorbance with time is related to the 

lacasse activity. 

Procedure for determination 

of gluconic acid in wines based on 

enzymatic kit. This procedure is 

described in the instructions for using 

the enzymatic kit. It is based on the 

absorbance measurements (at 340, 

344 or 365 nm) at 5 min (after mixing


the sample and 2 of the 3 reagents and 

before addition of the third reagent, 

which starts the enzymatic reaction) 

and 25 min. 

2.5 Software used for the analyser 

control 

All the automatic units that compose 

the autoanalyser are controlled by the 

software developed by the authors [3]. 

The software is an interactive 

interface —developed in Java language 

and supported on a communication 

layer developed in C 

language— that enables the input and 

output of information from each 

analysis. 

The automated procedure 

described in section 2.4 is stored in a 

file under the eXtensible Markup 

Language technology (XML). This 

file is composed by data related to the 

different units and the actions to be 

carried out by these units. 


3.1 Optimisation of the enzymatic 

reaction 

The variables affecting the enzymatic 

reaction are pH, temperature and 

substrate concentration. A full design 

was built for a screening study of the 

behaviour of these variables. The 

number of experiments carried out is 

given by the expression 2 n +3, where n 

is the number of variables and the 

term 3 takes into account the three 

centre points. Thus, 11 random 

experiments were developed. The 

upper and lower values given to each 

factor were selected from the 

available data and experience 

gathered in preliminary experiments. 

These values were 4.5 and 6.5 units 

for the pH; 20.0 and 40.0 ºC for the 

temperature; and 0.05 and 0.15 % for 

the substrate concentration. 

At 95% of significance level 

the results of the screening was that 

only the pH is an influential factor, 

and with negative influence. Neither 

temperature nor substrate concentration 

affect the signal. Relationship 

between factors was not observed. 

The fact of non-influence of the 

temperature assures easy implementation 

of the method in the wine 

industry. The study of the pH showed 

that the lacasse activity diminishes 

drastically for values of pH above 6.0, 

209


and presents its optimum at pH 5.0. 

3.2 Optimisation of the anthocyans 

and tannins removal by PVPP 

Several studies carried out aimed at 

demonstrating the influence of tanins 

and anthocyanins on the lacasse 

activity. These two groups of 

compounds present in musts are 

retained in the PVPP resin. The 

retention capacity of this material as a 

function of the quantity of resin was 

tested. The enzyme activity was set at 

2.0 U ml -1 and the matrix employed 

was the pool prepared using different 

musts. 

Figure 2 shows the values of 

slope of the A/t plot vs the grams of 

resin. An amount of resin between 

0.1-0.8 g was employed in this study. 

From 0.5 g, the slope was constant. 

Then, 0.5 g of resin was used for 

tanins and anthocyanins removal from 

a red wine (as this wine contains 

higher amounts of tanins than the pool 

used for the study). 

3.2 Analytical characterisation of the 

FI method 

210 

Limits of detection (LOD) and quantification 

(LOQ) 

Table 1 shows the analytical characteristics 

of the method proposed. The 

limit of detection (LOD) and the limit 

of quatification (LOQ) were calculated 

using the expressions: CLOD = Cb + 

3 * Sb and CLQ = Cb + 10 * Sb, 

respectively, where Cb and Sb 

correspond to the concentration of the 

blank and the standard deviation of 

the blank signal, respectively. The 

values obtained were 0.1 and 0.3 U 

ml -1 for LOD and LOQ, respectively. 

Taking into account the 1:1 dilution of 

the samples, the values of these 

parameters in the samples are 0.2 and 

0.6 U ml -1 . 

Because not samples of must 

infected by Botrytis cinerea were 

available during the development of 

this research, data about the 

relationship between lacasse activity 

and the degree of infection were 

obtained from the literature [19]. The 

grape that showed the minimum value 

of activity when there was a light 

infection was the Cabernet-Sauvignon 

variety. The value obtained was 0.8 U


slope (mA/s) 

8 

7 

6 

5 

4 

3 

2 

1 

0 

0 0,2 0,4 

Resin (g) 

0,6 0,8 

Fig. 2. Slope (mA s -1 ) vs resin quantity (g). The enzyme activity was set at 2.0 U 

ml -1 . 

ml -1 . Thus, the LOD and LOQ 

obtained with the automated method 

were appropriate for evaluating the 

degree of infection by Botrytis. 

Calibration curve and regression 


A calibration curve was run using 

standard solutions prepared following 

the procedure under sample preparation, 

in the Experimental section. 

The matrix employed was the pool 

from must non affected by fungi. The 

standards covered a concentration 

range of 0.0–18.0 U ml -1 (namely, 0.3, 

1.0, 3.0, 6.0, 9.0, 12.0, 15.0, 18.0 U 

ml -1 ). A linear response was found 

between 0.3–12.0 U ml -1 . Thus, the 

effective linear range was 0.6–24.0 U 

ml -1 since the sample preparation 

involved the 1:1 dilution of musts. A 

higher dilution enlarges the linear 

range for lacasse determination. The 

regression coefficient was 0.988. 

The relationship between lacasse 

activity and the infection by 

Botrytis depends on grape variety. 

The range of lacasse activity obtained 

in musts is between 0.8–60.0 U ml -1 , 

which is wider than the linear range of 

211


Table 1. Analytical characteristics of the method proposed (1:1 sample dilution). 

212 

Limit of detection (LOD) 0.2 U ml -1 

Limit of quantification (LOQ) 0.6 U ml -1 

Linear range 0.6 – 24 U ml -1 

Calibration equation y = 3.05 x + 1.18 

Regression coefficient (r) 0.988 

Repeatability 4 % 

Reproducibility (six days) 6.5 % 

the method proposed. This fact does 

not restrict the applicability of the 

method as: 1) For a given amount of 

enzyme added, the sample-buffer 

dilution ratio does not affect the 

activity of the biocatalyst for ratio 

equal to or higher than 1:1, and 2). 

The higher values of activity corresponds 

to rot levels that can be 

appreciated by ocular inspection and 

the effect in wine levelled off for 

activities higher than 20.0 U ml -1 . 

Precision of the method 

The precision of the proposed method 

was studied as repeatability and 

reproducibility. The parameter employed 

was the standard deviation for 

the same red wine spiked with 2.0 U 

ml -1 of lacasse. The repeatability or 

standard deviation obtained applying 

the method to samples in a short time 

interval was 0.08 U ml -1 , which corresponds 

to 4.0 % in relative terms. 

The reproducibility or standard 

deviation obtained applying the 

method to samples in different days 

was 0.13 U ml -1 , which means 6.5 % 

in relative terms, for a six-day study. 

These precision values show 

the roughness of the method, which 

can be used as a routine tool for 

evaluating potential infections by 

Botrytis.


3.3 Application of the method to real 

samples. Relationship between the 

lacasse activity and the content of 

gluconic acid 

Gluconic acid and lacasse activity 

were determined in forty samples by 

the enzymatic test and the automated 

method proposed, respectively. The 

content of gluconic acid obtained was 

between 0.40 and 1.20 g l -1 . Nevertheless, 

lacasse activity in the samples 

was not detected. Although inprevious 

research [18,14] it was stated 

that a content of gluconic acid of 1- 2 

g l -1 indicates low levels of rot, this 

does not mean infection by Botrytis as 

no oxidase activity was obtained, so 

another disease could be involved in 

the generation of gluconic acid 

(namely, gluconobacter bacteria). In 

addition, the 2004 harvest (to which 

the samples in this study belong) was 

good regarding climate conditions for 

absence of infections by Botrytis. 

All the forty samples were 

spiked with laccase, they providing 

activities in agreement with the 

amount of biocatalyst added. 


A method for full automation of the 

monitoring of Botrytis infection is 

presented here with the aim of 

overcoming the limitations for 

measurement of the current parameter 

used for this purpose (namely, 

gluconic acid). The biochemical 

parameter involved is the oxidase 

activity of lacasse. 

The shortening of the time 

required for analysis (30 min for the 

determination of gluconic acid and 12 

for that of lacasse activity), the total 

automation (no user involved on the 

analysis), and the low costs of the 

method proposed makes it an 

excellent alternative for the 

determination of Botrytis disease. 

Flow injection provides the appropriate 

tool for substituting the 

expensive, manual and timeconsuming 

method of the enzymatic 

kit. 

A key aspect of the research 

carried out, from an enological point 

of view, is the absence of relationship 

between the content of gluconic acid 

and lacasse activity. This fact is due to 

213


the existence of several sources 

(namely, fungi and bacteria) of 

gluconic acid in wine in contrast to a 

single source of laccase (fungi). In 

fact, the presence of gluconic acid and 

the absence of lacasse have been 

demonstrated in this work. This 

means that musts with a high level of 

gluconic acid would drive to wine 

without problems related to colour 

change and increase of dry extract. 

The method proposed can be 

implemented as a routine method in 

the grape reception for improving 

quality control. 





AGL2000-0321-P4-03). 


[1] C. Flanzy, Enología: Fundamentos 

Científicos y Tecnológicos (Enology: 

Scientific and Technological 

Fundamentals), AMV-Mundi 

Prensa, Madrid, 2000. 

[2]. P. Ribereau-Gayon, Y. Glories, 

A. Maujean, D. Dubourdieu, 

214 

Handbook of Enology. Vol. 2: 

The Chemistry of wine. Stabilization 

and Treatments, John 

Wiley & Sons, Ltd., 2000. 


Castro, M.A. Gómez-Nieto, 


Methods in the Winery Industry 

through a Computer Program 

based on a Multilayer Model. In 

Proceedings of 9th IEEE International 


Technologies and Factory Automation, 

530-536, Lisbon, 2003. 

[4] J. González, P. Pérez-Juan, M.D. 

Luque de Castro, Talanta, 56 

(2002) 53. 

[5] E. Mataix, M.D. Luque de Castro, 

Talanta, 51 (2000) 489. 

[6] X.L. Su, L.H. Nie, S.Z. Yao, Fres. 

J. Anal. Chem., 360 (1998) 272. 

[7] O.Y. Berezin, Y.I. Tur’yan, I. 

Kuselman, A. Shenhar, Talanta, 

42 (1995) 507. 

[8] F.G. Barros, M. Tubino, Analyst, 

117 (1992) 917. 

[9] M. Urbano-Cuadrado, M.D. Luque 

de Castro, P.M. Pérez-Juan, J.


García Olmo, M.A. Gómez-Nieto, 

Anal. Chim. Acta, 527 (2004) 81. 

[10] Y. Li, C. Brown, J. Near Infrared 

Spectrosc. 7 (1999) 101. 

[11] C.M. García-Jares, B. Medina, 

Fres. J. Anal. Chem. 357 (1997) 

86. 

[12] R. Eberl, J. Near Infrared 

Spectrosc. 6 (1998) 133. 

[13] M. Gishen, R.G. Dambergs, A. 

Kambouris, M. Kwiatkowski, 

W.U. Cynkar, P.B. Høj, I.L. 

Francis, Proceedings of the 9 th 

International Conference on Near 

Infrared Spectroscopy 1999, 917. 

[14] L. Pérez, M.J. Valcárcel, P. 

González, B. Domecq, Am. J. 

Enol. Vitic., 42 (1991) 58. 

[15] J. Suárez, B. Iñigo, Microbial 

Alterations in Wines. Yeast and 

Molds. In Enological Microbiology, 

331-347, Mundi-Prensa, 

Madrid, 1990. 

[16] A. Joyeux, S. Lafon-Lafourcade, 

P. Ribereau-Gayon, Appl. Environ. 

Microbiol., 48 (1984) 153. 

[17] A. Joyeux, S. Lafon-Lafourcade, 

P. Ribereau-Gayon, Sci. Aliment., 

4 (1984) 247. 

[18] S. Chauvet, PhD Thesis, 

University of Bordeaux II, 1981. 

[19] D. Dubourdieu, C. Grassin, C. 

Deruche, P. Ribereau-Gayon, 

Connaissance 

(1984) 237. 

Vigne Vin, 18 

[20] H. Huang, R. Cai, Y. Du, Z. Lin, 

Y. Zeng, Anal. Chim. Acta, 318 

(1995) 63. 

215

Capítulo 6 

JWISWINE: A JAVA-WEB 

INFORMATION SYSTEM FOR 

QUALITY CONTROL IN WINERIES 

El contenido de este capítulo ha sido enviado a la revista Computers and 

Electronics in Agriculture para su publicación y ha sido presentado como cartel 

en la 5th International Conference on Enterprise Information Systems, celebrada 

en Angers (Francia) entre el 22 y el 26 de abril de 2003.


JWISWINE: A JAVA-WEB INFORMATION SYSTEM FOR QUALITY 

CONTROL IN WINERIES 

M. Urbano-Cuadrado, M.D. Luque de Castro, P. M. Pérez-Juan, M. A. Gómez- 

Nieto 

Abstract 

A system for the overall management of the information related to 

analytical processes and quality control in wineries is presented. It enables the 

integration of semi-automated and automated analytical processes. It has been 

developed in Java using the database management system Oracle 9i and can be 

executed both as a stand-alone program and through a standard Internet browser. 

It has been developed under the evolutionary incremental paradigm in order to 

take into account users’ requirements and using UML object oriented technology 

to represent the complexity of the processes and the large amount of analytical 

information generated in wine production. Thus, a decision support system, 

JWisWine, was built for monitoring wine production. 

Keywords: Decision support system, Quality control, Wineries, Object-oriented 

paradigm. 

219



Wine composition and production are 

both extremely complex (Flanzy, 

2000, Ribereau-Gayon et al., 2000). 

Chemical monitoring and control of 

the overall production process is 

needed to provide knowledge of the 

quality of the raw material (grape), 

the intermediate and final product. 

Figure 1 shows a detailed activity 

diagram (Booch et al., 1999) of the 

overall process. Analytical monitoring 

of the enological parameters —in 

parallel with the process— determines 

the start and end of each step, in 

addition to the quality achieved. The 

availability of analytical information 

at the precise time and in the 

appropriate format is crucial to this 

operation. 

Advances in the automation 

of quality control requires: a) 

implementation of a system for 

management, organisation, handling 

and treatment of the information 

generated throughout the production 

process with the aim of providing the 

technical manager with enough 

knowledge for making decisions 

220 

(McGrawth et al., 1998, Muller et al., 

1999); b) automation of analytical 

methods using instrumentation 

coupled to the information system 

(Ilyukhin et al., 2001). 

Three common steps are 

involved in the analysis of the 

chemical parameters: sampling and 

possible sample treatment; instrument 

measurement and data collection and 

processing of the instruments outputs 

(Figure 2). An average of 75 and 25 

samples per day (fermentation and 

non-fermentation periods, respectively, 

with a number of 5-20 analyses 

per sample) is carried out in the 

laboratories in which JWisWine has 

been implemented. The amount of 

data generated as a consequence of 

the variety and number of analyses 

justifies the implementation of an 

information management system with 

the following aims: 

1. To make possible the appropriate 

management of historical data. 

This includes the efficient storage 

of a huge amount of data 

corresponding to previous harvests, 

which can be used when 

and as required.


GRAPE 

NEW WINE 

VINTAGE WINE 

AGEING 

MARKET 

RECEPTION OF RAW MATERIAL 

Transport and discharge 

PRE-FERMENTATION 

Crush 

Press and must selection (white wines) 

Must selection (white wines) 

Must clarification (white wines) 

Chemical correction (pH, sugars, sulphur dioxide, N) 

FERMENTATION 

Alcohol fermentation 

Maceration (red wines) 

Press and wine-must selection (red wines) 

NEW WINE MATURATION 

Malolactic fermentation 

Phisycal-chemical equilibrium 

Spontaneous clarification 

Storage, wine selection and blending 

NO 

AGEING 

YES 

Wooden casks 

Bottles 

FINAL PROCESS 

Clarification 

Filtration 

Physical and chemical stabilization 

Packaging 

Density, Reducing sugars, 

pH, Gluconic acid, 

Pesticides residues 


pH, Sulphur dioxide, 

Readity 

assimilable nitrogen 


Colour and Total phenols 

(red wines) 


Colour and Total phenols 

(red wines) 


Alcohol content, pH, 

Titratable acidity, L-Malic 

acid, Sulphur dioxide (free 

and total), Colour, Total 

phenols and Iron. 

Sensory Analysis 

Sensory and physicochemical 

control vs 

Specification of final product 

Fig. 1. Activity diagram of the wine production process. 

221


t:Technician c:Calibrate 

s:Sample m:Measurement 

222 

prepare the sample for measurement 

check calibration 

make calibration 

correct values 

make analytical measurement of sample 

analyse results 

take decisions 

store analytical result 

Technician may be replaced by 

the autoanalyser in the initial 

stages 

relationship sample 

calibration 

Results (measurements) are stored 

in the database 

measurement 

(result) 

Tasks can be repeated as a 

function of decisions 

Fig. 2. Stages involved in the development of analytical measurements. 

2. To have access to additional 

information about analysers, analytical 

parameters, etc. This aspect 

endows the system with the 

traceability capacity, which allows, 

for example, correlating the 

decision about selling a given 

wine batch with the evolution of a 

given parameter (e.g. volatile 

acidity, ethanol, etc.). 

3. To assess the quality, accuracy 

and reliability of the information, 

which require analytical instruments 

and software components 

for their controls, thus 

providing the users with the 

results in the appropriate format. 

The work presented here


focuses on management of the 

information. It constitutes a first step 

to automation of the analytical 

methods in wineries (Válcarcel and 

Luque de Castro, 1987, Urbano et al, 

2003). Thus, a Decision Support 

System (DSS), JWisWine, has been 

built using Java and Oracle 9i as tools, 

while the design of the system was 

carried out by both object-oriented 

and evolutionary incremental paradigms. 


For the construction of JWisWine 

both the object-oriented and the 

evolutionary incremental software 

engineering paradigms have been 

used. 

2.1 Modelling of the System Domain 

In order to model the domain under 

study the following classes and 

subclasses have been established: 

class Parameter, for the parameters to 

be determined in wine; subclass 

DatumParameter, which includes the 

parameters to be determined directly 

in the sample (e.g. pH measurement); 

CalculatedParameter, which compri- 

ses the parameters which are 

determined in an indirect manner 

through a given algorithm for the 

establishment of a relationship between 

the DatumParameter and a 

mathematical equation (e.g. dissolved 

solids in wine), as shown in the class 

diagram (Rumbaugh et al., 1999) in 

Figure 3. 

Class Unit represents the scale 

in which a parameter is measured (e.g. 

ethanol concentration expressed in 

percentage, mg mL -1 , etc.), together 

with the class Parameter, —which 

represents the property which 

characterises the material under study 

(e.g. ethanol)— yield the new class 

Magnitude, which represents a 

parameter measured in a given unit 

(e.g. volatile acidity measured in 

absorbance unit). 

The advantages of using the 

object-oriented paradigm (Booch, 

1994) for modelling the problem are 

summarized in Appendix 1, where 

part of the Java code (Eckel, 2002, 

Anderson and Stone, 1999) corresponding 

to the definition of the 

previously mentioned classes is 

presented. For instance: 

223



/* Definition of the class Parameter * / 

public abstract class Parameter 

extends java.lang.Object 

implements PersistentSQL 

/* It implements the interface PersistentSQL with the 

methods to access to the database */ 

/ * Definition of some of the attributes of the class * / 

protected java.lang.String alias 

/* It contains the alias of the parameter, used in the 

formulas */ 

protected Magnitude default 

/* It contains a reference to the default magnitude */ 

protected java.util.Vector magnitudes 

/* It contains all the magnitudes in those a value of 

this parameter can be expressed */ 

protected java.lang.String id_parameter 

/* It contains the name of the parameter */ 

/* Other class attributes */ 

cache, connection, debugParameter, stateSQL, 

deletedMagnitudes, ninstances, changedname, objcache, 

idChanged 

/* The constructors definition */ 

Parameter() 

/* Constructor without arguments */ 

Parameter(java.lang.String n, java.lang.String alias, 

Unit u) 

/* It creates a parameter of name 'n', alias' alias' 

and the unit 'u' as their default unit */ 

/* Definition of some of the class methods */ 

Magnitude addMagnitude(Unit u) 

/* It allows that the parameter is expressed in the unit 

'u', creating a new magnitude associated to the parameter 

*/. 

Magnitude[] getMagnitudes() 

/* This method returns an array with all the magnitudes 

defined in the system for this parameter */ 

Magnitude getDefaultMagnitude() 

/* It locates and returns the default magnitude associated 

to this parameter */ 

void setNormalInterval(Unit u, double min, double max) 

/* This method fixes the maximum and minimum values for 

measurements expressed in the magnitude that associates the 

unit 'u' and the parameter */ 

void setNormalInterval(Magnitude m, double min, double 

max) 

224


/* This method overloads the previous ones and fixes the 

normal interval for the magnitude indicated in 'm' */ 

void loadSample(Sample s) 

/* This method relates the parameter with the sample 's' */ 

abstract Measurement createMeasurement() 

/* It creates and returns a new measurement */ 

boolean isMeasurementNormal(Measurement m) 

/* This method checks if the indicated value in the 

measurement is between the maximum and minimum values of 

the parameter */ 

/* Other class methods */ 

addToCache, storeParameter, addMagnitude, 

deleteParameter, disablePersistentCache, 

deleteMagnitude, enablePersistentCache, 

equalsParameter, finalizeParameter, getAlias, 

getStoreException, getConnection, getSQLDeleted, 

getSQLState, getFromCache, getMagnitudes, 

getIdParameter, getNewSQL, getParameter, 

getParameters, getSQL, hashCode, isAliasDefined, 

isIdDefined, isMagnitudeDefined, 

isPersistentCacheEnabled, rebuild, retrieve, 

removeFromCache, setAlias, setConnection, setSQLState, 

setDefaulfMagnitude, setIdParameter, toString, 

unloadSample, isSampleValidable, isMeasurementNormal 

/* Definition of the class DatumParameter * / 

public class DatumParameter 

extends Parameter 

/* It inherits the attributes and methods from the class 

Parameter */ 


Measurement createMeasurement() 

/* This method provides an implementation to the superclass 

method */ 

/* Definition of the class CalculatedParameter * / 

public class CalculatedParameter 

extends Parameter 

/* It inherits the attributes and methods from the class 

Parameter */ 

/* Definition of the class attributes */ 

protected CalculusFormulae calculusFormulae 

/* It stores a reference to the calculation formulae that 

evaluates this parameter in their default units */ 


void loadSample(Sample s) 

/* This method overloads the superclass method */ 

Measurement createMeasurement() 

/* This method provides an implementation to the superclass 

method */ 

225


CalculusFormulae getCalculusFormulae() 

/* This method returns the calculation formulae that allows 

to evaluate the CalculatedParameter */ 

DatumParameter[]getDatumParameter() 

/* This method consults the calculation formulae of the 

CalculatedParameter and returns all the parameters involved 

in the formulae */ 

/* Definition of the class Magnitude * / 

public abstract class Magnitude 

extends java.lang.Object 

implements PersistentSQL 

/* It implements the interface PersistentSQL with the 

methods to access to the database */ 

/* Definition of some of the class attributes */ 

protected TranslationFormulae translationformulae 

/* It contains a reference to the conversion formulae that 

calculates this magnitude */ 

protected java.lang.String name 

/* It contains the name of the magnitude */ 

protected Parameter parameter 

/* It contains the parameter associated to the magnitude */ 

protected Unit unit 

/* It contains the unit in which the parameter can be 

expressed */ 

protected double maximumValue 

/* It contains the maximum value for evaluation of the 

associate parameter in the associate units */ 

protected double minimumValue 

/* It contains the minimum value for evaluation of the 

associate parameter in the associate units */ 

• The attribute magnitudes can only 

be accessed from the class 

Parameter, because only the 

instances (objects) of this class 

have permission to manage that 

property. 

• The inheritance property allows 

that both the DatumParameter 

and CalculatedParameter classes 

226 

inherit the Parameter class 

structure. Also, these classes 

include their characteristic 

attributes and methods; thus 

encapsulating their definition. 

• Different constructor methods can 

be defined in order to facilitate the 

system development (see the 

constructor methods for the


0..* 

0..1 

SAMPLE 

1..1 

1..* 

id_sample 

SELECTED 

PARAMETER 

USER 

0..* 

1..1 

MAGNITUDE 

id_parameter 

1..* 

id_parameter 

1..1 

inputs validates 

1..* 

id_unit 

1..1 

1..* 1..1 

Different unities in which 

each parameter may be 

measured 

0..* 

MEASUREMENT 

1..1 

PARAMETER 

Class detailed in Figure 4 

TECHNICIAN MANAGER ADMINISTRATION 

Parameter class). 

Fig. 3. Class diagram for parameters and samples. 

• The functionality of overload's 

mechanism is illustrated in the 

method createMeasurement of the 

Parameter class, which is 

redefined in the Calculated 

Parameter and DatumParameter 

UNIT 

DATUM 

PARAMETER 

CALCULATED 

PARAMETER 

1..1 

1..1 

CALCULUS 

FORMULAE 

classes to adapt it to their 

particular behaviour. 

2.1.1 Information domain in winery 

laboratories 

The information domain can be 

represented by three main blocks, 

namely: 

227


1. The samples and the analytical 

parameters to be monitored. 

2. The analytical measurements performed 

along the wine production 

process. 

3. The quality assessment and the 

control of the integrity and 

accuracy of information. 

Figure 3 shows, through a 

class diagram (Rumbaugh et al., 

1999), the modelling for representtation 

of the samples and analytical 

parameters to be monitored. In this 

diagram, SelectedParameter —an 

association between the classes 

Sample and Parameter— represents 

the whole of analytical parameters to 

be monitored in a given sample at a 

given time. 

The association between 

Parameter and Unit classes (see 

Figure 3) makes it possible to express 

a given analytical parameter in 

different units through the Magnitude 

class, as required. In this way, the 

range of normal values of the 

parameter can be expressed in a 

different manner. 

Figure 4 shows some of the 

228 

classes that represent the information 

concerning the analytical measurements 

during wine production. The 

class Measurement represents the 

measurements with time, which 

becomes specific for a given 

parameter through DatumMeasurement 

and CalculatedMeasurement. 

The model also enables 

plotting the calibration curves with 

the data of instrument checking by 

association between classes CalibrationCurve 

and Analyser. In 

addition, the data obtained in the 

analytical measurements can be 

transformed into chemical values or 

magnitudes defined for the selected 

parameter by association between the 

CalibrationCurve and Calculated- 

Measurement classes. 

Finally, Figure 3 shows the 

most important classes and associations 

that constitute the third block 

focused on the control of security, 

integrity and quality of the 

information. As can be seen in the 

figure, the system takes into account 

the importance of the users in the 

security and quality of the production 

process. Class User represents dif-


CALCULATED 

MEASUREMENT 

< Uses > 

1..1 

CALCULUS 

FORMULAE 

MEASUREMENT 

1..1 

DATUM 

MEASUREMENT 

1..* 

0..1 

1..* 

CALIBRATION 

CURVE 

ferent users involved in the information 

system through Technician, 

Administration and Manager subclasses. 

2.2. Modelling of the function domain 

Three types of users are recognised by 

the system: Technicians, Managers 

and Administration. Technician users 

are responsible for the daily analyses 

in the winery laboratory, namely: 

general information management 

(parameters, units, magnitudes and so 

SELECTED 

PARAMETER 

0..* 




id_parameter 

1..* 

0..1 

0..* 

1..1 

Classes previously defined 

(Figure 3) 

USER 

Fig. 4. Class diagram for measurements. 

DATUM 

PARAMETER 

1..1 

Calibration curves for 

each parameter 

measured with the 

analyser 

ANALYSER 

on); inclusion of new samples and 

parameters to be monitored in these 

samples; validation of results, 

generation of reports, and the basic 

functions related with the management 

of information generated in 

the wine analysis processes. 

In addition to all the 

privileges that correspond to Technician, 

Manager users are responsible 

for management of the daily activities 

in the laboratory, the evaluation of the 

workload, the testing of both the 

229


results and the analytical equipment, 

and the analysis of the results as a 

function of both origin and time. In 

this way, the Manager can develop 

the reports that support the decisions 

to be made in the winery. This type of 

user can ask for information on the 

system using non pre-established 

criteria for searching and the system 

will provide managers with real time 

responses in the appropriate format 

(either tabular or graphical). For 

example, the evolution with time of 

the measurements of the analytical 

parameters in a sample, comparison of 

these results between samples of the 

same or different origin and year, etc., 

can be requested. 

Finally, the functionality 

assigned to Administration users is 

similar to that these users have in 

conventional information systems. 

They are responsible for the 

management of users of the system 

and the privileges they have. Also, 

they must maintain the security, 

integrity and privacy of the information 

involved in the system. 

2.3. Technology used in the development 

of the system 

230 

JWisWine is a system developed in 

Java (Campione, 2002, Anderson and 

Stone, 1999) under a three-layer 

structure, as shown in the deployment 

diagram (Rumbaugh et al., 1999) of 

Figure 5. The information is updated 

using the DBMS Oracle 9i (Dorsey 

and Hudicka, 1999, Loney and Koch, 

2002) under an object-relational 

model which implements properly the 

classes model in the data base server 

in charge of giving the data service to 

the users. From their terminal, that 

provides the calculation service, the 

users access to the information 

through the Web server —using 

Apache (Wainwright et al., 2001) for 

this service— and by both JDBC and 

SQLJ procedures (Morisseau-Leroy et 

al., 2001). The Web server communicates 

with the database server, 

transmits the user requirement and 

backed to him/her the result, 

presented through a Java interface. 


3.1. Software developed 

JWisWine is an DSS (Decision 

Support System) composed by two


Database Server 

TCP/IP 

Web Server 

:Database objects 

:JSP pages 

:Java Package 

:JwisWine 

User Terminal 

TCP/IP 

:IU JwisWine 

TCP/IP 

Control Server 

: Analysers 

Domain 

:Browser 

Any Protocol 

Analytical Equipment 

:Control application 

Fig. 5. Deployment diagram of the system architecture. 

tools: the first is a client application 

through which users load samples, 

introduce chemical data, generate 

daily reports, etc. Thus, this tool is in 

charge of managing the daily 

generated data in the winery. The 

other is a web application through 

which managers use information 

extracted from historical data for 

making decisions. 

3.2. Users participation 

3.2.1.- Users in the development of 

JWisWine 

The participation of different users — 

managers and technicians— in the 

development of the system has been 

implemented by the evolutionary 

incremental paradigm. This allowed 

the dynamic specification of the 

system requisites in order to take into 

account the objectives pursued by 

companies. 

231


3.2.2.- Users in the functionality of 

JWisWine 

The system users have overall control 

and monitoring of the data whether 

the data are acquired automatically or 

entered manually (e.g. if the data from 

the analyser are collected by the 

computer through a digital interface 

or the data are manually introduced by 

the user, respectively). In addition, 

each type of user has a different 

access to the system as a function of 

the assigned privileges and all 

activities in the system are monitored. 

3.2. Evolution of JWisWine functionality 

3.3.1.- Operational sub-system 

The aim of this sub-system was to 

endow wineries with a tool able to 

assist the management of the analytical 

monitoring in the following 

aspects: 

• To establish the appropriate 

information flow in the overall 

system in order to unify and 

classify the working procedures. 

• To know in real time the working 

232 

load of the winery as the system 

has previously stored the types of 

samples and the parameters to be 

analysed in each sample. 

• To visualize, modify, verify and 

validate the daily results from any 

workstation in the Local Area 

Network. 

• To move —after validation— the 

results and all data of interest 

from a given sample to the 

historical repository. Warning 

messages advise users about the 

existence of outliers. 

• To store all data related to both 

instruments use and technicians 

activity. 

3.3.2.- Information 

system 

analysis sub- 

The aim of this sub-system is to 

extract information from the data 

stored in the historical repository. The 

most remarkable tasks of this subsystem 

are as follows: 

• To know the evolution of a given 

parameter attending to preset 

searching criteria. The information 

can be obtained through a


standard browser and presented in 

graphs or tables. 

• To obtain series of statistical data 

as either the change or standard 

deviation of any parameter as a 

function of time. 

• To group samples attending to the 

values of one or more parameters, 

including the possibility of 

applying PCA (principal components 

analysis). 

• To introduce control samplers in 

order to extract information of the 

functioning of the analytical 

equipment —namely, Sewart 

graphics—. Traceability of historical 

data with both the 

equipment and regression data 

used is also available. 


Firstly, JWisWine is a valuable tool to 

manage historical data. Thus, users 

are able to extract information 

structured in different manners de 

pending on requirements. The access 

to data of interest is achieved in real 

time, overcoming limitations related 

to manual searches. Wine quality is 

improved because of the major 

knowledge of the wine making. 

On the other hand, daily 

handling of data is also improved by 

easy handling of the interactive 

interfaces. The introduction and validation 

of data, generation of reports 

and other operational tasks are carried 

out by simple procedures and 

protocols that are close to users, who 

only require a short learning period. 

JWisWine can be modified 

functionally with few changes in the 

architecture and components of the 

system due to the open, scalable and 

independent characteristics of both the 

supporting model and the technology 

used. 


The Spanish Comisión Interministerial 

de Ciencia y Tecnología 

(CICyT) is thanked for financial 

support (Project AGL2000-0321-P4- 

03). 


Anderson, J.C. Stone, B.L., 1999. 

Manual de Oracle Jdeveloper 

(Oracle Jdeveloper Manual). 

233


234 

McGraw-Hill/Oracle Press, Madrid. 

Booch, G., Rumbaugh, J., Jacobson, 

I., 1999. The Unified Modeling 

Language. User Manual. Addison-Wesley 

Longman Inc. 

Booch, G., 1994. The Object-Oriented 

Analysis and Design with 

Applications. Addison-Wesley. 

Campione, M., 2002. The Java 

Tutorial Third Edition. McGraw- 

Hill, (http://java.sun.com). 

Dorsey, P., Hudicka, J.R., 1999. 

Oracle8 Database Design Using 

UML Object Modelling. Mc- 

Graw-Hill Osborne Media. 

Eckel, B., 2003. Thinking in Java 

(Third Edition). Prentice Hall. 

Flanzy, C., 2000. Enología: Fundamentos 

Científicos y Tecnológicos 

(Enology: Scientific 

and Technological Fundamentals). 

AMV-Mundi Prensa, 

Madrid. 

Ilyukhin, S.V., Haley, T.A., Singh, 

R.K., 2001. A survey of automation 

practices in the food 

industry. Food Control, 12 (5), 

285-296. 

Jacobson, I., Booch, G., Rumbaugh, 

J., 1999. The Unified Software 

Development Process. Addison- 

Wesley Longman Inc. 

Loney, K., Koch, G., 2002. Oracle 9i 

The Complete Reference. Oracle 

Press. 

McGrath, M.J., O'Connor, J.F. 

Cummins, S., 1998. Implementing 

a process control strategy for the 

food processing industry. J. Food 

Engineer., 35 (3), 313-321. 

Morisseau-Leroy, J. N., Solomon, M., 

Momplaisir, G., 2001. Oracle9i 

SQLJ Programming. Osborne- 

Oracle Press. 

Muller, E. Bassin, M. Troyon, JP. 

Nowak, P., 1999. Implementation 

of rapid result management 

systems in the metals industry. 

Lab. Autom. Inform. Manag., 

34(1), 31-40. 

Murali, N.S., Secher Bo, J.M., 

Rydahl, P., Andreasen Finn, M., 

1999. Application of information 

technology in plant protection in


Denmark: from vision to reality. 

Comput. Electron. Agric., 22, 

109-115. 

Ribereau-Gayon, P., Glories, Y., 

Maujean, A., Dubourdieu, D., 

2000. Handbook of Enology. Vol. 

2: The Chemistry of wine. 

Stabilization and Treatments. 

John Wiley & Sons, Ltd. 

Urbano, M., Luque de Castro, M.D., 

Gómez-Nieto, M.A., 2003. 

Automation of flow injection 

methods in the winery industry 

through a computer program 

based on a multilayer model. 

Proceedings of 9th IEEE International 


Technologies and Factory Automation, 

530-536, Lisbon. 

Rumbaugh, J., Jacobson, I., Booch, 

G., 1999. The Unified Modeling 

Language. Reference Manual. 

Addison-Wesley Longman Inc. 

Valcárcel, M., Luque de Castro, M.D., 

1987. Flow Injection Analysis: 

Principles and Applications. Ellis 

Horwood, Chichester. 

Van Asseldonk, M.A.P.M., Huirne, 

R.B.M., Dijkhuizen, A.A., Beulens, 

A.J.M., Udink ten Cate, A.J., 

1999. Information needs and 

information technology on dairy 

farms. Comput. Electron. Agric., 

22, 97-107. 

Wainwright, P., Ahmad, A., Link, M., 

Sarang, P., 2001. Professional 

Apache 2.0. Wrox, 

(http://www.apache.org). 

Warner, S.A., Sywilok, J.W., 1995. 

Implementing information technology. 

Anal. Chem., 67(5), 173- 

179. 

235

Parte II: Desarrollo de métodos 

quimiométricos y algoritmos 

para el desarrollo de modelos 

cualitativos y cuantitativos en 

química analítica

Capítulo 7 

ULTRAVIOLET-VISIBLE 

SPECTROSCOPY AND PATTERN 

RECOGNITION METHODS FOR 

DIFFERENTIATION AND 

CLASSIFICATION OF WINES 

El contenido de este capítulo ha sido enviado a la revista Food Chemistry para su 

publicación.

Food Chem., enviado para su publicación Parte II, cap. 7 

ULTRAVIOLET-VISIBLE SPECTROSCOPY AND PATTERN 

RECOGNITION METHODS FOR DIFFERENTIATION AND 

CLASSIFICATION OF WINES 

Manuel Urbano Cuadrado, María D. Luque de Castro, Pedro M. Pérez Juan, Juan 

García Olmo, Miguel A. Gómez-Nieto 

Abstract 

The feasibility of developing a qualitative model based on cheap and 

simple instrumentation to differentiate and classify wines from the apellation 

d’origine “La Mancha” (Spain) has been studied. The criteria for discrimination 

were origin, grape variety and ageing process. Ultraviolet Visible Spectroscopy 

(UV-Vis) was used for the development of a inexpensive and simple screening 

approach. Once spectra were collected, a data exploratory analysis was carried 

out in order to both show trends hidden in the data matrix from the sample 

spectra and study the characteristics of the models thus developed. Principal 

Components Analysis (PCA) and Soft Independent Modelling of Class Analogy 

(SIMCA) were used for the exploratory analysis and the development of 

classification models, respectively. The ultraviolet region has for the first time 

been used for the discrimination of types of wines. This region is of great 

importance for differentiation of wines according to both origin within a same 

apellation d’origine and ageing process. The sum of false positives and false 

negatives —the criteria used for evaluating errors— were not higher than 25%, 

and the classification of wines according to the origin zone yielded the best 

values (10%). Sample preparation was not necessary. 

Keywords: Ultraviolet-visible spectroscopy; Wine classification; Pattern 

recognition methods. 

241



Quality control of foods is often based 

on modern and sophisticated instruments 

that involve high costs and 

require well-trained analysts (Flurer, 

2003; Wang, Geil, Kolling, & Padua, 

2003; Encinar, Sliwka-Kaszynska, 

Polatajko, Vacchina, & Szpunar, 

2003; De Villiers, Alberts, Lynen, 

Crouch, & Sandra, 2003). However, 

the results from these instruments are 

often not obtained in time despite 

being supported on modern tech 

niques. This is owing to the necessity 

of steps, previous to measurement, 

focused on improving the accuracy 

and precision of the results. 

Quantification of components either 

involved in a fraudulent wine 

elaboration or those that constitute a 

key family in wine from a given 

apellation d’origine is, most times, a 

long and expensive step (Legin, 

Rudnitskaya, Lvova, Vlasov, Di 

Natale, & D'Amico, 2003; Cullere, 

Aznar, Cacho, & Ferreira, 2003; 

López, Aznar, Cacho, & Ferreira, 

2002). A present trend in analytical 

chemistry is the development of 

242 

methodologies able to provide “fitness 

for purpose” results, which take into 

account aspects related with the 

importance of time against accuracy 

achieved. These aims are often 

supported on qualitative aspects in 

contrast to quantitative results 

(Trullols, Ruisánchez, & Rius, 2004). 

Chemometric techniques allow 

construction of models to 

characterise target samples within 

previously defined and validated 

groups. Multivariate analysis is a 

powerful tool that permits to extract 

qualitative and quantitative chemical 

information from large data sets. 

Thus, qualitative techniques as cluster 

analysis (CA), principal components 

analysis (PCA), linear discriminant 

linear analysis (LDA), soft independent 

modelling of class analogy 

(SIMCA), etc., are aimed at 

improving the quality of food 

products (Kos, Lohninger, & Krska, 

2003; Yu & MacGregor, 2003; Tura, 

Prenzler, Bedgood, Antolovich, & 

Robards, 2004). 

Wine industry has employed 

the above commented tools with 

several objectives; namely: wine


classification based on either grape 

variety or climate factors; assessment 

of the authenticity of wine; study of 

different brownings; etc. The datasets 

used were of different nature, namely: 

the concentration of phenolic compounds 

(De la Presa & Noble, 1995; 

García-Parrilla, González, Heredia, & 

Troncoso, 1997), the composition of 

volatile compounds (García-Jares, 

García-Martín, Marino, & Torrijos, 

1995; Guth, 1997), amino acids content 

(Etiévant, Schlich, &. Bouvier, 

1988), concentration of metal ions 

(Baxter, Crews, Dennis, Goodall, & 

Anderson, 1997), isotopic determination 

(Martin, Guillou, & Martin, 

1998), etc. 

Spectroscopy has been widely 

used for the development of classification 

models. Thus, Near Infrared 

Spectroscopy (NIRS), Fourier Transform 

Infrared (FT-IR), Mass Spectrometry 

(MS), etc., have been used for 

the differentiation and classification 

of samples in various areas (Downey, 

McIntyre, & Davies, 2002; Reeves & 

Zapf, 1998; Pérez-Pavón, Del Nogal- 

Sánchez, García-Pinto, Fernández 

Laespada, Moreno-Cordero, & 

Guerrero-Peña, 2003). Although 

quantitative methods based on these 

techniques have been developed in the 

enological area, there are few 

approaches regarding to their qualitative 

use. Visible and NIR regions 

have been used to discriminate 

between white wines of two varietal 

origins (Cozzolino, Smyth, & Gishen, 

2003). The visible region has also 

been used in an indirect way for 

certification of rosé wines from the 

apellation d’origine “Rioja” (Spain) 

(Meléndez, Sánchez, Íñiguez, Sarabia, 

& Ortiz, 2001). The authors employed 

six parameters obtained from the 

absorption spectrum in the visible 

range for classification. The ultraviolet 

zone has not been employed for 

differentiation and classification of 

wines. 

The aim of this work was to 

study the use of both the ultraviolet 

and visible zones in order to obtain 

spectra for classification of wines 

based on criteria more restrictive than 

those above commented. Criteria as 

ageing process and origin within a 

apellation d’origin have been studied. 

Varietal discrimination has also been 

243


evaluated. Thus, the approach here 

reported is aimed at studying the 

feasibility of developing a cheap and 

simple model —using only the 

ultraviolet and visible zones— for the 

differentiation and classification of 

wines without the help of trained 

specialists. 


2.1 Samples 

Different samples —red and white 

wines; young and aged wines; wines 

from different zones within the “La 

Mancha” apellation d’origine (“Quintanar 

de la Orden”, “Fuente de Pedro 

Naharro”, “Mota del Cuervo”, “Corral 

de Almaguer” and “Villacañas”) and 

grape varieties (“Cencibel” and 

“Cabernet Sauvignon”)— were used 

in the study. Thus, the number of 

samples employed in the calibration 

and validation steps was 120. 

2.2 Apparatus and procedure 

The instrument employed for spectra 

collection was an Agilent 8453E UVvisible 

Spectroscopy System (Agilent 

244 

Technologies, Waldbronn, Germany). 

The spectra were collected using a 

UV-VIS Chem Station Rev. A.06.03 

(Hewlet-Packard, USA). The absorbance 

spectra were recorded in 

duplicate. The working range was 300 

– 800 nm. 

The instrument was equipped 

with a quartz cell with a pathlength of 

0.1 cm. This short pathlength enabled 

to obtain absorbance values within the 

appropriate range regarding to accuracy 

and precision specified by the 

spectrophotometer characteristics manual. 

2.3 Chemometric software used for 

data processing and statistical 

techniques employed 

Duplicate spectra from two aliquots of 

each sample were averaged. The 

Unscrambler 7.5 (Camo Process AS, 

Oslo, Norway) was used for data 

processing. PCA and SIMCA (Vandeginsten, 

Massart, Buydens, De 

Jong, Lewi, & Smeyers-Verbeke, 

1998; Wold, 1976; Esbensen, 2002) 

were the multivariate pattern 

recognition methods.


Principal components analysis (PCA) 

Principal components analysis has 

been extensively used for visualisation 

of hidden trends in a data 

matrix M consisting of n objects 

defined by m variables. In this work, 

the objects were the sample spectra 

and the variables were the wavelengths. 

This can also be seen as an 

m-dimensional space, in which each 

wavelength defines one dimen-sion. 

In order to show trends or different 

data structures hidden in the M 

matrix, a new c-dimensional space 

can also be built from the original mdimensional. 

The new dimensions, 

principal components —PCs—, are 

built taking into account the maximum 

variance of data and the 

requirements about an orthogonal 

space. The number of PCs is much 

lower than the number of original 

variables, mainly in spectral analysis, 

due to the linear combination of the 

original variables in order to form the 

PCs thus removing co-linearity between 

variables. Objects plotted in the 

new space —score plot— often show 

trends that, in spite of having to be 

interpreted and explained, constitute a 

first step in subsequent modelling for 

samples classification. 

Soft independent modelling of class 

analogy (SIMCA) 

SIMCA, which is a supervised pattern 

recognition technique in contrast to 

the not supervised PCA, is considered 

a key chemometric approach for 

classification. This technique enables 

to classify the samples into an already 

existing group, assigning new objects 

to the class to which they show the 

largest similarity. SIMCA is strongly 

based on PCA, because each class is 

defined by an independent PCA, 

taking into account the optimal 

number of PCs for each class, which 

is endowed with a specific data 

structure. 

The SIMCA classification 

process consists of two stages, 

namely: the training stage, in which 

the individual models of the data 

classes are developed, and the 

classification stage, in which new 

samples are classified within the 

established class models. 

245



3.1 Exploratory analysis 

246 

Fig. 1. Variance explained in the PCA for white wines. 

Visualising trends or groups in white 

wines 

PCA was applied to the matrix formed 

by the UV-Vis spectra corresponding 

to samples of white wines. The 

maximum number of PCs was set at 

10. Firstly, Fig. 1 shows the data 

variance explained by the new PC 

plot. As can be seen in the figure, the 

first two PCs explain almost all the 

data variance. 

The first criterion was to 

differentiate wines from different 

zones within the same apellation 

d’origine, namely: “Quintanar de la 

Orden” (identified by A), “Fuente de 

Pedro Naharro” (B), “Mota del 

Cuervo” (C) and “Corral de 

Almaguer” (D). Fig. 2-A shows the 

plot of samples in the bi-dimensional 

space formed by the first two PCs. As 

can be seen, there are four incipient


A 

B 

Fig. 2. Plot of white wines samples in the space formed by the first two PCs: (A) 

original spectra data. (B) first derivative spectra data. 

247


248 

Fig. 3. Loadings plot for the two first components in white wines. 

groups corresponding to the four 

zones. Thus, UV-vis spectra can be 

used for the differentiation of wines as 

a function of wine origin within the 

same apellation d’ origine. 

First derivative is often used 

as a mathematical preprocessing for 

UV-Vis spectra in order to enlarge the 

differences between them. This 

treatment was carried out with the aim 

of improving the discrimination above 

achieved. Fig. 2-B shows the more 

significant differentiation between 

groups. The clusters corresponding to 

C and D were very close, but the 

difference between them was clearer 

after re-scaling. Discrimination was 

possible because of the different 

concentration depending on zones of 

families of compounds that absorb 

UV-Vis radiation. 

In order to find out these 

compounds, loadings for the first two 

PCs were plotted as shown in Fig. 3. 

Loading vectors can be considered as 

the bridge between the initial 

variables space and the PCs space. As 

commented before, each PC is a linear


Fig. 4. PCA for red wines according to origin zone. 

combination of the entire initial 

variables; that is, of all the 

wavelengths. The coefficients of this 

combination are called loadings, and 

each wavelength has a loading. The 

higher the loadings, the higher the 

influence of the corresponding 

wavelengths in the explanation of the 

data variance. Fig. 3 shows that the 

key wavelengths for the discrimination 

of groups were in the range 

300-400 nm; that is, the ultraviolet 

region. Compounds present in wine 

that absorb in this region are esters 

from hydroxicynamic acids (Flanzy, 

2000). Thus, two conclusions can be 

obtained from this exploratory 

analysis. The first is that there are 

differences between the proportion of 

these families between different zones 

within an apellation d’ origine. The 

second is that this discrimination can 

not be visual because the differences 

lead in the ultraviolet region. 

Visualising trends or groups in red 

wines 

The first criterion used for the 

249


differentiation was also origin zone. 

Three zones were taken into account, 

namely: “Fuente de Pedro Naharro” 

(J), “Quintanar de la Orden” (K) and 

“Villacañas” (L). Fig. 4 shows the 

PCA applied to the data from the first 

derivative spectra from red wines. The 

major differences were found between 

wines from “Quintanar de la Orden” 

and those from “Villacañas”. Wines 

from “Fuente de Pedro Naharro” were 

located between the other two zones. 

As can be seen, the differentiation 

was worse than that achieved for 

white wines. 

Other criterion used to 

differentiate red wines was the grape 

variety used for obtaining the two 

groups of wine. The varieties employed 

were “Cabernet Sauvignon” 

(S) and “Cencibel” (T). The score plot 

is shown in Fig. 5-A. Two trends can 

be observed: T samples, which are the 

most abundant, provided a swarm of 

points that comprises a large zone and 

S samples were located in the belowright 

part of the plot. Loadings from 

the visible region, as can be seen in 

Fig. 5-B, were higher than those 

obtained from white wines. Ob- 

250 

viously, this is due to the red colour. 

The ageing process of wines 

was also studied. Aged (Y) and noaged 

wines (Z) were considered. The 

former gathered in the right side and 

the latter in the left side of Fig. 5-C. 

A higher dispersion was 

observed in the PCAs for red wines as 

compared with that obtained for white 

wines, which can be due to the higher 

complexity of the composition of red 

wines. 

3.2 Development of classification 

models 

Model for classifying white wines 

according to origin of the grape 

A model for the classification of three 

types of wines according to zone of 

the grape was developed. The classes, 

following the nomenclature employed 

in section 3.1.2, were named A 

(“Quintanar de la Orden”), B (“Fuente 

de Pedro Naharro”) y C (“Mota del 

Cuervo”). Independent principal components 

analyses were carried out for 

each class. The first derivative spectra 

were employed. The prediction 

capacity of the model was studied


A 

B 

Fig. 5. (A) PCA plot. (B) Loadings plot for the two first components of red wines 

according to grape variety 

251


C 

252 

Fig. 5. (cont.) (C) PCA for red wines according to the ageing process. 

by external validation using both 

samples of these classes that had not 

been used in the training stage, and 

samples from “Corral de Almaguer” 

(D), that has not been modelled 

because of the small number of 

samples. The significance level was 

set at 5%. 

Coomans plots (Esbensen, 

2002) were used for evaluating the 

results from the classification. These 

plots provide the orthogonal distance 

from all new objects to two selected 

classes at the same time. The critical 

cut-off class membership limits are 

also obtained from those plots. If an 

object belongs to a model (class) it 

should fall within the membership 

limit, which is on the left of the 

vertical line or below the horizontal 

line in Fig. 6-A. 

As can be seen in Fig. 6-A, 

the validation samples corresponding 

to origin zones for grape A and B 

were all within the limit of each 

membership. Nevertheless, B samples 

were placed in the class C 

membership zone in the Cooman plot


for the classes A and C shown in Fig. 

6-B; thus, too many false positives 

were obtained in the classification 

table (the 50 % of B samples were 

also assigned to class C). This is 

owing to the proximity of both groups 

—B and C— , the major dispersion of 

the class B and, mainly, the existence 

of two C samples within the swarm of 

points corresponding to B zone, as 

shown in the PCA in Fig. 2-B. Better 

results were achieved when these two 

C samples were not used in the 

modelling of class C —that is, they 

were considered as outliers—. Almost 

all the validation samples corres 

ponding to zone C were classified 

correctly in class C, and all the 

validation samples corresponding to 

zone D were not classified in any 

class, as expected. The sum of percent 

of false positives and false negatives 

was 10% and the number of samples 

in the validation set was 32. 

In order to explain the classification 

data, the concept of 

Variable Discrimination Power (Esbensen, 

2002) was used; this gives 

information about the power of a 

variable to discriminate between any 

two models. A value close to 1 indicates 

no discrimination power at all, 

while a high value, e.g. greater than 3, 

indicates good discrimination for a 

given variable. 

The discrimination power 

plots for comparison between classes 

A-B and comparison between classes 

C-B are shown in Fig. 6-C and Fig. 6- 

D, respectively. In the former, the 

discrimination values were much 

higher in the ultraviolet than in the 

visible region. For this reason, the 

classification of wines from 

“Quintanar de la Orden” (A) and 

wines from “Fuente de Pedro 

Naharro” (B) was based on the 

differences in the concentration of 

esters from hydroxicynamic acids, 

which absorb in the ultraviolet region. 

Distinction between wines 

from “Quintanar de la Orden” (A) and 

“Mota del Cuervo” (C) was explained 

by the same effect. 

The discrimination values in 

Fig. 6-D were lower than those shown 

in Fig. 6-C. This fact explains that 

classes B and C were close, as 

compared with the distance from A. 

The visible zone between 620 and 640 

253


A 

B 

Fig. 6. (A) Cooman plot for the classes A and B. (B) Cooman plot for the classes 

B and C. 

254


C 

D 

Fig. 6. (C) Discrimation power plot for classes A and B. (D) Discrimation power 

plot for classes B and C. 

255


nm provided a high discrimination 

power for classes B and C. Phenolic 

compounds are responsible for the 

slightly green colour present in white 

wines; thus, the proportion of these 

compounds is different between B and 

C wines. 

Model for classifying red wines 

according to origin of the grape 

Classes corresponding to wines from 

“Quintanar de la Orden” (K) and 

wines from “Villacañas” (L) were 

developed using independent principal 

component analysis. External 

validation was carried out using 

samples that have not been used for 

models development. 

The Distance versus Leverage 

plot (Si vs Hi plot) (Esbensen, 2002) 

was used in the evaluation of the 

external validation. It shows the limits 

used in the classification, both for the 

distance to the model (Si) and of the 

leverage (Hi). The objects within 

these limits have a high probability to 

belong to the class at the chosen 

significance level. They summarise 

the information contained in the 

model. 

256 

Fig. 7 shows the Si vs Hi plot 

for the validation set of the model 

composed by classes K and L at 5% 

significance level. For the class K 

(Fig. 7-A) almost all the validation 

objects were within the leverage limit. 

As can be seen in the figure, two 

samples were false negatives. They 

came from two zones placed in the 

border of zone K. Fig. 7-B shows the 

classification achieved for class L. 

Four samples of the K group were 

plotted within the class L membership 

limit; thus, these samples were false 

positives. The classification of L 

samples was successful. The sum of 

false positives and false negatives was 

20% and the number of samples for 

the validation set was 30. 

On the other hand, the zone 

between 500 and 560 nm enables the 

highest discrimination power for red 

wines origin, in contrast to the 

discrimination of the ultraviolet 

region for white wines. In this region, 

the absorbent compounds are 

anthocyans. 


according to grape variety


A 

B 

C 

Fig. 7. (A) Si vs Hi plot for the class K. (B) Si vs Hi plot for the class L. (C) Si vs 

Hi plot for the class Y. 

257


258 

D 

E 

F 

Fig. 7. (cont.) (D) Si vs Hi plot for the class Z. (E) Discrimination power plot for 

classification according to grape variety. (F) Discrimination power plot 

for classification according to the ageing process.


Classes corresponding to red wines 

from “Caubernet Sauvignon” grape 

(S) and red wine from “Cencibel” 

grape (T) were modelled. Class models 

were then externally validated. 

Fig. 7-C and Fig. 7-D show the Si vs 

Hi plots of the validation set for the 

classes S and T, respectively. The 

error —sum of false positives and 

false negatives— was 25 %. The 

validation set was composed by 30 

samples and the significance level 

was set at 5%. The larger data 

dispersion, as shown in the PCA in 

Fig. 5-A under the variety criterion, 

meant a higher error. 


according to the ageing process 

In this case, the model developed was 

aimed at classifying red wines into 

aged wines and no-aged wines. Thus, 

two groups of wine were taken in 

order to model the classes, namely: Y 

class, non-aged wines, and Z class, 

aged wines. The sum of false 

positives and false negatives was 

25%. The validation set was 

composed by 30 samples and the 

significance level was set at 5%. 

The discrimination power of 

the variables of the last two models 

was lower than that of the first model, 

as can be seen in Fig. 7-E and Fig. 7- 

F; this explains the higher number of 

false positives and false negatives. 

The last one shows the higher 

discrimination values in the ultraviolet 

region. This fact means that the 

quantitative and qualitative evolution 

of the phenolic fraction in the ageing 

process yields differences in ultraviolet 

absorbent compounds. 


Differentiation and classification of 

wine under various criteria have been 

achieved in the approach presented in 

this work, using cheap and simple 

instrumentation. The error in the 

prediction depends on the type of 

wine to which the method is applied 

(namely, white or red wine). 

The classification of the wines 

from different origins within the same 

apellation d’origine is around 90%. In 

addition, the classification of wines 

according to grape variety and ageing 

process is better than 75%. The model 

thus developed supplies a simple, 

259


inexpensive screening tool for wines 

from the apellation d’origine “La 

Mancha” from the data provided by 

an array diode UV-VIS spectrophotometer. 

Other advantage of the 

model is the time required for analysis 

—about ten minutes were enough for 

classification into a given class—. 

The ultraviolet region has 

been used for the first time for 

differentiation of wines. This region is 

the key for discrimination of wines 

according to the zone within the same 

apellation d’origine. In addition, the 

joint use of the ultraviolet and visible 

regions enables a better classification 

of wine according to the ageing 

process, as compared with the single 

use of one of the spectrum zones. 

The phenolic compounds 

from the secondary metabolism of 

vegetables are excellent discriminators 

for wines. This characteristic 

is due to the variability of those 

compounds as a function of soils, 

grape variety, production conditions, 

etc. 



260 



AGL2000-0321-P4-03). 


Baxter, J. M., Crews, M. E., Dennis, 

J., Goodall, I., & Anderson, D. 

(1997). The determination of the 

authenticity of wine from its trace 

elements composition. Food 

Chemistry, 60, 443-450. 

Cozzolino, D., Smyth, H. E., & 

Gishen, M. (2003). Feasibility 

study on the use of visible and 

near-infrared spectroscopy together 

with chemometrics to 

discriminate between commercial 

white wines of different varietal 

origins. Journal of Agriculture 

and Food Chemistry, 51, 7703- 

7708. 

Cullere, L., Aznar, M., Cacho, J., & 

Ferreira, V. (2003). Fast fractionation 

of complex organic 

extracts by normal-phase chromatography 

on a solid-phase extraction 

polymeric sorbent. Optimization 

of a method to 

fractionate wine flavor extracts.


Journal of Chromatographia, A, 

1017, 17-26. 

De la Presa, C., & Noble, A. C. 

(1995). Descriptive analysis of 

three white wines varieties from 

Penedes. American Journal of 

Enological and Viticulture, 46, 5- 

9. 

De Villiers, A., Alberts, F., Lynen, 

F., Crouch, A., & Sandra, P. 

(2003). Evaluation of liquid 

chromatography and capillary 

electrophoresis for the elucidation 

of the artificial colorants brilliant 

blue and azorubine in red wines. 

Chromatographia, 57, 393-397. 

Downey, G., McIntyre, P., & Davies, 

A. N. (2002). Detecting and 

quantifying sunflower oil adulteration 

in extra virgin olive oils 

from the Eastern Medite- rranean 

by visible and near infrared 

spectroscopy. Journal of 

Agriculture and Food Chemistry, 

50, 5520-5525. 

Encinar, J. R., Sliwka-Kaszynska, M., 

Polatajko, A., Vacchina, V., & 

Szpunar, J. (2003). Methodo- 

logical advances for selenium 

speciation analysis in yeast. 

Analytica Chimica Acta, 500, 

171-183. 

Esbensen, K. H. (2002). Multivariate 

Data Analysis – in Practice, Oslo: 

Camo Process AS. 

Etiévant, P., Schlich, P., & Bouvier, J. 

C. (1998). Varietal and geographic 

classification of French 

red wines in terms of elements, 

amino acids and aromatic 

alcohols. Journal of Science and 

Food Agriculture, 48, 25-41. 

Flanzy, C. (2000). Enología: Fundamentos 

Científicos y Tecnológicos 

(Enology: Scientific and Technological 

Fundamentals). Madrid: 

AMV-Mundi Prensa. 

Flurer, C. L. (2003). Analysis of 

antibiotics by capillary electrophoresis. 

Electrophoresis, 24, 

4116-4127. 

García-Jares, C. M., García-Martín, 

M. S., Marino, N., & Torrijos, C. 

(1995). GC-MS identification of 

volatile components of Galician 

(northwestern Spain) white wines. 

261


262 

Application to differentiate Rias 

Baixas wines from wines produced 

in nearby geographical 

regions. Journal of Science and 

Food Agriculture, 69, 175-184. 

García-Parrilla, M. C., González, G. 

A., Heredia, F. J., & Troncoso, A. 

M. (1997). Differentiation of wine 

vinegars based on phenolic composition. 

Journal of Agriculture 

and Food Chemistry, 45, 3487- 

3492. 

Guth, H. (1997). Quantification and 

sensory studies of character 

impact odorants of different white 

wine varieties. Journal of Agriculture 

and Food Chemistry, 45, 

3027-3032. 

Kos, G., Lohninger, H., & Krska, R. 

(2003). Development of a method 

for the determination of fusarium 

fungi on corn using mid-infra-red 

spectroscopy with attenuated total 

reflection and chemometrics. 

Analytical Chemistry, 75, 1211- 

1217. 

Legin, A., Rudnitskaya, A., Lvova, L., 

Vlasov, Y., Di Natale, C., & 

D'Amico, A. (2003). Evaluation 

of Italian wine by an electronic 

tongue: recognition, quantitative 

analysis and correlation with 

human sensory perception. 

Analytica Chimica Acta, 484, 33- 

44. 

Lopez, R., Aznar, M., Cacho, J., & 

Ferreira, V. (2002). Determination 

of minor and trace volatile 

compounds in wine by solidphase 

extraction and gas 

chromatography with mass 

spectrometric detection. Journal 

of Chromatographia, A, 966, 167- 

177. 

Martin, G. J., Guillou, C., & Martin, 

M. L. (1998). Natural factors of 

isotope fractionation and the 

characterization of wines. Journal 

of Agriculture and Food Chemistry, 

36, 316-322. 

Meléndez, M. E., Sánchez, M. S., 

Íñiguez, M., Sarabia, L. A., & 

Ortiz, M. C. (2001). Pysichophysical 

parameters of colour 

and the chemometric characterisation 

of wines of certified 

denomination of origin ‘Rioja’.


Analytica Chimica Acta, 446, 

159-169. 

Pérez-Pavón, J. L., Del Nogal- 

Sánchez, M., García-Pinto, C., 

Fernández Laespada, M. E., 

Moreno-Cordero, B., & Guerrero- 

Peña, A. (2003). A method for the 

detection of hydrocarbon pollution 

in soils by headspace mass 

spectrometry and pattern recognition 

techniques. Analytical 

Chemistry, 75, 2034-2041. 

Reeves, J. B., & Zapf, C. M. (1998). 

Mid-infrared diffuse reflectance 

spectroscopy for discriminant 

analysis of food Ingredients. 

Journal of Agriculture and Food 

Chemistry, 46, 3614-3622. 

Trullols, E., Ruisánchez, I., & Rius, F. 

X. (2004). Validation of 

qualitative analytical methods. 

Trends in Analytical Chemistry, 

23, 137-145. 

Tura, D., Prenzler, P. D., Bedgood, D. 

R., Antolovich, M., & Robards, 

K. (2004). Varietal and processing 

effects on the volatile 

profile of Australian olive oils. 

Food Chemistry, 84, 341-349. 

Vandeginsten, B. G. M., Massart, D. 

L., Buydens, S., De Jong, S., 

Lewi, P. J., & Smeyers-Verbeke, 

J. (1998). Handbook of Chemometrics 

and Qualimetrics: Parts A 

and B. Amsterdam: Elsevier. 

Wang, J. F., Geil, P. H., Kolling, D. 

R. J., & Padua, G. W. (2003). 

Analysis of zein by matrix-assisted 

desorption/ionization mass 

spectrometry. Journal of Agri 

cultural and Food Chemistry, 51, 

5849-5854. 

Wold, S. (1976). Pattern Recognition, 

8, 127-139. 

Yu, H. L., & MacGregor, J. F. (2003). 

Multivariate image analysis and 

regression for prediction of 

coating content and distribution in 

the production of snack foods. 

Chemometrics and Intelligent 

Laboratory Systems, 67, 125-144. 

263

Capítulo 8 

STUDY OF SPECTRAL ANALYTICAL 

DATA USING FINGERPRINTS AND 

SCALED SIMILARITY 

MEASUREMENTS 

El contenido de este capítulo ha sido aceptado para publicación en la revista 

Analytical and Bioanalytical Chemistry.

Anal. Bioanal. Chem. XX (2005) XX Parte II, cap. 8 

STUDY OF SPECTRAL ANALYTICAL DATA USING FINGERPRINTS 

Abstract 

AND SCALED SIMILARITY MEASUREMENTS 

M. Urbano Cuadrado, M. D. Luque de Castro, M. A. Gómez-Nieto 

A new chemoinformatic model for enlarging the differences between 

spectra has been developed. Then, it has been applied to the differentiation of 

wines according to various criteria —namely, grape origin and variety, ageing 

process—. The model is based on generation of fingerprints from normalised 

spectra, using empirical parameters and a set of 120 samples. After generation of 

the fingerprints, similarity matrixes based on the Tanimoto similarity index 

between the fingerprints of the samples were built. The calculation of the 

Tanimoto index was modified in order to adapt the index to the characteristics of 

the analytical measurements. Thus, scaling factors taking into account pattern 

fingerprint generated from a group of samples with common characteristics were 

used. In addition, a modified expression for calculating the Tanimoto index was 

employed. Principal Components Analysis (PCA) and Soft Independent 

Modelling of Class Analogy (SIMCA) were applied to the similarity matrixes. 

The results obtained are discussed as a function of the normalisation method 

employed, the empirical factor used in generation of the fingerprints, selection of 

samples for building the pattern fingerprint, etc. Finally, the results for 

differentiating of wines are compared with those obtained by applying PCA to 

the unprocessed spectra as stated by the proposed model. 

Keywords: Similarity calculation, Fingerprints, Attenuated total reflection mid 

infrared, Wine differentiation. 

267


Introduction 

Spectroscopic techniques have been 

widely used for the development of 

analytical methods, so they constitute 

one of the macro-areas that support 

instrumental analysis. The different 

spectroscopic techniques provide 

solutions to a wide range of analytical 

problems as a function of the type of 

information extracted from the 

samples. Although Mid Infrared 

Spectroscopy (MIRS), Near Infrared 

Spectroscopy (NIRS), Nuclear 

Magnetic Resonance (NMR) and 

Mass Spectrometry (MS) are 

techniques traditionally used in 

structural determination of organic 

compounds [1-3], their use for 

quantitative aspects has recently 

increased significantly [4-6]. In the 

last few years, NIRS has become a 

key technique for the determination of 

components in the agrifood field 

[7,8]. 

Qualitative analysis [9] is a 

trend in Analytical Chemistry aimed 

at shortening the time required for a 

given analysis, achieving objectives 

close to users and industry requi- 

268 

rements without the use of 

sophisticated instrumentation. Thus, 

approaches focused on obtaining 

information about polluted/non polluted 

materials, fraudulent production 

processes, clinical yes/no responses, 

etc. [10-12] have been developed in 

the last years. For this, multivariate 

analysis, and, specifically, pattern 

recognition techniques, have been 

employed for sample differentiation 

and classification [13,14]. 

In order to obtain maximum 

information from datasets, it is 

mandatory to both characterise the 

objects by means of a series of 

variables and apply multivariate 

analysis. The number of variables for 

developing pattern recognition methods 

depends on the data available 

from each sample. In this way, to use 

variables difficult to be obtained 

owing to measurement costs, low 

sample quantity, long analysis time, 

etc. limits the number of variables 

[15,16]. On the other hand, a minimum 

number of variables is 

necessary for supporting the models. 

Spectroscopic techniques are useful as 

they fulfil the requirement of ob-


taining large datasets. 

A spectrum is a set of data — 

namely, absorbances, emission intensities, 

relative intensities of 

fragments in a mass spectrum, etc.— 

at given values of the variables — 

wavelengths, mass/charge ratios, distances, 

etc.—. The spectrum is usually 

collected in a short time, so it is 

possible to obtain the values of 

hundreds of variables in real time in a 

non-destructive fashion (most times) 

and, usually, affordable acquisition 

and maintenance costs. 

Advances in chemometrics 

constitute a powerful tool for surpassing 

limitations inherent to nonexpensive 

instruments in order to 

obtain maximum information. The 

development of a new chemoinformatic 

method —based on similarity 

calculation, used in structural 

aspects most times [17,18]— in order 

to improve the information provided 

by spectroscopic techniques is presented 

in this paper. The aim of this 

method is to obtain qualitative 

information —differentiating samples 

of wine according to their zone and 

grape origins in a preliminary study— 

from very similar MIR spectra. This 

similarity is due to the high influence 

on the spectrum of the common 

components of wine, masking the 

contribution to the spectrum of the 

specific components. The method 

enables to enlarge the differences 

between samples by means of the 

digitalisation of spectra and the 

calculation of similarity matrixes 

using fingerprints of samples. 

Proposed model 

After data preprocessing aimed at 

detecting and removing spectral 

outliers, a procedure for transformation 

of spectral data into binary 

fingerprints was carried out. The 

fingerprints were then used for 

calculating similarity between the 

analysed samples. This process consists 

of the following steps. 

Normalisation of the spectral data 

The spectrum from a given sample 

can be considered as a variable array e 

composed by n elements, where e(i) 

represents the absorbance value at the 

wavelength i. A matrix E with di- 

269


mensions n x e is defined for the 

sample set, where n is the number of 

samples and e is the number of 

variables. The element E(i,j) represents 

the absorbance value for the 

sample i at the wavelength j. 

270 

The matrix E is transformed 

into a normalised matrix E . From the 

types of normalisation proposed in the 

literature [19], three of them are 

employed in this work, namely: 

Standard: 

E( 

i, 

j) 

−min( 

E( 

n, 

j)) 

∀ n, 

E( 

i, 

j) 

= 

max( E( 

n, 

j)) 

−min( 

E( 

n, 

j)) 

Logarithmic: 

(1) 

log( E( 

i, 

j) 

+ 1) 

− min(log( E( 

n, 

j) 

+ 1) 

∀n 

, E( 

i, 

j) 

= 

max(log( E( 

n, 

j) 

+ 1) 

− min(log( E( 

n, 

j) 

+ 1) 

Tangential: 

(2) 

∀n , E( 

i, 

j) 

= tanh(log( E( 

i, 

j) 

+ 1)) 

(3) 

The three types normalise the 

absorbance values of the matrix E 

within the range [0,1], but equation 

(3) is independent of the distribution 

of values in contrast to equations (1) 

and (2). 

Generation of the fingerprints 

After normalising the data, the matrix 

E consisting of values within the 

range [0,1] is used for building a new 

matrix F, the fingerprint matrix, as 

follows: 

A threshold value U within the 

range [0,1] is selected for the 

construction of the fingerprint. 

An element F(i,j) = 1, if and only 

if, E( i, 

j) 

≥ U , and F(i,j) = 0 

otherwise. 

Thus, the threshold value U 

determines the significance of the 

spectral value at every wavelength.


The matrix F is equal to the unity 

matrix when U=0 and, on the other 

hand, most elements F(i,j) are equal to 

zero when U=1. 

Generation of the similarity matrix 

Transformation of the spectra into 

fingerprints allows to use the 

similarity indexes in the literature 

[20,21]. In this work, the Tanimoto 

index has been used, which is 

described as follows: 

c 

= 

a + b − c 

TA, B 

(4) 

where: c represents the number of bits 

set to 1 common in the fingerprints A 

and B; a and b represent the number 

of bits set to 1 in the fingerprints A 

and B, respectively. 

Calculation of similarity 

measurements using the Tanimoto 

index is applied to the matrix F, thus 

building the similarity matrix S. This 

matrix is symmetrical and its 

dimension is n x n, where n is the 

number of samples (objects). The 

elements S(i,i) are equal to 1 and each 

element S(i,j) represents the similarity 

value between sample i and sample j 

obtained from the application of 

equation (4) to the fingerprint matrix 

F. 

Scaled similarity matrix 

For the calculation of the similarity 

measurements by equation (2), all the 

fingerprint bits have the same 

influence or loading in the similarity 

calculation. This assertion is correct in 

applications as structural similarity 

calculation [17,18]; nevertheless, it is 

incorrect when the fingerprints are 

built from the processing of different 

analytical measurements at different 

working conditions. 

It can be considered that each 

sample is defined by the following 

equation: 

M = a X + a X + L+ 

a 

i 

X 

1 i 1i 

2i 

2i 

Ni Ni (5) 

where: i represents the sample i; N is 

the number of analytical measurements; 

Xji is the analytical value 

(the absorbance value in this case); 

and aji is a coefficient that takes into 

account the influence of the analytical 

value j on sample i. 

271


272 

The similarity calculation 

must be modified if the coefficients aji 

show different values for different 

samples or groups of samples. This 

modification is carried out through a 

scaling process using a pattern 

fingerprint P, which has a size equal 

to those of the fingerprints that 

constitute the matrix F. Bits set to 1 

represent the group of bits more 

significant between the bits set to 1 in 

the matrix F, that is, the analytical 

variables that provide more information 

from the sample under 

study. 

Thus, a new calculation of the 

similarity matrix is proposed by the 

application of a Tanimoto index 

scaled as follows: 

T 

c 

s 

S = (6) 

as 

+ bs 

− cs 

where: 

a 

b 

c 

s 

s 

s 

= a 

= b 

= c 

b 

c 

a 

+ a 

+ b 

+ c 

p 

p 

p 

× W 

× W 

× W 

(7) 

The aa, bb and cc values represent 

the number of bits set to 1 in the 

fingerprints A, B, and common in 

A and B, respectively, and not set 

to 1 in the pattern fingerprint P. 

The ap, bp and cp values represent 

the number of bits set to 1 in the 

fingerprints A, B, and common in 

A and B, respectively, and set to 1 

in the pattern fingerprint P. 

W is a scaling factor that permits 

to give more weight to those bits 

set to 1 in the samples A and B 

and also set to 1 in the pattern P 

in the similarity calculation. 

In this way, equation (7) 

permits to weigh up the different 

fingerprint bits in the similarity 

calculation, thus considering different 

levels of significance of the analytical 

variables. 

Generation of the pattern fingerprint 

The number and position of the bits 

set to 1 in the pattern fingerprint are 

the key for application of the equation 

(7), and they clearly depends on the 

problem under study. This dependence 

makes necessary to open the 

proposed model to the analytical 

technique, number of analytical


variables, characteristics of the 

samples, data dispersion, etc. 

In this paper, a method based 

on frequency measurements of the 

bits of the matrix F has been used. 

This method consists of the following 

steps: 

A group of samples is selected 

for building pattern P. The 

group of samples determines the 

characteristics of the pattern. It 

is possible to build several 

patterns according to the group 

of samples selected. 

The matrix F is analysed by 

columns and, then, a frequency 

array is generated in the 

following way: 

i n 

∑ = 

f j = 

i= 

F i j 

n 

1 

( ) 

( , ) 

(8) 

where the values i determine the 

samples to be considered for 

generating the pattern, and n is 

the number of samples. 

A pattern fingerprint P is built 

from the frequency array by the 

consideration of a threshold 

frequency value V. Thus, a 

pattern P is generated with 

elements P(i)=1 (fingerprint 

bits) if the corresponding 

element of the frequency array 

f(i) has a value equal to or 

higher than the threshold value 

V. All the bits of the pattern 

fingerprint P are set to 1 when 

the threshold value is V=0. 

Thus, the higher the values for 

V, the lower the number of bits 

set to 1 in the pattern 

fingerprint. 

Experimental 

Samples 

120 samples of wine were used. From 

these, 60 were of red wine and 60 of 

white wine. The grape variety was 

known for all red wines (25 of 

“Cencibel” and 35 of “Cabernet 

Sauvignon”), but not for white wines. 

Only 15 red wines were aged, 25 nonaged 

and the age of the rest of red 

wines was unknown. The origin was 

known for 40 samples of white wine 

(15 samples of “Quintanar de la 

273


Orden” and 25 samples of “Fuente de 

Pedro Naharro”, both within the “La 

Mancha” apellation d’origine). The 

samples were used as such, because 

filtering, dilution, preconcentration, 

interferents removal, etc., were not 

required. 

Apparatus and methods 

The instrument employed for MIR 

spectra collection was an FT-MIR 

Nicolet Magna-IR550 Serie II 

(Nicolet Instrument Corp., Madison, 

Wisconsin, USA), capable of making 

measurements at 4 cm -1 resolution in 

the spectral range covering 4000-400 

cm -1 . The instrument was furnished 

with an infrared attenuated total 

reflection (ATR) solid, liquid and 

mellow sample cell with a zinc 

selenide crystal (Spectra Tech., 

Stamford, CT, USA) for Nicolet 

Spectrometers. The ATR cell consists 

of the following: base accessory unit 

containing the mirrors to direct the IR 

beam to and from the sampling plate; 

a sampling plate containing the ZnSe 

ATR crystal and; precision single or 

dual readout controller. The number 

274 

of reflections of the ATR module was 

15. Other characteristics of the 

instrument were a transmission range 

between 20000 and 650 cm -1 , a 

refractive index (at 1000 cm -1 ) of 2.4, 

density of 5,27 g cm -3 and volume of 

3.5 ml. Before recording the spectra, 

the samples were thermostated at 

24ºC. The absorbance (log(1/R)) 

spectra were collected in duplicate. 

Chemometric software for data 

processing and statistical techniques 

used 

The software employed for spectra 

normalisation, building of the 

fingerprints and generation of the 

similarity matrixes was developed by 

the authors in C programming 

language. The Unscrambler 7.8 

(Camo Process AS, Oslo, Norway) 

was used for principal components 

analysis (PCA). The chemometric 

procedure consists of the following 

steps: 

PCA for visualisation of spectral 

outliers 

PCA was carried out for the detection 

of sample spectra that behave as


outliers and can badly influence the 

building of the fingerprints. Once the 

samples were in the new space 

defined by principal components, the 

leverage value —a measure of how 

far an object is as compared to the 

majority— and the residual variance 

were computed using The Unscrambler 

7.8 software. The outliers were 

examined in order to know if either 

they provided any useful information 

or they must be removed. 

Normalisation, building the fingerprints 

and generation of similarity 

matrixes 

Similarity matrixes were obtained 

from the sample spectra after data 

normalisation, building the fingerprints 

and calculation of the Tanimoto 

indexes for the samples. This 

procedure is based on the model 

described in section 2. 

Analysis of the similarity matrixes 

PCA and SIMCA were applied to the 

similarity matrixes for differentiation 

and classification of samples, respectively, 

according to different criteria. 

Results and discussion 

Sample spectra and spectral outliers 

The zone of the MIR spectra used was 

the 800-3000 cm -1 region; thus, the 

400-800 and 3000-4000 cm -1 zones 

were deleted from this study owing to 

irreproducibility because of the high 

absorbance values. The MIR absorption 

band corresponding to the 

fundamental strength vibration of the 

–OH group at 3300-3500 cm -1 is 

responsible for irreproducibility. The 

800-3000 cm -1 spectra are shown in 

Fig. 1-A. As can be seen in this 

figure, the spectra show a high degree 

of similarity, with the exception of the 

1150-1300 and 2300-2400 cm -1 zones. 

For this reason, the use of the model 

proposed is necessary for enlarging 

the dissimilarities in order to 

differentiate and classify samples of 

wines using spectroscopic data under 

various criteria. 

Analysis of spectral outliers 

was carried out to avoid the influence 

of anomalous spectra on building the 

fingerprints, and then, in the generation 

of the similarity matrixes. 

275


A 

B 

Fig. 1. (A) MIR 400-3000 cm -1 spectra for all the samples of wine. (B) Residual 

variance vs leverage plot for the PCA applied to all the sample spectra. 

Thus, PCA was applied to the spectra 

and the residual variance versus 

leverage, shown in Fig. 1-B, was 

obtained. Five samples behave as 

276 

outliers, taking into account both the 

leverage value for each sample and 

sample dispersion. On the other hand, 

one sample was also considered


outlier because of the high value of 

residual variance in spite of being 

close to the swarm of samples. 

Study of normalisation methods 

The normalised spectra of a given 

sample randomly selected and 

resulting from standard, logarithmic 

and tangential methods are shown in 

Fig. 2-A, 2-B and 2-C, respectively. 

The first two yield spectra without 

significant differences, and neither in 

the fingerprints built using several 

threshold values U (0.2, 0.4, 0.6 and 

0.8), also shown in Fig. 2. 

Nevertheless, the tangential 

method yields a normalised spectrum 

very different from those obtained by 

the first two methods. The tangential 

method only reduces the absorbance 

values owing to the use of the 

hyperbolic tangent and not the 

minimum and maximum values of 

each variable, as can be seen in Fig. 2- 

C. The threshold values U used for 

building the fingerprints were also 

reduced (0.1, 0.2, 0.3 and 0.4) in order 

to do not generate fingerprints with 

too many elements set to 0, that is, 

almost empty fingerprints. 

The results from the different 

normalisation methods are shown in a 

general way in Fig. 3, where both the 

frequency spectra for all the samples 

—a pattern of the entire sample 

population— and the fingerprints built 

with threshold values for the three 

methods can be seen. Very similar 

frequency spectra were obtained again 

for the standard and logarithmic 

methods, as can be seen in Figs. 2-A 

and 2-B. The tangential method yields 

only minimum and maximum frequencies 

for almost all the variables 

and, for this reason, the tangential 

method tends to equalise the 

differences between samples (see Fig. 

2-C). This result is just the opposite to 

that pursued; that is, to enlarge the 

differences between samples. Thus, 

the standard normalisation was used 

for subsequent studies. 

Threshold value for the generation of 

fingerprints. 

index 

Averaged Tanimoto 

The selection of the threshold value U 

is a key aspect in building the 

277


278 

A Standard normalisation 

0.2 

0.4 

0.6 

0.8 

B Logarithmic normalisation 

0.2 

0.4 

0.6 

0.8 

C Tangential normalisation 

Fig. 2. Normalised spectra corresponding to sample T166 resulting from the 

normalisation methods: (A) standard; (B) logarithmic; (C) tangential. 

Corresponding fingerprints built using the different threshold values U 

are added to each spectrum. 

0.2 

0.4 

0.6 

0.8


A Standard normalisation 

0.4 

0.6 

B Logarithmic normalisation 

0.4 

0.6 

C Tangential normalisation 

Fig. 3. Frequency spectra from all the samples, obtained using the different 

normalisation methods: (A) standard; (B) logarithmic; (C) tangential. 

Corresponding fingerprints built using the different threshold values U 

are added to each spectrum. 

0.1 

0.3 

279


fingerprint as it determines its density. 

The higher the threshold value, the 

lower density. Fingerprints with high 

density yield high similarity values, 

even for very different samples. Just 

the opposite occurs with low-density 

fingerprints. Thus, very low similarity 

values are obtained, even for very 

similar samples. 

The selection of the threshold 

value U depends on the distribution of 

the normalised frequencies in the 

sample set. For this reason, this 

selection was carried out in an empirical 

way. Medium-density finger 

prints with medium threshold values 

(0.4, 0.6) were obtained, and then, the 

best results. This can be seen in the 

different fingerprints shown in Figs. 2 

and 3. 

From the above commented 

results, a new way for calculating the 

similarity index was proposed, based 

on taking also into account the 

fingerprint bits set to 0. The Tanimoto 

index calculated by expressions (4) 

and (6) was modified in such way that 

bits set to 0 common in the 

fingerprints were considered and not 

bits set to 1. Finally, both the values 

280 

of the original Tanimoto index and the 

modified Tanimoto index were 

averaged, for the generation of the 

similarity matrixes. 

A comparison of the two 

ways of calculating the Tanimoto 

index is given in Fig. 4, where the 

score plots for the PCA applied to the 

similarity matrixes generated from 

fingerprints are summarised. Different 

threshold values U were used for 

building the fingerprints. Between the 

threshold values U —namely, 0.2, 0.4 

and 0.6; 0.8— the last yielded a high 

no-explained data variance and was 

not employed (the variance explained 

was no higher than 60% versus values 

higher than 80% for threshold values 

U of 0.2, 0.4 and 0.6. Data variances 

explained by the first two factors are 

in the low part of the score plots of 

Fig. 4). Trends are shown in Fig. 4. 

For example, samples of white wines 

(the first letter of the code is b) form a 

horizontal group in the above zone 

and samples of red wines (the first 

letter of the code is t) constitute other 

group placed from the down side to 

end of the group of white samples in 

Fig. 4-A. The best discrimination is


A, U=0.2 

B, U=0.2 

C, U=0.4 D, U=0.4 

Fig. 4. Score plots obtained applying PCA to the similarity matrixes obtained by: 

(A) U=0.2 and normal Tanimoto index; (B) U=0.2 and averaged Tanimoto 

index; (C) U=0.4 and normal Tanimoto index; (D) U=0.4 and averaged 

Tanimoto index. 

281


E, U=0.6 F, U=0.6 

Fig. 4. (cont.). (E) U=0.6 and normal Tanimoto index; (F) U=0.6 and averaged 

Tanimoto index. 

achieved for the threshold value 

U=0.4 and the averaged Tanimoto 

index, as can be seen in Fig. 4. 

Application of the model to the 

differentiation of wines 

The model proposed in this paper has 

been applied to the differentiation of 

types of wines. Thus, the origin and 

variety of the grape, and ageing 

process were the criteria used to 

distinguish between wines using MIR 

spectra. The results were compared 

with those obtained applying PCA to 

282 

spectra. For the reasons above 

commented, the similarity matrixes 

were built using a threshold value 

U=0.4 and the averaged Tanimoto 

index. First, and with the aim of 

checking the model, it was applied to 

something obvious: differentiation of 

red and white wines. 

Figure 5 shows the score plot 

applying PCA to spectra (the data 

variance explained by the first two 

factors was 82%). White wines are on 

the right part of the plot and red wines 

on the left. A clearer separation of the


Fig. 5. Score plot obtained applying PCA to MIR spectra. 

two types of wines is achieved 

applying PCA to the similarity matrix 

using a threshold value U=0.4. Thus, 

the sensitivity of the model to 

distinguish white and red wines is 

high, in spite of the initial similarity 

of the spectra and the low selectivity 

of the MIR instrument as compared 

with that of sophisticated instruments 

as mass spectrometers. 

Study of the influence of the pattern 

As commented in describing the 

model, a pattern is a fingerprint used 

for filtering (scaling) the calculation 

of similarity. The ultimate objective 

was to increase and decrease the 

similarity values for similar and 

different samples, respectively. Two 

factors influence the efficiency of the 

pattern for enlarging the differences, 

namely: a) the selection of the 

samples used for building the pattern 

fingerprint; and b) the frequency 

threshold value V employed in this 

process. 

Figure 6 shows both the frequency 

distribution of the normalised 

values and the corresponding 

pattern fingerprints generated from 

283


different threshold values V for all the 

samples, including both white and red 

wines, and also for white wines and 

red wines, separately. A significant 

variation in the frequency distribution 

for the variables (wavelengths in this 

case) depending on samples selected 

for generation of the pattern can be 

observed. This behaviour affects the 

fingerprints generated for a given 

frequency threshold value V, different 

for equal threshold values V, as can be 

observed in Fig. 6. Thus, the proposed 

method based on building fingerprints 

from spectral information permits to 

differentiate wines. 

In order to deal with more 

interesting problems, pattern finger 

prints of wines from different grape 

varieties and ageing processes were 

built. Frequency distributions and the 

fingerprints generated from these 

distributions are shown in Fig. 7. 

Thus, frequencies and fingerprints for 

both wines from varieties “Cencibel” 

and “Cabernet Sauvignon” —aged 

and non-aged wines— can be seen in 

this figure. Low threshold values V 

yield pattern fingerprints with high 

density that are not appropriate for 

284 

samples characterisation. Nevertheless, 

the fingerprints generated 

using a threshold value V=0.6 are 

appropriate for characterisation. 

The low number of samples 

employed for building pattern 

fingerprints yielded a high number of 

consecutive variables with similar 

values in the frequency distributions 

shown in Fig. 7. For this reason, 

fingerprints with high density were 

obtained. Thus, the higher the number 

of samples, the higher the quality of 

the pattern; and a large number of 

samples is mandatory in order to 

develop robust models. In spite of this 

shortcoming, differentiation of samples 

was achieved, as commented in 

the following section. 

On the other hand, parameter 

W of expression (7) also influenced 

the results. The values used were 5, 7, 

9, 11 and 13. Best differentiation was 

achieved with medium values 

(namely: W=9 and W=11). 

Differentiation of red wines according 

to grape variety and ageing process 

Reference data of grape varieties and 

ageing process were known for the


A White and red wines 

0.4 

0.6 

B White wines 

0.4 

0.6 

C Red wines 

Fig. 6. Frequency distributions and their fingerprints for building patterns using: 

(A) all the samples; (B) white wines; (C) red wines. 

0.4 

0.6 

285


A “Cabernet Sauvignon”, no aged wines 

0.4 0.4 

0.6 0.6 

C “Cencibel”, no aged wines 

0.4 0.4 

0.6 0.6 

286 

B “Cabernet Sauvignon”, aged wines 

D “Cencibel”, aged wines 

Fig. 7. Frequency distributions and their fingerprints for the building of patterns. 

major part of red wines. PCA was 

applied to the similarity matrixes 

obtained applying a threshold value 

V=0.6 (selected from differences in 

the fingerprints shown in Fig. 7) and 

using the averaged Tanimoto index. 

Figure 8 (A and B) shows the score 

plots resulting from PCA for spectra 

and similarity matrixes, respectively 

(the data variance explained was 

higher than 80%). Wines from the 

“Cencibel” variety were named as X


Application of PCA to MIR spectra 

A 

C 

B 

Application of PCA to similarity 

matrixes 

Fig. 8. Score plots resulting from PCA applied to: (A) spectra of non-aged and 

aged wines from “Cencibel” —X— and “Caubernet Sauvignon” —Y— 

grapes; (B) similarity matrix between samples of non-aged and aged 

wines from “Cencibel” grapes —X— and samples of non-aged and aged 

wines from “Caubernet Sauvignon” grapes —Y—; (C) spectra of nonaged 

—S— and aged —T— wines from “Cencibel” grapes; (D) 

similarity matrix between samples of non-aged —S— and aged —T— 

wines from “Cencibel” grape. 

D 

287


Application of PCA to MIR spectra 

E 

288 

F 

Application of PCA to similarity 

matrixes 

Fig. 8. (cont.). Score plots resulting from PCA applied to: (E) spectra of wines 

from “Quintanar de la Orden” —A— and “Fuente de Pedro Naharro” — 

B— grape origins; (F) similarity matrix between samples of wines from 

“Quintanar de la Orden” —A— and “Fuente de Pedro Naharro” —B— 

grape origins. 

and wines from “Cabernet Sauvignon” 

were named as Y. A clear 

grouping is obtained using the 

proposed model; that is, Y samples on 

the right and X samples on the left. 

This separation is not achieved in Fig. 

8-A. 

Similar results were obtained 

in differentiating wines according to 

the ageing process. Wines from 

“Cencibel” grape were employed. 

Aged wines were named T and nonaged 

wines S. The formers are 

grouped on the right and the latter on 

the left in Fig. 8-D, corresponding to 

the score plots from the similarity 

matrix. Figure 8-C, corresponding to 

PCA applied to spectra without 

processing, shows only slight trends. 

Differentiation of white wines 

according to grape origin 

The model was used for the 

differentiation of white wines 

according to grape origin; thus, wines 

from “Quintanar de la Orden” were


named as A and wines from “Fuente 

de Pedro Naharro” were named as B. 

These zones are within the appellation 

d’origine “La Mancha”. In this case a 

second order differentiation that 

requires a degree of discrimination 

higher than that for differentiation of 

wines from different appellation 

d’origine was obtained. 

PCA was applied to spectra 

without processing the proposed 

model and similarity matrix. Score 

plots for the classical method and the 

similarity calculation are shown in 

Figs. 8-E and 8-F, respectively (the 

data variance explained was higher 

than 80%). Better results are obtained 

using the model. Wines within the 

same origin show a grouping in Fig. 

8. This is not achieved using classical 

processing. 

SIMCA methods for prediction 

Classification of samples in their 

respective classes has been tested. 

SIMCA, which is a supervised pattern 

recognition method based on the 

development of PCA for given classes 

separately, has been used. Thus, 

SIMCA uses PCA for modelling 

classes of objects taken into account 

the different groups a priori. 

Table 1 shows the results 

obtained from the prediction of the 

SIMCA models. The classification 

error was calculated by the sum of 

false positive and false negative/number 

of samples ratio. Firstly, 

two classes for white and red wines 

were modelled using data from 

spectra and similarity matrixes. PCA 

and outliers removing were applied to 

each group. The training set was 

composed of 85 samples. For 

validation, a test set consisting of 30 

samples not used in the training step 

was used. The classification errors of 

samples were 20% and 10% for 

sample spectra and similarity 

matrixes, respectively. 

Classes for wines from different 

grape varieties and origins, in 

addition to the ageing process, were 

developed. In this case, test sets 

consisting of samples not employed in 

the training of any class were not used 

because of the low number of samples 

in which these parameters were 

known. Despite this, to use samples of 

a given class for validating other 

289


Table 1 Results from SIMCA prediction 

Classification 

criterion and data 

used 

Type of wine. Sample 

spectra 

Type of wine. 

Similitude matrixes 

290 

Type 

Number of 

samples 

Classified 

in class A 

White (A) 14 13 1 

Red (B) 16 3 15 

White (A) 14 14 1 

Red (B) 16 2 16 

Grape variety. 

Cencibel 

(A) 

23 23 2 

Similitude matrixes Caber.- 

Sauv. (B) 

32 3 32 

Ageing process. 

Non-aged 

(A) 

23 23 0 


Aged (B) 14 2 14 

Procedence. 


Quint. de la 

Orden (A) 

F. de Pedro 

Naharro (B) 

classes developed with their 

respective training sets can be 

considered as an independent test. The 

high data dispersion for samples of 

the same type obtained when PCA is 

applied to spectra without the 

proposed processing can be observed 

in Figs. 8-A, 8-C and 8-E. This fact 

made impossible to obtain correct 

classifications. The classes developed 

from the similarity matrixes yielded 

15 15 3 

21 5 21 

Classified 

in class B Error 

20% 

10% 

9% 

5% 

22% 

the values summarised in Table 1. 

The classification errors obtained 

were 9% and 5% for grape variety and 

ageing process criteria, respectively. 

The classification according to origin 

yielded errors of 22% owing to the 

high data dispersion. 

Conclusions 

A new chemoinformatic method 

based on both building fingerprints of


wines from MIR spectra and 

similarity calculation has been 

presented in this work. The use of this 

method permits to enlarge differences 

between very similar spectra. Thus, 

the similarity matrixes show samples 

groupings according to origin of 

grape, variety of grape, and ageing 

process. 

The empirical parameters 

used for building the fingerprints, that 

is, for digitalisation of MIR spectra, 

influence the differentiation results 

obtained. Pattern fingerprints generated 

for filtering the calculation of 

the Tanimoto index yield the best 

results, but the low number of 

samples used for building patterns of 

varieties, origins and ageing process 

also influence the quality of the 

pattern fingerprints. Thus, this study 

can be only considered as exploratory 

for testing the characteristics of the 

method proposed. 

The calculation of the 

Tanimoto index has also been modifyed 

in order to enlarge differentiation. 

An advantage of the model is 

the possibility of building a library of 

wine fingerprints, which would not 

require a high storage capacity due to 

the size of the fingerprints (1 kilobyte 

per wine). 





AGL2000-0321-P4-03). 


[1] Blanco M, Valdes D, Bayod 

MS, Fernández Mari F, 

Llorente I (2004) Anal Chim 

Acta 502:221-227 

[2] Murphy BT, MacKinnon SL, 

Yan XJ, Hammond GB, 

Vaisberg AJ, Neto CC (2003) 

J Agric Food Chem 51:3541- 

3545 

[3] Emmett MR (2003) J 

Chromatogr, A 1013:203-213 

[4] Cran MJ, Bigger SW (2003) 

Appl Spectrosc 57:928-932 

[5] Bruch MD, Fatunmbi HO 

(2003) J Chromatogr, A 

1021:61-70 

[6] Chen H, He MY, Pei J, He 

291


292 

HF (2003) Anal Chem 

75:6531-6535 

[7] Albanell E, Caja G, Duch X, 

Rovai M, Salama AAK, 

Casals R (2003) J AOAC Int 

86:746-752 

[8] Buening Pfaue H (2003) Food 

Chem 82:107-115 

[9] Trullols E, Ruisánchez I, Rius 

FX (2004) Trends Anal Chem 

23: 137-145 

[10] Pérez-Pavón JL, Del Nogal- 

Sánchez M, García-Pinto C, 

Fernández Laespada ME 

Moreno-Cordero B, Guerrero- 

Peña A (2003) Anal Chem 

75:2034-2041 

[11] Arvanitoyannis IS, Katsota 

MN, Psarra EP, Soufleros 

EH, Kallithraka S (1999) 

Trends Food Sci Techn 

10:321-336 

[12] Lasch P, Schmitt J, Beekes 

M, Udelhoven T, Eiden M, 

Fabian H, Petrich W, 

Naumann D (2003) Anal 

Chem 75:6673-6678 

[13] Esbensen KH (2002) 

Multivariate Data Analysis - 

in Practice. Camo Process 

AS, Oslo 

[14] Vandeginsten BGM, Massart 

DL, Buydens S, De Jong S, 

Lewi PJ, Smeyers-Verbeke J 

(1998) Handbook of Chemometrics 

and Qualimetrics: 

Parts A and B. Elsevier, 

Amsterdam 

[15] García-Jares CM, García- 

Martín MS, Marino N, 

Torrijos C (1995) J Sci Food 

Agric 69:175-184 

[16] La S, Cho JH, Kim JH, Kim 

KR (2003) Anal Chim Acta 

486:171-182 

[17] Varmuza K, Karlovits M, 

Demuth W (2003) Anal Chim 

Acta 490:313-324 

[18] Scsibrany H, Karlovits M, 

Demuth W, Mueller F, 

Varmuza K (2003) Chemom 

Intell Lab Syst 67:95-108 

[19] Mazzatorta P, Benfenati E 

(2002) J Chem Inf Comput 

Sci 42:1250-1255


[20] Willet P, Barnard JM, Downs 

G (1998) J Chem Inf Comput 

Sci 38: 983-996 

[21] Rouvray DH, Balaban AT 

(1979) Chemical Applications 

of Graph Theory. Applications 

of Graph Theory. 

Wilson RJ, Beineke 

LW(Eds.). AcademicPress 

293

Capítulo 9 

NEAR INFRARED REFLECTANCE 

SPECTROSCOPY AND 

MULTIVARIATE ANALYSIS IN 

ENOLOGY: DETERMINATION OR 

SCREENING OF FIFTEEN 

PARAMETERS IN DIFFERENT TYPES 

OF WINES 

El contenido de este capítulo ha sido publicado en la revista Analytica Chimica 

Acta, 527 (2004) 81-88.

Anal. Chim. Acta 527 (2004) 81 Parte II, cap. 9 

NEAR INFRARED REFLECTANCE SPECTROSCOPY AND 

MULTIVARIATE ANALYSIS IN ENOLOGY: DETERMINATION OR 

SCREENING OF FIFTEEN PARAMETERS IN DIFFERENT TYPES OF 

WINES 

Urbano-Cuadrado M., Luque de Castro M.D., Pérez-Juan P.M., García-Olmo J., 

Gómez-Nieto M.A. 

Abstract 

A study of the feasibility of near infrared reflectance spectroscopy 

(NIRS) for analytical monitoring in wineries is presented, in which equations for 

the determination or screening of the commonest enological parameters are 

proposed. The training and validation sets to develop NIR general equations were 

built with samples (180) from different apellation d’origine, different wine types, 

etc. By the calibration step (partial least squares regression and cross validation 

were used for multivariate calibration), major components such as ethanol, 

volumic mass, total acidity, pH, glycerol, colour, tonality and total polyphenol 

index are accurate determined by the proposed equations as compared with the 

reference data obtained by the official and standard methods —determination 

coefficients (R 2 ) were higher than 0.800 (and higher than 0.900 most times) and 

standard error cross validation (SECV) values were close to those of the 

reference methods—. The proposed method also offers screening capability for 

components such as volatile acidity (R 2 = 0.481), organic acids (R 2 = 0.432 for 

malic acid, R 2 = 0.544 for tartaric acid, R 2 = 0.541 for gluconic acid) —with the 

exception of the accurate determination of lactic acid (0.860 and 0.35 g l -1 for R 2 

and SECV, respectively)—, reducing sugars (R 2 = 0.705) and total sulphur 

dioxide (R 2 = 0.615). In equations validation, the correlation between the 

reference and NIRS methods was tested, and lope and bias values statistically not 

different from 1 and 0, respectively, were obtained for most parameters. 

Keywords: Near infrared spectroscopy, Wineries, Multivariate calibration. 

297



The main objective of an analytical 

chemist is the development of 

methods able to extract physical, 

chemical and biological information 

from a target system. In agrifood 

industries, as in other areas, the new 

methodologies should be as close as 

possible to both time and operational 

requirements, and aims of clients 

regarding to product characterisation. 

Multi-parameter approaches —that 

permit to deter mine more than one 

parameter in a single analysis— such 

as those based on chromatography, 

electrophoresis [1-3], or emission 

atomic spectrometry with multichannel 

detection [4,5], etc. have been 

widely employed. 

In this sense, the use of 

various spectral regions and 

chemometrics is often aimed at 

obtaining multiparametric information, 

as is the case with approaches 

based on Fourier trans form infrared 

[6,7]. Also Near Infrared Reflectance 

Spectroscopy (NIRS) is a very useful 

technique in this area as it enables 

298 

multiparametric determination with 

minimum or null sample pretreatment 

[8,9]. The efficiency of NIRS, of 

particular significance in food 

analysis, has been tested regarding to 

some official and standard methods in 

wineries and other spirit industries. 

Thus, key parameters in brewing 

processes —namely: ethanol and 

sugar (fructose, glucose or residual 

sugars)— can be determined by both 

off-line and on-line approaches [10- 

12]. The possibility of multidetermination 

compensates for both the 

time-consuming calibration step and 

the lower or similar accuracy of these 

methods as compared with the 

reference methods. The accuracy can 

be improved by both high 

reproducible reference data and 

homogeneous training and validation 

sets. The applicability of NIRS to 

wine parameters different from 

ethanol and sugars —namely, soluble 

solids, pH, colour and methanol— has 

also been studied [13]. 

A method for the determination 

of 15 enological parameters — 

alcoholic degree, volumic mass, total 

acidity, pH, volatile acidity, glycerol,


total polyphenol index, reducing 

sugars, lactic, malic, tartaric and 

gluconic acids, colour, tonality, total 

sulphur dioxide and free sulphur 

dioxide— by NIRS has been 

developed. The objective was to 

check the applicability of this 

technique for either determination or 

screening of key parameters in a 

variety of both apellation d’origine 

and types of wines (red, rosé and 

white wines). 


2.1. Samples and sample preparation 

Different wines —including red, rosé 

and white wines (98, 12 and 70 

samples, respectively); young and 

aged wines (the proportion was 70 

and 30 percent for young and aged 

wines, respectively); wines from 

different apellation d’origine (80 from 

“La Mancha”, 25 from “Valdepeñas”, 

17 from “Alicante”, 15 from 

“Jumilla”, 18 from “Navarra” and 25 

from “Madrid”) and grape varieties 

(27 from “Cencibel”, 30 from 

“Cabernet Sauvignon”, 15 from 

“Cencibel-Cabernet Sauvig non”, 20 

from “Merlot”, 15 from “Garnacha”, 

30 from “Bobal” and 43 from 

“Arien”)— were used in the present 

study. Thus, the number of samples 

employed in the calibration and 

validation steps was 180 for all the 

parameters, but for organic acids (155 

samples), colour (98 samples) and 

tonality (98 samples). The last two 

parameters were obtained only for red 

wines—. These samples were used as 

such, because filtering, dilution, 

interferents removal, etc. were not 

required. 

2.2. Apparatus and methods 

The instrument employed for spectra 

collection was a Foss-NIRSystems 

6500 System II spectrophotometer 

(Foss-NIRSystems Inc., Silver Spring, 

MD, USA) equipped with a transport 

module. The samples were analysed 

by folded transmission using a ring 

cup with a 0.1 mm pathlength. A 

diffuse reflecting gold surface placed 

at the bottom of the cup reflected the 

radiation back through the sample to 

the reflectance detector. The spectra 

299


were collected using WinISI software 

1.50 (Infrasoft International, Port 

Matilda, PA, USA). Before recording 

the spectra, the samples were 

thermostated at 24ºC. The reflectance 

(log1/R) spectra were collected in 

duplicate. 

On the other hand, samples 

were analysed in duplicate by the 

reference methods shown in Table 1 

and standard error laboratory (SEL) 

was estimated from these duplicates 

using the following equation: 

2 

( yi1 

− yi2) 

i 1 

SEL = (1) 

n 

where n is the number of samples and 

yi1 and yi2 are values obtained for the 

replicates 1 and 2, respectively, of 

sample i. 

300 

n 

∑ 

= 

2.3. Chemometric software used for 

data processing and statistical 

techniques used 

WinISI software 1.50 (Infrasoft 

International, Port Matilda, PA, USA) 

was also used for data processing. The 

chemometric procedure consists of the 

following steps: 

a) Root Mean Square (RMS) calculus 

The RMS [14] was used for the study 

of similarity between spectra 

corresponding to aliquots of the same 

sample. The following equation was 

used: 

n 

∑ 

= 

6 

RMS( 

j) 

= 10 × 

i 1 

( yij 

− yi) 

n 

(2) 

where n is the number of 

wavelengths, y ij is the log(1/R) for 

the sub-sample j at λi and y i is the 

log(1/R) for the averaged spectrum of 

a sample at λi. The factor 10 6 is 

introduced in the calculation of RMS 

for avoiding to work with too low 

values. As two spectra were collected 

per sample, their RMS values were 

equal. Thus, a unique RMS value was 

considered, then compared with the 

RMScut off, which was calculated from 

the individual RMS values of the set 

of samples using the MEAN and STD 

parameters and equations (3) and (4), 

where N and n are the number of 

samples and the number of 

wavelengths, respectively. 

Taking into account equations 

2


Table 1. Enological parameters and reference methods 

Parameter Reference method 

Alcoholic degree Distillation and aerometry 

Volumic mass Aerometry 

Total acidity Titration with NaOH up to pH = 7.0 

pH Potentiometry 

Volatile acidity 

Distillation, vapour dragging and titration with 

NaOH 

Glycerol Enzymatic reaction 

Total 

index 

polyphenol 

Folin-Ciocalteu reagent in alkaline medium 

Reducing sugars Reduction of Cu +2 in boiling alkaline medium 

Lactic acid High pressure liquid chromatography 

Malic acid High pressure liquid chromatography 

Tartaric acid High pressure liquid chromatography 

Gluconic acid Enzymatic kits 

Colour Absorbance sum at 420, 520 and 620 nm 

Tonality Absorbance ratio at 420 nm and 520 nm 

Total sulphur dioxide 

Hydrolysis with NaOH and Iodometry in acid 

medium 

Free sulphur dioxide Iodometry in acid medium 

MEAN = 

STD = 

N 

∑ 

j= 

1 

2 ( RMS ) ( yij 

− yi) 

N 

n N 

j ∑∑ 

i= 

1 j= 

1 

= 

nN 

N 

n N 

2 

∑ ( RMSj) 

∑∑( 

yij 

− yi) 

j= 

1 

N −1 

(3) and (4), the relationship between 

the values of MEAN and STD is 

defined by the following equation: 

N 

STD = × MEAN 

N −1 

(5) 

= 

i= 

1 j= 

1 

n 

( N −1) 

2 

2 

(3) 

(4) 

Equation (5) can be transformed into 

equation (6) as 2 replicates per sample 

were obtained in this work: 

STD = 2 × MEAN (6) 

301


The expression for calculating 

STD is a variance of the error that 

302 

STD 

lim it = . 036× 

k m 

2 

∑ STDk 

k 1 

= 1. 

036× 

= 

= 

where m is the number of samples. 

Finally, RMScut off is obtained 

using STDlimit and equation (5). For 

samples with RMS lower than RMScut 

off an average of the two spectra was 

obtained; for samples with RMS 

higher than RMScut off a third spectrum 

was obtained, and the two more 

similar were used for recalculation of 

the RMS. In this way, RMS values 

lower than the RMScut off were also 

obtained in all instances. 

b) Principal Components Analysis 

(PCA) for visualisation of spectral 

outliers 

PCA was required for the reduction of 

the number of variables showing colinearity, 

thus representing the 

samples in a new, reduced pdimensional 

space (p3.0 were considered outliers. 

Because of the existence of clusters of 

samples PCA and H were computed 

for each cluster. These outliers were 

examined in order to know if either 

they provided any useful information 

or they must be removed. 

c) Selection of the calibration and 

validation sets 

Once the outliers without useful information 

had been removed, the 

calibration and validation sets were 

defined. Both sets were independent; 

thus, the validation set was only used 

for testing the equations. The percent 

of samples in each set were 85% and 

15% for the training and validation 

sets, respectively. The selection of the 

validation set was carried out by


calculating both PCA and subsequent 

H distance. The criterion used was 

samples more separated between them 

with the highest number of neighbours 

(H


304 

Table 2. Reference data 

Parameter 

Alcoholic 

degree (%v/v) 

Volumic mass 

(kg l -1 ) 

Total acidity 

(meq l -1 ) 

Calibration set Validation set 

N Range Mean 

Stand. 

Dev. 

N Range Mean 

Stand. 

Dev 

150 9.58 – 15.15 12.14 1.24 25 10.13 – 14.96 12.26 1.37 0.19 

150 989.5 – 999.3 992.9 2.1 25 990.4 – 999.4 993.5 2.4 0.4 

150 3.55 – 8.72 5.42 0.92 25 4.15 – 8.69 5.70 1.09 0.35 

pH (pH units) 150 3.26 – 4.04 3.65 0.15 25 3.30 – 4.03 3.63 0.17 0.02 


(g l -1 ) 

150 0.14 - 0.87 0.42 0.15 25 0.19 – 0.82 0.45 0.20 0.08 

Glycerol (g l -1 ) 150 1.95 - 12.38 6.29 2.47 25 2.57 – 14.56 6.74 2.88 0.43 

Total 


index 

Reducing sugars 

(g l -1 ) 

Lactic acid 

(g l -1 ) 

Malic acid 

(g l -1 ) 

Tartaric acid 

(g l -1 ) 

Gluconic acid 

(g l -1 ) 

Colour. Only 

red wines 

Tonality. Only 

red wines 

Total sulphur 

dioxide (mg l -1 ) 

Free sulphur 


150 5.0-131.0 35.3 25.4 25 6.0 – 92.0 36.6 25.8 3.2 

150 0.65 – 9.78 2.19 1.24 25 0.85 – 14.3 2.86 3.39 0.12 

130 0.06 – 5.32 1.36 1.10 20 0.22 – 5.09 1.33 1.09 0.22 

130 0.03 – 1.83 0.77 0.49 20 0.19 – 1.79 0.80 0.50 0.10 

130 1.54 – 4.64 2.59 0.44 20 1.76 – 4.20 2.63 0.55 0.25 

130 0.06 – 1.80 0.63 0.48 20 0.06 – 1.85 0.73 0.65 0.16 

80 3.80 – 21.40 10.59 3.77 14 8.10 -16.10 11.25 2.77 1.48 

80 0.440 – 0.950 0.627 0.120 14 0.430 – 0.840 0.602 0.104 0.045 

150 16.0 – 149.0 59.9 35.4 25 19.0 – 204.0 69.3 43.0 13.3 

150 8.0 – 24.0 16.45 4.7 25 8.0 – 59.0 20.9 10.8 1.4 

the table, in addition to standard error 

laboratory (SEL). As can be seen, the 

range of reference values encompass- 

SEL 

ses the characteristic values for a high 

diversity of wine. An exception is 

reducing sugars, whose values were


only within the interval for dry and 

semi-dry wines but not for sweet 

wines. 

3.2. Spectral similarity 

After calculating individual RMS 

values for each sample, the RMScut off 

value obtained was 3200. Figure 1 

shows the evolution of this statistic 

parameter versus the sample number 

(no sample identifier). Four samples 

had an RMS value higher than the 

limit (samples that did not fulfill the 

spectral similarity control); thus, a 

third spectrum was collected for each 

outlier and the RMS values were 

recalculated using the two closer 

spectra and the remaining spectrum 

was deleted. This means that a 

spectrum of the three collected per 

sample was anomalous owing to 

operational errors. In this way the four 

samples had an RMS value lower than 

the upper limit. 

3.3. Spectral outliers 

After studying and controlling the 

spectral similarity within samples, 

similarity between samples was 

studied with the aim of detecting 

spectral outliers regarding sample 

population. For this, the averaged 

spectrum per sample was considered 

(see Fig. 2). Significant differences 

between spectra in the regions 400- 

1000 nm and 1800-2000 nm can be 

observed. The figure also shows that 

one spectrum differs significantly 

from the rest of spectra. 

PCA was applied to the 

spectra. The samples plotted in the tridimensional 

space formed by the first 

three principal components are shown 

in Fig. 3. Two groups can be 

distinguished: white wine samples are 

placed in a down plane and red wine 

samples are in a top swarm. Rosé 

wines (a small group) are located 

between the other two groups, next to 

the plane corresponding to white 

wines. The spectrum far from the rest 

belongs to sample T075 in Fig. 2. 

Considering the above commented, 

PCA and H Mahalanobis 

distance were computed for each of 

the two clear clusters in Fig. 3 — 

white and rosé wines were considered 

jointly—. Three spectra of red wines 

and one spectrum of white wine 

305


306 

RMS values 

12000 

10000 

8000 

6000 

4000 

3200 

2000 

0 

B094 

T112 

0 30 60 90 

Samples 

120 150 180 

Fig. 1. RMS values versus sample number. 

Fig. 2. NIR spectra. 

T121 

T139


Fig. 3. Samples in the space determined by the first three components. 

behaved as outliers. 

Ten and twelve principal 

components were used for H distance 

calculation. The criterion to fix the 

number of components was to obtain 

an increment of explained variance 

lower than 0.25%. On the other hand, 

the sum of explained variance for 

each model was close to 100%. 

3.4. Equations development 

3.4.1. Influence of spectra preprocessing 

The results obtained, based on the 

statistic parameters described below, 

were similar independently of the 

mathematical preprocessing employed. 

3.4.2. Equations calibration 

The cross validation procedure was 

used for equations calibration. The 

minimum value of SECV determined 

the number of PLS factors in each 

equation, thus avoiding overfitting 

problems. The values of R 2 and SECV 

indicated the precision achieved in 

calibration. The analytical quality of 

the equations will be studied in the 

subsequent step of validation. The 

criteria proposed by Shenk et al. 

based on the values of R 2 and SECV 

[15] were employed in this section. 

307


Thus, R 2 values higher than 0.90 

indicate excellent precision, as well as 

SECV values lower than 1.5*SEL. R 2 

values between 0.70-0.90 mean good 

precision, as do the SECV values 

between 2-3*SEL. On the other hand, 

R 2 values lower than 0.70 indicate that 

the equation can only be used for 

screening purposes, which enable 

distinction between low, medium and 

high values for the measured 

parameter. If the R 2 value is lower 

than 0.50, the equation only discriminates 

high and low values. 

Table 3 shows the results 

obtained in equations calibration. 

Thus, the number of samples used 

after outliers removal (using the 

Student test), number of PLS factors, 

mean, minimum, maximum, SECV 

and R 2 are summarised in this table. 

The best results were achieved for the 

determination of alcoholic degree and 

volumic mass, being the SECV values 

very close to those of the standard 

methods —namely 0.19 %V/V and 

0.33 kg l -1 for the determination of 

alcoholic degree and volumic mass, 

respectively—. The R 2 values for the 

correlation between the reference and 

308 

NIRS methods were 0.986 and 0.980 

for the determination of these two 

parameters. 

There are three parameters in 

wine related to acidity: total acidity, 

pH, and volatile acidity. The first two 

were determined with good precision 

(the SECV values were 0.38 meq l -1 

and 0.05 pH units, respectively). The 

model for the volatile acidity can be 

used only as screening methodology 

to distinguish between low and high 

values, according to the R 2 value 

(0.481). The SECV value was 0.09 g 

l -1 . 

Glycerol and polyphenol total 

index (t.p.i) showed both good R 2 

values (0.936 and 0.975, respectively) 

and SECV values (0.59 g l -1 and 4.50, 

respectively). Thus, a good precision 

was also achieved. 

The results for reducing 

sugars (R 2 and SECV were 0.705 and 

0.27 g l -1 , respectively) were in between 

the values corresponding to the 

applicability into either determination 

or screening. 

The applicability of NIRS to 

organic acids —lactic, malic, tartaric 

and gluconic— was limited to


Table 3. Mean, minimum, maximum, SECV and R 2 in the calibration step 

PARAMETER N 

PLS 

factors 

MEAN MIN MAX SECV R2 

Alcoholic degree (%v/v) 140 7 12.07 9.69 15.18 0.19 0.986 

Volumic mass (kg l -1 ) 141 14 992.8 989.5 997.1 0.33 0.980 

Total acidity (meq l -1 ) 139 14 5.41 3.77 8.69 0.38 0.845 

pH (pH units) 143 10 3.64 3.28 4.03 0.05 0.905 

Volatile acidity (g l -1 ) 143 10 0.40 0.17 0.82 0.09 0.481 

Glycerol (g l -1 ) 139 12 6.27 2.43 12.51 0.59 0.936 

Total polyphenol index 138 11 32.31 6.17 83.21 4.50 0.975 

Reducing sugars (g l -1 ) 140 12 1.95 0.81 3.07 0.27 0.705 

Lactic acid (g l -1 ) 125 12 1.38 0.11 5.41 0.35 0.860 

Malic acid (g l -1 ) 128 7 0.73 0.08 1.72 0.34 0.452 

Tartaric acid (g l -1 ) 126 8 2.59 1.32 3.54 0.23 0.544 

Gluconic acid (g l -1 ) 128 8 0.64 0.04 1.75 0.31 0.541 

Colour. Only red wines 75 7 10.29 4.18 22.8 1.52 0.820 

Tonality. Only red wines 74 7 0.60 0.42 0.95 0.05 0.781 

Total sulphur dioxide (mg l -1 ) 141 12 53.2 17.0 138.2 21.5 0.615 

screening methodologies due to the 

low concentration of these compounds; 

except for lactic acid, with 

values of 0.860 and 0.35 g l -1 values 

for R 2 and SECV, respectively. Errors 

involved in the cross validation for 

malic, tartaric and gluconic acids were 

high (SECV values of 0.34, 0.23 and 

0.31 g l -1 for malic, tartaric and 

gluconic acids, respectively). 

With respect to colour and 

tonality parameters, the results obtained 

were acceptable. The standard 

methods for these parameters are 

309


based on absorbance measurements at 

established wavelengths in the visible 

range. The spectral region was 400- 

2500 nm; thus, the visible spectrum 

was also taken into account and, for 

this reason, good correlation between 

the reference and NIRS methods was 

achieved. The wavelength region 

longer than UV-VIS added to the 

spectrum a specific noise, which 

affected to R 2 and SECV values 

(0.820 and 1.52 for colour, and 0.781 

and 0.049 for tonality). 

Sulphur dioxide present in 

wine is divided in two fractions: the 

free fraction and the combined 

fraction or sulphur dioxide bonded to 

diverse organic components. The 

model was not appropriate for the 

determination of the free fraction as 

the sensitivity of NIRS is not high 

enough; so the values of this 

parameter are not shown in Table 3, 

which lists that the model yielded 

acceptable statistic values (R 2 and 

SECV were 0.615 and 21.5 mg l -1 , 

respectively) to distinguish low, 

medium and high values of total 

sulphur dioxide —that is, the sum of 

free and bonded sulphur dioxide—. 

310 

These results can be explained by 

both the highest concentration of 

combined sulphur dioxide and the 

suitability of NIRS for organic 

compounds. 

3.4.3. - Equations validation 

The equations were tested with the 

validation set consisting of samples 

not used for calibration. Table 4 

shows the number of samples used 

after removing outliers (from Student 

test), mean, minimum, maximum, 

SEP, r 2 , slope and bias. These statistic 

parameters were used for evaluating 

the analytical quality of the equations. 

The values of slope and bias 

parameters were useful for distinguishing 

systematic errors and 

studying the correlation between the 

reference and NIRS methods. Slope 

and bias values were evaluated for 

testing if they are statistically equal to 

1 and 0, respectively. With this aim, 

the criteria proposed by the OIV [17] 

were used at a significance level of 

0.5 %. The range of non-significance 

is also shown in the slope and bias 

columns. Only the determination of 

volatile acidity (bias), and tartaric and


Table 4. Correlation between the reference and NIR methods in the validation 

step 

Parameter N MEAN MIN MAX SEP R 2 SLOPE BIAS 

Alcoholic 

degree (%v/v) 

Volumic mass 

(kg l -1 ) 

Total acidity 

(meq l -1 ) 

24 12.27 10.08 15.36 0.24 0.978 

24 993.3 989.5 998.5 0.54 0.917 

24 5.68 4.12 8.65 0.48 0.812 

pH (pH units) 24 3.62 3.22 3.91 0.07 0.819 


(g l -1 ) 

24 0.44 0.25 0.72 0.14 0.345 

Glycerol (g l -1 ) 24 6.31 2.89 10.67 0.72 0.845 

Total 


index 

Reducing 

sugars (g l -1 ) 

Lactic acid (g l - 

1 ) 

Malic acid (g l - 

1 ) 

Tartaric acid (g 

l -1 ) 

Gluconic acid 

(g l -1 ) 

Colour. Only 

red wines 

Tonality. Only 

red wines 

Total sulphur 


24 32.14 5.54 67.14 6.70 0.919 

23 1.86 1.37 2.65 0.33 0.712 

19 1.40 0.03 2.87 0.41 0.814 

19 0.73 0.33 1.17 0.36 0.441 

19 2.55 2.19 3.61 0.39 0.428 

19 0.72 0.07 1.82 0.38 0.498 

14 10.57 7.90 13.71 1.83 0.705 

14 0.65 0.45 0.83 0.06 0.729 

23 63.16 23.24 112.11 23.5 0.569 

gluconic acids (slope) yielded values 

out of the limits. These values are 

bolded in Table 4. 

Although almost all slope and 

bias values were within the nonsignificance 

range, this was wider for 

0.971 

(0.969 – 1.031) 

1.001 

(0.973 – 1.027) 

0.986 

(0.789 – 1.211) 

0.989 

(0.916 – 1.084) 

0.730 

(0.582 – 1.418) 

0.871 

(0.710 – 1.290) 

0.918 

(0.894 – 1.106) 

0.983 

(0.929 – 1.071) 

0.941 

(0.847 – 1.143) 

0.910 

(0.682 – 1.318) 

0.675 

(0.754 – 1.246) 

0.701 

(0.809 – 1.191) 

0.807 

(0.725 – 1.275) 

0.95 

(0.925 – 1.075) 

0.845 

(0.698 – 1.302) 

0.04 

(-0.10 – 0.10) 

-0.09 

(-0.30 – 0.30) 

0.09 

(-0.21 – 0.21) 

-0.02 

(-0.03 – 0.03) 

0.12 

(-0.08 – 0.08) 

-0.29 

(-0.32 – 0.32) 

-2.15 

(-2.83 – 2.83) 

0.12 

(-0.14 – 0.14) 

-0.05 

(-0.20 – 0.20) 

0.03 

(-0.16 – 0.16) 

-0.09 

(-0.17 – 0.17) 

0.13 

(-0.15-0.15) 

-0.91 

(-1.07 – 1.07) 

0.06 

(-0.08 – 0.08) 

1.62 

(-8.24 – 8.24) 

parameters with r 2 and SEP values 

which only enable screening (namely, 

volatile acidity, tartaric and gluconic 

acids and total sulphur dioxide). The 

slopes for correlation were always 

lower than 1, but for volumic mass. 

311


This means that the NIRS values are 

systematically higher than those 

obtained by the reference methods, 

taking into account that the NIRS 

values correspond to abcisas axis in 

correlation plots. 

On the other hand, almost all 

the SEP values in the external 

validation were within the limit value 

—SEP = 1.5*SECV—. Only the 

validation of volumic mass and 

tartaric acid yielded SEP values 

slightly upper the limit; thus, the 

equations developed were robust. 

3.4.4. Comparison with the results 

obtained by other authors 

The determination of alcoholic 

degree, volumic mass, pH and 

glycerol yielded R 2 , SECV and SEP 

values close to the values in the 

literature [11-13]. The error obtained 

in the determination of sugars and 

colour was higher [11,13], but a 


and types of wines were used for 

establishing the equations with the 

aim of obtaining a general approach. 

The determination of organic acids, 

312 

volatile acidity, total acidity, t.p.i. and 

total sulphur dioxide had not been 

reported previously. 


The applicability of NIRS to the 

evaluation of 16 enological parameters 

in wine has been studied in this 

work. The results have been compared 

with the values obtained by other 

authors, when available and quite 

similar values were obtained in spite 

of the fact the calibration and 

validation sets were more heterogeneous 

than those involved previous 

approaches [11-13]. Thus, the 

calibration and validation of the 

equations were carried out with a 


and types of wines. The final 

equations were developed for the 

determination of 15 parameters. 

Thus, the most remarkable 

aspects of this work are the evaluation 

of the applicability of NIRS to the 

quantitative analysis in a wide variety 

of wines and the high number of 

enological parameters which can be 

determined.


Acknowledgement 




AGL2000-0321-P4-03). 


[1] Q.J. Wang, H. Yu, H. Li, F. Ding, 

P.G. He, Y.Z. Fang, Food Chem. 

83 (2003) 311. 

[2] M. Bonoli, M. Montanucci, T.G. 

Toschi, G. Lercker, J. Chromatogr. 

A 1011 (2003) 163. 

[4] G. Álvarez Llamas, M.D.R. 

Fernández de la Campa, A. Sanz 

Medel, J. Anal. Atom. Spectrom. 

18 (2003) 460. 

[5] F.M. Pennebaker, M.B. Denton, 

Appl. Spectrosc. 55 (2001) 504. 

[6] A. K. Kupina, A. Shrikhande, J. 

Amer. J. Enol. Viticult. 54 (2003) 

131. 

[7] M. Dubernet, M. Dubernet, Revue 

Française d’Oenologie, 81 (2000) 

10. 

[8] E. Albanell, G. Caja, X. Such, M. 

Rovai, A.A.K. Salama, R. Casals, 

J. AOAC Internat. 86 (2003) 746. 

[9] F.E. Barton, D.E. Akin, W.H. 

Morrison, A. Ulrich, D.D. Archibald, 

J. Agric Food Chem. 50 

(2002) 7576. 

[10] Y. Li, C. Brown, J. Near 

Infrared Spectrosc. 7 (1999) 

101. 

[11] C.M. García-Jares, B. Medina, 

Fresen. J. Anal. Chem. 357 

(1997) 86. 

[12] R. Eberl, J. Near Infrared 

Spectrosc. 6 (1998) 133. 

[13] M. Gishen, R.G. Dambergs, A. 

Kambouris, M. Kwiatkowski, 

W.U. Cynkar, P.B. Høj, I.L. 

Francis, Proceedings of the 9 th 

International Conference on 

Near Infrared Spectroscopy 

1999, 917. 

[14] J.S. Shenk, M.O. Westerhaus, 

Routine Operation, Calibration, 

Development, and Network 

System Management Manual, 

NIRSystems Inc., Silver Spring, 

MD, USA, 1995. 

313


[15] J.S. Shenk, M.O. Westerhaus, 

Calibration the ISI way. In Near 

Infrared Spectroscopy: the 

Future Waves, NIR Publications, 

p.198-202, Chichester, 

1996 

[16] B.G.M. Vandeginsten, D.L. 

Massart, S. Buydens, S. De 

Jong, P.J. Lewi, J. Smeyers- 

Verbeke, Handbook of Chemometrics 

and Qualimetrics: Part 

B, Elsevier, Amsterdam, 1998. 

[17] Resolution OENO 6/99. 

Validation Protocol for a 

Typical Analytical Method 

Compared to the OIV Reference 

Method. Office International de 

la Vigne te du Vin. 

http://www.oiv.int/Database/Im 

ages/Client/oeno699uk.doc 

314

Capítulo 10 

COMPARISON AND JOINT USE OF 

NEAR INFRARED SPECTROSCOPY 

AND FOURIER TRANSFORM MID 

INFRARED SPECTROSCOPY FOR 

THE DETERMINATION OF WINE 

PARAMETERS 

El contenido de este capítulo ha sido aceptado para su publicación en la revista 

Talanta.

Talanta. XX (2005) XX Parte II, cap.10 

COMPARISON AND JOINT USE OF NEAR INFRARED 

SPECTROSCOPY AND FOURIER TRANSFORM MID INFRARED 

SPECTROSCOPY FOR THE DETERMINATION OF WINE 

PARAMETERS 

M. Urbano Cuadrado, M. D. Luque de Castro, P. M. Pérez Juan, M. A. Gómez- 

Nieto 

Abstract 

A study of the statistic characteristics of the multidetermination of 

several enological parameters —namely, alcoholic degree, volumic mass, total 

acidity, glycerol, total polyphenol index, lactic acid and total sulphur dioxide—, 

depending on the spectroscopic zone employed, was carried out. The two 

techniques used were Near Infrared Spectroscopy (NIRS) and Fourier Transform 

Mid Infrared Spectroscopy (FT-MIRS). The combination of these two regions 

(sum of their spectra) was also studied. NIRS yielded better results, but the use of 

both zones improved the determination of glycerol and total sulphur dioxide. The 

training and validation sets used for developing general equations were built with 

samples from different apellation d’origine, different wine types, etc. Partial least 

squares regression was used for multivariate calibration, using systematic cross 

validation in the calibration stage and external validation in the testing stage. 

Sample preparation was not required. 

Keywords: Near infrared spectroscopy; Attenuated total reflection mid infrared; 

Wineries; Multivariate calibration. 

317



One of the last trends in analytical 

chemistry is aimed at shortening the 

time required for a given analysis and 

subsequent availability of the pursued 

information in a short time. This goal 

is so far achieved by different ways; 

namely, the reduction and simplification 

of the steps involved in sample 

preparation [1,2], the recent advances 

in automation and instrumentation [3- 

5], the development of qualitative and 

screening methodologies [6,7], the 

higher use of chemometrics thanks to 

advances in computers [8,9], etc. 

In the food area, and 

specifically in the wine industry, the 

above commented ways have also 

been used in order to obtain fast 

information about the production 

process. Several methods based on 

Flow Injection (FI) have been developped 

in order to automate and 

reduce the complexity of some 

official methods in wine laboratories. 

Thus, parameters as ethanol, glycerol, 

total and free sulphur dioxide, etc [10- 

11]. can be determined by FI methods. 

318 

Chemometrics in wine pro- 

duction and laboratory is widely 

referenced in the literature from 

several points of view. Thus, supervised 

and non-supervised pattern 

recognition techniques have been used 

to distinguish different varieties, 

geographical areas, elaboration processses, 

etc. Variables used for 

differentiating can be classified into 

non-spectroscopic and spectroscopic 

variables[12-14]. Concerning the latter, 

the spectra obtained, generally in 

few seconds, by spectroscopic 

techniques —namely, Near Infrared 

Reflectance Spectroscopy (NIRS), 

Fourier Transform Mid Infrared 

Spectroscopy (FT-MIRS), Ultraviolet- 

Visible Spectroscopy (UV-VISS), 

Nuclear Magnetic Resonance (NMR), 

etc.— constitute large data sets which 

contain hidden information. Multivariate 

regression, such as Principal 

Component Regression (PCR) and 

Partial Least Squares Regression 

(PLSR) are standards in chemometrics, 

which has been used for 

developing equations for the 

determination of quantitative parameters 

in wine and other food industries 

using the data provided by the


spectroscopic techniques above cited. 

Approaches based on NIRS are the 

most used for the determination of 

different analytes; namely, ethanol, 

glycerol, sugars, [15,16] etc. FT-IR 

applications for wine analyses have 

also been developed [17,18]. 

Here, the applicability of both 

NIRS and FT-MIRS techniques to the 

determination of several enological 

parameters —alcoholic degree, volumic 

mass, total acidity, glycerol, 

total polyphenol index, lactic acid and 

total sulphur dioxide— is studied in 

wines from both several appellations 

d’origine and different grape varieties. 

The aim of this study is the 

development of equations independent 

of the type of wine. In 

addition, the joint use of spectral 

regions of shorter wavelength for 

improving the statistic parameters of 

the calibrations is discussed for the 

first time. 


2.1. Samples and sample preparation 

Different wines —including red, rosé 

and white wines; young and aged 

wines; wines from different apellation 

d’origine (“La Mancha”, “Valdepeñas”, 

“Jumilla”, “Navarra”, 

“Alicante” and “Madrid”) and grape 

varieties (“Cencibel”, “Cabernet Sau 

vignon”, “Cencibel-Cabernet Sauvig 

non”, “Merlot”, “Garnacha Tintorera” 

and “Syrah”)— were used in the 

present study. Thus, the number of 

samples employed in the calibration 

and validation steps was 180. The 

samples were used as such, because 

filtering, dilution, preconcentration, 

interferents removal, etc. were not 

required. 

2.2. Apparatus and methods 

The instrument employed for NIR 

spectra collection was a Foss-NIR 

Systems 6500 System II spectrophotometer 

(Foss-NIRSystems Inc., 

Silver Spring, MD, USA) equipped 

with a transport module and capable 

of making measurements at 2 nm 

resolution in the spectral range 

covering 400-2500 nm. The samples 

were analysed by folded transmission 

using a ring cup with a 0.1 mm 

pathlength. A diffuse reflecting gold 

surface placed at the bottom of the 

319


cup reflected the radiation back 

through the sample to the reflectance 

detector. The spectra were collected 

using WinISI software 1.50 (Infrasoft 

International, Port Matilda, PA, 

USA). Before recording the spectra, 

the samples were thermostated at 

24ºC. The reflectance spectra (log1/R) 

were collected in duplicate. 

The instrument employed for 

MIR spectra collection was an FT- 

MIR Nicolet Magna-IR550 Serie II 

(Nicolet Instrument Corp., Madison, 

Wisconsin, USA), capable of making 

measurements at 4 cm -1 resolution in 

the spectral range covering 4000-400 

cm -1 . The instrument was furnished 

with an infrared attenuated total 

reflection (ATR) solid, liquid and 

mellow sample cell with a zinc 

selenide crystal (Spectra Tech., 

Stamford, CT, USA) for Nicolet 

Spectrometers. The ATR cell consists 

of the following: base accessory unit 

containing the mirrors to direct the IR 

beam to and from the sampling plate; 

a sampling plate containing the ZnSe 

ATR crystal and; precision single or 

dual readout controller. 

Other characteristics were a 

320 

transmission range between 20000 

and 650 cm -1 , a refractive index (at 

1000 cm -1 ) of 2.4, density of 5,27 g 

cm -3 and volume of 3.5 ml. Before 

recording the spectra, the samples 

were thermostated at 24ºC. The 

reflectance spectra (log1/R) were 

collected in duplicate. 

On the other hand, the 

samples were also analysed in 

duplicate by the reference methods, 

and standard error laboratory (SEL) 

was estimated from the duplicates. 

The reference methods used are 

shown in Table 1. 

2.3. Chemometric software for data 

processing and statistical techniques 

used 

The Unscrambler 7.8 (Camo Process 

AS, Oslo, Norway) was used for data 

processing. The chemometric procedure 

consists of the following steps: 

2.3.1. Principal Components Analysis 

(PCA) for visualisation of spectral 

outliers 

PCA [19] was required for the 

reduction of the number of variables 

showing co-linearity, thus repre-


Table 1. Enological parameters and reference methods. 

Parameter Reference method 

Alcoholic degree Distillation and aerometry 

Volumic mass Aerometry 

Total acidity Titration with NaOH up to pH = 7.0 

Glycerol Enzymatic reaction 

Total polyphenol index Folin-Ciocalteu reagent in alkaline medium 

Lactic acid High pressure liquid chromatography 

Free sulphur dioxide Iodometry in acid medium 

senting the samples in a new, reduced 

n-dimensional space. Once the 

samples were in the new space 

defined by principal components, the 

leverage value —a measure of how 

far an object is compared to the 

majority— was computed. Sample 

spectra with leverage higher than 0.5 

were considered outliers. The outliers 

were examined in order to know if 

either they provided any useful 

information or they must be removed. 

2.3.2. Spectra preprocessing 

Different treatments were applied to 

the spectra in the calibration and 

validation steps, namely: data 

centring, scatter correction, first 

derivative, and some of them jointly. 

2.3.4. Calibration 

validation 

step: cross 

In this step, Partial Least Square 

Regression (PLSR) [19,20] was used 

for developing the equations. The 

number of calibration groups and 

maximum number of PLS factors 

were set at 5 and 16, respectively. The 

latter is based on the following rule: 

one PLS factor per 10 samples of the 

training set plus 2. On the other hand, 

a study of possible outliers in the 

prediction of the cross validation was 

carried out taking into account the 

321


Table 2. Reference data. 

Parameter 

Alcoholic 

degree (%V/V) 

Volumic mass 

(Kg/L) 

Total acidity 

(meq/L) 

322 

Calibration set Validation set 

N Range Mean SD. N Range Mean SD 

150 9.58 – 15.15 12.14 1.24 25 10.13 – 14.96 12.26 1.37 0.19 

150 

989.5 – 

999.3 

992.9 2.1 25 990.4 – 999.4 993.5 2.4 0.4 

150 3.55 – 8.72 5.42 0.92 25 4.15 – 8.69 5.70 1.09 0.35 

Glycerol (g/L) 150 1.95 – 12.38 6.29 2.47 25 2.57 – 14.56 6.74 2.88 0.43 

Total 


index 

Lactic acid 

(g/L) 

Free sulphur 

dioxide (mg/L) 

150 5.0-131.0 35.3 25.4 25 6.0 – 92.0 36.6 25.8 3.2 

130 0.06 – 5.32 1.36 1.10 20 0.22 – 5.09 1.33 1.09 0.22 

150 8.0 – 24.0 16.45 4.7 25 8.0 – 59.0 20.9 10.8 1.4 

statistic t (Student’s test) parameter, 

which was set at 2.50. Statistic 

parameters as Standard Error Cross 

Validation (SECV) and the 

Determination Coefficient (R 2 ) were 

employed. 

2.3.5. Validation step 

The validation set was introduced in 

the model, thus statistic parameters as 

Standard Error Prediction (SEP) and 

r 2 were obtained. The procedure 

SEL 

above commented was applied to the 

spectral data from the NIR and the 

MIR zones and a combination of both. 


3.1. Reference data 

Table 2 shows information about 

reference data. Thus, the range, mean, 

standard deviation (Std. Dev.) and 

number of samples for the calibration 

and validation sets are summarised in


A 

B 

Fig. 1. (A) NIR spectra. (B) MIR spectra. 

323


C 

324 

Fig. 1. (cont.). (C) Combination of NIR and MIR spectra. 

the table, in addition to the standard 

error laboratory (SEL). As can be 

seen, the range of reference values 

encompassed the characteristic values 

for a high diversity of wines. 

3.2. Sample spectra 

Spectra in the whole measurement 

range provided by the NIR instrument; 

that is, 400-2500 nm were 

obtained. Nevertheless, the zone of 

the MIR spectra used was only the 

800-3000 cm -1 region; thus, the 400- 

800 and 3000-4000 cm -1 zones were 

not employed in this study owing to 

the high irreproducibility because of 

the high absorbance values. NIR and 

MIR spectra are shown in Fig. 1-A 

and Fig. 1-B, respectively. The 

highest differences in the NIR spectra 

were within 400-1100 and 1910-1960 

nm, while the MIR spectra presented 

the largest differences in the 1000- 

1500 and 2300-2400 cm -1 regions. 

The combination of the NIR and MIR 

spectra is shown in Fig. 1-C. 

In the NIR zone, the absorption 

bands with the highest intensity


Fig. 2. Samples in the space determined by the first two principal components for 

the NIR region. 

—namely, 1820-2030 and 1370-1600 

nm— are second and first harmonics, 

respectively, of the MIR absorption 

band corresponding to the strength 

vibration of the –OH group. The 

fundamental vibration band corresponding 

to this group is at 3300-3500 

cm -1 , which was not used owing to the 

high irreproducibility, as commented 

before. 

3.3. Spectral Outliers 

PCA was applied to the three data 

matrixes corresponding to the sample 

spectra for NIR, MIR and their 

combination. The leverage values 

were calculated in order to detect 

spectral outliers. Four and three 

spectra were outliers for NIR and 

MIR, respectively. These spectral 

outliers were not common for the two 

regions; which can be due to 

operational errors. With respect to the 

combination of the two zones, only 

two spectra were computed as 

outliers. The samples plotted in the bidimensional 

space formed by the first 

two principal components for the NIR 

region are shown in Fig. 2. The 

outliers are within circles. Two groups 

can also be distinguished: white wine 

are in the left side and red wines in 

325


326 

Table 3. Results obtained with the proposed equations. 

Parameter Spectrum 

Alcoholic degree 

(%V/V) 

Volumic mass 

(Kg/L) 

Total acidity 

(meq/L) 

Glycerol (g/L) 

Total polyphenol 

index 

Lactic acid (g/L) 

Calibration equation 

Validation 

equation 

N MEAN SECV R 2 r 2 SEP 

NIR 140 12.07 0.19 0.986 0.978 0.24 

MIR 140 12.11 0.23 0.972 0.961 0.29 

NIR-MIR 138 12.07 0.20 0.980 0.953 0.35 

NIR 141 992.8 0.33 0.980 0.917 0.54 

MIR 139 992.7 0.42 0.961 0.912 0.60 

NIR-MIR 142 992.8 0.40 0.967 0.901 0.63 

NIR 139 5.41 0.38 0.845 0.812 0.48 

MIR 140 5.52 0.45 0.837 0.795 0.54 

NIR-MIR 139 5.47 0.43 0.841 0.814 0.49 

NIR 139 6.27 0.59 0.936 0.845 0.72 

MIR 140 6.33 0.55 0.940 0.813 0.68 

NIR-MIR 140 6.33 0.50 0.954 0.926 0.57 

NIR 138 32.31 4.50 0.975 0.919 6.70 

MIR 139 32.10 4.70 0.937 0.892 7.13 

NIR-MIR 141 32.05 4.83 0.914 0.890 7.24 

NIR 125 1.38 0.35 0.860 0.814 0.41 

MIR 123 1.36 0.42 0.825 0.790 0.55 

NIR-MIR 122 1.35 0.37 0.871 0.811 0.52


Table 3 (cont.). Results obtained with the proposed equations. 

Parameter Spectrum 

Total sulphur 

dioxide (mg/L) 

Calibration equation 

Validation 

equation 

N MEAN SECV R 2 r 2 SEP 

NIR 141 53.2 21.5 0.615 0.569 23.5 

MIR 140 52.3 24.3 0.573 0.520 27.0 

NIR-MIR 138 52.7 19.0 0.765 0.670 22.7 

the right. Rosé wines (a small group) 

are located between the other two 

groups, next to white wines. 

3.4. Equations Development 

3.4.1. Influence of the spectra 

preprocessing 

Accuracy and precision, based on 

SECV and R 2 values, were similar, 

independently of the mathematical 

preprocessing employed. 

3.4.2. Equations accuracy 

Table 3 shows the statistic results 

obtained in the calibration and 

external validation stages (SECV and 

R 2 ). Table IV summarises the criteria 

proposed by Shenk et al. 21 to compare 

statistic results from calibrations and 

the subsequent validations. NIRS 

provided the best statistic results for 

alcoholic degree and volumic mass, 

being the SECV values very close to 

those of the standard methods — 

namely 0.19 % V/V and 0.33 Kg/L 

for the determination of alcoholic 

degree and volumic mass, respectively—. 

The R 2 values for the 

correlation between the reference and 

NIRS methods were 0.986 and 0.980, 

respectively, for the determination of 

these parameters. The use of the MIR 

region did not improve the results, as 

did the combination of both zones. 

Total acidity was determined 

with good precision (the SECV and 

R 2 values were 0.38 meq/L and 0.845, 

respectively) by NIRS. The determination 

of this parameter by FT-MIR 

showed statistic results similar to 

327


328 

Table 4. Criteria for the evaluation of the results. 

R 2 SEP 

R 2 ≥ 0.90 Excellent precision SEP = 1-1.5 SEL Excellent precision 

R 2 = 0.70 – 0.89 Good precision SEP = 2-3 SEL Good precision 

R 2 = 0.50 – 0.69 

R 2 = 0.30 – 0.49 

R 2 = 0.05 – 0.29 

Good separation 

between low, medium, 

and high values 

Correct separation 

between low and high 

values 

It is better than no 

analysing 

NIRS (0.45 meq/L and 0.837), but not 

surpassed the efficiency of the NIR 

region. The determination of polyphenol 

total index also yielded similar 

values by the two procedures proposed, 

as can be seen in Table 3. 

Glycerol and total sulphur 

dioxide were determined by the 

combination of MIR and NIR regions 

in a more accurate way than by either 

NIR or MIR separately. It is worth to 

emphasising the improvement achieved 

for the determination of total 

sulphur dioxide —taking into account 

SEP = 4 SEL Medium precision 

SEP = 5 SEL Low precision 

the R 2 values, from 0.615 by NIR to 

0.765— by the combination of the 

two zones studied. In this way, this 

parameter can thus be quantitatively 

determined, according to the criteria 

showed in Table 4. 

The applicability of NIRS to 

the most representative organic acids 

—namely, malic, tartaric and gluconic— 

was limited to screening 

methodologies owing to the low 

concentration of these compounds. 

For this reason, these parameters are 

not shown in the tables. Only the


determination of lactic acid, with 

values of R 2 higher than 0.800 and 

SECV values acceptable —according 

to the criteria shown in Table 4— 

with respect to SEL value in Table 2. 

3.4.3. External validation 

The equations were tested with the 

validation set (samples not used for 

the development of the equations). 

Almost all the SEP values in the 

external validation were within the 

limit value —SEP = 1.5*SECV—. 

Only the validation of volumic mass 

yielded an SEP value slightly upper 

the limit; thus, the equations 

developed were robust. 

3.4.4. Comparison between the spec 

tral zones employed 

The determinations carried out using 

the NIR region yielded generally 

better statistic results than those using 

the MIR region. This behaviour is due 

to the low signal/noise ratio in the 

MIR spectra caused by both 

instrumental and technical characteristics 

and the high intensity of the 

absorption bands corresponding to the 

strength vibration of the –OH group. 

In spite of the best efficiency of 

NIRS, drastic differences were not 

found. 

The combination of the two 

spectral zones improved the determination 

for glycerol and total 

sulphur dioxide. This combination is 

particularly useful for the later as it 

provides a quantitative method and 

not a screening method as occurs 

when the spectral regions are used 

separately. The use of a larger dataset 

by combination of the two spectral 

zones (2100 variables) surpasses 

results obtained from the data of the 

MIR region. 


A study of the applicability of the 

spectroscopic techniques in the near 

and mid infrared zones to the 

determination of wine parameters has 

been studied in this work. Thus, the 

spectroscopic methods developed 

enable multidetermination for alcoholic 

degree, volumic mass, total 

acidity, total polyphenol index, 

glycerol and total sulphur dioxide, 

surpassing the limitations of standard 

and reference methods regarding to 

329


time, reagent consumption, operational 

errors, etc. 

The simplicity of the methods 

developed is similar for both spectral 

regions. The calibration stage, and 

development of the equations is the 

limiting step taking into account the 

time necessary for calibration. After 

this step, the determination of the 

parameters in a single analysis is 

carried out in 2 min. In addition, the 

equations were developed with the 

aim of covering a wide range of 

wines; for which wines from different 

appellation d’origine and grape 

varieties were used. 

The study carried out shows 

that NIRS results were better than 

those obtained by FT-MIR due to the 

high signal/noise ratio of the latter. In 

addition, the combination of both 

spectral zones has been studied for the 

first time. The availability of a larger 

number of variables (wavelengths) 

than when used separately surpasses 

the noise of MIR spectra for two 

parameters; namely, glycerol and total 

sulphur dioxide. The improvements 

achieved are the key for the quantitative 

determination of the latter. 

330 

The equations provided for each zone 

of the spectrum separately can only be 

used for screening. 

5. Acknowledgements 




AGL2000-0321-P4-03). 

6. References 

[1] M. Valcárcel and M.D. Luque 

de Castro, Flow Injection Analysis: 

Principles and Appli- 

[2] 

cations, Ellis Horwood, Chichester, 

1987. 

M. D. Luque de Castro and J. L. 

Luque García, Acceleration and 

Automation of Solid Sample 

Treatment, Elsevier, Amsterdam, 

2002. 

[3] L, Yang, V, Colombini, P. 

Maxwell, Z. Mester and R. E. 

Sturgeon, J. Chromatogr., A. 

1011 (2003) 135. 

[4] M. Bonoli, M. Montanucci, T. 

G. Toschi and G. Lercker, J. 

Chromatogr., A. 1011 (2003) 

163.


[5] G. Alvarez Llamas, M. D. R. 

Fernández de la Campa and A. 

Sanz Medel, J. Anal. Atom. 

Spectrom. 18 (2003) 460. 

[6] E. Trullols, I. Ruisánchez and F. 

X. Rius, Trends Anal. Chem. 23 

(2004) 137. 

[7] R. Muñoz-Olivas, Trends Anal. 

Chem. 23 (2004) 203. 

[8] G. Kos, H. Lohninger and R. 

Krska, Anal. Chem. 75 (2003) 

1211. 

[9] J.L. Pérez-Pavón, M. Del 

Nogal-Sánchez, C. García-Pinto, 

M. E. Fernández Laespada, 

B. Moreno-Cordero and A. 

Guerrero-Peña, Anal. Chem. 75 

(2003) 2034. 

[10] E. Mataix and M. D. Luque de 

Castro, Analyst. 123 (1998) 

1547. 

[11] E. Mataix and M. D. Luque de 

Castro, Talanta. 51 (2000) 489. 

[12] M. C. García-Parrilla, G. A. 

González, F. J. Heredia and 

A.M. Troncoso, J. Agric. Food 

Chem. 45 (1997) 3487. 

[13] G. J. Martin, C. Guillou and M. 

L. Martin, J. Agric. Food Chem. 

36 (1998) 316. 

[14] M. Urbano-Cuadrado, M. D. 

Luque de Castro, P. M. Pérez- 

Juan, J. García-Olmo and M. A. 

Gómez-Nieto, Anal. Chim. Acta 

527 (2004) 81. 

[15] C. M. García-Jares and B. 

Medina, Fresenius' J. Anal. 

Chem. 357 (1997) 86. 

[16] R. Eberl, J. Near Infrar. Spectrosc. 

6 (1998)133. 

[17] A. K. Kupina and A. Shrik 

hande, J. Amer. J. Enol. 

Viticult. 54 (2003) 131. 

[18] M. Dubernet and Mathieu 

Dubernet, Rev. Franç. D’Oenolog. 

81 (2000) 10. 

[19] B. G. M. Vandeginsten, D. L. 

Massart, S. Buydens, S. De 

Jong, P. J. Lewi and J. Smeyers- 

Verbeke, Handbook of Chemometrics 

and Qualimetrics: Part 

B. Elsevier, Amsterdam, 1998. 

[20] K. H. Esbensen. Multivariate 

Data Analysis- in Practice. Camo 

Process AS, Oslo, 2002. 

[21] J. S. Shenk and M. O. 

Westerhaus, Calibration the ISI 

way. En Near Infrared Spec 

troscopy: the Future Waves. 

331


332 

NIR Publications, Chichester, 

1996, p.198.

Discusión de los resultados

Discusión 

La normativa vigente en la Universidad de Córdoba sobre la elaboración de 

Memorias de Tesis Doctorales en la modalidad que permite incluir los artículos 

publicados o en vías de publicación como tales, establece la inclusión de una 

sección de discusión conjunta de los resultados, que puede llevarse a cabo o no 

dependiendo de la homogeneidad del tema de la tesis. 

Esta Memoria recoge la investigación sobre la aplicabilidad de las 

nuevas herramientas informáticas en química analítica, tema general que puede 

dividirse en dos partes: 1) la informatización del proceso analítico y la gestión y 

análisis de los datos que éste proporciona con el uso del paradigma de ingeniería 

del software de orientación a objetos. 2) La comparación de métodos 

quimiométricos convencionales con otros propuestos en esta Memoria y basados 

en el cálculo de la similitud entre muestras problema a partir de las denominadas 

fingerprints o huellas digitales construidas utilizando los datos espectroscópicos. 

Además, se estudió la posibilidad de mejorar la determinación de parámetros 

enológicos con el aumento de la información espectral, para lo que se usaron dos 

regiones: NIR y MIR. 

Por tanto, esta sección se ha dividido en dos bloques: informatización del 

proceso y de los datos analíticos, y desarrollo y uso de métodos quimiométricos 

para el tratamiento de datos espectroscópicos. 

1. Informatización del proceso y de los datos analíticos 

Las aplicaciones informáticas diseñadas y construidas en esta investigación son 

el fruto de procesos basados en los paradigmas orientado a objetos y de 

335


desarrollo en espiral. Estas metodologías han proporcionado programas 

informáticos familiares al químico analítico, abiertos a los cambios funcionales e 

independientes de los análisis llevados a cabo y de los datos obtenidos. 

En esta primera parte se han abarcado dos vertientes diferenciadas por el 

ámbito de aplicación: la automatización del proceso analítico, y la gestión y 

análisis de los datos. 

1.1 Automatización del proceso analítico 

1.1.1 Modelo estructural para la automatización en química analítica 

En el modelo estructural propuesto se ha utilizado la nomenclatura de la IUPAC, 

diferenciando entre dispositivo, aparato, instrumento y autoanalizador. En la 

jerarquía de clases desarrollada se han empleado las relaciones de generalización 

y agregación, junto con el proceso de especialización, para reflejar también las 

distintas clasificaciones propuestas en el Análisis Instrumental y en las Técnicas 

Analíticas de Separación para los instrumentos y aparatos, respectivamente. 

El concepto de dispositivo se centra en los aspectos físicos de la 

implantación de los aparatos e instrumentos, no tratándose de un ente lógico 

desde el punto de vista de la automatización analítica. 

1.1.2 Modelo dinámico para la automatización en química analítica 

En la investigación que recoge esta Memoria se ha propuesto un modelo para 

representar un proceso analítico capaz de ejecutar acciones según parámetros de 

tiempo (es decir, un instrumento o aparato lleva a cabo una determinada acción 

en un momento preseleccionado) y según parámetros de estado (es decir, se lleva 

a cabo una determinada acción según el valor de unas variables, instrumentales o 

no, como son el valor de absorbancia, el número de muestras analizadas, la 

temperatura de un termostato, etc.). Para este tipo de control se han propuesto dos 

módulos: uno para el control del tiempo y otro para el control de las variables de 

estado. 

336

Discusión 

Las estructuras de control usadas para la gestión de los disparadores no 

pueden ser consideradas como un motor de inferencia y, además, adolecen del 

cálculo de incertidumbre. Como ventajas, el modelo dinámico propuesto permite 

la construcción de un sistema abierto a cualquier tipo de regla y cercano al 

usuario. 

1.1.3 Modelo arquitectónico para la automatización en química analítica 

Se ha desarrollado un modelo por capas para cubrir desde el aspecto físico de la 

comunicación del instrumento con el ordenador hasta el aspecto analítico del 

diseño de la configuración de los métodos. Se diseñaron 5 capas: de drivers, de 

comunicación, de lógica, de control y de diseño. Las dos primeras son las que 

limitan la total escalabilidad e independencia del sistema de automatización, 

siendo necesario que los fabricantes la aporten, o, en su defecto, desarrollarlas 

para los distintos sistemas operativos. Si se consigue esto último, se puede hablar 

de un sistema estándar de automatización. 

En la Tesis realizada se han desarrollado las dos primeras capas para un 

total de 10 instrumentos y aparatos (incluyendo bombas peristálticas, automuestreadores, 

espectrofotómetros, espectrofluorímetros, válvulas automáticas, 

peachímetros, etc.). 

La última capa, la de diseño, es una aplicación para la especificación de 

sistemas de control en automatización, cercana al lenguaje del químico analítico, 

y que no necesita que el usuario posea ningún conocimiento sobre algorítmica y 

estructuras de datos, a diferencia de otros entornos comúnmente utilizados 

descritos en la bibliografía. 

1.1.4 Pruebas del funcionamiento del sistema de automatización 

El sistema se ha probado mediante la automatización total del método de 

determinación de la acidez volátil en vinos propuesto por uno de los grupos de 

investigación donde se ha realizado la tesis. El hecho de disponer de un sistema 

totalmente automatizado aportó la ventaja de comodidad para el usuario y la 

337


reducción de costes. Además, se obtuvieron mejoras cuantitativas en la exactitud 

y precisión al eliminar la participación humana en el análisis. 

Otra aplicación de prueba para el sistema de automatización ha sido la 

puesta a punto del método para la medición de la actividad oxidásica de la lacasa 

en vinos. La versatilidad que aporta el sistema se constató en el estudio de las 

variables que afectan la señal analítica (diseño de experimentos para la 

optimización) y en la caracterización del método. La configuración instrumental 

y la adquisición y tratamiento de los datos cambió respecto al método para la 

acidez volátil. 

1.2 Automatización de la gestión y el análisis de los datos analíticos 

Se ha desarrollado un sistema para la gestión y análisis de los datos generados en 

los laboratorios analíticos pertenecientes a empresas dedicadas a la producción o 

elaboración de productos cuya calidad está basada en el valor de unos 

determinados parámetros. El modelo y la tecnología usados para la construcción 

del sistema hacen que éste tenga una configuración abierta respecto a cualquier 

proceso de producción, que permita la accesibilidad remota a la información y 

que sea independiente respecto a la plataforma en la que se instala. 

El sistema desarrollado es de ayuda a la hora de tomar decisiones en el 

sector enológico y está compuesto por dos herramientas: 1) una aplicación de 

gestión de los datos analíticos de la elaboración de vinos, además de la 

información referente al laboratorio; 2) una aplicación Web a través de la cual los 

técnicos extraen información de los datos históricos correspondientes a ciclos de 

producción anteriores o al actual. La primera aplicación se denomina subsistema 

operacional y la segunda subsistema de análisis. 

La aplicación operacional establece un flujo de trabajo unificado en el 

laboratorio, ya que se ha de llevar a cabo una serie de etapas secuenciales como 

son la identificación de las muestras, la carga de parámetros, la introducción de 

resultados y su validación, y la introducción de los datos en el histórico. Además, 

338

Discusión 

es posible conocer la carga de trabajo en el laboratorio en tiempo real, la gestión 

automatizada de los instrumentos, etc. 

La aplicación de análisis de la información permite obtener la evolución, 

en formato gráfico o tabular, de uno o varios parámetros enológicos según unos 

determinados criterios de búsqueda. Se puede también llevar a cabo un estudio 

estadístico de los datos (por ejemplo, el incremento de un parámetro o la 

desviación estándar por día). 

2. Métodos quimiométricos para el tratamiento de datos 

espectroscópicos 

En la segunda parte de la tesis se han utilizado las zonas espectrales ultravioleta, 

visible, infrarrojo cercano e infrarrojo medio para la diferenciación y 

clasificación de muestras por un lado, y la determinación de parámetros químicos 

por otro, mediante la aplicación de varios métodos quimiométricos. El campo de 

aplicación de los métodos propuestos ha sido el enológico. 

Las variables categóricas usadas en la diferenciación y clasificación de 

muestras de vinos fueron la zona de procedencia de las muestras dentro de una 

única denominación de origen, la variedad de la uva utilizada, y el proceso de 

crianza. 

Las ecuaciones para la determinación de parámetros cuantitativos han 

cubierto la mayoría de los parámetros enológicos que se cuantifican en el vino. 

Se ha estudiado la determinación de 16 propiedades cuantitativas por 

espectroscopía con detección multicanal, abarcando todos los parámetros de 

rutina en vinos de diferentes denominaciones de origen. 

2.1 Análisis cualitativo 

2.1.1 Espectroscopía ultravioleta-visible y métodos clásicos de patrones de 

reconocimiento de muestras 

Se han utilizado los métodos de análisis de componentes principales (ACP) y el 

339


soft independent modelling of class analogies (SIMCA) para el análisis 

exploratorio y el desarrollo de reglas de clasificación, respectivamente. Los 

espectros ultravioleta-visible obtenidos han permitido discriminar vinos blancos 

y tintos de diferente procedencia dentro de una misma denominación de origen. 

Así, se ha conseguido una discriminación de segundo orden, para lo que 

se requirieron datos espectroscópicos que proporcionaran una mayor información 

química, recurriendo a la zona de ultravioleta 300-400 nm, normalmente no 

usada para estos propósitos. Esta región ha sido clave para poder diferenciar el 

origen de los vinos, siendo los ésteres de los ácidos hidroxicinámicos, que 

absorben en esta región, los principales compuestos discriminadores. Los errores 

en el conjunto para la validación de los modelos desarrollados fueron del 10% y 

el 20% para vinos blancos y tintos, respectivamente. 

Los modelos propuestos para la clasificación de vinos tintos según la 

variedad de la uva empleada y el proceso de crianza mostraron un 25% de error 

en la predicción. La región visible presenta en estos modelos una mayor 

capacidad de discriminación que la mostrada para el primer criterio, lo que indica 

que el contenido de antocianos producidos por las dos variedades de uva y su 

evolución en el proceso de crianza es significativamente diferente. De nuevo, la 

región ultravioleta presentó una alta capacidad de diferenciación. 

2.1.2 Espectroscopía de infrarrojo medio y métodos de preprocesamiento 

basados en huellas digitales y en el cálculo de similitudes 

Se ha propuesto un método para el tratamiento de los espectros de infrarrojo 

medio obtenidos para distintos tipos de vinos. La baja selectividad que presenta 

esta zona espectral fue la causante de la no-discriminación de vinos según los 

criterios de origen, variedad y crianza. 

El método permite la construcción de huellas digitales para cada muestra 

siguiendo los pasos de normalización (para tener valores de absorbancia en el 

rango 0-1) y de asignación de uno o cero a cada variable (longitud de onda 

340

Discusión 

considerada) según ésta supere o no, respectivamente, un valor umbral estudiado 

para valores de 0.2, 0.4, 0.6 y 0.8. 

Una vez construidas las huellas se procedió a la generación de una matriz 

de similitud (basada en el cálculo del índice de Tanimoto) que considera para 

cada par de muestras un valor que tiene en cuenta la semejanza entre éstas. A la 

matriz de similitud se le realizó un ACP para visualizar posibles tendencias en el 

agrupamiento de muestras. La discriminación conseguida fue mayor que la 

obtenida con los espectros de MIR sin la aplicación del tratamiento propuesto, 

siendo los valores umbrales 0.4 y 0.6 los que condujeron a los mejores 

resultados. 

La obtención de una huella digital patrón y el cálculo modificado de la 

matriz de similitud, teniendo en cuenta tanto la huella digital como un nuevo 

método de conteo de los bits significativos, proporcionó mejores discriminaciones 

entre grupos de muestras. 

Se realizó un SIMCA para estudiar la capacidad de predicción de los 

modelos, obteniendo los menores valores de error para la clasificación de 

muestras de diferente variedad y del desarrollo o no de proceso de crianza (9 y 

5%, respectivamente). El error en la clasificación de muestras de diferente origen 

fue del 22%. 

La calidad de las huellas digitales patrón y la capacidad de predicción de 

los modelos SIMCA construidos requieren un número mayor de muestras, por lo 

que los errores corresponden a un primer estudio exploratorio de la capacidad del 

método propuesto. 

2.2 Análisis cuantitativo 

2.2.1 Espectroscopía NIR y calibración multivariante 

Se ha estudiado la posibilidad de la espectroscopía NIR en el campo enológico 

con el fin de ampliar las aplicaciones que aparecen recogidas en la bibliografía. 

Así, un primer objetivo fue el desarrollo de ecuaciones PLS de ámbito general 

341


para diferentes tipos de vino, incluyendo blancos, rosados, tintos, de diferente 

variedad de uva, de distinta denominación de origen, etc. 

Un segundo objetivo fue el ampliar el número de parámetros estudiados, 

llegando hasta 16 parámetros (a saber: grado alcohólico, masa volúmica, acidez 

total, pH, acidez volátil, glicerol, índice de polifenoles totales (ipt), azúcares 

reductores, color, tonalidad, dióxido de azufre total y libre y los ácidos tartárico, 

láctico, málico y glucónico). 

Se han llevado a cabo las correspondientes etapas de calibración 

(mediante el método de validación cruzada) y de validación, obteniéndose una 

serie de parámetros estadísticos que se analizaron según diferentes criterios y 

normas. De esta forma, el grado alcohólico, masa volúmica, acidez total, pH, 

glicerol, color, ácido láctico, tonalidad e ipt se determinaron de forma exacta 

atendiendo a los valores R 2 y ETVC (Error Típico de la Validación Cruzada). La 

espectroscopía en el infrarrojo cercano sólo puede utilizarse para métodos de 

screening del dióxido de azufre total, los azúcares reductores, la acidez volátil y 

el resto de ácidos orgánicos. El dióxido de azufre libre no puede determinarse por 

NIRS. 

La comparación del error que proporciona el SECV con el obtenido en la 

validación de las ecuaciones desarrolladas, ETP (Error Típico de la Predicción), 

ha conducido a la conclusión de que la mayoría de las ecuaciones desarrolladas 

son robustas (excepto para la masa volúmica y el ácido tartárico, que mostraron 

diferencias levemente superiores al límite permitido por el criterio ETP ≤ 1.5 * 

ETVC). También se ha comprobado que la pendiente y el sesgo o desviación de 

la correlación entre el método de referencia y el método NIRS no difieren 

estadísticamente de 1 y 0, respectivamente. 

2.2.2 Estudio del uso separado y conjunto de la espectroscopía visible-NIR y la 

espectroscopía FT-MIR en la determinación de parámetros enológicos 

La región visible y ambas zonas del espectro infrarrojo (400-2500 nm para el 

visible y NIR, y 3000-800 cm -1 para el MIR) se han usado para la determinación 

342

Discusión 

de varios parámetros en vinos de diferente variedad, procedencia y tipo. Los 

parámetros determinados fueron el grado alcohólico, masa volúmica, glicerol, 

acidez total, ipt, ácido láctico y dióxido de azufre total. 

La espectroscopía NIR mostró, en la mayoría de los casos, mejores 

resultados estadísticos que la espectroscopía FT-MIR (aunque no se encontraron 

diferencias significativas). Una causa de este comportamiento radica en la 

naturaleza de la absorción en ambas zonas. La elevada intensidad de la absorción 

correspondiente a la tensión de vibración del grupo –OH da lugar a una baja 

relación señal/ruido. 

La combinación de ambas regiones mejoró la determinación de dos 

parámetros: el dióxido de azufre total y el glicerol. Para el dióxido de azufre total 

los parámetros estadísticos obtenidos permiten usar la ecuación para la 

determinación cuantitativa y no sólo el screening. 

343

Conclusiones

Conclusiones 

A continuación se presentan las conclusiones más destacables de la investigación 

realizada en la Tesis Doctoral y recogida en esta Memoria. 

A) Se han desarrollado herramientas computacionales para la automatización 

del proceso analítico y el análisis de la información que este proceso 

genera. Las soluciones software construidas poseen un carácter abierto 

respecto al problema analítico, un alto grado de escalabilidad para futuras 

ampliaciones y un elevado nivel de independencia frente a instrumentos y 

aparatos, sistemas operativos, y hardware informático. Estas herramientas 

son las siguientes: 

A.1) Plataforma para la automatización de procesos analíticos compuesta 

por dos subsistemas: 1) un subsistema que permite al químico 

analítico el diseño de métodos automatizados (estructura y 

comportamiento del autoanalizador) y su almacenamiento en formato 

XML y, 2) un subsistema que, a partir de los datos de la fase de 

diseño, posibilita la ejecución de los análisis de forma totalmente 

automatizada. 

La automatización de dos métodos continuos para la determinación 

de parámetros enológicos han sido los casos de prueba para la 

evaluación de la plataforma. 

A.2) Un LIMS (Laboratory Information Management System) para la 

gestión y análisis de los datos en una laboratorio analítico, 

compuesto por dos subsistemas de funcionalidad marcadamente 

347


348 

especializada: 1) un subsistema de gestión del trabajo diario en el 

laboratorio analítico, que permite la organización y planificación de 

muestras, parámetros a medir, instrumentos y reactivos, usuarios y 

alertas, y 2) un subsistema de análisis de los datos históricos 

almacenados por la organización (o laboratorio), que posibilita la 

consulta según criterios preestablecidos o no para la extracción de 

información que ayude a la toma de decisiones. 

B) Se han propuesto nuevos métodos quimiométricos para el análisis de la 

gran cantidad de información espectral que proporcionan los equipos 

actuales. Se ha trabajado con la espectroscopía ultravioleta-visible, de 

infrarrojo cercano y de infrarrojo medio con transformada de Fourier. Se 

han extraido las siguientes conclusiones en el campo enológico, tanto a 

nivel cualitativo como cuantitativo: 

B.1) Se han desarrollado modelos de clasificación de vinos a partir de los 

espectros ultravioleta y visible. Estos modelos permiten la 

clasificación de vinos según la zona de procedencia dentro de una 

denominación de origen (clasificación de segundo orden), tipo de 

uva empleada y proceso de elaboración. 

B.2) Se ha desarrollado un método computacional para el análisis de 

información espectroscópica basado en la construcción de huellas 

digitales, el cálculo de similitudes y la escalabilidad de las medidas 

de similitud para incrementar el nivel de caracterización. 

B.3) Se ha aplicado el método anterior a la clasificación de vinos 

utilizando el espectro infrarrojo medio según los tres criterios citados 

en el punto B.1. Se obtuvieron mejoras significativas en la 

clasificación de muestras al aplicar el modelado SIMCA a las 

matrices de similitud basadas en el uso de las huellas digitales de

Conclusiones 

espectros y en el del método de escalabilidad en el cálculo de la 

similitud espectral frente al empleo de los datos sin tratar. 

B.4) Se han construido ecuaciones PLS para la determinación y screening 

de 15 parámetros enológicos utilizando el espectro infrarrojo medio. 

Se trata de ecuaciones globales para cuyo desarrollo se han usado 

vinos de diferentes tipos de uva y denominación de origen. 

B.5) Se han comparado los resultados obtenidos empleando las zonas NIR 

y MIR, y se ha estudiado la posibilidad de usar la combinación de 

ambas regiones para mejorar los estadísticos de las determinaciones 

cuantitativas. La zona NIR ofreció una mejor capacidad de 

determinación, que se mejoró para dos parámetros analíticos con el 

uso conjunto de los espectros NIR y MIR. 

349

Documentación en PDF - LIEC

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?