Additional Material, Journal of Imaging Science - Society for Imaging ...

JIST 

Vol. 51, No. 4 

July/August 

2007 

Journal of 

Imaging Science 

and Technology 

imaging.org 

Society for Imaging Science and Technology

Editorial Staff 

Melville Sahyun, editor 

sahyun@infionline.net 

Donna Smith, production manager 

dsmith@imaging.org 

Editorial Board 

Philip Laplante, associate editor 

Michael Lee, associate editor 

Nathan Moroney, associate editor 

Mitchell Rosen, color science editor 

David S. Weiss, associate editor 

David R. Whitcomb, associate editor 

JIST papers are available for purchase 

at www.imaging.org and through 

ProQuest. They are indexed in 

INSPEC, Chemical Abstracts, Imaging 

Abstracts, COMPENDEX, and ISI: 

Science Citation Index. 

Orders for subscriptions or single 

copies, claims for missing numbers, 

and notices of change of address 

should be sent to IS&T via one of the 

means listed below. 

IS&T is not responsible for the accuracy 

of statements made by authors and 

does not necessarily subscribe to their 

views. 

Copyright ©2007, Society for Imaging 

Science and Technology. Copying 

of materials in this journal for internal 

or personal use, or the internal or personal 

use of specific clients, beyond 

the fair use provisions granted by the 

US Copyright Law is authorized by 

IS&T subject to payment of copying 

fees. The Transactional Reporting Service 

base fee for this journal should be 

paid directly to the Copyright Clearance 

Center (CCC), Customer Service, 

508/750-8400, 222 Rosewood Dr., 

Danvers, MA 01923 or online at 

www.copyright.com. Other copying 

for republication, resale, advertising or 

promotion, or any form of systematic 

or multiple reproduction of any material 

in this journal is prohibited except with 

permission of the publisher. 

Library of Congress Catalog Card 

No. 59-52172 

Printed in the USA. 

Society for Imaging Science and 

Technology 

7003 Kilworth Lane 

Springfield, VA 22151 

www.imaging.org 

info@imaging.org 

703/642-9090 

703/642-9094 fax 

Manuscripts should be sent to the 

postal address above as describe at 

right. E-mail PDF and other files as requested 

to dsmith@imaging.org. 

Guide for Authors 

Scope: The Journal of Imaging Science and Technology (JIST) is dedicated to the advancement of imaging science knowledge, the 

practical applications of such knowledge, and how imaging science relates to other fields of study. The pages of this journal are 

open to reports of new theoretical or experimental results, and to comprehensive reviews. Only original manuscripts that have not 

been previously published nor currently submitted for publication elsewhere should be submitted. Prior publication does not refer 

to conference abstracts, paper summaries, or non-reviewed proceedings, but it is expected that Journal articles will expand in scope 

the presentation of such preliminary communication. Please include keywords on your title and abstract page. 

Editorial Process/Submission of Papers for Review: All submitted manuscripts are subject to peer review. (If a manuscript appears better 

suited to publication in the Journal of Electronic Imaging, published jointly by IS&T and SPIE, the editor will make this recommendation.) 

To expedite the peer review process, please recommend two or three competent, independent reviewers. The editorial staff, will 

take these under consideration, but is not obligated to use them. 

Manuscript Guidelines: Please follow these guidelines when preparing accepted manuscripts for submission. 

• Manuscripts should be double-spaced, single-column, and numbered. It is the responsibility of the author to prepare a succinct, 

well-written, paper composed in proper English. JIST generally follows the guidelines found in the AIP Style Manual, available 

from the American Institute of Physics. 

• Documents may be created in Microsoft Word, WordPerfect, or LaTeX/REVTeX. 

• Manuscripts must contain a title page that lists the paper title, full name(s) of the author(s), and complete affiliation/address for 

each author. Include an abstract that summarizes objectives, methodology, results, and their significance; 150 words maximum. 

Provide at least four key words. 

• Figures should conform to the standards set forth at www.aip.org/epub/submitgraph.html. 

• Equations should be numbered sequentially with Arabic numerals in parentheses at the right margin. Be sure to define symbols 

that might be confused (such as ell/one, nu/vee, omega/w). 

• For symbols, units, and abbreviations, use SI units (and their standard abbreviations) and metric numbers. Symbols, acronyms, 

etc., should be defined on their first occurrence. 

• Illustrations: Number all figures, graphs, etc. consecutively and provide captions. Figures should be created in such a way that 

they remain legible when reduced, usually to single column width (3.3 inches/8.4 cm); see also 

www.aip.org/epub/submitgraph.html for guidance. Illustrations must be submitted as .tif or .eps files at full size and 600 dpi; 

grayscale and color images should be at 300 dpi. JIST does not accept .gif or .jpeg files. Original hardcopy graphics may be sent 

for processing by AIP, the production house for JIST. (See note below on color and supplemental illustrations.) 

• References should be numbered sequentially as citations appear in the text, format as superscripts, and list at the end of the document 

using the following formats: 

• Journal articles: Author(s) [first/middle name/initial(s), last name], “title of article (optional),” journal name (in italics), ISSN 

number (e.g. for JIST citation, ISSN: 1062-3701), volume (bold): first page number, year (in parentheses). 

• Books: Author(s) [first/ middle name/initial(s), last name], title (in italics), (publisher, city, and year in parentheses) page reference. 

Conference proceedings are normally cited in the Book format, including publisher and city of publication (Springfield, VA, for all 

IS&T conferences), which is often different from the conference venue. 

• Examples 

1. H. P. Le, Progress and trends in ink-jet printing technology, J. Imaging Sci. Technol. 42, 46 (1998). 

2. E. M. Williams, The Physics and Technology of Xerographic Processes (John Wiley and Sons, New York, 1984) p. 30. 

3. Gary K. Starkweather, “Printing technologies for images, gray scale, and color,” Proc. SPIE 1458: 120 (1991). 

4. Linda T. Creagh, “Applications in commercial printing for hot melt ink-jets,” Proc. IS&T’s 10th Int’l. Congress on Adv. In 

Non-Impact Printing Technologies (IS&T, Springfield, VA 1994) pp. 446-448. 

5. ISO 13655-1996 Graphic technology: Spectral measurement and colorimetric computation for graphic arts images (ISO, 

Geneva), www.iso.org. 

6. Society for Imaging Science and Technology website, www.imaging.org, accessed October 2003. 

Reproduction of Color: Authors who wish to have color figures published in the print journal will incur color printing charges. 

The cost for reproducing color illustrations is $490 per page; color is not available to those given page waivers, nor can color page 

charges be negotiated or waived. Authors may also choose to have their figures appear in color online and in grayscale in the printed 

journal. There is no additional charge for this, however those who choose this option are responsible for ensuring that the captions 

and descriptions in the text are readable in both color and black-and-white as the same file will be used in the online and 

print versions of the journal. Only figures saved as TIFF/TIF or EPS files will be accepted for posting. Color illustrations may be 

also submitted as supplemental material for posting on the IS&T website for a flat fee of $100 for up to five files. 

Website Posting of Supplemental Materials: Authors may also submit additional (supplemental) materials related to their articles 

for posting on the IS&T Website. Examples of such materials are charts, graphs, illustrations, or movies that further explain the 

science or technology discussed in the paper. Supplemental materials will be posted for a flat fee of $100 for up to five files. For 

each additional file, a $25 fee will be charged. Fees must be received before supplemental materials will be posted. As a matter of 

editorial policy, appendices are normally treated as supplemental material. 

Submission of Accepted Manuscripts: Author(s) will receive notification of acceptance (or rejection) and reviewers’ 

reports. Those whose manuscripts have been accepted for publication will receive correspondence informing them of the issue for 

which the paper is tentatively scheduled, links to copyright and page charge forms, and detailed instructions for submitting accepted 

manuscripts. A duly signed transfer of copyright agreement form is required for 

publication in this journal. No claim is made to original US Government works. 

Page charges: Page charges for the Journal is $80/printed page. Such payment is 

not a condition for publication, and in some circumstances page charges are 

waived. Requests for waivers must be made in writing to the managing editor prior 

to acceptance of the paper and at the time of submission. 

Manuscripts submissions: Manuscripts should be submitted both electronically 

and as hardcopy. To submit electronically, send a single PDF file attached to an e- 

mail message/cover letter to jist@imaging.org. To submit hardcopy, mail 2 singlespaced, 

single-sided copies of the manuscript to: IS&T. With both types of submission, 

include a cover letter that states the paper title; lists all authors, with complete 

contact information for each (affiliation, full address, phone, fax, and e-mail); identifies 

the corresponding author; and notes any special requests. Unless otherwise 

stated, submission of a manuscript will be understood to mean that the paper has 

been neither copyrighted, classified, or published, nor is being considered for 

publication elsewhere. Authors of papers published in the Journal of Imaging 

Science and Technology are jointly responsible for their content. Credit for the 

content and responsibility for errors or fraud are borne equally by all authors. 

JOURNAL OF IMAGING SCIENCE AND TECH- 

NOLOGY ( ISSN:1062-3701) is published bimonthly 

by The Society for Imaging Science and Technology, 

7003 Kilworth Lane, Springfield, VA 22151. Periodicals 

postage paid at Springfield, VA and at 

additional mailing offices. Printed in Virginia, 

USA. 

Society members may receive this journal as part of 

their membership. Forty-five dollars ($45.00) of 

membership dues are allocated to this subscription. 

IS&T members may refuse this subscription by written 

request. Domestic institution and individual nonmember 

subscriptions are $195/year or $50/single 

copy. The foreign subscription rate is $205/year. 

For online version information, contact IS&T. 

POSTMASTER: Send address changes to JOURNAL 

OF IMAGING SCIENCE AND TECHNOLOGY, 

7003 Kilworth Lane, Springfield, VA 22151.

JIST 

Vol. 51, No. 4 

July/August 

2007 

Journal of 

Imaging Science 

and Technology ® 

Feature Article 

283 Improved Calibration of Optical Characteristics of Paper by an Adapted 

Paper-MTF Model 

Safer Mourad 

General Papers 

293 Gloss Granularity of Electrophotographic Prints 

J. S. Arney, Ling Ye, Eric Maggard, and Brian Renstrom 

299 Forensic Examination of Laser Printers and Photocopiers Using Digital 

Image Analysis to Assess Print Characteristics 

J. S. Tchan 

310 Moiré Analysis for Assessment of Line Registration Quality 

Nathir A. Rawashdeh, Daniel L. Lau, Kevin D. Donohue, 

and Shaun T. Love 

317 Analysis of the Influence of Vertical Disparities Arising in Toed-in 

Stereoscopic Cameras 

Robert S. Allison 

328 Improved B-Spline Contour Fitting Using Genetic Algorithm for the 

Segmentation of Dental Computerized Tomography Image Sequences 

Xiaoling Wu, Hui Gao, Hoon Heo, Oksam Chae, Jinsung Cho, Sungyoung Lee, 

and Young-Koo Lee 

337 Colorimetric Characterization Model for Plasma Display Panel 

Seo Young Choi, Ming Ronnier Luo, Peter Andrew Rhodes, Eun Gi Heo, 

and Im Su Choi 

348 Real-Time Color Matching Between Camera and LCD Based on 16-bit 

Lookup Table Design in Mobile Phone 

Chang-Hwan Son, Cheol-Hee Lee, Kil-Houm Park, and Yeong-Ho Ha 

360 Solving Under-Determined Models in Linear Spectral Unmixing of 

Satellite Images: Mix-Unmix Concept (Advance Report) 

Thomas G. Ngigi and Ryutaro Tateishi 

continued on next page 

imaging.org 

Society for Imaging Science and Technology

IS&T BOARD OF DIRECTORS 

President 

Eric G. Hanson 

Department Manager 

Hewlett Packard Company 

Immediate Past President 

James R. Milch Jim 

Director Research & Innovation Labs. 

Carestream Health, Inc. 

Executive Vice President 

Rita Hofmann 

Chemist, R&D Manager 

Ilford Imaging Switzerland GmbH 

continued from previous page 

368 Color Shift Model-Based Segmentation and Fusion for Digital Autofocusing 

Vivek Maik, Dohee Cho, Jeongho Shin, Donghwan Har, and Joonki Paik 

380 Error Spreading Control in Image Steganographic Embedding Schemes Using 

Unequal Error Protection 

Ching-Nung Yang, Guo-Jau Chen, Tse-Shih Chen, and Rastislav Lukac 

386 In Situ X-ray Investigation of the Formation of Metallic Silver Phases During the 

Thermal Decomposition of Silver Behenate and Thermal Development of 

Photothermographic Films 

B. B. Bokhonov, M. R. Sharafutdinov, B. P. Tolochko, L. P. Burleva, and D. R. Whitcomb 

Conference Vice President 

Robert R. Buckley Rob 

Research Fellow 

Xerox Corporation 

Publication Vice President 

Franziska Frey 

Assist. Prof., School of Print Media 

Rochester Institute of Technology 

Secretary 

Ramon Borrell 

Technology Strategy Director 

Hewlett Packard Company 

Treasurer 

Peter D. Burns 

Principal Scientist 

Carestream Health, Inc. 

Vice Presidents 

Choon-Woo Kim 

Inha University 

Laura Kitzmann 

Marketing Dev. & Comm. Manager 

Sensient Imaging Technologies, Inc. 

Michael A. Kriss 

MAK Consultants 

Ross N. Mills 

CTO & Chairman 

imaging Technology international 

IS&T Conference Calendar 

For details and a complete listing of conferences, visit www.imaging.org 

Digital Fabrication Processes Conference 

September 16–September 20, 2007 

Anchorage, Alaska 

General chair: Ross Mills 

NIP23: The 23rd International Congress on 

Digital Printing Technologies 

September 16–September 20, 2007 

Anchorage, Alaska 

General chair: Ramon Borrell 

IS&T/SID’s Fifteenth Color Imaging 

Conference cosponsored by SID 

November 5–November 9, 2007 

Albuquerque, New Mexico 

General chairs: Jan Morovic 

and Charles Poynton 

Electronic Imaging 

IS&T/SPIE 20th Annual Symposium 

January 26–January 31, 2008 

San Jose, California 

General chairs: Nitin Sampat 

CGIV 2008: The Fourth European Conference 

on Color in Graphics, Image and Vision 

June 10–13, 2008 

Terrassa, Spain 

General chair: Jaume Pujol 

Archiving 2008 

June 24–27, 2008 

Bern, Switzerland 

General chair: Rudolf Gschwind 

Jin Mizuguchi 

Professor, Yokohama National Univ. 

David Weiss 

Scientist Fellow, Eastman Kodak 

Company 

Chapter Director 

Franziska Frey – Rochester 

Patrick Herzog – Europe 

Takashi Kitamura – Japan 

Executive Director 

Suzanne E. Grinnan 

IS&T Executive Director 

ii J. Imaging Sci. Technol. 514/Jul.-Aug. 2007

Journal of Imaging Science and Technology® 51(4): 283–292, 2007. 

© Society for Imaging Science and Technology 2007 

Improved Calibration of Optical Characteristics of Paper 

by an Adapted Paper-MTF Model 

Safer Mourad 

Empa, Swiss Federal Laboratory for Materials Testing and Research, Laboratory for Media Technology, 

Dübendorf, Switzerland 

E-mail: safer.mourad@empa.ch 

Abstract. The calibration of color printers is highly influenced by 

optical scattering. Light scattered at microscopic level within printed 

papers induces a blurring phenomenon that affects the linearity of 

the tone reproduction curve. The induced nonlinearity is known as 

optical dotgain. Engeldrum and Pridham analyzed its impact on 

printing, using Oittinen’s light scattering model. They determined the 

scattering and absorption coefficients based on spectral measurements 

of solid patches only. Their calibration achieves good independence 

of any printing irregularities. However, the microscopic 

knife-edge measurements of Arney et al. showed that the model 

overestimates the influence of the absorption coefficient. Unlike 

Oittinen’s model, we directly approach the laterally scattered light 

fluxes. This is achieved by an extended three-dimensional Kubelka- 

Munk model. We describe how to determine our coefficients using 

measurements of mere solid patches, which allows us to decouple 

the optical dot gain from other printing influences. Our improved 

model successfully corrects the observed overestimation and is able 

to predict Arney’s microscopic measurements. © 2007 Society for 

Imaging Science and Technology. 

DOI: 10.2352/J.ImagingSci.Technol.200751:4283 

INTRODUCTION 

The appearance of halftone images is determined by optical 

scattering in paper, which greatly affects the calibration of 

color printers. Optical scattering is the reason for the 

nonlinearity of the tone reproduction curve known as optical 

dot gain. This paper describes an improved and easy 

characterization method for optical dot gain. 

Engeldrum and Pridham 1 analyzed the effects of optical 

dot gain on printing using Oittinen’s scattering model. 2 This 

estimates the lateral extent of the scattering effect based on 

the Kubelka-Munk analysis of mere vertical radiant fluxes 

originally proposed for uniform paints. 3 Engeldrum and 

Pridham estimated the values of the classical Kubelka-Munk 

fitting coefficients of light absorption and scattering by spectral 

measurements of printed solid patches, i.e., fulltone 

single-color patches printed with full area coverage. They 

applied the coefficients obtained to Oittinen’s model and 

predicted the optical lateral point spread function. Using 

simple solid patches offers the significant advantage of attaining 

calibration independent of any halftone printing irregularities. 

However, Arney et al. compared microscopic 

 

IS&T Member 

Received Dec. 23, 2005; accepted for publication Jan. 28, 2007. 

1062-3701/2007/514/283/10/$20.00. 

knife-edge measurements of different paper substrates with 

the numerical results of Oittinen and Engeldrum’s model 

and observed overestimated influence of the absorption coefficient 

on the width of the predicted spread function. 4,5 

In contrast to Oittinen’s model, our approach intrinsically 

accounts for scattered lateral light fluxes. 6,7 Our concept 

extends the classical Kubelka-Munk model to threedimensional 

space and analyses the balances of diffuse light 

fluxes across the six faces of an elementary paper volume 

cube. Since the lateral fluxes are explicitly considered, the 

extended model improves the discrimination between the 

scattering and absorption coefficients. After a brief review of 

the related background, we introduce the model used as a 

mathematical tool and demonstrate how to determine its 

fitting coefficients without using any microscopic device. We 

then present a few application results and compare them 

with the findings of Arney et al. 5 

BACKGROUND 

Nowadays, color printers are calibrated based on measurements 

of color patches that encompass the whole color 

gamut. Usually the patches are arranged within standardized 

color wedges and printed according to a known input set of 

device-dependent sampling points. Together with the measured 

colors, the values of the sampling points constitute the 

printer’s tone transfer function. Common color management 

systems require these transfer functions in a tabulated 

form called the calibration profile. In practice, the tone 

transfer function of printers is highly nonlinear and needs to 

be measured on a large number of at least one thousand 

printed color sampling points. The measured color response 

deviations from linearity are referred to as dot gain and are 

induced by two distinct effects. The first effect is called mechanical 

or physical dot gain and arises due to the nonlinear 

response of the reproduction process. It leads, e.g., to differences 

between the intended dot size and the dot size actually 

printed. The second effect is the optical dot gain 8 , also called 

Yule-Nielsen effect 9 , and is generally caused by the phenomenon 

of scattered light within the paper substrate. It induces 

microscopically a blurring phenomenon, which is the reason 

for the optical dot gain. Both dot gains have similar influences 

on the printed halftone images and hence on the tone 

reproduction curve. Therefore, it is hard to characterize their 

distinct impact by means of mere macroscopic reflectance 

283

Mourad: Improved calibration of optical characteristics of paper by an adapted paper-MTF model 

measurements of reproduced color wedges. However, controlling 

and improving the image quality of color prints ultimately 

requires an analytical understanding of both effects 

in detail. In this publication we consider only the optical dot 

gain. 

In order to assess the optical effect, several authors have 

proposed and used direct microanalytical measurements. 

5,10–13 These approaches have a microscopic device in 

common, which projects a focused image of either a knife 

edge, 10 an isolated illumination point, 12 or a sinusoidal 

pattern 11 on top of the paper. Using a microspectrophotometer 

allows the visible optical effect of the scattered light to 

be characterized by spectral measurements of the spatial distribution 

of the reflected light. However, use of such a microscopic 

device is not always affordable, especially from the 

perspective of printer manufacturers or graphic arts bureaux. 

We propose to simplify these efforts by introducing a 

calibration technique based on our light scattering and color 

halftone model. 6,7 

The analysis of the optical properties of paper can be 

considered as a special application of light scattering in turbid 

media. 14 From a fundamental point of view, light scattering 

can be derived from Maxwell’s equations; see, for instance, 

Ishimaru. 15 Concerning this approach, however, the 

same author also states: “... its drawback is the mathematical 

complexities involved, and its usefulness is limited.” [Ref. 

16, p. 2210]. On the other hand, transport theory directly 

models the transport of radiant power through turbid media. 

Because of its experimental conformity, transport theory 

is preferred in a large number of applications. The pragmatic 

success is thereby emphasized and the approximating character 

of the solutions is accepted, which is especially the case 

for the simple and popular variants like the Kubelka-Munk 

two-flux theory, see, e.g., Ref. 17. In this tradition our color 

prediction model follows an engineering approach for printing 

applications, particularly in connection with the control 

and calibration of color halftone printers where “engineering” 

stands for a balance between simplicity of use and accuracy 

of prediction. 

Other halftone color prediction models are a continuing 

subject of past and current investigations. Emmel, 18,19 for 

example, presents a recent survey and introduces a novel 

mathematical framework for spectral predictions of color 

halftone prints. The framework uses a global analytical approach 

based on matrix algebra that unifies most of the 

classical color prediction models. It is an efficient and intuitive 

model employing overall probabilities of photons entering 

and emerging through particular inking levels. However, 

the probabilistic description 20 used is “taken throughout the 

full sample area,” 19 which leads to a halftone-independent 

model. This makes it necessary to recalibrate the model’s 

coefficients for every different halftone technique used for 

printing. 

As an alternative to Emmel’s approach, our proposed 

halftone prediction model 7 is particularly intended to meet 

the requirements of a halftone-dependent characterization of 

digital printing devices. In order to be adaptable to arbitrary 

Figure 1. Diagram of an upper paper section with a light path scattered 

between the entry point x,y and the exit point x,y. 

halftone schemes we chose a numerical convolution approach 

using a separate optical modulation transfer function 

(MTF). The MTF model is founded on a three-dimensional 

extension of the Kubelka-Munk approach derived by analyzing 

multivariate partial differential equations with common 

computer algebra systems. 21 The derived extension approximates 

the scattered lateral light within semi-isotropic substrates. 

Although the one-dimensional Kubelka-Munk 

theory has methodological weaknesses, 17,22 the inaccuracy of 

the predictions for the applications considered is limited. 

Moreover, it can be expected that the three-dimensional extension 

also increases the prediction accuracy. Like the underlying 

theory, the current approach relates the light propagation 

characteristics to a few substrate-dependent scattering 

and absorption coefficients. Specular and internal reflections 

at the interfaces and transmittances are considered as 

boundary conditions. Furthermore, the scattering concept 

can easily be extended to brightened fluorescent media as 

shown in Ref. 7. 

MTF-BASED SPATIAL-SPECTRAL HALFTONE 

PREDICTION MODEL 

In this section we describe the model used for light propagation 

in printed paper without elaborating the mathematical 

derivation. We are especially interested in an optical halftone 

model as a function of several easy predictable 

parameters such as scattering or partial reflection coefficients. 

The basis of the proposed approach 7 is to relate the 

reflected (reemitted) image and the local impact of the 

spread light to what is known as the point spread function 

(PSF) of paper. 10,23 The PSF models the scattered light intensity 

by the probability density h R x−x,y−y of a photon 

that enters the substrate at location x,y to exit at 

x,y; see Figure 1. Let x,y be the inner transmittance of 

the print layer at location x,y, wherex,y= ink if 

x,y is covered by ink and x,y=1 otherwise, then the 

light reflected at point x,y of a halftone print is given by 23 

Rx,y = x,yh R x − x,y − yx,ydxdy. 

For simplicity, we ignore for the moment any specular or 

partial internal reflections at the paper interfaces. Usually, 

the computational effort of calculating the convolution integral 

is reduced by applying a two-dimensional Fourier transform 

to Eq. (1). Application of the commonly known convolution 

theorem replaces the integral operation by simple 

multiplication in the Fourier domain, yielding 

1 

284 J. Imaging Sci. Technol. 514/Jul.-Aug. 2007


Figure 2. Scheme of the paper bulk with the considered inner partial 

reflections and the illuminating flux i 0 . 

Rx,y = x,yF −1 †H R ,Fx,y‡, 

where F denotes the Fourier transform and F −1 its inverse 

[see the Appendix section and Bracewell 24 for details]. The 

variables (,) are the lateral frequencies in the Fourier domain 

and H R , is the MTF of paper, i.e., the Fourier 

transform of the PSF. Fx,y is the Fourier transform of 

the inner transmittance of the print layer. 

We now extend Eq. (2) for practical purposes and incorporate 

additional surface refractive corrections in a similar 

way to Saunderson. 25 To begin with, we first supplement 

the specularly reflected fraction ap x,y at the upper side 

ink-air interface. (We use the following subscripts: ap: air to 

paper, pa: paper to air, and pb: paper to backing). Secondly, 

we distinguish between the incident transmitted fraction 

ap x,y and the emerging fraction pa x,y because of different 

illuminating and viewing geometries, yielding 

Rx,y = ap x,y + pa x,yF −1 †H R ,F ap x,y‡. 

In Eq. (3), the multiplication with the outward transmittance 

pa x,y represents the passage of the emerged light 

through the ink layer before being captured by a sensor. On 

the other hand, the multiplication of the MTF with the Fourier 

transformed inward transmittance F ap x,y accounts 

for the spreading effect induced by the scattered light. Its 

propagation through the inner paper bulk is constrained at 

the bottom by the inner partial reflection pb and at the top 

interfaces by pa ; see Figure 2. We distinguish between both 

reflectances because the scattering bulk is usually faced by 

different media during measurements, as will be seen later. 

Note that Eq. (3) considers the halftone structures only 

through both inner transmittance factors ap x,y and 

pa x,y, i.e., we assume the MTF H R , to be independent 

from the local halftone structure. Consequently, we 

consider the situation depicted in Fig. 2, and the inner partial 

reflectances pa and pb are chosen halftone independently. 

Finally, with respect to multi-ink prints, an additional 

model is required for the calculation of the spectral inner 

transmittance of the overprinted ink layers. As such, Beer- 

Lambert’s multiplication of transmittances is a frequently 

chosen and simple approach. 26 More complex alternatives 

may be considered when dealing with fluorescent inks 19 or 

with ink penetration effects. 27 

In microscopic image analyses, Eq. (3) proves very useful. 

In this section we outline how to determine a mathematical 

expression of the MTF H R ,. The bulk of papers 

usually consists of a fiber network which deflects 

passing light rays in arbitrary directions. 28 This behavior is 

2 

3 

responsible for the scattering properties of the paper and, 

therefore, for its MTF. As mentioned earlier, the phenomenological 

Kubelka-Munk (KM) approach 3,29 is successfully 

used in application fields involving familiar light propagation 

problems. However, the KM approach considers only 

two diffuse radiant fluxes, one in the direction of incidence 

and the other in the opposite direction. In other words, the 

spatial distribution of the re-emitted light cannot be accounted 

for by using the results of the traditional KM 

theory. Therefore, we approached the MTF by extending the 

KM theory to a three-dimensional space 7 in a similar way to 

the two-dimensional extension proposed by Berg. 30 

For an infinitesimal volume cube of the substrate, six 

light fluxes along and opposed to the coordinate axes x, y, 

and z are considered. The fluxes are non-negative by definition 

and are specified for −x,y and 0zD, 

where D is the thickness of the paper sheet. The study is 

confined to temporal steady state analyses of samples with a 

given absorption 0 that are illuminated with a light 

source of finite power. Hence the fluxes are expected to decay 

greatly in both lateral directions as x +y →. The 

correctness of this assumption was already shown by the 

measurements of Yule and others see, e.g., Ref. 10. Now, let 

us consider the intensity of the downward incident light flux 

along the vertical propagation direction z and call it i. The 

theory of KM is based on the assumption that the fractional 

amount of light lost by absorption between z and z+dz is 

given by dz, where the absorption density corresponds to 

the absorption coefficient K of the original KM theory. Also 

scattering decreases the considered intensity but, in our case, 

we consider different kinds of scattering densities: b and l , 

where b is the back scattered intensity and l is the intensity 

scattered laterally to the initial direction. Accordingly, 

after passing dz the flux i is reduced by dz, where the 

coefficients obey 

= +4 l + b . 

We call the modeled scattering behavior semi-isotropic, 

where isotropic stands for the symmetry related to x,y,z and 

semi indicates l b . The absorbed light is lost to the system, 

but back-scattered light from the upward propagating 

flux is added to i. A quarter of the laterally scattered light 

from each lateral flux is also added to i. Applying the same 

reasoning to each of the remaining five fluxes we obtained a 

system of six coupled linear partial differential equations 7 

(PDE) that are familiar to the type of the seven-flux model 

of Yoon et al. 31 [Remark: According to the design of these 

PDEs, the incident light flux primarily propagates along the 

coordinate axes after an initial scattering event. However, in 

our applications, the resulting rotational asymmetry reaches 

numerically at most about one percent. 7 We disregard this 

asymmetry in favor of the obvious application advantages 

presented below]. 

In order to derive the paper’s MTF, we used the generalized 

two-dimensional Fourier transform. This allows the 

PDE system to be reduced to a pair of equations similar to 

the original ordinary KM differential equations in z but with 

4 

J. Imaging Sci. Technol. 514/Jul.-Aug. 2007 285


added dependencies on the spatial frequencies and . 7 For 

the PDE system at hand, the generalized two-dimensional 

Fourier transform is appropriate, since the fluxes are defined 

on the real plane R 2 and decay greatly as x +y →. 24,32,33 

Solving the system of transformed PDEs at z=D yields the 

spectral reflectance MTF 

where 

H R , = Fh R x,y = A 

B 

, 

A = a 12 + a 21 − c pb e cD − a 12 + a 21 + c pb e −cD , 

5.1 

5.2 

B =−a 21 + c + a 12 pa + pb + a 21 − c pa pb e cD + a 21 

− c + a 12 pa + pb + a 21 + c pa pb e −cD . 5.3 

The coefficients a 12 =a 12 , a 21 =a 21 , and c=c depend on 

the lateral frequencies , and are given by Eqs. 

(6.1)–(6.6): 

where 

2 2 

c =a 21 − a 12 , 6.1 

a 12 = 

a 21 = 

s 3 

s 1 

, 

s 2 

s 1 

, 

6.2 

6.3 

s 1 = − b 2 −2 l + b +2 l + b +4 2 2 − b 2 

2 + 2 +16 4 2 2 , 

s 2 = − b 2 −2 l + b +2 l + b −4 l 2 

+4 2 − b + b −2 l 2 2 + 2 

6.4 

+16 4 2 2 , 6.5 

s 3 = − b 2 −2 l + b b +2 l + b −4 l 2 

+4 2 − b b + b −2 l 2 2 + 2 

+16 4 b 2 2 . 6.6 

Likewise, the microspectral transmittance distribution is 

modeled by 

Tx,y = pa x,yF −1 †H T ,F bp x,y‡, 

with the spectral transmittance MTF H T , 

7 

H T , = Fh T x,y =− 2c 

B 

. 

Here, bp x,y describes the fraction of light transmitted into 

the substrate through the bottom layer. Equation (7) implies 

that the optical spreading has no direct observable effect on 

transmittance measurements of single-side, upward-oriented 

printed paper sheets. This is consistent with microscopic 

transmittance images published by Koopipat et al. 13 

The two spatial formulas, Eqs. (3) and (7), are the foundation 

used in predicting the spectral reflectance of arbitrary 

halftone prints presented and discussed in the Model Application 

and Model Discussion sections below. The following 

section considers the calibration of the optical parameters. 

MODEL CALIBRATION 

Our next objective is to determine the optical dot gain for 

arbitrary halftones and dithering frequencies from a few 

macroscopic spectral measurements. More precisely, we need 

to calculate the isolated optical dot gain in order to distinguish 

between the different effects leading to printing 

nonlinearities. In order to meet this requirement, Eq. (3) 

describes the spectral reflectance Rx,y as a function of the 

bulk parameters D, , l , and b together with the surface 

refractive coefficients ap , ap , pa , pa , and pb . Unfortunately, 

only the paper thickness, D, is directly measurable 

with common instruments. Therefore, we determine the remaining 

parameters in such a way as to best match the calculated 

results to the measured spectra of a small set of test 

patches. In order to avoid any printing irregularity and to 

increase the accuracy of the parameter estimation, the test 

patches are chosen to be as unambiguous as possible. In 

particular, we chose only solid patches of the primary 

colors—in our case cyan, magenta, yellow, and black prints, 

abbreviated to CMYK—plus a sample of paper-white (W). 

For this work, we consider only single side prints. As well as 

avoiding the printing irregularities, the uniformity of the 

solid patches also reduces the matrix multiplication of Eq. 

(3) to a simple scalar multiplication. This is because 

Fx,yof a uniform patch is only different from zero at 

zero frequencies ,0,0. Hence, for a solid patch, 

Eqs. (3) and (7) reduce to 

R solid = ap + pa H R 0,0 ap , 

T solid = pa H T 0,0 bp . 

8 

9 

10 

In other words, the form of the PSF h R x,y is not involved 

in the calibration process. 

Traditionally, the KM theory scattering and absorption 

coefficients K and S are determined using two distinct reflectance 

measurements: one measurement over a black 

backing and another over a white backing, both backings of 

known reflectances. However, in our case, more measurements 

are required because we need to determine not only 

the scattering and absorption coefficients but also the inner 

transmittance of the primary inks used in addition to the 



Figure 3. Considered configurations of the calibration measurements of a paper sheet printed on one side 

with a solid patch. Upper left: Spectral reflectance on a known black backing. Upper right: Spectral reflectance 

on a known white backing. Lower left: Regular spectral transmittance. Lower right: Flipped spectral 

transmittance. 

refractive coefficients. Therefore, we propose measuring the 

spectral reflectance and transmittance of each of the five 

solid test set patches (CMYK/W) in each of the four configurations 

depicted in Figure 3. These comprise the usual 

reflectance with a black and a white backing of known 

reflectances R b in addition to the transmittance of the 

samples in both a regular configuration and a flipped-over 

configuration yielding a total of nineteen different spectra. 

The spectral measurements were carried out using a 

standard R/T spectrophotometer (Gretag-Macbeth 

SpectroScan T). Its geometry of illumination (45°, collimated 

annular) and of viewing (0°) contradicts, strictly spoken, 

the general KM assumption of diffuse illumination and 

measurement. Nevertheless, according to Kubelka, 34 as we 

examine highly scattering paper substrates, we expect the 

geometric discrepancies at most to scale the determined values 

of the scattering and absorption coefficients by a constant 

factor. In our application we disregard its effect as long 

as the geometry of the measuring equipment is kept unchanged 

between calibration and prediction. The geometry 

of each measurement configuration also affects the assumed 

surface refractive corrections that are mutually connected by 

functional relations. We make these relations explicit by 

first-order approximations and introduce additional common 

measurement coefficients. 35 The considered surface refractive 

coefficients are illustrated in Figure 4. For simplicity, 

we assume the individual inks to have equal refractive indices, 

which may, however, deviate from the bulk refractive 

index of the unprinted paper. Additionally, we disregard any 

wavelength dependency of the refractive indices. In order to 

list the refractive coefficients for each of the printed and 

unprinted cases, we hereafter identify each coefficient with 

one of the subscripts I or , respectively. (With regard to 

halftone patches, we vary these coefficients in accordance 

with the ink coverage). 

We begin with the situation of measuring the reflectance 

of the samples and consider the recorded fraction ap 

of the specular reflection s : 

apI = K s sI , ap = K s s , 11 

where K s amounts to the fraction captured by the instrument’s 

field of view. 36 Next, the incident transmission ap is 

determined by the transmitted fraction s =1− s in addition 

to what is known as the internal transmittance of the upper 

side layer [Ref. 26 p. 30] thus we approximate 

apI = 1− sI , ap =1− s . 12 

Similarly, we approximate the coefficient for the outward 

transmittance through the upper interface 

paI = 1− iI , pa =1− i . 13 

In Eq. (13), 1− i represents the fraction of the light transmitted 

diffusely from inside the scattering substrate. It thus 

differs from the fraction 1− s used in Eq. (12), which accounts 

for the collimated incidence. Accordingly, the amount 

of internal light reflected diffusely at the upper interface is 

Figure 4. Refraction and transmission coefficients at the interfaces of the 

paper sheet. 



Figure 5. Spectral calibration performance for the nonbrightened double-calandered APCO paper. The solid 

lines plot the spectral predictions and the crosses mark the corresponding spectral measurements. The paper 

has a thickness of about 99 m. 

paI = iI , pa = i . 14 

We continue at the bottom of the paper and consider the 

internal fraction of light reflected diffusely at the bottom 

interface. Similar to Eq. (14), it is determined by the internal 

reflection i plus the transmitted part through the bottom 

layer that is reflected back into the substrate at the backing 

pbI = iI + 1− iI 1− sI 2 R b , 

pb = i + 1− i 1− s R b . 

15 

b = l = . 

17 

As already pointed out, we estimate the phenomenological 

model parameters as the set of values which minimizes 

the deviation between the nineteen measured spectra 

and their calculated predictions. We obtain the parameter 

estimates by using common least square minimization routines 

such as the lsqcurvefit of MATLAB. This routine 

solves nonlinear data-fitting problems in the least square 

sense. 37 In our case, the routine finds the parameters X 

sought that minimize the error to the measured spectra 

S meas 

Here, is squared because of the double light passage 

according to Beer-Lambert’s law. 26 Finally, for the case of 

measuring the transmittance of the samples, the part transmitted 

from beneath the sample into the scattering substrate 

is approximated to 

1 

min 

X 2 S solidX − S meas 2 2 = 1 R solidi X 

2 i 

− R measi 2 + 1 T solidj X − T measj 2 , 

2 j 

18 

bpI = 1− sI , bp =1− s . 16 

These relations reduce the multitude of refractive model parameters 

to K s , s , i for the printed and unprinted cases in 

addition to of each primary ink. Together with the 

scattering and absorption coefficients, six scalar coefficients 

and seven spectral coefficients are obtained that may be estimated 

using the nineteen available spectral measurements. 

However, in order to simplify the optimization process further, 

we limit the model to the isotropic case assuming both 

scattering coefficients to be equal, hence 

where R solid X and T solid X are the spectral reflectance 

and transmittance of the samples calculated with the 

parameters fixed to X according to Eqs. (9) and (10), respectively. 

The routine used finds the coefficients so that the 

solution is always bounded within an appropriately chosen 

range for each parameter of X. 

CALIBRATION RESULTS 

An example of the obtained calibration results is illustrated 

in Figure 5. The depicted spectra were obtained from a rep- 



Figure 6. Fitted estimates of the spectral absorption and scattering coefficients 

underlying the data of Fig. 5. 

Figure 8. From bottom to top reflects the dependency of the optical dot 

gain of the APCO paper on increasing screen frequencies. The simulated 

halftone technique is a conventional, circular dot screen illustrated below 

the chart. 

al. 40 for a critical review. Originally, the MD model was proposed 

in a densitometric form due to limitations of the instruments. 

We use its spectral form, which reads 

Figure 7. Fitted estimates of the spectral internal transmittances C , 

M , Y , and K underlying the data of Fig. 5. 

resentative calibration of a non-brightened doublecalandered 

APCO paper 38 printed by a common, four-color 

xerographic desktop printer. The average calibration accuracy 

achieved 1E * 94 , which is in the order of the colorimetric 

discrimination performance of the human eye. Figure 

6 shows the fitted spectral model coefficients of absorption 

and scattering obtained. In this experiment, the estimates 

for the inner reflectance and transmittance coefficients 

of the printed interfaces are i =10%, s =1.25% and 

K s =35%. These scalar values, and particularly i differ from 

usual reports in related research. 36,39 In our opinion, this 

discrepency is mainly due to the different underlying approaches 

of accounting for the interactions between the partial 

reflections and the scattered differential light fluxes. Finally, 

Figure 7 plots the estimated internal transmittances 

of the separate CMYK inks. 

MODEL APPLICATIONS 

Given a calibrated parameter set of a particular paper-ink 

combination, such as those derived in the previous section, 

the spectral prediction Eqs. (3) and (7) allow the expected 

microimages of arbitrary halftone prints on that paper to be 

simulated. In this section, after recalling the basics of dot 

gain calculations, we apply the proposed approach and analyze 

optical dot gain predictions for idealized halftone prints. 

Common dot gain calculations of arbitrary halftones are 

based on the model of Murray-Davies (MD); see Wyble et 

R = 1−a t R g + a t R t , 

19 

where R is the predicted reflectance spectrum, R g is 

the reflectance spectrum of the bare substrate and R t is 

the spectral reflectance spectrum of the color at full area 

coverage. The linear interpolation variable a t commonly refers 

to the theoretical area coverage of the predicted halftone 

print, i.e., the dot area of the binary image actually sent to 

the printer. Usually, the MD model overestimates the measured 

spectral reflectances R meas since it disregards the 

combined optical and mechanical dot gain effects as introducedintheBackground 

section. The overestimation gives 

rise to introduction of the effectiveareacoveragea eff , which 

refers to an estimated value that best fits the calculated reflectance 

R to the measured spectrum R meas . For a 

single wavelength, typically chosen at minimum reflectance, 

a eff is given by using Eq. (19) 

a eff = R meas − R g 

R t − Rg , 20 

with suppressed wavelength notation for simplicity. This allows 

the dot gain to be defined as 

= a eff − a t . 

21 

Use of Eq. (3) allows estimation of the optical dot-gain 

o of any given binary halftone image Ix,y with a dot area 

equal to a s . In this case o is given by 

o = a eff − a s = Rx,y − R g 

R t − R g 

− a s . 22 

In Eq. (22), Rx,y is the spatial average of the reflectance 

image predicted using Eq. (3) and replaces the measured 

reflectance R meas of Eq. (20). An illustrative example is seen 

in Figure 8 which depicts optical dot-gain predictions for the 

APCO paper 38 against the area coverage a s for various screen 



Figure 9. Calculated optical dot gain curves at medium area coverage 

against the screen ruling with respect to two typically used halftone 

screens. The plots were predicted using the data of the APCO paper. 

Figure 11. Comparison of our LSF-predictions with normalized measurement 

points of a different but apparently similar paper type published by 

Arney et al. 5 The dots depict Arney’s measurements. The dashes and the 

solid line plot the obtained predictions for the ColorCopy and the APCO 

paper, respectively. 

particular the impact of the optical dot-gain’s variations 

around the commonly used ruling of 60 lpcm 150 lpi. 

In this region, any slight shift of the screen ruling obviously 

induces a non-negligible error arising solely from the optical 

dot gain. 

Figure 10. Calculated optical dot gain curves for two different types of 

office paper plotted against screen ruling. The conventional circular dot 

screen was used. The dashes and the solid line plot the obtained predictions 

for the ColorCopy paper and the APCO paper, respectively. 

frequencies (so-called ruling). The type and magnitude of 

the optical dot gain curves shown are close to analyses presented, 

e.g., by Gustavson 8 or by Yang [Ref. 41 p. 201], which 

apply a simple Gaussian-like function to approach the shape 

of the paper PSF. 

Clearly, for calculating the halftone reflectances, the 

simplifying assumption underlying Eq. (9) no longer holds 

true and the coefficients become spatially dependent. Hence 

we numerically evaluate the spatial formula, Eq. (3), by using 

the standard inverse two-dimensional fast Fourier transform 

(iFFT). 24 Thereby we sample the coefficients of concern 

at spatial high resolution frequency grids , with a 

sampling frequency exceeding the Nyquist rate of the halftone 

structures, i.e., the coefficients are filled out at each 

x i ,y j according to the simulated binary image Ix i ,y j with 

the calibrated values of section Calibration Results. 

Another application example is given in Figure 9, which 

illustrates the influence of different types of halftone screens 

on the magnitude of the optical dot gain at medium area 

coverage of a s =50%. With the proposed simulation, it is 

also convenient to explore the effect of changing the type of 

paper on the optical dot gain. For the case of the conventional 

circular dot halftone screen, Figure 10 shows the optical 

dot gain prediction for the APCO paper compared with 

results obtained with a similarly calibrated parameter set for 

the ColorCopy office paper. 42 These graphs demonstrate in 

MODEL COMPARISON AND DISCUSSION 

To explore the performance of our microscopic predictions, 

we consider results of edge-trace measurements published by 

Arney et al. 5 For this purpose, we compare our prediction 

results with their published data of a measured line spread 

function (LSF) and the corresponding MTF of an office copy 

paper of a similar type to those analyzed in our laboratory. 

Figure 11 depicts a direct comparison of Arney’s measurements 

and our prediction of a microscopic line edge observation 

derived by using Eq. (3). The comparison of the 

MTF, the normalized modulus of the digital Fourier transform 

of the same data is illustrated in Figure 12. Both figures 

demonstrate good agreement between our predictions and 

the microdensitometric measurements. Note that the model 

parameters were calibrated using spectral measurements of 

solid patches only. 

A further comparison arises from Arney’s discussion of 

the influence of Kubelka-Munk’s absorption coefficient K on 

Oittinen and Engeldrum’s model. Oittinen, 2 as well as 

Engeldrum and Pridham, 1 suggested a simplified relationship 

between the Kubelka-Munk theory and the MTF of 

paper. According to their approach, the MTF of paper approximates 

the vertical derivative of the classical KM 

function. 3 It depends on the thickness of the sample D, the 

scattering coefficient S, and the absorption coefficient K. 

However, with edge-trace measurements Arney et al. showed 

that K has only a small influence on the width of the MTF of 

the paper compared to S and D. 4 Figure 13 illustrates this 

finding and compares the results of our model with those 

obtained by Oittinen and Engeldrum’s model. In particular, 

the figure plots the scalar measure k p proposed by Arney, 

which is equal to the inverse of the frequency at which the 

normalized MTF loses its half magnitude. The plot demon- 



our model shows good agreement. Moreover, the calibrated 

model agrees well with the experimental insight of Arney et 

al. 4 that the optical absorption coefficient of paper has an 

insignificant effect on the MTF width of paper. 

Figure 12. Comparison of the modulus of the digital Fourier transform of 

the data presented in Fig. 11. 

ACKNOWLEDGMENTS 

The author would like to thank Professor R. D. Hersch, 

École Polytechnique Fédérale de Lausanne (EPFL), Switzerland, 

for the continuing and supporting discussions. The 

constructive discussions with K. Simon, M. Vöge, and P. 

Zolliker are also gratefully acknowledged. Part of the investigation 

was financed by the Swiss Innovation Promotion 

Agency (grant KTI/CTI 6498.1 ENS-ET). 

Appendix 

We used the following form of the two-dimensional Fourier 

transform F, of a two-dimensional function fx,y 

 

F, fx,ye =− 

−2ix+y dx dy, 

23 

and the inverse two-dimensional Fourier transform 

 

fx,y F,e =− 

2ix+y d d, 

24 

Figure 13. Comparing the involvement of the absorption coefficient K of 

Oittinen and Engeldrum’s model with that of the absorption coefficient 

of our model obtained for the ColorCopy paper. The comparison is carried 

out using Arney’s data and his proposed k p scalar metric. 

strates the involvement of Oittinen and Engeldrum’s absorption 

coefficient K shown by Arney 5 and the more realistic 

influence of our absorption coefficient . 

CONCLUSIONS 

Our prediction model offers a new way of minimizing the 

effort of characterizing and calibrating color halftone printers 

and extends the computational framework of controlling 

color printers online. The model used computes high resolution 

spectral color images of arbitrary halftone prints on 

common office paper and newsprint types of paper. Light 

scattering effects are accounted for by relating the appearance 

characteristics to a few substrate-dependent fitting coefficients. 

The main advantage of the approach is to uncouple 

the calibration of the scattering and absorption 

coefficients of the paper from printing irregularities without 

using a microscopic device. We base the calibration on reflectance 

measurements of solid patches because they are 

affected much less by the mechanical dot-gain than halftone 

patches are. Thereby, the calibration achieves a good independence 

from the printing process and requires merely a 

common spectrophotometer for reference measurements. 

Comparing published microspectral knife-edge measurements 

of Arney et al. 5 with corresponding simulations of 

see Ref. 24 for details. 

REFERENCES 

1 P. G. Engeldrum and B. Pridham, “Application of turbid medium theory 

to paper spread function measurements”, Proc.-TAGA 1, 339–352 

(1995). 

2 P. Oittinen, “Limits of microscopic print quality”, in Advances in 

Printing Science and Technology (Pentech, London, 1982), Vol. 16, pp. 

121–138. 

3 P. Kubelka and F. Munk, “Ein Beitrag zur Optik der Farbanstriche”, Z. 

Tech. Phys. (Leipzig) 12, 593–601 (1931). 

4 J. S. Arney, E. Pray, and K. Ito, “Kubelka-Munk theory and the 

Yule-Nielsen effect on halftones”, J. Imaging Sci. Technol. 43, 365–370 

(1999). 

5 J. S. Arney, J. Chauvin, J. Nauman, and P. G. Anderson, “Kubelka-Munk 

theory and the MTF of paper”, J. Imaging Sci. Technol. 47, 339–345 

(2003). 

6 S. Mourad, P. Emmel, K. Simon, and R. D. Hersch, “Prediction of 

monochrome reflectance spectra with an extended Kubelka-Munk 

model”, Proc. IS&T/SID Tenth Color Imaging Conference (IS&T, 

Springfield, VA, 2002) pp. 298–304. 

7 S. Mourad, Color Prediction for Electrophotographic Prints on Common 

Office Paper, Ph.D. Thesis, École Polytechnique Fédérale de Lausanne, 

Switzerland, 2003, http://diwww.epfl.ch/w3lsp/publications/colour. 

8 S. Gustavson, “Color gamut of halftone reproduction”, J. Imaging Sci. 

Technol. 41, 283–290 (1997). 

9 J. A. C. Yule and W. J. Nielsen, “The penetration of light into paper and 

its effect on halftone reproduction”, Proc.-TAGA, 65–76 (1951). 

10 J. A. C. Yule, D. J. Howe, and J. H. Altman, “The effect of the 

spread-function of paper on halftone reproduction”, Tappi J. 50, 

337–344 (1967). 

11 S. Inoue, N. Tsumura, and Y. Miyake, “Analyzing CTF of print by MTF 

of paper”, J. Imaging Sci. Technol. 42, 572–576 (1998). 

12 S. Gustavson, Dot Gain in Colour Halftones, Ph.D. Thesis, Dept. of 

Electrical Engineering, Linköping University, Sweden, 1997. 

13 C. Koopipat, N. Tsumura, and Y. Miyake, “Effect of ink spread and 

optical dot gain on the MTF of ink jet image”, J. Imaging Sci. Technol. 

46, 321–325 (2002). 



14 B. Philips-Invernizzi, D. Dupont, and C. Cazé, “Bibliographical review 

for reflectance of diffusing media”, Opt. Eng. (Bellingham) 40, 

1082–1092 (2001). 

15 A. Ishimaru, Wave Propagation and Scattering in Random Media 

(Academic Press, New York, 1978), Vols. I&II. 

16 A. Ishimaru, “Diffusion of light in turbid material”, Appl. Opt. 28, 

2210–2215 (1989). 

17 B. Hapke, “Kubelka-Munk theory: What’s wrong with it”, in Theory of 

Reflectance and Emittance Spectroscopy (Cambridge University Press, 

Cambridge, UK, 1993), Chap. 11. 

18 P. Emmel, Modèles de prédiction couleur appliqués à l’impression jet 

d’encre, Ph.D. thesis, École Polytechnique Fédérale de Lausanne, 1998, 

http://diwww.epfl.ch/w3lsp/publications/colour. 

19 P. Emmel, “Physical models for color prediction”, in Digital Color 

Imaging, Handbook (The Electrical Engineering and Applied Signal 

Processing Series) (CRC Press, Boca Raton, FL, 2003), pp. 173–238. 

20 J. S. Arney, “A probability description of the Yule-Nielsen effect, I”, J. 

Imaging Sci. Technol. 41, 633–636 (1997). 

21 Wolfram Research, Inc. MATHEMATICA ® . http://www.wolfram.com. 

22 L. Yang and S. J. Miklavcic, “Revised Kubelka-Munk theory. III. A 

general theory of light propagation in scattering and absorptive media”, 

J. Opt. Soc. Am. A 22, 1866–1873 (2005). 

23 F. R. Ruckdeschel and O. G. Hauser, “Yule-Nielsen effect in printing: a 

physical analysis”, Appl. Opt. 17, 3376–3383 (1978). 

24 R. N. Bracewell, The Fourier Transform and its Applications, 3rd ed. 

(McGraw-Hill, New York, 2000). 

25 J. L. Saunderson, “Calculation of the color of pigmented plastics”, J. Opt. 

Soc. Am. 32, 727–736 (1942). 

26 G. Wyszecki and W. S. Stiles, Color Science, 2nd ed. (John Wiley & Sons, 

Inc., New York, 1982). 

27 L. Yang, R. Lenz, and B. Kruse, “Light scattering and ink penetration 

effectsontonereproduction”,J.Opt.Soc.Am.A18, 360–366 (2001). 

28 G. Kortüm, Reflectance Spectroscopy: Principles, Methods, Applications 

(Springer-Verlag, Berlin-Heidelberg, 1969). 

29 P. Kubelka, “New contributions to the optics of intensely light-scattering 

materials. Part II: Nonhomogenous layers”, J. Opt. Soc. Am. 44, 330–335 

(1954). 

30 F. Berg, Isotrope Lichtstreuung in Papier - Neue Überlegungen zur 

Kubelka-Munk-Theorie, Ph.D. Thesis, Technische Hochschule 

Darmstadt, 1997. 

31 G. Yoon, A. J. Welch, M. Motamedi, and M. C. J. Van Gemert, 

“Development and application of three-dimensional light distribution 

model for laser irradiated tissue”, IEEE J. Quantum Electron. 23, 

1721–1733 (1987). 

32 M. J. Lighthill, Introduction to Fourier Analysis and Generalised 

Functions (Cambridge University Press, Cambridge, UK, 1980). 

33 D. G. Duffy, Transform Methods for Solving Partial Differential 

Equations (CRC Press, Boca Raton, FL, 1994). 

34 P. Kubelka, “New contributions to the optics of intensely light-scattering 

materials Part. I”, J. Opt. Soc. Am. 38, 448–457 (1948). 

35 D. B. Judd and G. Wyszecki, Color in Business, Science and Industry, 3rd 

ed. (John Wiley & Sons, Inc., New York, 1975). 

36 F. R. Clapper and J. A. C. Yule, “The effect of multiple internal 

reflections on the densities of half-tone prints on paper”, J. Opt. Soc. 

Am. 43, 600–603 (1953). 

37 MathWorks. MATLAB Optimization Toolbox. Consider especially 

lsqcurvefit and fmincon, http://www.mathworks.com. 

38 ISO-2846-1. Graphic technology—Colour and transparency of ink sets for 

four-colour-printing, 1st ed. (ISO, Geneva, 1997). The nonbrightened 

APCO II/II paper is specified in Annex A. 

39 D. B. Judd, “Fresnel reflection of diffusely incident light”, J. Res. Natl. 

Bur. Stand. 29, RP1504 (1942). 

40 D. R. Wyble and R. S. Berns, “A critical review of spectral models 

applied to binary color printing”, Color Res. Appl. 25, 4–19 (2000). 

41 L. Yang, S. Gooran, and B. Kruse, “Simulation of optical dot gain in 

multichromatic tone reproduction”, J. Imaging Sci. Technol. 45, 198–204 

(2001). 

42 Neusiedler, Color Copy paper. http://www.neusiedler.com(2000). 

Brightened, highly opaque office paper. 




Gloss Granularity of Electrophotographic Prints 

J. S. Arney and Ling Ye 

Rochester Institute of Technology, Rochester, New York 14623 

E-mail: arney@cis.rit.edu 

Eric Maggard and Brian Renstrom 

Hewlett-Packard Co., Boise, Idaho 83714 

Abstract. The random variation in gloss often observed in images 

produced in electrophotographic printers has been examined by an 

analytical technique that combines the capabilities of a microdensitometer 

with a goniophotometer. The technique is called 

microgoniophotometry and measures both the spatial and the angular 

distribution of the specular component of reflected light. The 

analysis provides information about the spatial variation of 

specularly reflected light at all angles through which the specular 

light is reflected, not just at the equal/opposite angle at which gloss 

is traditionally measured. The results of this analysis have lead to an 

optical model of the random spatial variation in gloss. The results 

indicate that dry toner is typically not completely fused and can be 

described as a surface composed of two distinct regions. These two 

regions differ in the extent of fusing that has occurred, as manifested 

by their differences in specular reflectance characteristics. 

The difference in reflectance is manifested primarily in their different 

angular distributions of specular light and also in their spatial 

frequency. © 2007 Society for Imaging Science and Technology. 


INTRODUCTION 

A bidirectional reflectance distribution function (BRDF) is a 

useful way to characterize the angular distribution of specular 

light reflected from materials. 1–8 Moreover, one would 

expect the BRDF to be a necessary part of a complete instrumental 

characterization of visual attributes of gloss. 9 In 

addition to the angular distribution of the specular light, the 

spatial distribution of the specular light may also play a role 

in visual gloss. 10,11 As illustrated in Figure 1, gloss in electrophotographic 

prints is not always spatially uniform. Indeed, 

spatial variations in gloss take many forms. Artifacts 

such as streaking and banding are often observed in high 

gloss prints, and differential gloss involves differences in 

gloss between bordering regions of different color. The current 

report focuses on gloss granularity, which is the random 

gloss variation across a printed surface. Gloss granularity is 

illustrated in Fig. 1 with samples A and B showing different 

degrees of gloss granularity. 

Granularity analysis is an analytical technique that 

evolved during the 20th century to characterize silver halide 

photographic film. 12 The typical microdensitometer was an 

optical microscope with a fixed aperture and an electronic 

light detector. The film sample was scanned under the microscope 

and a trace of irradiance versus location was recorded. 

This technique is called microdensitometry. Currently, 

a microdensitometry scan may be performed more 

easily by a software routine applied to a digital image captured 

with a camera and appropriate microscope optics. 13,14 

Several reports have been published on the application of 

microdensitometry techniques to the analysis of gloss 

granularity. 10,11,15 All of these techniques involve detection of 

light at the specular angle (equal/opposite angle) while scanning 

across the surface of the sample. The current work 

extends this analytical technique to a measurement of the 

entire BRDF (goniophotometry) scanned spatially across the 

surface of a printed sample (microdensitometry). This analytical 

technique is called microgoniophotometry. 

THE MICROGONIOPHOTOMETER 

The microgoniophotometer has been described in detail in 

previous reports and is summarized in Figure 2. 1,2,16–18 The 

print sample is wrapped around a cylinder, and this presents 

all sample angles from −90° to +90° to the camera. The 

sample is illuminated with a linear light source placed at an 

angle of 20° from the camera. This places a bright specular 

line at the half angle, =10° between the camera and the 

source. Two images captured with this system are illustrated 

in Figure 3. 

As illustrated in Fig. 3, the specular component of the 

reflected light maintains its polarization and is observed only 

 

IS&T Member 

Received Jan. 3, 2007; accepted for publication Feb. 3, 2007. 

1062-3701/2007/514/293/6/$20.00. 

Figure 1. Examples of A rough and B smooth gloss granularity in 

electrophotographic prints produced by two different printers using different 

toners and fusing conditions. 

293

Arney et al.: Gloss granularity of electrophotographic prints 

Figure 5. BRDF of vs and BGDF of vs generated from Fig. 4. 

Curves are normalized to 1.00 at the peak value in order make a 

comparison. 

Figure 2. Schematic illustration of the microgoniophotometer. A linear 

polarizer is placed in front of the line light source, and another polarizer, 

called the analyzer, is in front of the camera. 

Figure 3. Images captured with the analyzer in front of the camera parallel 

to and perpendicular to the polarization direction of the light source 

polarizer. 

Figure 6. Microfacets of the surface are randomly oriented at different tilt 

angles. If the facet tilt results in an equal/opposite angle between the 

camera and the light source, then light enters the camera. Otherwise the 

specular light misses the camera. A piece of shattered automobile window 

glass is a macroscopic illustration of bilevel gloss granularity. 

sample. A plot of the mean value versus tilt angle, vs , is 

a bidirectional reflectance distribution function, BRDF. A 

plot of the standard deviation versus tilt angle, vs , isa 

bidirectional granularity distribution function, BGDF. (See 

Figure 5.) It is the granularity of the specular light at each 

angle on the BRDF. 

Figure 4. The difference image A-B shows only the specularly reflected 

light. The mean, , and the standard deviation, , of the specular light is 

determined at each column in the image. 

in the image with parallel polarizers. Both the crossed and 

the parallel polarizers capture the same amount of diffuse, 

randomly polarized light. The difference image, (A-B) in 

Figure 4, shows only the specular light. 

The horizontal location of each column in the difference 

image (A-B) corresponds to a tilt angle, , on the print 

A FACET MODEL OF SPECULAR GRANULARITY 

Johansson, Béland, and MacGregor have introduced a model 

of specular reflection called the microfacet model, 10,11 and 

the microfacet model has been applied to the problem of 

synthetic scene generation in computer graphics. 19 The 

microfacet model assumes the surface that reflects the specular 

light can be described as a set of small facets, each at a 

randomly tilted angle, as illustrated schematically in 

Figure 6. The only facets that will deliver light to the camera 

are those facets tilted exactly to produces an equal/opposite 



Figure 7. Example for a sample of solid black toner printed by a typical 

electrophotographic printer. The solid line is Eq. 3, and the points are 

from experimental measurements of and 2 over the range −50° 

50°. 

angle between the source and the camera. Otherwise the 

light misses the camera. The result would be expected to be 

the bilevel image of specular glints, as illustrated in Fig. 6. 

The line light source used in the microgoniophotometer 

is assumed to be infinite in the direction colinear with the 

cylinder so that a facet tilt in the orthogonal direction, , 

always directs light to the camera. Therefore, the BRDF measured 

with the microgoniophotometer should be a direct 

measure of the random distribution of facet tilt angles in the 

direction. By normalizing the area under the BRDF, vs 

, to unity, the probability density function, P, for the 

random tilt angles, , can be formed as shown in Eqs. (1). 

The value of P at each angle, , is a measure of the fraction 

of the surface that contains facets at exactly angle : 

90 

K =−90 

d and P = 

K . 1 

Each facet that is at the correct specular angle delivers 

light at irradiance I to the camera. All other facets produce 

an irradiance of I=0. The result is irradiance I at the facet 

location projected onto the camera sensor plane. This bilevel 

set of facets should produce an average value and a standard 

deviation given by Eqs. (2) and (3). Note from Eq. (1) that 

the area under the BRDF ( vs ) is an experimental measure 

of the irradiance, I=K, 

= P · I, where I = K, 2 

2 = P · 1−P · I 2 . 

In order to test the facet model quantitatively, experimental 

measurements of 2 versus were carried out for 

twenty samples of solid black (single toner) produced by 

different printers with different toners and different fusing 

conditions on different substrates. Values of P were calculated 

from with Eq. (1), and the data was plotted as 2 

versus P·1−P. Figure 7 is an example for a typical solid 

black toner printed by laser EP. The measured values of 2 

were much lower than predicted, and the data do not show 

the linearity of Eq. (3). Thus the facet model illustrated in 

Fig. 6 does not provide a complete, quantitative rationale for 

the measured data. 

3 

Figure 8. The blurring effect of the camera pixels projected onto the 

surface facets. 

AN EXPANDED FACET MODEL 

It is not surprising that the experimentally measured values 

of 2 are lower than predicted. Equation (3) is based on the 

facets as if they were measured with infinite resolution. 

However, there is no reason to expect the surface facets to be 

large relative to the size of the camera pixels projected onto 

the surface. Indeed, if the camera pixels are larger than the 

facet size, the camera image will blur the image through a 

convolution with the effective aperture of the camera pixels. 

This is illustrated in Figure 8. The effect can be described 

quantitatively by modifying Eq. (3) with a blurring factor, k, 

as shown in Eq. (4): 

2 = P · 1−P · I 2 · k 2 . 

The nonlinearity observed in Fig. 7 requires additional 

modification of the facet model. Figure 9 suggests a modification 

based on the microstructure of the facets. Visual inspection 

of the printed samples in specular light indicates 

that the samples have a variety of different microstructures. 

Moreover, visual inspection of many samples suggests that 

the microstructures may be described as a population of two 

types of surfaces; one with well fused toner and the other 

with more poorly fused toner. This model is illustrated schematically 

in Figure 10. 

These two regions would be expected to contribute to 

the overall measured BRDF and granularity of the sample. 

This is described in Eqs. (5)–(7), where P a and P b are the 

probability density functions for the distribution of surface 

tilt angles in the two regions illustrated in Fig. 10, a and b 

are the rms granularity characteristic of the two regions, and 

F is the fraction of the surface that is region (a). Note that 

Eq. (7) reduces to Eq. (3) for P a =P b : 

P = F · P a + 1−F · P b , 

2 = F · a2 · I 2 + 1−F · b2 · I 2 , 

4 

5 

6 



Figure 11. P normalized BRDF versus angle for a typical solid black 

printed by laser EP. The solid line is experimental data. The dotted line is 

the model of Eqs. 5, 9, and10 with s a =5.1°, s b =12.7°, and 

F=0.3. 

Figure 9. Closeup of the specular band for experimental samples 1 and 

2. 

Figure 12. BRGF versus angle for a typical solid black printed by 

laser EP. The solid line is experimental data. The dotted line is the model 

of Eq. 8 with k a =0.95 and k b =0.20. 

Figure 10. Schematic illustration of partial fusing of toner. 

2 = F · P a · 1−P a · I 2 + 1−F · P b · 1−P b · I 2 . 

Equation (7) needs to be adjusted to account for the 

aperture effect of the camera pixels, as described above. 

However, one might expect the pixel aperture effect, the constant 

k in Eq. (4), not to be the same for the two regions. 

Thus we write Eq. (7). Equations (5)–(8) represent an expanded 

facet model of specular reflections: 

2 = F · P a · 1−P a · I 2 · k a 

2 

7 

Figure 13. Example for a sample of solid black toner printed by a typical 

laser EP printer. The solid line is Eq. 3, and the points are from experimental 

measurements of and 2 over the range −50° 50°. 

+ 1−F · P b · 1−P b · I 2 · k 2 b . 8 

By combining Eqs. (5), (9), and (10), the BRDF can be 

modeled by adjusting the parameters, s a , s b , and F to achieve 

APPLYING THE EXPANDED FACET MODEL 

the best fit with the experimental data. Figure 11 shows the 

In order to model the BRDF and BGDF, the two individual result for one of the printed samples. The model parameters 

PDF functions P a and P b are needed. These functions were s a , s b , and F were adjusted to achieve the minimum rms 

assumed to be normal distributions described by Eqs. (9) deviation from the experimental data. 

and (10): 

Equation (8) has two additional parameters, k a and k b , 

that must be adjusted to model the BGDF, versus . 

1 

P a = e 

s a 

−2 2 

/2s a, 

9 

Figure 12 shows the minimum rms deviation between the 

2 model and the data, and Figure 13 shows the corresponding 

plot of versus P. The model provides a rationale for the 

1 

significant deviation from linearity predicted by Eq. (3). 

P b = e 

s b 

−2 2 

/2s b. 10 

2 



Figure 14. Examples of differences in behavior observed and modeled for other solid samples of black toner 

from different printers. Model parameters for s a , s b , F, k a , and k b are also shown. 

SPATIAL SIGNIFICANCE OF PARAMETERS k a AND k b 

Figure 14 illustrates the behavior of three additional samples 

of solid black toner printed by different electrophotographic 

printers. The differences in behavior are more easily observed 

by plotting versus P·1−P. The solid lines show 

the models that best fit the data, and the modeled values of 

s a , s b , F, k a , and k b are also shown. From an analysis of 15 

samples of black toner produced in different printers, this 

behavior appears to be representative of typical electrophotographic 

samples. 

The physical meanings of parameters a , b , and F are 

indicated in the diagram of Fig. 10. In all cases s a s b , which 

suggests that the range of surface tilt angles in region (a) is 

less than the range of angles in region (b). This is reasonable 

if the toner in region (a) is more thoroughly fused than 

region (b). The fraction F in every case is less than 0.5, 

which suggests that there is less of the smooth region (a) 

than of the more rough region (b). 

The physical meaning of the parameters k a and k b is less 

obvious. In every case k a k b . This suggests the effect of the 

pixel aperture convolution with the facet size has more of a 

blurring effect in the rough region (b) than in the smooth 

region (a). A possible rationale for this observation may be 

that the rough region (b) is also a higher frequency region. 

The low pass filtering effect of the pixel aperture would indeed 

be expected to have a have a larger effect on the higher 

frequency region (b) than the lower frequency region (a). 

Thus k a and k b provide spatial information about the gloss 

granularity in addition to the magnitude parameters s a and 

s b . 

As a check of the interpretation of k a and k b as indices 

of relative spatial frequency, the (A) image illustrated in 

Fig. 3 was low-pass filtered with a Gaussian kernel of radius 

R. Values of R were selected over the range R=0 (no filtering) 

to R=20 m. Each image was analyzed to extract experimental 

values of and as described above, and from 

fitting the model to each data set, values of the model parameters 

were determined as described above. The results 

are shown in Figure 15. As one would expect, the smoothing 

kernel had only a small effect on the width parameters, s a 

and s b . However, the values of k a and k b declined significantly, 

with k a decreasing much more than k b . 

Figure 15. Values of s a , s b , k a ,andk b for a printed sample of black toner 

analyzed through low pass filters of radius 0R20 m. 

DISCUSSION 

The behavior shown in Fig. 15 is consistent with the interpretation 

of k a and k b as noise attenuation factors related to 

the low pass filtering effect of the effective pixel aperture and 

the assumption that facets in the smooth region (a) are 

larger (lower frequency) than those in less well fused regions 

(b). The smaller facets in region (b) are low pass filtered to a 

larger extent than those in region (b) by the pixel aperture 

effect, so k b k a . Further filtering by the added Gaussian 

filters lowers both k a and k b , as expected, and they approach 

the same values for extreme low-pass filtering R=20 m. 

As discussed in a previous report, the width of the 

BRDF is an inverse index of traditional gloss. 13 A narrow 

curve correlates with a high gloss reading. In the current 

work, it appears that fused toner can be interpreted in terms 

of two spatial regions that differ in the degree of fusing. The 

well fused region has a narrow BRDF, indicated by the value 

of s a , and the poorly fused region has a broader BRDF indicated 

by s b . The magnitude of the rms deviation of gloss, 

called gloss granularity, is indicated by the values of k a and 

k b . As is typical of granularity indices, their magnitude is 

dependent on the effective spatial aperture of measurement. 

In this case that spatial aperture is the area of a camera pixel 

projected onto the surface. The range of behaviors of k a and 

k b observed in these experiments indicates that gloss granularity 

has a significant spatial frequency component that remains 

to be examined in future research. 

REFERENCES 

1 J. S. Arney and Hung Tran, “An inexpensive micro-goniophotometry 

you can build”, Proc. IS&T’s PICS Conference on Digital Image Capture, 

Reproduction, and Image Quality (IS&T, Springfield, VA, 2002) pp. 

179–182. 

2 J. S. Arney, H. Hoon, and P. G. Anderson, “A micro-goniophotometer 

and the measurement of print gloss”, J. Imaging Sci. Technol. 48, 458 

(2003). 

3 J. M. Bennett and L. Mattsson, Introduction to Surface Roughness and 

Scattering, 2nd ed. (Optical Soc. of America, Washington, DC, 1999), 

Chap. 3. 

4 J. C. Stover, Optical Scattering, Measurement and Analysis (McGraw Hill, 

NY, 1990). 

5 I. Nimeroff, “Two-parameter gloss methods”, J. Res. Natl. Bur. Stand. 



58(3), 127 (1957). 

6 Standard Practice for Angle Resolved Optical Scatter Measurements on 

Specular or Diffuse Surfaces, Standard Procedure No. E 1392-96, 

American Society for Testing and Materials, 1996. 

7 I. Nimeroff, “Analysis of goniophotometric reflection curves”, J. Res. 

Natl. Bur. Stand. 48(6), 441 (1952). 

8 H. Rothe and D. Huester, “Application of circular and spherical statistics 

for the interpretation of BRDF measurements”, Proc. SPIE 3141(02), 13 

(1997). 

9 M. Colbert, S. Pattanaik, and J. Krivanek, “BRDF-shop: creating 

physically correct bidirectional reflectance distribution functions”, IEEE 

Comput. Graphics Appl. 26(1), 30 (2006). 

10 P.-Å. Johansson, “Optical homogeneity of prints”, doctoral thesis, KTH, 

Royal Institute of Technology, Stockholm, Sweden, 1999. 

11 M.-C. Béland, “Gloss variation of printed paper: relationship between 

topography and light scattering”, doctoral thesis, KTH, Royal Institute of 

Technology, Stockholm, Sweden, 2001. 

12 R. E. Swing, An Introduction to Microdensitometry (SPIE Optical 

Engineering Press, Bellingham, WA, 1998). 

13 J. S. Arney, P. G. Engeldrum, and H. Zeng, “An Expanded Murray- 

Davies model of tone reproduction in halftone imaging”, J. Imaging Sci. 

Technol. 39, 502 (1995). 

14 J. S. Arney, C. Scigaj, and P. Mehta, “Linear color addition in halftones”, 

J. Imaging Sci. Technol. 45, 426 (2001). 

15 Y. Kipman, P. Mehta, K. Johnson, and D. Wolin, “A new method of 

measuring gloss mottle and micro-gloss using a line-scan CCD camera 

based imaging system”, Proc. IS&T’s NIP17 (IS&T, Springfield, VA, 

2001) p. 714. 

16 J. S. Arney, L. Ye, and S. Banach, “Interpretation of gloss meter 

measurements”, J. Imaging Sci. Technol. 50(6), 567 (2006). 

17 J. S. Arney, P. G. Anderson, G. Franz, and W. Pfeister, “Color properties 

of specular reflections”, J. Imaging Sci. Technol. 50(3), 228 (2006). 

18 J. S. Arney, L. Ye, J. Wible, and T. Oswald, “Analysis of paper gloss”, J. 

Pulp Pap. Sci. 32(1), 19 (2006). 

19 M. Ashikhmin, S. Premoze, and P. Shirley, “A microfaceted-based BRDF 

generator,” Proc. SIGGRAPH (ACM Press, NY, 2000) pp. 65–74. 




Forensic Examination of Laser Printers and Photocopiers 

Using Digital Image Analysis to Assess Print 

Characteristics 

J. S. Tchan 

MATAR Research Group, London College of Communication, Elephant and Castle, 

London SE1 6SB, England 

E-mail: j.tchan@lcc.arts.ac.uk 

Abstract. The work in this paper describes a method that can assist 

the process of print identification with respect to the printing 

machine that produced it. The method used high spatial resolution 

and low-noise digital image analysis to measure the sharpness, intensity 

and size characteristics of individual text characters. The 

relative variations of these variables were used to identify the machine 

that produced the print under examination. The results 

showed that three machines could be distinguished and one of 

these machines also showed differences in the print produced when 

the toner cartridge was changed. © 2007 Society for Imaging Science 

and Technology. 


Received Oct. 12, 2006; accepted for publication Mar. 30, 2007. 

1062-3701/2007/514/299/11/$20.00. 

INTRODUCTION 

A reason why it is frequently not feasible to prosecute counterfeiters 

and document fraudsters is due to the difficulty in 

establishing links between the counterfeiters or fraudsters 

and their printing equipment. This is because of the extremely 

wide range of both cheap and expensive laser and 

ink jet printing machines available. This problem is exacerbated 

by the fact that new varieties of printing machine are 

being commercially produced on a continual and frequent 

basis. 

Also both laser and ink jet printers, which make up 

most of the cheap office printing market, have disposable 

ink and toner cartridges, or the cartridges can be refilled. 

This can prevent valid chemical analysis of any similarities or 

differences in chemical compositions from the various ink 

and toner cartridges. Also chemical analysis is a process that 

requires the destruction of part of the evidence. 

Methods which involve microscopy 1 are frequently used 

by forensic scientists to determine the production source of 

digital print. The linking of a document to a digital printer 

in these cases usually involves analyzing variables such as ink 

or toner overspray and assessing alignment, spacing and 

copy distortion. 

Investigations have been carried out by Oliver and 

Chen, 2 Tchan, Thompson, and Manning, 3 and Tchan 4,5 using 

digital image analysis to link documents to printing machines. 

Oliver and Chen have studied the relationship between 

the raggedness of print and text character distortion 

of different printers. Tchan, Thompson, and Manning have 

taken a similar approach but have also used neural networks 

to link the contrast, noise and edge characteristics of printed 

text to the printing machine that produced it. These attempts 

at using digital image analysis however could only 

provide a positive test for a small range of printing machines 

and do not account for the effect of replaceable toner and 

ink cartridges. 

The analysis of the actual shapes of the text characters 

produced by different print engines using digital image 

analysis is another possible way of fingerprinting printing 

machines, according to Tchan. 6 However, this method not 

only suffers from the influence of replaceable ink and toner 

components distorting the results of the analysis, but other 

drawbacks as well. First, processing huge amounts of image 

data is time-consuming due to the large number of fonts 

and their range of sizes from many different makes and 

models of printing machines. Secondly, the problem of ink 

spread on different kinds of paper or humidity conditions in 

ink jet printing processes distorts the shapes of text characters. 

An identification methodology for fingerprinting printing 

machines recently considered is to use a technique called 

ESDA 7 (Electrostatic Detection Apparatus). ESDA has been 

employed to detect and evaluate roller pressure marks on 

paper. 8 These pressure marks are due to the interaction between 

the paper feeder rollers and the paper substrate. The 

original application was for the detection of pen impressions 

several layers down in a notepad. It works by charging paper 

surfaces with a high voltage. If toner particles are applied to 

the charged paper surface, imperceptible pen marks due to 

writing on a piece of paper placed above this sheet may be 

revealed. This means for example, in a notepad, writing can 

be read from sheets many layers down from the top sheet 

that has the actual writing. As the ESDA technique has been 

shown to detect weak imperceptible pressure marks from 

pens, it might be able to detect pressure marks from printing 

rollers. 

If the pressure marks can be detected, then to link a 

document to the printer that produced it would require 

comparing the width and spacing characteristics of the roll- 

299

Tchan: Forensic examination of laser printers and photocopiers using digital image analysis… 

Table I. List of the three printing machines and the printing machine with a change of 

toner cartridge used in the investigation. 

Printing System 

Canon IR 3570 Photocopier 

HP 1200 Laser Jet Printer 

HP 4250 Laser Jet Printer Toner Cartridge 1 

HP 4250 Laser Jet Printer Toner Cartridge 2 

ers of the machine in question. However, detecting the pressure 

marks may not always be possible for the following 

reasons. First, the pressure exerted by the rollers is weak, 

generally much weaker than pen pressure. If the pressure 

marks are present, they will sometimes be difficult to detect, 

even with the most sophisticated digital image processing 

systems. Secondly, assuming that the pressure marks are detectable, 

heavy handling can destroy them. 

A similar technique that has been recently explored concerns 

ink jet printers and indentations on the paper after a 

printed sheet has been fed through the machine. 9 These indentations 

are caused by the spoke wheels that feed the paper 

through the ink jet printer and are most perceptible on 

moist parts of the paper caused by wet ink. Due to the heavy 

pressure imparted by the spoke wheels, the indentations can 

be seen using optical microscopy without the aid of ESDA. 

However not all ink jet printers have spoke wheels so this 

technique does not apply to all ink jet printing systems. 

Another type of method that has been considered for 

the identification of laser printers exploits imperfections in 

the print known as banding. Banding imperfections are lines 

across the printed page when smooth print is required. 10 

The effect has been attributed to the following two causes. 

First, fine banding due to the imbalance of the rotor component 

of the polygon mirror or mechanical weaknesses of 

the laser scanning unit. Secondly, rough banding caused by 

unsteady motion of the photoconductor drum or the fuser 

unit. 

Mikkilineni et al. 11 have devised a system that uses a 

scanner to analyze relative texture differences on a printed 

page caused by banding effects. This system has shown that 

9 out of a set of 10 laser printing machines were successfully 

identified. 

The method described in this paper is an alternative 

method of measuring banding effects in laser printers and 

photocopiers. Instead of scanning images and analyzing the 

relative texture of text characters, it uses a high resolution 

and low-noise digital image analysis system to measure the 

following variables in printed text. These variables are sharpness, 

intensity, and size. The following section describes the 

methodology and the experimental setup involved. 

EXPERIMENTAL PROCEDURE 

When a completely black page was printed out on a photocopier, 

two different laser printers and on one of the two 

laser printers with a different toner cartridge, Table I, the 

following effects in Figure 1 were seen. These are sketches of 

Figure 1. Sketches of the lines produced by the four different printing 

samples used in the investigation. 

the banding lines seen with an indication of their dimensions 

and separations. It was observed that some of the horizontal 

lines from the HP 4250 for the same toner cartridge 

were not in fixed positions and the lines were not always 

equal in number for different printed sheets. It is unknown 

whether some of the lines were random or followed a complex 

pattern since further investigation of this effect has yet 

to be made. However, for this part of the investigation, only 

a confirmation of the existence of the banding effects that 

are a common feature of digital printers was required. 

The experimental procedure can also be separated into 

three distinct stages. In the first stage the banding effect was 

observed for a test page that was entirely covered in solid 

toner as stated above. In stage two, a test page of the same 

text character was produced and physical differences in the 

print were investigated for each printing machine used. This 

was completed using high-resolution digital image analysis. 

In stage three, a page of ordinary text was produced and 

patterns in the text were again investigated using highresolution 

digital image analysis. The process is illustrated in 

Figure 2, the flow chart below, and will be discussed in 

greater detail later. 

The effects of banding on printed text were investigated 

using a high-resolution digital image analysis system, which 

has been built specifically to analyze the print. Figure 3 illustrates 

how the camera was attached to the stand and how 

the lighting source was attached to the camera. 



Figure 2. The three different stages of the investigation. 

Figure 4. The three stages required to make the necessary measurements 

for the analysis. 

Table II. A summary of the measurements taken. 

Measurements 

Variable 

1 Location of the peak maximum for the print 

region, Fig. 5. 

2 The value of the peak maximum for the print 

region, Fig. 5. 

3 The text character area is calculated by counting 

the number of pixels below an arbitrarily selected 

threshold between the nonimage and image 

peaks, Fig. 5. 

4 Integration of the peak area for the print region 

divided by the text character area to determine 

the average image intensity, Fig. 5. 

Figure 3. The camera, lens, lighting system, and stand. 

The camera employed in the investigation was a 

Hamamatsu C4742-95 camera. This camera had a Peltier 

cooled CCD chip to increase the signal to noise ratio. The 

camera was attached firmly to a camera stand that weighs 

approximately 30 kg. The lighting unit was firmly screwed 

onto the lens of the camera. The lighting unit consisted of a 

circular array of red LEDS. The LEDS were connected to a 

laboratory power supply with low ripple. 

Figure 4 shows in block diagram form the different 

hardware and software components of the image analysis 

system. The image data from the camera was digitized to 

8 bit resolution using Matrox Mil software. The data was 

subsequently analyzed using Visual Basic that was compatible 

with Matrox Mil. Algorithms were developed using Visual 

Basic Active X language that could perform the following 

computations on individual text characters. 

Four measurements were taken from the text characters. 

They related to the image sharpness, intensity and size of the 

print under investigation. These are summarized in Table II. 

The image window size, Figure 5, was 480 by 483 and had a 

tonal resolution of 256. 

Measurement 1 is the position of the peak in standard 

8 bit gray scale for the tonal distribution of the printed region. 

The location of this peak has the lower gray scale 

value, in this case at about 55, Fig. 5. The other peak corresponds 

to the tonal distribution of the unprinted white paper. 

The position of the peak for the white paper used in the 

investigation is located at just over 100, Fig. 5. Such a low 

value is due to the arbitrarily low lighting exposure chosen 

for the CCD camera. Attempting to stretch the distance between 

the two peaks for the black print and white paper 

regions, by increasing the exposure too much, can sometimes 

reduce the precision of the system. 

Measurement 2 is the height of the peak from the tonal 

distribution of the printed region. 



Figure 6. The two sets of test targets that were used to produce the results 

in this investigation. 

Figure 5. An illustration of the data acquired from the image analysis 

system. 

Measurement 3 is the area of the printed text character, 

this is calculated by counting the number of pixels below a 

fixed arbitrary gray scale level. 

Measurement 4 averages the overall intensity of the 

printed region by integrating the total intensity from the 

printed region and dividing it by the number of pixels from 

the text character below the fixed arbitrary level chosen for 

measurement 3. Figure 5 and Table II illustrate and summarize 

the measurements made. The measurements were taken 

individually and sequentially for each text character by 

manual alignment under the image window. 

Figure 6 shows the two print samples used in the investigation. 

They were both printed on the same batch of standard 

laser printer paper in all cases. The font was Times New 

Roman and the font size was 22 pts. 22 pts is a large-sized 

font; it was used because it facilitated relatively quick measurements 

to show that the method is viable. 

A test sheet was produced that consisted of a series of 

the letter “W” in Times Roman font and 22 points in size. 

There were 17 W’s across the page and 30 down the page. 

The selection of the letter “W” was arbitrary. However, the 

size was important to facilitate ease of measurement when 

recording the data using the digital image analysis system. 

Figure 7. In the case of the normal text page a mask was required to 

eliminate the effect of adjacent text characters. 

In the case of the page of normal text, Fig. 6 on the 

right, a cardboard mask, Figure 7, was required to shield the 

effects of nearby letters influencing the readings since unwanted 

parts of letters appeared in the image window. The 

test set of “W’s” did not have this effect since the spacing of 

the “W’s” was designed to eliminate the requirement for a 

mask. Figure 8 shows the “e’s” that were analyzed in the page 

of normal text. 

RESULTS 

First, the accuracy of the system was established by assessing 

variations in the intensity and area measurements for a 



Figure 8. The letter “e’s” that where selected from the test page for the 

classification of the two toner cartridges of the HP 4250. 

single text character sampled 50 times. A maximum value 

from the mean of ±0.05% was observed when the mask was 

not employed and ±0.03% when the mask was used for the 

intensity measurements. These error values were tripled in 

size for the area measurements. This text character print 

sample was also used to check for any substantial drift in the 

system at the beginning of each day over a period of about 

20 days that the measurements were taken. Substantial drift, 

which could be caused by small change in the LED voltage, 

lens settings or focus, did not occur over the period of data 

collection. This was probably due to the mechanical robustness 

of the optical system and the quality of the LED illumination 

system. 

Secondly, two W test pages from the HP 4250 printer, 

one from each of the two toner cartridges were analyzed. 

This, as in all of this investigation, required individual sequential 

manual alignment and subsequent measurement 

from each text character on a line of text. Figure 9 shows 

how the size of the letter “W” for line 1 and 10 of the grid 

changes down the page for the HP 4250 printer using the 

two different toner cartridges. The consistent patterns indicate 

that the digital image analysis system has recorded 

meaningful results. 

Thirdly, a demonstration was made of how the four 

printing samples could be distinguished using the W template. 

This was achieved by using three adjacent W test 

sheets from each of the four printing samples in a print run 

of 10. The reported results in this section used only the first 

horizontal and vertical lines of the W test sheets because of 

Figure 9. A comparison of the text character size down line 1 and 10 for the two HP 4250 toner cartridges. 



Figure 10. A comparison of the text character area down line 1 of the W test pages. Left, for 3 adjacent 

pages; right, their average. 

the time-consuming nature of recording the measurements 

and associated time constraints of the researcher. The time 

problem was only discovered during the experimental phase 

and centered on alignment difficulties of the text characters 

in the image window. The left-hand graphs of Figures 10–13 

show the individual data from the three sheets and on the 

right-hand side their averages. 

It was shown that the peak size, the average intensity 

and the size measurements yielded useful information for 

the classification process. The position of the image peak 



Figure 11. A comparison of the text character area across line 1 of the W test pages. Left, for 3 adjacent 


remained constant at either 54 or 55 and provided no useful 

classification data. It does however provide a useful check on 

the stability of the illumination levels throughout the period 

when the measurements were taken. The experimental results 

using the W template show that all four printing 

samples could be differentiated by a combination of the 



Figure 12. A comparison of the relative average intensity across the W test pages, left, for 3 adjacent pages; 

right, their average. 

analysis of the area variation of the text characters down the 

page and the average intensity, peak size and area variations 

of the text characters across the page, Figs. 10–13. 

Finally, the masking technique was employed, Fig. 7, 

with the printed page of normal text shown in Figs. 6 and 8 

to find differences from the different printing examples, in 

this case from the HP 4250 printer when the toner cartridge 

was changed. Figure 14 shows a shallow well or an inverted 

sharp spike at position 15 for toner cartridge 1 and an 

inverted sharp spike for toner cartridge 2 at position 14. This 

result was obtained from a print run of 201. The graphs, 

Fig. 14, show the results from sheets 1 of 101 and 201. The 



Figure 13. A comparison of the text character peak intensities across the W test pages. Left, for 3 adjacent 


measurements were also sampled for other sheets in the 

print run at 20 sheet intervals (numbers 1, 21, 41, 61, 81, 

101, 121, 141, 161, 181, and 201 in total) and were 

100% consistent in the fact that the inverted sharp spike at 

position 14 only appears for all print samples from toner 

cartridge 2 and not at all for toner cartridge 1. 

CONCLUSIONS 

The results of the investigation thus far demonstrate the potential 

of the method for the forensic analysis of print, both 

in linking a machine to a particular document and to show 

whether a document has been tampered with. 



Figure 14. The area differences using the page of text for the HP 4250 laser printer for the two toner 

cartridges. 

It has been shown that a small number of toner based 

printing systems can be classified using high-resolution image 

analysis to measure the relative changes in the physical 

properties of individual text characters both across and 

down pages of printed text. Even at this stage of its development 

the system has potentially useful forensic applications. 

Also, these results correlate with the work carried out by 

Mikkilineni et al. on the measurement of surface texture by 

scanning the text characters of laser printers. 

In this investigation more work is required on smaller 

text characters. In particular positive results are required on 

font sizes of 10 since this is typical for documents. If limitations 

in the hardware become apparent or the banding 

signatures become weaker when smaller font sizes are considered 

then a larger set of text characters could be analyzed 

statistically to try to overcome the limitations. 

However, the further work stated above requires assistance 

from better measurement and analysis techniques with 

greater precision. This is due to the labor intensive nature of 

the experimental work. Statistical analysis techniques such as 

moving averages or autocorrelation analysis can enhance the 

data, thereby reducing the volume of data required from the 



print samples for accurate classification. The current masking 

method is difficult to carry out in practice for a large 

number of small text characters and needs an alternative. A 

possible solution to this problem is to use or develop character 

recognition software that can automatically isolate and 

classify individual text characters. 

The suggestions given for further work indicate, in the 

longer term, since data acquisition is time-consuming, an 

automated system that uses a high-resolution and fast scanning 

device is needed. This method is expensive to develop 

and implement but will greatly improve the volume of print 

that can be processed in a given time, and should enable 

machines to be examined more both quickly and more accurately 

from smaller font sized print. 

REFERENCES 

1 B. S. Lindbolm, and R. Gervais, Scientific Examination of Questioned 

Documents (Taylor and Francis, Boca Raton, FL, 2006). 

2 J. Oliver, and J. Chen, “Use of signature analysis to discriminate digital 

printing technologies”, Proc. IS&T’s NIP18 (IS&T, Springfield, VA, 2002) 

pp. 218–222. 

3 J. S. Tchan, R. C. Thompson and A. Manning, “The use of neural 

networks in an image analysis system to distinguish between laser prints 

and their photocopies”, J. Imaging Sci. Technol. 44(2), 132–144 (2000). 

4 J. S. Tchan, “Classifying digital prints according to their production 

process using image analysis and artificial neural networks”, Proc. SPIE 

3973, 105–116, (2000). 

5 J. S. Tchan, “The development of an image analysis system that can 

detect fraudulent alterations made to printed images”, Proc. SPIE 5310, 

151–159 (2004). 

6 J. S. Tchan, “Forensic analysis of print using digital image analysis”, 

Proc. SPIE 5007, 61–72 (2003). 

7 J. Levinson, Questioned Documents: A Lawyer’s Handbook (Academic 

Press, London, 2001). 

8 G. M. Laporte, “The use of an electrostatic detection device to identify 

individual and class characteristics on documents produced by printers 

and copiers-A preliminary study”, J. Forensic Sci. 49(3), 610–620 (2004). 

9 Y. Akao, K. Kobayashi, and Y. Seki, “Examination of spur marks found 

on inkjet printed documents”, J. Forensic Sci., 50(4), 915–923 (2005). 

10 J. You, “Banding reduction in an electrophotographic printer”, J. 

Imaging Sci. Technol. 49(6), 635–640 (2005). 

11 A. K. Mikkilineni, P. Chiang, G. N. Ali, G. T. C. Chiu, J. P. Allebach, and 

E. J. Delp, “Printer identification based on graylevel co-occurrence 

features for security and forensic applications”, Proc. SPIE 5681, 

430–440 (2005). 




Moiré Analysis for Assessment of Line 

Registration Quality 

Nathir A. Rawashdeh, Daniel L. Lau and Kevin D. Donohue 

University of Kentucky, ECE, 453 F. Paul Anderson Tower, Lexington, Kentucky 40506-0046 

E-mail: nathir@ieee.org 

Shaun T. Love 

Lexmark International, Inc., 740 W. New Circle Rd., Lexington, Kentucky 40550 

Abstract. This paper introduces objective macro and micro line 

registration quality metrics based on Moiré interference patterns 

generated by superposing a lenticular lens grating over a hardcopy 

test page consisting of high-frequency Ronchi rulings. Metrics for 

macro and micro line registration are defined and a measurement 

procedure is described to enhance the robustness of the metric 

computation over reasonable variations in the measurement process. 

The method analyzes low frequency interference patterns, 

which can be scanned at low resolutions. Experimental measurements 

on several printers are presented to demonstrate a comparative 

quality analysis. The metrics demonstrate robustness to small 

changes in the lenticular lens and grating superposition angle. For 

superposition angles varying between 2° and 5°, the coefficients of 

variance for the two metrics are less than 5%, which is small enough 

for delineating between test patterns of different print quality. 

© 2007 Society for Imaging Science and Technology. 


INTRODUCTION 

Image quality analysis is an important component in the 

development and operation of various digital imaging technologies, 

such as displays, scanners and printers. To produce 

visually pleasing images, devices must be designed to minimize 

defects, such as problems related to color registration 

and line quality. An efficient way of measuring imaging defects 

is through the use of special test targets, which are 

designed to test the limits of the respective imaging technology. 

Analysis based on test target results can be used to track 

and minimize image defects during the development phase. 

This paper presents a method for analyzing printed line 

quality by analyzing the Moiré patterns resulting from the 

superposition of a test pattern, consisting of finely spaced 

lines, and an array of cylindrical lenses of similar spacing. 

Other approaches to line quality attributes for hardcopy 

output include blurriness, raggedness, stroke width, darkness, 

contrast, fill, and registration. 1–3 The test targets for 

these measures consist of a printed black line on a white 

background. The quality attributes are then quantified 

through measurements from the printed line. Blurriness 

measures the average transition length from light to dark, 

 

IS&T Member 

Received Jan. 5, 2007; accepted for publication Mar. 1, 2007. 

1062-3701/2007/514/310/7/$20.00. 

and raggedness measures the geometric distortion of the 

line’s edge from its ideal shape. Line width is the average 

stroke width measured from either edge along a direction 

normal to the line under analysis. Line darkness measures 

the mean line density, which can vary due to voids for example. 

The contrast attribute captures the relationship between 

the darkness of the line and that of its surrounding 

field by measuring the mean reflectance factors. Contrast 

can vary due to blurring, extraneous marks, haze, or substrate 

type. Fill refers to the appearance of darkness within 

the inner boundary of the line. One example of line registration 

is the color registration of the CMYK components in 

an inkjet printer. If the same line is printed once with each 

color, then ideally, all four color lines should collapse into 

one, and any consistent increase in line width would indicate 

position errors, or mis-registration, of one or more ink 

components. 3 

This paper introduces new metrics that differ from previous 

line quality attribute measures in that they are directly 

based on the printer’s ability to create fine detailed lines. 

While this metric may be influenced by measures such as 

raggedness and blur, its use of fine details makes it unique 

relative to previous measures. The measurement method involves 

the analysis of low frequency Moiré patterns that 

change according to small changes in the test patterns. The 

test pattern consists of finely spaced parallel lines, which an 

imperfect printer reproduces with some line placement (or 

registration) errors. The parallel lines are no longer uniformly 

spaced in this case, and this is reflected in the resulting 

Moiré line shape. Moiré patterns are used as a nondestructive 

analysis tool in Moiré interferometry. For this 

method a photographic grid is printed on the surface of a 

material under investigation and is irradiated by coherent 

light. The interfering fringes (Moiré patterns) can indicate 

the presence of local stress and deformation for in-plane 

displacement. 4,5 Moiré interferometry techniques have the 

advantage of being able to analyze a broad range of engineering 

materials in small analysis zones at high spatial resolution 

and sensitivity. This work extends the principles of 

Moiré interferometry to assess line registration quality by 

analyzing the Moiré patterns produced by the superposition 

310

Rawashdeh et al.: Moiré analysis for assessment of line registration quality 

Figure 1. Moiré pattern of spacing P and direction formed by the 

angled superposition of a Ronchi ruling and a lenticular grating. 

of a lenticular grating on a printed Ronchi-ruling test pattern 

to characterize underlying printed line registration 

errors. 

The lenticular grating consists of a plastic sheet that is 

smooth on one side and holds an array of parallel cylindrical 

lenses or prisms on the other side. The printed test pattern is 

a Ronchi ruling (a rectangular spatial wave linear grating) 

with a similar line spacing as the lenticular grating. This 

quality assessment approach lends itself to automation, since 

the lenticular grating is thin enough to be fixed to the glass 

surface of a flatbed scanner and does not interfere with its 

automatic document feeder mechanism. Since only the 

shape of the Moiré lines is used for analysis, it is sufficient to 

use a relatively inexpensive scanner (or scan faster), because 

high-resolution detail and tone reproduction accuracy are 

not crucial. This paper presents the underlying equations 

affecting the critical details of the Moiré patterns, describes a 

procedure for robust measurement and computation of 

macro and micro line quality metrics, and presents results 

for several printers. Measurements are analyzed and compared 

to a visual assessment of line quality based on a magnified 

view of the Ronchi pattern created with a high resolution 

scanner. 

The text is organized as follows. The Moiré Model section 

describes the Moiré line model and discusses normalization 

techniques and ranges of superposition angles for 

robust measurements. The Line Registration Quality Measurements 

and Metrics section describes the measurement 

procedure and computation of the macro and micro line 

registration metrics. The Results and Analysis section presents 

measurement results from three different printers and 

analyzes measurement variability and quality assessment. Finally, 

the Conclusion section summarizes results and presents 

conclusions. 

MOIRÉ MODEL 

Figure 1 illustrates the Moiré fringe pattern produced by the 

superposition of a (printed) linear grid of spacing P 0 , and a 

lenticular grating of spacing P 1 at an angle . The Moiré 

lines are produced by the lenticular lenses intersecting with 

the individual lines of the Ronchi ruling. Only two lenses are 

illustrated in this figure; however, a sheet consisting of many 

lenticular lenses produces extended patterns of Moiré lines. 

Figure 2. Photograph of a printed horizontal Ronchi ruling test pattern at 

a small angle with a superimposed lenticular lens grating. Resulting Moiré 

patterns are dark vertical curved lines. 

The Moiré line spacing, as shown in Fig. 1, is related to the 

superposition parameters by 6,7 

P = 

P 0 P 1 

P 0 2 + P 1 2 −2P 0 P 1 cos . 

The angle of the Moiré lines with the base of the lenticular 

sheet is given by 6,7 

tan = 

P 1 sin 

P 0 − P 1 cos . 

An actual Moiré pattern from such a sheet is shown in Figure 

2. The Moiré lines deviate from straight lines due to 

printer imperfections. The Ronchi rule pattern was printed 

with a 0.4233 mm spacing, and the lenticular lens sheet consisted 

of lenses with a spacing of 0.630 mm (40 lenses per 

inch) and the lenses had a magnification factor of 1.505 

(making the effective Ronchi line spacing equal 0.637 mm). 

Fluctuations in the printed line spacing, P 0 , result from line 

registration errors and create deviations in the Moiré line 

angle , according to Eq. (2). 

The sensitivity of the resulting Moiré line direction 

angle to the superposition is shown in Figure 3, which 

plots Eqs. (1) and (2) as functions of . For the 10° interval 

shown, the Moiré line spacing decreases from around 

80 mm to 2.5 mm. Both P and exhibit relatively little 

change for greater than 4°. In practical implementations, 

the superposition angle cannot be precisely controlled. So 

ensuring that these changes do not significantly affect the 

metric is a critical issue to the usefulness of this method. 

Therefore, selecting an around 4° reduces the impact of 

small changes in the superposition angle. This results in 

multiple low-frequency Moiré lines over the test pattern for 

robust analysis. 

1 

2 



where x is the additive displacement term for underlying 

line spacing P 0 , and x is the deviation from Moiré angle 

. The angular deviations vary over the printed pattern 

based on changes in the underlying test pattern.Without loss 

of generality, this direction is denoted as a function of a 

single variable x. To separate the deviation terms, a Taylor 

series can be applied to the cotangent function and expanded 

about . After higher-order terms are dropped (assuming 

small deviations), Eq. (3) results in 

x− x sin2 

P 1 

 

sin + cot + cot 

− P 0csc 

csc 2 P 1 csc 2 , 

where terms in the parenthesis are constant over x and relate 

to a constant offset on the Moiré angle . They are subtracted 

out in the estimation procedure. The effective gain 

term that scales the line position deviations to Moiré pattern 

angle deviations is given by 

4 

Figure 3. Plots of Moiré fringe spacing P and direction as a function of 

the superposition angle between a Ronchi ruling and a lenticular grating. 

Circles indicate manual measurements; solid lines are plots of Eqs. 

1 and 2. 

Also included in the plots of Fig. 3, are actual measurements 

of the Moiré line spacing and angle for five values of 

. The measurements were made by manually setting the 

superposition angle and using a ruler to visually measure 

the resulting Moiré spacing, and a protractor to measure the 

Moiré direction ø. The resulting measurements agreed well 

with Eqs. (1) and (2) as can be seen by the measurement 

marker on the graphs of Fig. 3. For the measurement system 

proposed in this work, the lenticular grating is of high precision, 

while the actual superposition angle may also be variable 

depending on the mechanics used to load the test sheet. 

The following equations show the critical parameters relating 

the underlying line registration to the Moiré pattern parameters 

used in the measurement. From this derivation, a 

normalization step is presented to reduce the sensitivity of 

metrics computed from the Moiré pattern to variations in 

parameters of the measurement system. 

A relationship between changes in the underlying line 

spacing and change in the angle of the Moiré pattern can be 

seen from taking the reciprocal of Eq. (2) and adding deviation 

terms to the test pattern line spacing and Moiré pattern 

angles to obtain 

cot + x = P 0 + x − P 1 cos 

P 1 sin 

= P 0 + x 

P 1 sin − cot, 

3 

g m =− sin2 

P 1 sin , 

where the gain/sensitivity is determined by the lenticular 

grid spacing P 1 and superposition angle . An alternate 

derivation of g m can be obtained directly through the ratio 

of the root-mean-square (rms) deviations of and P 0 . This 

would eliminate the offset (zero-order) term of Eq. (4) and 

allow the gain factor g m to be computed directly from the 

derivatives of Eq. (3) with respect to and P 0 . The gain 

factor g m in this case is simply /P 0 . 

Since the deviations, x, will be extracted from the 

Moiré patterns and used for characterization, the sensitivity 

to becomes an issue for consistent measurements (small 

changes in , for near zero, can result in large changes in 

the gain). This variability can be significantly reduced by 

dividing the measured angle deviation by the measured distance 

between the moiré lines, if the effective Ronchi pattern 

line spacing is close to that of the lenticular grid. With P 1 

equal to P 0 , Eq. (1) can be simplified using the half angle 

formula to show the Moiré line spacing is related to by 

P 1 

5 

P = 

2 sin/2 . 6 

For small (as is the case here), sin approximately equals 

(in radians). Thus, by applying this approximation to Eqs. 

(5) and (6), the normalized gain becomes 

ḡ m = g m 

P − sin2 

P 1 

2 

. 7 

This equation shows that the repeatability of the measurement 

is enhanced through this normalization. The dominant 

scale factor controlling the gain on the angular displacement 

is now primarily dependent on the lenticular gird spacing, 

which can be precisely controlled and does not change with 

the superposition angle. The next section describes the extraction 

of x and P form the scanned Moiré patterns, and 

the development of the metrics based on the normalization 

described by Eq. (7). 



Figure 4. Line misregistration in 8.5 mm long vertical slices of the top twenty lines of the actual printed test 

patterns of printers LP1 top and LP2 bottom. Rulers show ideal line locations. 

LINE REGISTRATION QUALITY MEASUREMENT 

AND METRICS 

An example of a Moiré pattern used to extract parameters 

for quality metrics is shown in Fig. 2. The pattern was created 

with a five inch square printed test ruling and a lenticular 

grating at a superposition angle indicated by the line at 

the top of the figure. The Moiré patterns result in wavy lines 

along an angled path. Their deviation from a straight path 

indicates a faulty printed test pattern that is due to the x 

perturbations of Eq. (3). For example, the lines in the top 

half of the figure deviate to the left of the expected straight 

path and then back again. This corresponds to an increase, 

and then a decrease of angle x, which corresponds to 

changes in x according to Eq. (4). This section describes 

how these changes can be extracted, characterized, and used 

to form line quality metrics. The underlying line imperfections 

for two laser printers are illustrated in Figure 4. This 

figure compares two portions of the printed Ronchi ruling 

test pattern. The slices are 8.5 mm long and contain the top 

20 printed lines. The figure contains regular tick marks to 

indicate line numbers and their expected locations. Observe 

at the junction of the line sets that the line spacing is not 

consistent between the two printers. Line 1 is aligned for 

both prints; however, lines between 6 and 19 do not align, 

and lines from printer LP1 (top line set) deviate from the 

ideal locations from line 6 onward. The printed lines from 

LP2 (bottom line set) also deviate from the ideal locations, 

but the deviations are less pronounced and only start to 

become large from line 14 onward, indicating the line registration 

quality of printer LP2 is higher than that of printer 

LP1. The metrics described in this section will correctly assess 

this difference from values extracted over the whole 

printed line pattern. 

The printed test pattern used for the results presented in 

this paper is a 55 in. square Ronchi ruling. A lenticular 

lens sheet is superimposed at an angle of around 4° and the 

resulting pattern is scanned at 600 dpi on an HP ScanJet 

C7710A flatbed scanner. The scanned image results in a 

3000 by 3000 pixel image, which yields 10 pixels per blackwhite 

line pair (corresponds to a density of 60 line pairs per 

inch or a line spacing of 0.4233 mm). The targets are 

scanned in a monochrome setting and cropped to 2048 by 

2048 pixels to limit scan edge effects. The lenticular lens 

sheet is a Pacur LENSTAR large format polyester sheet with 

40 lenticules per inch. The sheet is 0.033 in. thick, which 

also corresponds to the lenticular focal length. The lenticules 

have a radius of 0.0146, and a width of 0.0251 inches. For 

the analysis, the scan is low-pass filtered to emphasize the 

lower frequency Moiré patterns of interest, using a rotationally 

symmetric two-dimensional Gaussian correlation kernel 

of size 8 and standard deviation parameter of 8. Luminance 

variability from the scanner, which often affects banding 

metrics, for example, is mitigated using this approach because 

only the shape of the Moiré lines are used in the 

metric and not their intensity. The angle between the test 

pattern and lenticular grating was determined (near 4°) by 

eye to produce Moiré patterns of good visibility and measurability 

after scanning for analysis purposes. 

The analysis program extracts the contiguous pixel locations 

of the local minima (or constant gray-level) forming 

a pattern vertically oriented over the page. The groups of 

pixel locations associated with the Moiré patterns are characterized 

by a best-fit (least squares) line to pixel minima to 

obtain an estimate of the Moiré line corresponding to a 

perfect line pattern. The groups of pixels near the line corresponding 

to the actual patterns are identified and 

smoothed using a higher-order polynomial (order 32). Since 

multiple lines exist over the page, a search for local minima 

is performed with a best-fit line to identify each Moiré pattern. 

To describe this process, denote the scanned Moiré 

pattern image as Ix n ,y m ,wherex n and y m respectively represent 

the discrete row and column positions of the image 

matrix. As illustrated in Fig. 2, the origin of this coordinate 

system is located at the top left pixel. The algorithm searches 

for Moiré lines by assuming the form 

Rx n ;m,b = mx n + b, 

where Rx is the y coordinate of Moiré line, and m and b 

are the slope and y intercept, respectively. The line parameters 

are found through an exhaustive search over a range of 

b and m values in order to minimize the following cost 

function: 

8 



N 

Cm,b = Ix n ,Rx n ;m,b, 

n=1 

where N is the total number of pixel rows in the scanned 

image. The parameters b and m associated with the best-fit 

line can be determined by 

m 0 ,b 0 = arg minCm,b. 

m,b 

9 

10 

Once the best fit is found over the image, the slope m 0 is 

fixed and b is incremented over the column pixels of Ix,y 

and local minima detected to find the other patterns. Since 

the images are relatively simple, the minima appear with 

good distinction, and a threshold can be set to ignore insignificant 

minimum peaks and collect the set of b values corresponding 

to local minima denoted by 

B = b 1 ,b 2 ,b 3 ,b 4 ,b 5 , ...,b Y , 

11 

such that B is a vector, of length Y, containing the intercept 

values of the lines that are the best linear fits to the actual 

Moiré lines. The average distance between the minima is 

taken as an estimate of the Moiré line spacing given by 

Pˆ = 1 Y−1 

b i+1 − b i . 

Y −1i=1 

12 

The actual curved Moiré pattern can be found by locating 

the local minimum for each x coordinate in the neighborhood 

of each fitted line. For some lenticular grids; however, 

the locally dark image points appear at the lens intersections, 

creating a regular discontinuity over the pattern. To improve 

the detection of the Moiré pattern pixel, a midluminance 

gray level was used. Therefore, the y coordinates of the actual 

Moiré patterns were determined by the pixels closest to 

the Moiré pattern gray level I m in the neighborhood of the 

fitted line. The collection of points for the ith Moiré pattern 

is denoted as 

Pˆ 

2 

S i x =arg minIx,y − I m Rx;m 0 ,b i − 

y 

y Rx;m 0 ,b i + Pˆ 

2, 

13 

where I m is the mean luminance of the Moiré patterns. Figure 

5 illustrates the results of this extraction process for two 

sample laser printer outputs. A 32-order polynomial was fitted 

to the locus of points from Eq. (13) in order to smooth 

and overlay the Moiré patterns, along with the best fit lines 

for visual inspection, on the actual scanned image. With the 

approach described above the need for smoothing is important 

because of the periodic dark bands of the lens intersections 

cause regular glitches in the points. While other methods 

can be used for smoothing, such as the median filter, this 

work uses the 32-order polynomial fitted to the points identified 

by Eq. (13). The results observed in Fig. 5 demonstrate 

that the extraction procedure is indeed capturing the basic 

Figure 5. Moiré analysis comparison between two laser printers. Straight 

lines indicate the ideal Moiré patterns, and curved lines are best-fit polynomials 

to actual Moiré patterns due to printer errors. a Laser printer 

LP1. b Laser printer LP2. 

elements of the Moiré patterns. 

The deviation from all lines is characterized by a mean 

deviateateachrow,givenby 

Y 

Lx n = 1 S¯ix n − Rx n ;m 0 ,b i , 

Y i=1 

14 

where S¯i is the resulting polynomial fit to the points of S i in 

Eq. (13). The derivative of Lx is equal to the tangent of 

angle x; and for small values of x, can be estimated 

with a numerical gradient as follows: 



Table II. Statistical measures of Moiré pattern deviation quality metrics for two laser 

Table I. Line registration quality metrics for LP1 as a function of varying in 

5 2.57 2.28 10 −9 13.7 10 −5 

degrees. 

and one inkjet printers. 

Pˆ NAV NMD 

Printer Pˆ NAV NMD 

2 6.14 2.11 10 −9 13.1 10 −5 

LP1 3 4.26 2.08 10 −9 12.7 10 −5 

3 4.26 2.08 10 −9 12.7 10 −5 

LP2 3.8 3.73 1.79 10 −9 11.7 10 −5 

4 3.12 2.29 10 −9 12.9 10 −5 

IP 3.7 3.66 2.37 10 −9 11.4 10 −5 

x n tanx n 1 

2 Lx n+1 − Lx n−1 . 

15 

The estimate of the Moiré angle can be used to compute the 

macro quality metric referred to as the normalized average 

variance (NAV), given by 

1 

2 L = x n 2 . 

Pˆ 2 N −1 n=1 

N 

16 

Note this metric reflects the average line registration error 

over the whole test pattern. A micro quality line metric can 

be taken over local portions of the test pattern and involve 

the row corresponding to the worst deviation. This metric is 

called the normalized maximum deviation (NMD), and is 

given by 

¯ L = 1 Pˆ max x n . 

n 

17 

RESULTS AND ANALYSIS 

To demonstrate the robustness of the metrics to superposition 

angle variation, a pattern from laser printer LP1 was 

scanned for four different values, and the resulting NAV 

and NMD quality metrics are presented in Table I. It can be 

seen that the spacing decreases with increasing angle, while 

the NAV and NMD measures stay relatively constant. Quantitatively, 

the coefficients of variances (CV) for the metrics 

over the variations are 4.9% and 3.3% for the NAV and 

NMD, respectively. The CV is the ratio of the standard deviation 

to the mean of a data set, and it provides a quantity 

related to the measurement resolution, which is affected by 

factors such as printer and scanner settings, as well as, properties 

of the lenticular lenses used, such as lens spacing and 

precision. 

As an example of how the quality metrics respond to 

different printers, the NAV and NMD metrics were computed 

using the outputs of two laser printers (LP1 and LP2, 

used in Fig. 5), and an inkjet printer denoted as IP. Table II 

shows a numerical comparison between these printer outputs, 

as well as the measurement parameters. The CV values 

computed from Table I can be used to examine the relative 

comparison of line registration quality between printers. For 

example, the difference between the NAV values as a percentage 

of their mean is 15% for printers LP1 and LP2 in 

Table II. This value is greater than the 4.9% variation expected 

from the measurement variability and thus indicates 

that the large scale (macro) line registration quality of LP2 is 

better than that of LP1 (consistent with observations in Figs. 

4 and 5). In addition, the NMD measurements differ by 

8.3%, which is greater than the 3.3% CV for the NMD measure. 

Comparing the inkjet printer IP with LP1, it is evident 

from the NAV values that IP has poorer quality (consistent 

with examinations of scaled-up observation of the line quality); 

however, the NMD values differ by 10.8% of their 

mean, which is grater than the 3.3% CV value. This indicates 

that even though LP1 has better line registration on a 

macro scale (on average across the page), it has greater isolated 

deviations than IP. 

These results suggest that the above measures can serve 

as a quality metric for printed line registration. The NAV 

measure reflects the average printed line spacing P 0 constancy 

over the length of the test page. A quasiperiodic pattern 

in Lx n reflects banding like intensity variations across 

the test page as observed in Fig. 4 for LP1. These shape 

variations reflected periodic fluctuations in the printed line 

spacing, which are likely due to the same problems causing 

banding, such as imperfect rollers in the print process direction 

or gear noise. Moreover, Lx n can isolate process 

motion-related banding causes from other ones that affect 

reflectance, such as toner or ink deposition inconsistencies. 

CONCLUSION 

This work outlines the use of Moiré analysis for the quantification 

of line registration. The line registration metrics developed 

here are based on modeling the interference between 

a lenticular lens sheet and a hardcopy test target containing a 

Ronchi ruling or linear grating, and they provide examples 

of how the resulting Moiré patterns can be used to measure 

line registration quality. There are clearly other metrics that 

can be derived from the extracted Moiré patterns that can 

emphasize other issues depending on the application. For 

instance, if the Moiré line deviations are quasiperiodic, it is 

likely that these deviations indicate the root cause of banding. 

Therefore, metrics based on the periodicity of these deviations 

over macro regions can be used for banding characterization. 

The work derived general equations to help in 

designing of metrics that have good repeatability. 

The experimental setup presented in this work suggests 

methods for volume processing of hardcopy samples. This 

would require a scanner with an automatic document feeder. 

A lenticular lens sheet could then be embedded into the 



scanner glass or fixed on top of the glass where the test sheet 

could slide over it. The resulting patterns would then be 

scanned, and software applied to compute the performance 

analyses as described in this paper. The monochrome and 

low-resolution scans are relatively easy to produce and analyze 

on a computer. A potential problem resulting from an 

automatic document feeder is maintaining a constant superposition 

angle between test page and lens sheet; however, it 

has been shown that the proposed metrics are robust to 

small changes in the angle. A more significant problem 

would arise from variations in the distance between the test 

pattern and lenticular sheet, such as might result from 

trapped air or irregular pressure on the test pattern. In this 

case the Moiré line will be artificially skewed causing variations 

from the distance rather than line mis-registration. It 

would be important in such a system to ensure the automatic 

feed (or any other system) minimizes this variation for 

accurate metrics. 

REFERENCES 

1 ISO/IEC 13660:2001 Information technology - Office equipment: 

Measurement of image quality attributes for hardcopy output, Binary 

monochrome text and graphic images (ISO, Geneva), www.iso.org. 

2 E. N. Dalal, A. Haley, M. Robb, D. Mashtare, J. Briggs, P. L. Jeran, T. F. 

Bouk, and J. Deubert, “INCITS W1.1 Standards for Perceptual 

Evaluation of Text and Line Quality”, Proc. IS&T PICS Conference 

(IS&T, Springfield, VA, 2003) pp. 102–103. 

3 Y. Kipman, “Image quality metrics for printers and media”, Proc. IS&T 

PICS Conference (IS&T, Springfield, VA, 1998) pp. 183–187. 

4 G. J. Indebetouw and R. Czarnek, Selected Papers on Optical Moiré and 

Applications (SPIE, Bellingham, WA, 1992). 

5 B. Han, D. Post, and P. Ifju, “Moiré interferometry for engineering 

mechanics: current practices and future developments”, J. Strain Anal. 

Eng. Des. 36, 101–117 (2001). 

6 F. Zandman, G. S. Holister, and V. Brcic, “The influence of grid 

geometry on moire fringe properties”, J. Strain Anal. 1, 1-10 (1965). 

7 A. Livnat and O. Kafri, “Moire pattern of a linear grid with a lenticular 

grating”, Opt. Lett. 7, 253 (1982). 




Analysis of the Influence of Vertical Disparities Arising 

in Toed-in Stereoscopic Cameras 

Robert S. Allison 

Department of Computer Science and Centre for Vision Research, York University, 

4700 Keele St., Toronto, Ontario M3J 1P3, Canada 

E-mail: allison@cs.yorku.ca 

Abstract. A basic task in the construction and use of a stereoscopic 

camera and display system is the alignment of the left and 

right images appropriately—a task generally referred to as camera 

convergence. Convergence of the real or virtual stereoscopic cameras 

can shift the range of portrayed depth to improve visual comfort, 

can adjust the disparity of targets to bring them nearer to the 

screen and reduce accommodation-vergence conflict, or can bring 

objects of interest into the binocular field of view. Although camera 

convergence is acknowledged as a useful function, there has been 

considerable debate over the transformation required. It is well 

known that rotational camera convergence or “toe-in” distorts the 

images in the two cameras producing patterns of horizontal and 

vertical disparities that can cause problems with fusion of the stereoscopic 

imagery. Behaviorally, similar retinal vertical disparity patterns 

are known to correlate with viewing distance and strongly affect 

perception of stereoscopic shape and depth. There has been 

little analysis of the implications of recent findings on vertical disparity 

processing for the design of stereoscopic camera and display 

systems. I ask how such distortions caused by camera convergence 

affect the ability to fuse and perceive stereoscopic images. © 2007 

Society for Imaging Science and Technology. 


INTRODUCTION 

In many stereoscopic viewing situations it is necessary to 

adjust the screen disparity of the displayed images for viewer 

comfort, to optimize depth perception or to otherwise enhance 

the stereoscopic experience. Convergence of the real 

or virtual cameras is an effective means of adjusting portrayed 

disparities. A long-standing question in the stereoscopic 

imaging and display literature is what is the best 

method to converge the cameras? Humans use rotational 

movements to binocularly align the visual axes of their eyes 

on targets of interest. Similarly, one of the easiest ways to 

converge the cameras is to pan them in opposite directions 

to “toe-in” the cameras. However, convergence through 

camera toe-in has side effects that can lead to undesirable 

distortions of stereoscopic depth. 1,2 In this paper we reanalyze 

these geometric distortions of stereoscopic space in the 

context of recent findings on the role of vertical disparities in 

stereoscopic space perception. We focus on a number of issues 

related to converged cameras and the mode of convergence: 

The effect of rectification; relation between the geometry 

of the imaging device and the display device; fused and 

Received Dec. 5, 2006; accepted for publication Mar. 7, 2007. 

1062-3701/2007/514/317/11/$20.00. 

augmented displays; orthostereoscopy; the relation between 

parallax distortions in the display and the resulting retinal 

disparity; and the effect of these toe-in induced retinal disparities 

on depth perception and binocular fusion. 

Our interests lie in augmented-reality applications and 

stereoscopic heads for tele-operation applications. In these 

systems a focus is on the match and registration between the 

stereoscopic imagery and the “real world” so we will concentrate 

on orthostereoscopic or near orthostereoscopic configurations. 

These configurations have well known limitations 

for applications such as visualization and cinema, and 

other configurations may result in displays that are more 

pleasing and easier to fuse. However, it is important to note 

that our basic analysis generalizes to other configurations, 

and we will discuss other viewing arrangements when 

appropriate. 3,4 In a projector-based display system with separate 

right and left projectors, or in binocular head mounted 

display (HMD) with independent left and right displays, the 

displays/projectors can also be converged mechanically or 

optically. In this paper we will also assume a single flat, 

fronto-parallel display (i.e., a monitor or projector display) 

so that the convergence of the projectors is not an issue. 

Since the left and right images are projected or displayed 

into the same plane we will refer to these configurations as a 

“parallel display.” In most cases similar considerations will 

apply for a HMD with parallel left and right displays. 

OPTIONS FOR CAMERA CONVERGENCE 

We use the term convergence here to refer to a variety of 

means of realigning one stereoscopic half-image with respect 

to the other, including toe-in (or rotational) convergence 

and translational image shift. 

Convergence can shift the range of portrayed depth to 

improve visual comfort and composition. Looking at objects 

presented stereoscopically further or nearer than the screen 

causes a disruption of the normal synergy between vergence 

and accommodation in most displays. Normally accommodation 

and vergence covary but, in a stereoscopic display, the 

eyes should remain focused at the screen regardless of disparity. 

The accommodation-vergence conflict can cause visual 

stress and disrupt binocular vision. 5 Convergence of the 

cameras can be used to adjust the disparity of targets of 

interest to bring them nearer to the screen and reduce this 

conflict. 

317

Allison: Analysis of the influence of vertical disparities arising in toed-in stereoscopic cameras 

Table I. Typical convergence for stereoscopic sensors and displays. “Natural” modes of 

convergence are shown in bold. 

DISPLAY/SENSOR 

GEOMETRY 

Translation 

REAL OR VIRTUAL CAMERA CONVERGENCE 

Rotation 

Flat Horizontal Image Translation Toed-in camera, toed-in 

projector combination 

Spherical 

Differential translation of 

computer graphics images 

Image sensor shift 

Variable baseline camera 

Human viewing of planar 

stereoscopic displays? 

Toed-in stereoscopic camera 

or robot head 

Haploscope 

Human physiological 

vergence 

Convergence can also be used to shift the range of portrayed 

depth. For example, it is often preferable to portray 

stereoscopic imagery in the space behind rather than in front 

of the display. With convergence a user can shift stereoscopic 

imagery to appear “inside” the display and reduce interposition 

errors between the stereoscopic imagery and the edges 

of the displays. 

Cameras used in stereoscopic imagers have limited field 

of view and convergence can be used to bring objects of 

interest into the binocular field of view. 

Finally, convergence or more appropriately translation 

of the stereoscopic cameras can also be used to adjust for 

differences in a user’s interpupillary distance. The latter 

transformation is not typically called convergence since the 

stereoscopic baseline is not maintained. 

In choosing a method of convergence there are several 

issues one needs to consider. What type of 2D image transformation 

is most natural for the imaging geometry? Can a 

3D movement of the imaging device accomplish this transformation? 

In a system consisting of separate acquisition and 

display systems is convergence best achieved by changing the 

imaging configuration and/or by transforming the images 

(or projector configuration) prior to display? If an unnatural 

convergence technique must be used, what is the impact on 

stereoscopic depth perception? 

Although camera convergence is acknowledged as a useful 

function, there has been considerable debate over the 

correct transformation required. Since the eyes (and the 

cameras in imaging applications) are separated laterally, convergence 

needs to be an opposite horizontal shift of left and 

right eyes images on the sensor surface or, equivalently, on 

the display. The most appropriate type of transformation to 

accomplish this 2D shift—rotation or translation—depends 

on the geometry of the imaging and display devices. We 

agree with the view that the transformation should reflect 

the geometry of the display and imaging devices in order to 

minimize distortion (see Table I). One could argue that a 

“pure” vergence movement should affect the disparity of all 

objects equally, resulting in a change in mean disparity over 

the entire image without any change in relative disparity 

between points. 

For example, consider a spherical imaging device such 

as the human eye where expressing disparity in terms of 

visual angle is a natural coding scheme. A rotational movement 

about the optical centre of the eye would scan an 

image over the retina without distorting the angular relationships 

within the image. Thus the natural convergence 

movement with such an imaging device is a differential rotation 

of the two eyes, as occurs in physiological convergence 

(although freedom to choose various spherical coordinate 

systems complicates the definition of disparity 6 ). 

A flat sensor is the limiting form of spherical sensor 

with an infinite radius of curvature, and thus the rotation of 

the sensor becomes a translation parallel to the sensor plane. 

For displays that rely on projection onto a single flat, frontoparallel 

display surface (many stereoscopic displays with the 

notable exception of some head-mounted displays and haploscopic 

systems) depth differences should be represented as 

linear horizontal disparities in the image plane. The natural 

convergence movement is a differential horizontal shift of 

the images in the plane of the display. Acquisition systems 

with parallel cameras are well-matched to such display geometry 

since a translation on the display corresponds to a 

translation in the sensor plane. This model of parallel cameras 

is typically used for the virtual cameras in stereoscopic 

computer graphics 7 and the real cameras in many stereoscopic 

camera setups. 

Thus horizontal image translation of the images on the 

display is the preferred minimal distortion method to shift 

convergence in a stereoscopic rig with parallel cameras when 

presented on a parallel display. This analysis corresponds to 

current conventional wisdom. If the stereo baseline is to be 

maintained then this vergence movement is a horizontal 

translation of the images obtained from the parallel cameras 

rather than a translation of the cameras themselves. For example, 

in computer-generated displays, the left and right half 

images can be shifted in opposite directions on the display 

surface to shift portrayed depth with respect to the screen. 

With real camera images, a problem with shifting the displayed 

images to accomplish convergence is that in doing so, 

part of each half-image is shifted off of the display resulting 

in a smaller stereoscopic image. 

An alternative is to shift the imaging device (e.g., CCD 

array) behind the camera lens, with opposite sign of shift in 

the two cameras forming the stereo rig. This avoids some of 

the problems associated with rotational convergence discussed 

below. Implementing a large, variable range of convergence 

with mechanical movements or selection of subarrays 

from a large CCD can be complicated. Furthermore, 

many lenses have significant radial distortion and translating 

the center of the imaging device away from the optical axis 

increases the amount of radial distortion. Worse, for 

matched lenses the distortions introduced in each sensor 

image will be opposite if the sensors are shifted in opposite 

directions. This leads to increased disparity distortion. 

Toed-in cameras can center the image on the optical axis 

and reduce this particular problem. 

If we converge nearer than infinity using horizontal im- 



Figure 1. A plan view of an array of points located in the X-Z plane at 

eye level. The solid dots show the true position of the points and also their 

reconstruction based on images from a parallel camera orthostereoscopic 

rig presented at a 0.7 m viewing distance. The open diamond shaped 

markers show the reconstructed position of the points in the array when 

the cameras are converged using horizontal image translation HIT. As 

predicted the points that are truly at 1.1 m move in to appear near the 

screen distance of 0.7 m. Also depth and size should appear scaled 

appropriately for the nearer distance. But notice that depth ordering and 

planarity are maintained. Circles at a distance of zero denote the positions 

of the eyes. 

age shift, then far objects should be brought toward the 

plane of the screen. With convergence via horizontal image 

shift, a frontal plane at the camera convergence distance 

should appear flat and at the screen distance. However, 

depth for a given retinal disparity increases approximately 

with the square of distance. Thus if the cameras are converged 

at a distance other than the screen distance to bring a 

farther (or nearer) target toward the screen, then the depth 

in the scene should be distorted nonlinearly but depth ordering 

and planarity are maintained (Figure 1). This apparent 

depth distortion is predicted for both the parallel and 

toed-in configurations. In the toed-in case it would be added 

to the curvature effects discussed below. Similar arguments 

can be made for size distortions in the image (or equivalently 

the apparent spacing of the dots in Fig. 1). See Woods 1 

and Diner and Fender 2 for an extended discussion of these 

distortions. 

It is important to note that these effects are predicted 

from the geometry and do not always correspond to human 

perception. Percepts of stereoscopic space tend to deviate 

from the geometric predictions based on the Keplerian projections 

and Euclidean geometry 6 ). Vergence on its own is 

not a strong cue to distance and other depth cues in the 

display besides horizontal disparity can affect the interpretation 

of stereoscopic displays. For example, it has been 

known for over 100 years that observers can use vertical 

disparities in the stereoscopic images to obtain more veritical 

estimates of stereoscopic form. 8 In recent years, a role for 

vertical disparities in human stereoscopic depth perception 

has been confirmed. 9,10 

Translation of the images on the display or of the sensors 

behind the lenses maintains the stereoscopic camera 

baseline and hence the relative disparities in the acquired or 

simulated image. Shifting of the images can be used to shift 

this disparity range to be centered on the display to ease 

viewing comfort. However, in many applications this disparity 

range is excessive and other techniques may be more 

suitable. Laterally shifting the cameras toward or away from 

each other increases or decreases the range of disparities 

corresponding to a given scene. Control of the stereo rig 

baseline serves a complementary function to convergence by 

adjusting the “gain” of stereopsis instead of simply the mean 

disparity. This function is often very useful for mapping a 

depth range to a useful or comfortable disparity range in 

applications such as computer graphics, 4,11 photogrammetry, 

etc. 

In augmented reality or other enhanced vision systems 

that fuse stereoscopic imagery with direct views of the world 

(or with displays from other stereoscopic image sources), 

orthostereoscopic configurations (or at least consistent 

views) are important. In these systems, proper convergence 

of the camera systems and calibration of image geometry is 

required so that objects in the display have appropriate disparity 

relative to their real world counterparts. A parallel 

camera orthostereoscopic configuration presents true disparities 

to the user if presented on a parallel display. Thus, 

geometrically at least, we should expect to see true depth. In 

practice this seldom occurs because of the influence of other 

depth cues (accommodation-vergence conflict, changes in 

effective interpupillary distance with eye movements, flatness 

cues corresponding to viewing a flat display, etc.). 

In summary, an orthostereoscopic parallel-camera/ 

parallel-display configuration can present accurate disparities 

to the user. 1,7 On parallel displays, convergence by horizontal 

shift of the images obtained from parallel cameras 

introduces no distortion of horizontal or vertical screen disparity 

(parallax). Essentially, convergence by this method 

brings the two half images into register with out changing 

relative disparity. This can reduce vergence-accommodation 

conflict and improve the ability to fuse the imagery. Geometrically, 

one would predict effects on perceived depth— 

the apparent depth of imagery with respect to the screen and 

the depth scaling in the image are affected by the simulated 

vergence. 1,13 However, this amounts to a relief transformation 

implying that depth ordering and coplanarity should be 

maintained. 2,10 

CAMERA TOE-IN 

While horizontal image translation is attractive theoretically, 

there are often practical considerations that limit use of the 

method and make rotational convergence attractive. For example, 

with a limited camera field of view and a nonzero 

stereo baseline there exists a region of space near to the 



Figure 2. a The Toronto IRIS Stereoscopic Head 2 TRISH II, an example of a robot head built for a wide 

range of working distances. With such a system, a wide range of camera convergence is required to bring 

objects of interest into view of the cameras. With off-the shelf cameras this can be most conveniently achieved 

with camera toe-in. b A hypothetical stereo rig with camera field of view . Objects in near working space 

are out of the binocular field of view which is indicated by the cross hatch pattern. 

cameras that cannot be seen by one or both cameras. In 

some applications such as landscape photography this region 

of space may be irrelevant; in other applications such as 

augmented reality or stereoscopic robot heads this may correspond 

to a crucial part of the normal working range (see 

Figure 2). Rotational convergence of the cameras can increase 

the near working space of the system and center the 

target in the camera images. 14 Other motivations for rotational 

convergence include the desire to center the target on 

the camera optics (e.g., to minimize camera distortion) and 

the relative simplicity and large range of motion possible 

with rotational mechanisms. Given that rotational convergence 

of stereo cameras is often implemented in practice, we 

ask what effects the distortions produced by these movements 

have on the perception of stereoscopic displays? 

It is well known that the toed-in configuration distorts 

the images in the two cameras producing patterns of horizontal 

and vertical screen disparities (parallax). Geometrically, 

deviations from the parallel-camera configuration may 

result in spatial distortion unless compensating transformations 

are introduced mechanically, optically or electronically 

in the displayed images, 2,12 for example unless a pair of projectors 

(or HMD with separate left and right displays) with 

matched convergence or a parallel display with special distortion 

correction techniques are used. 15,16 For the rest of 

this paper we will assume a single projector or display system 

(parallel display) and a dual sensor system with parallel 

or toed-in cameras. 

The effects of the horizontal disparities have been well 

described in the literature and we review them before turning 

to the vertical disparities in the next section. The depth 

distortions due to the horizontal disparities introduced can 

be estimated geometrically. 1 The geometry of the situation is 

illustrated in Figure 3. The imaging space world coordinate 

system is centered between the cameras, a is the intercamera 

distance and the angle of convergence is (using the conventional 

stereoscopic camera measure of convergence rather 

than the physiological one). 

Let us assume the cameras converge symmetrically at 

point C located at distance F. A local coordinate system is 

attached to each camera and rotated ± about the y axis 

with respect to the imaging space world coordinate system. 

The coordinates of a point P=XYZ T in the left and right 

cameras is 

X 

l= X + a − Z sin 

l 

2cos 

Y l 

Y 

Z 

Z cos +X + 

2sin, 

a 

X 

r= X − a + Z sin 

r 

2cos 

Y r 

Y 

Z 

Z cos −X − 

2sin. 

a 

After perspective projection onto the converged CCD array 

(coordinate frame u-v centered on the optic axis and letting 

f=1.0) we get the following image coordinates for the left, 

u l ,v l T , and right, u r ,v r T ,arrays: 

= X + a − Z sin 

2cos 

u l 

v l= X l/Z l Z cos +X + 

Y l /Z l 2sin 

a 

Y 

Z cos +X + 

2sin, 

a 

1 

2 



Figure 3. Imaging and display geometry for symmetric toe-in convergence at point C and viewing at distance 

D plan view. 

Figure 4. Keystone distortion due to toe-in. a Left + and right images for a regularly spaced grid of 

points with the stereo camera converged toed-in on the grid. b Corresponding disparity vectors comparing 

left eye with right eye views demonstrate both horizontal and vertical components of the keystone distortion. 

= X − a + Z sin 

2cos 

u r 

v r= X r/Z r Z cos −X − 

Y r /Z r 2sin 

a 

Y 

Z cos −X − 

2sin. 

a 

The CCD image is then reprojected onto the display screen. 

We assume a single display/projector model with central 

projection and a magnification of M with respect to the 

CCD sensor image resulting in the following screen coordinates 

for the point in the left, U l ,V l T , and right, U r ,V r T , 

eye images: 

U l 

V l = M u l 

v l, 

U r 

V r = M u r 

v r. 

Toeing-in the stereoscopic rig to converge on a surface centers 

the images of the target in the two cameras but also 

introduces a keystone distortion due to the differential perspective 

(Figure 4). In contrast convergence by shifting the 

CCD sensor behind the camera lens (or shifting the half 

images on the display) changes the mean horizontal disparity 

but does not entail keystone distortion. For a given focal 

length and camera separation, the extent of the keystone 

distortion is a function of the convergence distance and not 

the distance of the target. 

To see how the keystoning affects depth perception, assume 

the images are projected onto a screen at distance D 

and viewed by a viewer with interocular distance of e. If the 

magnification from the CCD sensor array to screen image is 

3 



Figure 5. Geometrically predicted perception curved grid of displayed 

images taken from a toed-in stereoscopic camera rig converged on a 

fronto-parallel grid made with 10 cm spacing asterisks based on horizontal 

disparities associated size distortion not shown. Camera convergence 

distance F and display viewing distance D are 0.70 cm 

e=a=62.5 mm; f=6.5 mm; see Fig. 3 and text for definitions. The 

icon at the bottom of the figure indicates the position of the world coordinate 

frame and the eyeballs. 

M and both images are centered on the display then geometrically 

predicted coordinates of the point in display space 

is (after Ref. 1) 

d 

P d =X d= 

eU l + U r 

2e − U r − U l 

eV l + V r 

 

 

Y d 

4 

2e − U r − U l 

Z 

eD 

e − U r − U l 

where U r −U l is the horizontal screen parallax of the point. 

If we ignore vertical disparities for the moment, converging 

the camera causes changes in the geometrically predicted 

depth. For instance, if the cameras toe-in to converge 

on a frontoparallel surface (parallel to the stereobaseline), 

then from geometric considerations the center of the object 

should appear at the screen distance but the surface should 

appear curved (Figure 5). This curvature should be especially 

apparent in the presence of undistorted stereoscopic 

reference imagery as would occur in augmented reality 

applications. 16 In contrast, if convergence is accomplished 

via horizontal image translation then a frontal plane at the 

camera convergence distance should appear flat and at the 

screen distance although depth and size will be scaled as 

discussed in the previous section. 

USE OF VERTICAL DISPARITY IN STEREOPSIS 

The pattern of vertical disparities in a stereoscopic image 

depends on the geometry of the stereoscopic rig. With our 

spherical retinas disparity is best defined in terms of visual 

angle. An object that is located eccentric to the median plane 

of the head is closer to one eye than the other (Figure 6). 

Hence, it subtends a larger angle at the nearer eye than at the 

further. The vertical size ratio (VSR) between the images of 

an object in the two eyes varies as a function of the object’s 

eccentricity with respect to the head. Figure 6 also shows the 

variation of the vertical size ratio of the right eye image to 

the left eye image for a range of eccentricities and 

distances. 

It is evident that, for centrally located targets, the gradient 

of vertical size ratios varies with distance of the surface 

from the head. This is relatively independent of the vergence 

state of the eyes and the local depth structure. 17 Howard 18 

turned this relationship around and suggested that people 

could judge the distance of surfaces from the gradient of the 

VSR. Gillam and Lawergren 19 proposed a computational 

model for the recovery of surface distance and eccentricity 

based upon processing of VSR and VSR gradients. An alternative 

computational framework 10,20 uses vertical disparities 

to calculate the convergence posture and gaze eccentricity of 

the eyes rather than the distance and eccentricity of a target 

surface. For our purposes, these models make the same predictions 

about the effects of camera toe-in. However, the 

latter model uses projections onto flat projection surfaces 

(hypothetical flat retinae) which is easier for visualization 

and matches well with our previous discussion of camera 

toe-in. 

With flat imaging planes, disparities are usually measured 

in terms of linear displacement in the image plane. If 

the cameras in a stereoscopic rig are toed in (or if eyes with 

flat retinae are converged), then the left and right camera 

images have opposite keystone distortion. It is interesting to 

note that in contrast to the angular disparity case the gradients 

of vertical disparities are a function of camera convergence 

but are affected little by the distance of the surface. 

These vertical disparity gradients on flat cameras/retinae 

provide an indication of the convergence angle of the cameras 

and hence the distance of the fixation point. 

For a pair of objects or for depth within an object, the 

relationship between relative depth and relative disparity is a 

function of distance from the observer. To an extent, the 

visual system is able to maintain an accurate perception of 

depth of an object at various distances despite disparity 

varying inversely with the square of the distance between the 

object and the observer. This “depth constancy” demonstrates 

an ability to account for the effects of viewing distance 

on stereoscopic depth. The relationship between the 

retinal image size of an object and its linear size in the world 

is also a function of distance. To the degree that vertical 

disparity gradients are used as an indicator of the distance of 

a fixated surface for three-dimensional reconstruction, toe-in 

produced vertical disparity gradients would be expected to 

indirectly affect depth and size perception. Psychophysical 

experiments have demonstrated that vertical disparity gradients 

strongly affect perception of stereoscopic shape, size and 

depth 9,10,21 and implicate vertical disparity processing in human 

size and depth constancy. 



Figure 6. a A vertical line located eccentric to the midline of the head is nearer to one eye than the other. 

Thus it subtends a larger angle in the nearer eye than the further adapted from Howard and Rogers 6 . b The 

gradient of vertical size ratio of the image of a surface element in the left eye to that in the right eye varies as 

a function of distance of the surface shown as a series of lines: distances of 70, 60, 50, 40, and 30 cm in 

order of steepness. 

VERTICAL DISPARITY IN TOED-IN STEREOSCOPIC 

CAMERAS 

First, consider a stereoscopic camera and parallel display system 

that intends to portray realistic depth and that has camera 

separation equal to the eye separation. If the camera is 

converged using the toe-in method at a fronto-parallel surface 

at the distance of the screen, then the center of the 

target will have zero horizontal screen disparity. However, 

the camera toe-in will introduce keystone distortion into the 

two images with the pattern of horizontal disparities predicting 

curvature as discussed above. What about the pattern of 

vertical disparities? The pattern of vertical disparities produced 

by a toed-in camera configuration resembles the gradient 

of vertical size disparities on the retinae that can arise 

due to differential perspective of the two eyes. As discussed 

in the previous section, this differential perspective forms a 

natural and rich source of informative parameters contributing 

to human stereoscopic depth perception. 

Given that camera toe-in generates such gradients of 

vertical disparity in stereoscopic imagery, is it beneficial to 

use camera toe-in to provide distance information in a stereoscopic 

display? In other words, should the toed-in configuration 

be used to converge the cameras and preserve the 

sense of absolute distance and size, shape and depth constancy? 

Perez-Bayas 22 argued that toed-in camera configurations 

are more natural since they present these vertical disparities. 

The principal problem with this claim is that it 

considers the screen parallax of stereoscopic images rather 

than their retinal disparities. These keystone distortions are 

in addition to the natural retinal vertical disparities present 

when viewing a scene at the distance of the screen. 

In order to estimate the effect on depth perception we 

need to consider the retinal disparities generated by the stereoscopic 

image. The keystone distortion occurs in addition 

to the retinal vertical disparity pattern inherent in the image 

because it is portrayed on the flat screen. Consider a frontoparallel 

surface located at the distance of the screen away 

from the camera and that we intend to display the surface at 

the screen. Projections onto spherical retinas are hard to 

visualize so let us consider flat retinae converged (toed-in) at 

the screen distance. Alternatively one could imagine another 

pair of converged cameras viewing the display, one centered 

at the center of each eye. The images on these converged flat 

retinae would of course have differential keystone distortion 



Figure 7. a Simulation of the keystone distortion and gradient of VSR 

present in a stereo half image for a toed-in configuration. The plus symbols 

show the keystone distortion in the displayed image of a grid for a 

camera converged at 70 cm and the circle symbols indicated the exaggerated 

VSR distortion present in the retinal half image for an observer 

viewing the display at 70 cm flat retina. b Predicted distorted appearance 

circles in a set of frontal plane surfaces asterisks if depth from 

disparity is scaled according to the distance indicated by an exaggerated 

VSR. Typically the surface is not mislocalized in depth but curvature is 

induced. The predicted curvature based on the on the equations provided 

by Duke and Wilcox 28 is also shown diamonds. The simulated positions 

of the eyes are denoted by circles at zero distance and the screen by a 

line at 70 cm. 

when viewing a frontal surface such as the screen. When 

displaying images from the toed-in stereoscopic camera, 

which already have keystone distortion, the result is an exaggerated 

gradient of vertical disparity in the retinal images 

appropriate for a much nearer surface. For a spherical retina 

the important measure is the gradient of vertical size ratios 

in the image. The vertical size ratios in the displayed images 

imposed by the keystone distortion are in addition to the 

natural VSR for a frontal surface at the distance of the 

screen. Clearly, the additional keystone distortion indicates a 

nearer surface in this case as well [Figure 7(a)]. 

From either the flat camera or spherical retina model we 

predict spatial distortion if disparities are scaled according to 

the vertical disparities, which indicate a closer target. Such a 

misjudgement of perceived distance would be predicted to 

have effects on perceived depth and size [open circles in Fig. 

7(b)]. There is little evidence that observers actually 

mislocalize surfaces at a nearer distance when a vertical disparity 

gradient is imposed. However, there is strong evidence 

for effects of VSR gradients on depth constancy processes. 

If a viewer fixates a point on a fronto-parallel screen, 

then at all screen distances nearer than infinity the images of 

other points on the screen have horizontal disparity (retinal 

but not screen disparity). This is because the theoretical locus 

of points in three-dimensional space with zero retinal 

disparity, which is known as the horopter (the Vieth-Muller 

circle), curves inward toward the viewer and away from the 

frontal plane. The curvature of the horopter increases at 

nearer distances (Figure 8). 23 Thus a frontal plane presents a 

pattern of horizontal disparities that varies with distance. If 

depth constancy is to be maintained for fronto-parallel 

planes then the distance of the surface needs to be taken into 

account. Rogers and Bradshaw 21 showed that vertical disparity 

patterns can have a strong influence on frontal plane 

judgements, particularly for large field of view displays. Specifically, 

“flat”—or zero horizontal screen disparity—planes 

are perceived as curved if vertical disparity gradients indicate 

a distance other than the screen distance. 

In our case, the toe-in induced vertical disparity introduces 

a cue that the surface is nearer than specified by the 

horizontal screen disparity. Thus a zero horizontal screen 

disparity pattern for a frontal surface at the true distance 

would be interpreted as at nearer distance. The disparities 

would be less than expected from a frontal plane at the 

nearer distance. As a result, surfaces in a scene should appear 

curved more concavely than they are in the real scene. Notice 

that the distortion is in the opposite direction than the 

distortion created by horizontal disparities due to the 

keystoning. 

Thus the effect of vertical disparity introduced by the 

keystone distortion is complicated. The vertical disparity introduces 

a cue that the surface is nearer than specified by the 

horizontal screen disparity. Thus, from vertical disparities, 

we would expect a bias in depth perception and concave 

distortion of stereoscopic space. This may counter the convex 

distortions introduced by the horizontal disparities discussed 

above. So the surface may appear flatter than expected 

from the distorted horizontal disparities. But the 

percept is not more “natural” than the parallel configura- 



Figure 8. Disparity of a point on a fronto-parallel surface as a function of distance. Horizontal disparity for a 

given eccentricity increases with nearness due to the increasing curvature of the Veith-Muller circle see text. 

tion. Rather two distortions due to camera toe-in act to 

cancel each other out. 

Do toed-in configurations provide useful distance 

information for objects at other distances or 

for nonorthostereoscopic configurations? 

Since the toe-in induced vertical disparity gradients are superimposed 

upon the natural vertical disparity at the retinae 

they do not provide natural distance cues for targets near the 

display under orthostereoscopic configurations. 

Nonorthostereoscopic configurations are more common 

than orthostereoscopic and we should consider the effects of 

toe-in on these configurations. Magnification and minification 

of the images will scale the disparities in the images as 

well so that the vertical gradient of vertical size ratio will be 

relatively unchanged under uniform magnification. Hence 

we expect a similar curvature distortion under magnification 

or minification. 

Hyperstereoscopic and hypostereoscopic configurations 

exaggerate and attenuate, respectively, the horizontal and 

vertical disparities due to camera toe-in and the magnitude 

of the stereoscopic distortions will be scaled. However, for 

both configurations the sign of the distortion is the same 

and vertical disparities from camera toe-in predict concave 

curvature of stereoscopic space with increased distortion 

with an increased stereobaseline. 

For surfaces outside the plane of the screen, vertical 

keystone distortion from toe-in still introduces spatial distortion. 

A surface located at a distance beyond the screen in 

a parallel camera, orthostereoscopic configuration will have 

VSR gradients on spherical retinae appropriate to its distance 

due to the imaging geometry. For a toed-in camera 

system, all surfaces in the scene will have additional vertical 

disparity gradients due to the keystoning. These increased 

vertical disparity gradients would indicate a nearer convergence 

distance or a nearer surface thus the distance of the far 

surface should be underestimated and concave curvature introduced. 

The distance underestimation would be compounded 

by rescaling of disparity for the near distance 

which should compress the depth range in the scene. 

What about partial toe-in? For example, let us say we 

toed in on a target at 3mand displayed it at 1.0 m with the 

centers of the image aligned? Would the vertical disparities 

in the image indicate a more distant surface, perhaps even 

one at 3m (this would be the case if viewed in a haploscope)? 

A look at the pattern of vertical screen disparities in 

this case, however, shows that they are appropriate for a 

surface that is nearer than the 3msurface, and in fact nearer 

than the screen if the half images are aligned on the screen. 

Thus when the vertical screen disparities are compounded 

by the inherent vertical retinal disparities introduced by 

viewing the screen, the toe-in induced distortion actually 

indicates a nearer surface rather than the further surface 

desired. We will see below that vertical disparity manipulations 

can produce the impression of a further surface but the 

required transformation is opposite to the one introduced by 

camera toe-in. 

Do the toed-in configurations improve depth and size 

scaling? 

Vertical disparities have been shown to be effective in the 

scaling of depth, shape and size from disparity. 9,21 When the 

cameras are toed-in the vertical disparities indicate a nearer 

surface. Therefore, camera toe-in should cause micropsia (or 

apparent shrinking of linear size) appropriate for the nearer 

distance. Similarly, depth from disparity should be scaled 

appropriate to a nearer surface and depth range should be 

compressed. Thus, if toe-in is used to converge an otherwise 

orthostereoscopic rig, then image size and depth should be 

compressed. Vertical disparity cues to distance are most effective 

in a large field of view display and the curvature, size 

and depth effects are most pronounced in these types of 

displays. 9,21 

In the orthostereoscopic case with parallel cameras, 



there are no vertical screen disparities and the vertical disparities 

in the retinal images are appropriate for the screen 

distance and no size or depth distortions due to vertical 

disparity are predicted. Vertical disparities in the retinal (but 

not display) images can thus help obtain veridical stereoscopic 

perception. 

I use computer graphics or image processing to render 

stereoscopic images. Can I use VSR to give an 

impression of different distances? If so how? 

Incorporating elements that carry vertical disparity information 

(for example with horizontal edges) can lead to more 

veridical depth perception 8 and in this simple sense vertical 

disparity cues can assist in the development of effective stereoscopic 

displays. It is not certain that manipulating vertical 

disparity independent of vergence would be of use to content 

creators, but it is possible. In the lab we do this to look 

at the effects of vertical disparity gradients and to manipulate 

the effects of vertical disparities with vergence held constant. 

We have seen that toe-in convergence introduces a vertical 

disparity cue that indicates that a surface is nearer than 

other cues indicate. This will scale stereoscopic depth, shape 

and size appropriately, particularly for large displays. To 

make the surface appear further away the opposite transformation 

is required to reduce the vertical disparity gradients 

in the retinal image—this essentially entails “toe-out” of the 

cameras. VSR manipulations, intentional or due to camera 

toe-in, exacerbate cue conflict in the display as the distance 

estimate obtained from the vertical disparities will conflict 

with accommodation, vergence, and other cues to distance. 

FUSION OF VERTICAL DISPARITY 

In many treatments of the camera convergence problem it is 

noted that the vertical disparities introduced by toed-in 

camera convergence may interfere with the ability to fuse the 

images and cause visual discomfort. 24 Certainly, vertical fusional 

range is known to be less than horizontal fusional 

range 23 making it likely that vertical disparities could be 

problematic. Tolerance to vertical disparities depends on 

several factors including size of the display, and the presence 

of reference surfaces. 

When a stereoscopic image pair has an overall vertical 

misalignment, such as arises with vertical camera misalignment, 

viewers can compensate with vertical vergence and 

sensory fusional mechanisms. Vertical vergence is a disjunctive 

eye movement where the left and right eyes move in 

opposite directions vertically (vertical misalignment can also 

often be partially compensated by tilting the head with respect 

to the display). Vertical disparities are integrated over a 

fairly large region of space to form the stimulus to vertical 

vergence. 25 Larger displays increase the vertical vergence response 

and the vertical fusional range. Thus we predict that 

vertical disparities will be better tolerated in large displays. 

In agreement with this Speranza and Wilcox 26 found up to 

30 minutes of arc of vertical disparity could be tolerated in a 

stereoscopic IMAX film without significant viewer discomfort. 

However, convergence via camera toe-in gives local 

variations in vertical disparity and thus images of objects in 

the display have spatially varying vertical disparities. Thus, 

averaging retinal vertical disparities over a region of space 

should be less effective in compensating for vertical disparity 

due to camera toe-in compared to overall vertical camera 

misalignment. Furthermore, any vertical vergence to fuse 

one portion of the display will increase vertical disparity in 

other parts of the display. 

The ability to fuse a vertically disparate image is reduced 

when nearby stimuli have different vertical disparities, particularly 

if the target and background are similar in depth. 27 

In many display applications the frame of the display is visible 

and serves as a frame of reference. In other applications 

such as augmented reality and enhanced vision displays the 

stereoscopic imagery may be imposed upon other imagery. 

Presence of these competing stereoscopic images will be expected 

to reduce the tolerance to vertical disparity due to 

camera convergence. 27 This indicates that vertical disparity 

distortions should be particularly disruptive in augmented 

reality displays where the stereoscopic image is superimposed 

on other real or synthetic imagery and parallel cameras 

or image rectification should be used. 

ADAPTATION AND SENSORY INTEGRATION OF 

TOE-IN INDUCED VERTICAL DISPARITY 

The human visual system relies on a variety of monocular 

and binocular cues to judge distance and relative depth in a 

scene. The effects of toe-in induced horizontal and vertical 

disparities on depth and distance perception discussed above 

will be reduced when viewing a scene rich in these cues. The 

extent of the perceptual distortion depends on perceptual 

biases and the relative effectiveness of the various cues. For 

example, Bradshaw and Rogers 21 performed an experiment 

using dot displays to study size and depth scaling as a function 

of distance indicated by vertical disparities and vergence. 

They argued that use of vertical disparity information 

to drive size and depth constancy requires measuring the 

relevant disparity gradients over a fairly large retinal area 

whereas vergence signals, correlated with egocentric distance, 

could be obtained during binocular viewing of a point 

source of light. Accordingly, when displays were small, subjects 

responded as if they were scaling the stimulus appropriate 

for the distance indicated by vergence; when displays 

were large subjects responded as if they were scaling the 

stimulus appropriate for the distance indicated by vertical 

disparity. When other cues reliably indicate a different distance 

than toe-in induced vertical disparities the effect of the 

latter on depth and size perception may be small. However, 

latent, even imperceptible, cue conflicts are believed to be a 

causal factor in simulator sickness symptoms such as eye 

strain and nausea. 5 

When sensory conflict is persistent, the visual system 

shows remarkable ability to adapt or recalibrate. Following 

prolonged viewing of a test stimulus that appears curved due 

to keystone-type vertical disparity transformations a nominally 

flat stimulus appears curved in the opposite direction. 

Duke and Wilcox 28 have claimed this adaptation is driven by 



the curvature in depth induced rather than by the vertical 

disparities directly. In general, such an aftereffect can reflect 

“habituation” or “fatigue” of mechanisms sensitive to the 

adapting pattern, or from a recalibration of the vertical disparity 

signal, or a change in the relative weighting of cues 

driving depth constancy. At the present time it is unclear 

which of these adaptive changes can be produced by prolonged 

exposure to keystone patterns of vertical disparity. 

The effects of vertical disparities induced by toe-in convergence 

also depends on context and may differ depending 

on the type of task being performed by the subject and 

whether they involve size constancy, depth constancy, absolute 

distance judgements or other spatial judgements. For 

example, Wei et al. 29 reported that full-field vertical disparities 

are not used to derive the distance dependent gain term 

for the linear vestibulo-ocular reflex, a reflexive eye movement 

that compensates for head movements, under conditions 

where vertical disparities drive depth constancy. 

CONCLUSIONS 

In conclusion, we concur with conventional wisdom that 

horizontal image translation is theoretically preferred to 

toe-in convergence with parallel stereoscopic displays. 

Toed-in camera convergence is a convenient and often used 

technique that is often well-tolerated 24 despite the fact that it 

theoretically and empirically results in geometric distortion 

of stereoscopic space. The distortion of stereoscopic space 

should be more apparent in fused or augmented reality displays 

where the real world serves as a reference to judge the 

disparity distortion introduced by the toe-in technique. In 

these cases, and for near viewing when the distortions are 

large, the distortions may be ameliorated through camera 

rectification techniques 15,30 if resampling of the images is 

practical. 

It has been asserted by others that, since camera convergence 

through toe-in introduces vertical disparities into 

the stereoscopic imagery it should give rise to more natural 

or accurate distance perception than the parallel camera 

configuration. We have argued in this paper that these assertions 

are theoretically unfounded although vertical disparity 

gradients are an effective cue for depth and size constancy 

that could be used by creators of stereoscopic content. The 

geometrical distortions predicted from the artifactual horizontal 

disparities created by camera toe-in may be countered 

by opposite distortions created from the vertical disparities. 

However, when displayed on a single projector or monitor 

display the vertical disparity gradients introduced by 

unrectified, toed-in cameras do not correspond to the gradients 

experienced by a real user viewing a scene at the 

camera convergence distance. This is because the keystoning 

due to the camera toe-in is superimposed upon the natural 

vertical disparity pattern at the eyes. 

Our analysis and data 27 implies that stereoscopic 

display/camera systems that fuse or superimpose multiple 

stereoscopic images from a number of sensors should be 

more susceptible to toe-in induced fusion and depthdistortion 

problems than displays that present a single stereoscopic 

image stream. Analysis of toe-in induced vertical 

disparity reinforces the recommendation that rectification of 

the stereoscopic imagery should be considered for fused stereoscopic 

systems such as augmented reality displays or enhanced 

vision systems that require toed-in cameras to view 

targets at short distances. 


The support of the Ontario Centres of Excellence and 

NSERC Canada is gratefully acknowledged. An abbreviated 

version of this paper was presented at IST/SPIE Electronic 

Imaging 2004 [R. Allison, Proc. SPIE 5291, 167–178 (2004)]. 

REFERENCES 

1 A. Woods, T. Docherty, and R. Koch, Proc. SPIE 1915, 36 (1993). 

2 D. B. Diner and D. H. Fender, Human Engineering in Stereoscopic 

Viewing Devices (Plenum Press, New York and London, 1993). 

3 L. Lipton, Foundations of the Stereoscopic Cinema (Van 

Nostrand–Reinhold, New York, 1982). 

4 Z. Wartell, L. F. Hodges, and W. Ribarsky, IEEE Trans. Vis. Comput. 

Graph. 8(2), 129 (2002). 

5 J. P. Wann, S. Rushton, and M. Monwilliams, Vision Res. 35(19), 2731 

(1995). 

6 I.P.HowardandB.J.Rogers,Depth Perception (I. Porteous, Toronto, 

2002). 

7 L. Lipton, The Stereographics Developers Handbook (Stereographics 

Corp., San Rafael, CA, 1997). 

8 H. von Helmholtz, Physiological Optics, English translation by J. P. C. 

Southall from the 3rd German edition of Handbuch der Physiologischen 

Optik, Vos, Hamburg (Dover, New York, 1962). 

9 B. J. Rogers and M. F. Bradshaw, Nature (London) 361, 253 (1993). 

10 J. Garding, J. Porrill, J. E. W. Mayhew, and J. P. Frisby, Vision Res. 35(5), 

703 (1995). 

11 M. Siegel and S. Nagata, IEEE Trans. Circuits Syst. Video Technol. 10(3), 

387 (2000). 

12 A. State, J. Ackerman, G. Hirota, J. Lee, and H. Fuchs, Proc. International 

Symposium on Augmented Reality (ISAR) 2001 (IEEE, Piscataway, NJ, 

2001) pp. 137–146. 

13 V. S. Grinberg, G. Podnar, and M. W. Siegel, Proc. SPIE 2177, 56 (1994). 

14 A. State, K. Keller, and H. Fuchs, Proc. International Symposium on 

Mixed and Augmented Reality (ISMAR) 2005 (IEEE, Piscataway, NJ, 

2005) pp. 28–31. 

15 N. Dodgson, Proc. SPIE 3295, 100 (1998). 

16 S. Takagi, S. Yamazaki, Y. Saito, and N. Taniguchi, Proc IEEE & ACM 

ISAR 2000 (IEEE, Piscataway, NJ, 2000) pp. 68–77. 

17 B. Gillam and B. Lawergren, Percept. Psychophys. 34(2), 121 (1983). 

18 I. P. Howard, Psychonomic Monograph Supplements 3, 201 (1970). 

19 B. Gillam, D. Chambers, and B. Lawergren, Percept. Psychophys. 44, 473 

(1988). 

20 J. E. W. Mayhew and H. C. Longuet-Higgins, Nature (London) 297, 376 

(1982). 

21 M. F. Bradshaw, A. Glennerster, and B. J. Rogers, Vision Res. 36(9), 1255 

(1996). 

22 L. Perez-Bayas, Proc. SPIE 4297, 251 (2001). 

23 K. N. Ogle, Researches in Binocular Vision (Hafner, New York, 1964). 

24 L. B. Stelmach, W. J. Tam, F. Speranza, R. Renaud, and T. Martin, Proc. 

SPIE 5006, 269 (2003). 

25 I. P. Howard, X. Fang, R. S. Allison, and J. E. Zacher, Exp. Brain Res. 

130(2), 124 (2000). 

26 F. Speranza and L. Wilcox, Proc. SPIE 4660, 18 (2002). 

27 R. S. Allison, I. P. Howard, and X. Fang, Vision Res. 40(21), 2985 (2000). 

28 P. A. Duke and L. M. Wilcox, Vision Res. 43(2), 135 (2003). 

29 M. Wei, G. C. DeAngelis, and D. E. Angelaki, J. Neurosci. 23, 8340 

(2003). 

30 O. Faugeras and Q. Luong, The Geometry of Multiple Images (MIT 

Press, Cambridge, MA, 2001). 




Improved B-Spline Contour Fitting Using Genetic 

Algorithm for the Segmentation of Dental Computerized 

Tomography Image Sequences 

Xiaoling Wu, Hui Gao, Hoon Heo, Oksam Chae, Jinsung Cho, Sungyoung Lee and Young-Koo Lee 

Department of Computer Engineering, Kyunghee University, 1 Seochun-ri, Kiheung-eup, Yongin-si, 

Kyunggi-do, 449-701, South Korea 

E-mail: yklee@khu.ac.kr 

Abstract. In the dental field, 3D tooth modeling, in which each tooth 

can be manipulated individually, is an essential component of the 

simulation of orthodontic surgery and treatment. However, in dental 

computerized tomography slices teeth are located closely together 

or inside alveolar bone having an intensity similar to that of teeth. 

This makes it difficult to individually segment a tooth before building 

its 3D model. Conventional methods such as the global threshold 

and snake algorithms fail to accurately extract the boundary of each 

tooth. In this paper, we present an improved contour extraction algorithm 

based on B-spline contour fitting using genetic algorithm. 

We propose a new fitting function incorporating the gradient direction 

information on the fitting contour to prevent it from invading the 

areas of other teeth or alveolar bone. Furthermore, to speed up the 

convergence to the best solution we use a novel adaptive probability 

for crossover and mutation in the evolutionary program of the genetic 

algorithm. Segmentation results for real dental images demonstrate 

that our method can accurately determine the boundary for 

individual teeth as well as its 3D model while other methods fail. 

Independent manipulation of each tooth model demonstrates the 

practical usage of our method. © 2007 Society for Imaging Science 

and Technology. 


INTRODUCTION 

The accurate 3D modeling of the mandible and the simulation 

of tooth movement play an important role in preoperative 

planning for dental and maxillofacial surgery. The 3D 

reconstruction of the teeth can be used in virtual reality 

based training for orthodontics students and for preoperatory 

assessment by dental surgeons. For 3D modeling tooth 

segmentation to extract the individual contour of a tooth is 

of critical importance. Automated tooth segmentation methods 

from 3D digitized images have been researched for the 

measurement and simulation of orthodontic procedures. 1 

These methods provide interstices along with their locations 

and orientations between the teeth for segmentation result. 

However, it does not give individual tooth contour information 

which manifests more details that are helpful in dental 

study. A thresholding method, used in the existing segmentation 

and reconstruction systems, is known to be efficient 

for automatic hard tissue segmentation. 2,3 Some morphological 

filtering methods are used for creating intermediary 


1062-3701/2007/514/328/9/$20.00. 

slices by interpolation for modeling teeth in 3D. 4 The morphological 

operations are also combined with the thresholding 

method for dental segmentation in x-ray films. 2 However, 

neither the thresholding method nor the morphological 

filtering method is suitable for separating individual tooth 

regions using tooth computerized tomography (CT) slices, 

because some teeth touch each other and some are located 

inside of alveolar bone with a CT slice intensity profile similar 

to teeth. 5 A modified watershed algorithm was suggested 

to create closed-loop contours of teeth while alleviating the 

over-segmentation problem of the watershed algorithm. 5 Although 

this reduces the number of regions significantly, it 

still produces many irrelevant basins that make it difficult to 

define an accurate tooth contour. A seed-growing segmentation 

algorithm 6 was suggested based on B-spline fitting for 

arbitrary shape segmentation in sequential images. The best 

contour of an object is determined by fitting the initial contour 

passed by previous frame to the edges detected in the 

current frame. For the fitting operation, the objective function 

defined by the sum of distances between the initial contour 

and the object edges is used. For this algorithm to work 

properly, the complete object boundary should be extracted 

by global thresholding and the object should be located 

apart from other objects. If other objects are located nearby 

as in the case of the tooth CT image, the shape of the initial 

contour should be very close to the actual object contour to 

prevent being fitted to the boundaries of the nearby objects. 

Many snake algorithms have been proposed for medical 

image analysis applications. 7–10 However, in the CT image 

sequence where objects are closely located, the classical snake 

algorithms have not yet been successful due to difficulties in 

initialization and the existence of multiple extrema. It is only 

successful when it is initialized close to the structure of interest 

and there is no object which has similar intensity values 

to those of interest. 7 The snake models for object 

boundary detection search for an optimal contour that minimizes 

(or maximizes) an objective function. The objective 

function generally consists of the internal energy representing 

the properties of a contour shape and the external potential 

energy depending on the image force. The final shape 

of the contour is influenced by how these two energy terms 

are represented. However, many snakes tend to shrink when 

328

Wu et al.: Improved B-spline contour fitting using genetic algorithm for the segmentation of dental... 

its external energy is relatively small due to the lack of image 

forces. 7 Some snakes also suffer from the limited flexibility of 

representing the contour shape and a large number of derivative 

terms in their internal energy representation. A 

B-spline based snake has been developed as a B-spline snake 

and B-snake to enhance the geometric flexibility and optimization 

speed by means of a small number of control 

points instead of snaxels. 11,12 B-spline snake controls contour 

shapes by a stiffening parameter as well as its control 

points, and detects object boundaries in noisy environments 

by using gradient magnitude information instead of edge 

information. This algorithm introduces a stiffening factor to 

the B-spline function 13 that varies the spacing between the 

spline knots and the number of sampled points used during 

the evaluation of the objective function. In addition, the 

factor controls the smoothness of curve and reduces the 

computation of the cost function. Although the algorithm 

was proposed to extract the contour of a deformable object 

in a single image, it can be applied to the tooth segmentation 

in CT slices. However, in tooth CT data, the algorithm may 

cause the contour of a tooth to expand to include contours 

of nearby teeth and alveolar bone, or it may cause the contour 

to be contracted to a small region. 

A B-spline fitting algorithm employing a genetic algorithm 

(GA) was used to overcome local extrema indwelling 

in the vicinity of an object of interest. 14–17 In this case, it was 

shown that the GA does not require exhaustive search while 

avoiding high-order derivatives for curve fitting or matching 

problems. 18,19 However, the conventional GA-based B-spline 

fitting still suffers from the influence of other objects and 

often fails to extract the object boundary from the image 

sequences when similar objects are adjacent to each other. 

In this paper, we propose an improved B-spline contour 

fitting algorithm using a GA to generate a smooth and accurate 

tooth boundary for the 3D reconstruction of a tooth 

model. We devise a new B-spline fitting function by incorporating 

the gradient direction information on the fitting 

contours to search the tooth boundary while preventing it 

from being fitted to neighboring spurious edges. We also 

present an evolution method to accelerate the search speed 

by means of automatic and dynamic determination of GA 

probabilities for crossover and mutation. Experimental results 

show that our method can successfully extract the individual 

tooth boundary, compared with other methods 

which fail to do so. 

BACKGROUND 

Dental CT images have the following two distinct characteristics: 

(1) An individual tooth often appears with neighboring 

hard tissues such as other teeth and alveolar bone, and 

(2) these neighboring hard tissues have the same or similar 

intensity values to the tooth of interest. Thus, the fixed 

threshold value for each tooth in each slice is not effective as 

shown in Figure 1. When we try to obtain a tooth region by 

thresholding method, the lower and upper limits of a threshold 

value can be displayed at each slice for a given tooth by 

the two curves in Fig. 1. Any threshold value within the limit 

Figure 1. Threshold values for a certain tooth computed at different slices 

by manual. 

produces the tooth region with the accuracy better than 

90%. It shows us that individual segmentation method is 

required for each tooth in each slice. 

There are many segmentation methods, each of which 

have their own limitations in separating individual tooth 

regions on CT images. 3–6 An optimal thresholding scheme 20 

can be attempted by taking advantage of the fact that the 

shape and intensity of each tooth changes gradually through 

the CT image slices. 

However, even if an optimal threshold is determined for 

every slice, the result of the segmentation is found unsatisfactory 

because of neighboring hard tissue. For the 3D reconstruction 

of an individual tooth model, the tooth boundary 

needs to be defined more precisely. 

B-Spline Contour Fitting 

The B-spline curve has attractive properties for the representation 

of an object contour with arbitrary shape. They are 

also suitable for the curve fitting process and are summarized 

as follows. 

• An object of any shape, including those subsuming angular 

points, can be represented by a set of control 

points, a knot sequence, and a basis function. The shape 

of the contour can be adjusted by simply repositioning 

the control points in many fitting problems where the 

knot sequence and basis function can be fixed. 

• Little else remains to be different in the shape of the 

contour by deducting the number of control points 

within some tolerable limit for the purpose of reducing 

information needed for fitting process. This allows the 

fitting process to be faster with fewer variables over 

which to optimize. 

We choose the uniform cubic closed B-spline curve, 

shown as follows in Eqs. (1) and (2), to describe the object 

contours in the image. 

rs = r xs 

r y s = 

n−1 

x i B 0 s − i 

i=0 

n−1 

y i B 0 s − i, 1 

 

i=0 



B 0 s =s 3 /2 − s 2 + 2/3 if t 0 s t 1 , 

2−s 3 /6 if t 1 s t 2 , 

0 otherwise 

In the equations, rs represents the coordinate of a contour 

pixel at a specific value of parameter s and x i ,y i represents 

coordinates of ith control point. The B-spline basis functions 

are translated copies of B 0 s. In the case of tooth 

segmentation we use a closed uniform knot sequence, as 

t 0 ,t 1 ,...,t n =0,1,...,n and t 0 =t n where n is the total 

number of the control points. 

The B-spline fitting function f is represented in Eq. (3) 

(Ref. 11) as follows: 

M−1 

f = Irs k , 

k=0 

where M is the total number of contour points. The fitting 

function is maximized when the contour conforms to the 

object boundary. The B-spline fitting function makes use of 

only external force computed based on the gradient magnitude 

on the contour. The smoothness constraint is implicitly 

represented by the B-spline itself. 

B-spline Contour Fitting using Genetic Algorithm 

The genetic algorithm is a probabilistic technique for searching 

for an optimal solution. The optimal solution is described 

by a vector, called a “chromosome,” which can be 

obtained by maximizing a fitting function. Hence the definition 

of the fitting function significantly affects the solution 

state. A sequence of evolutionary operations is repeated for a 

chromosome to evolve to its final state. The end of the evolutionary 

operation is determined by checking the fitness 

values, which represent the goodness of each chromosome in 

the population. 

A chromosome is a collection of genes, and a gene represents 

the control point of B-spline. Since the chromosome 

represents a complete contour and a gene uses the actual 

location of a control point, the search algorithm has neither 

ambiguity on the contour location nor potential bias to particular 

shapes. To reduce the size of a gene, we use the index 

value as a gene, instead of two coordinate values. 16,17 Composing 

a search area based on the indices provides a search 

area with arbitrary shape, where it is confined to search for 

the final position of the control point to be found out. This 

scheme of chromosome guarantees that gene information 

does not spread over the chromosome, which results in short 

length and order of schema. 16 Accordingly, there is a high 

probability to converge fast. A new generation is made 

through the sequence of evolutionary operations and, during 

the evolutionary processes, crossover and mutation steps affect 

the quality and speed of final solution significantly. 

2 

3 

IMPROVED B-SPLINE CONTOUR FITTING USING 

GENETIC ALGORITHM 

Fitting Function Based on Gradient Magnitude and 

Direction 

The fitting function measures the fitness of the possible contour 

to the object boundary in the current slice. The fitness 

value is the basis for determining the termination of the 

evolutionary process and selecting elite chromosomes for 

mating pool generation. In the existing active contour models, 

the fitting function consists of the internal forces controlling 

the smoothness of the contour and the external force 

used for representing the object boundary information in 

the image. 7,12 One drawback of this representation is that it 

requires the determination of the weight values balancing 

these two components. 

B-spline snake makes use of a simple fitting function 

with only external force computed based on the gradient 

magnitude on the contour. The internal force terms are replaced 

by using a stiffening parameter and implicit smoothness 

constraint of the B-spline representation of a contour. 

However, in the image data such as the tooth CT image 

slices, those fitting functions often generate the contour fitted 

to the boundary of nearby object. They also generate the 

contour contracted to a small region unless the stiffening 

parameter is set properly. 

Note that the magnitude of the intensity difference may 

vary between the inside and outside of an object contour. 

However, if the relative intensity between two sides of a contour 

is maintained throughout the contour, the sign of the 

intensity difference made by two sides is inverted when the 

contour expands out to the boundary of another object. 

Hence, when fixing moving direction of parameter s along 

the curve, we are able to have knowledge of which side is 

inside (or outside) in advance. This enables us to know 

whether the contour is fitted to the object of interest or other 

adjacent objects. In this paper, the fitting function to be 

maximized is designed to take advantage of this property of 

the data. This gradient direction information allows the fitness 

function to penalize the portion of a contour fitted to 

the neighboring object. 

To compute the fitness value for a possible solution (or 

chromosome), we first generate the contour points from the 

B-spline representation of the solution and trace the contour 

as shown in Figure 2(a). At the kth contour point rs k ,a 

unit normal vector ns k is computed. Next, the inner region 

i 

and outer region pixel location p k and p o k , respectively, are 

identified by using ns k computed at the kth point rs k 

according to 

and 

p k o = rs k + ns k 

p k i = rs k − ns k . 

Then, the fitness value is determined based on gradient 

magnitude and direction information, k ,ateachcontour 

point according to 

4 

5 



Figure 2. a Definition of inner and outer regions. b Illustration for 

fitting function—right object is of interest, with adjacent left object, and 

thick black curve is a fitting curve. c Twisted contour. 

where 

and 

k = 

M−1 

f = k − k , 

k=0 

k =Irs k if Ip k i − Ip k o 0, 

− Irs k if Ip k i − Ip k o 0, 

C, rs k = rs j 

, ∀ j 0,1, ... ,M −1 ∧ j k. 

0, rs k rs j 

Ip k i and Ip k o are intensity values of the inside and outside 

of the kth contour point, respectively. This equation is further 

illustrated by Fig. 2(b), where some portion of the contour 

attaches to another object and in this portion 

Ip k i Ip k o , so we assign the negative gradient magnitude 

to penalize the fitness value. The figure also shows that in 

other portions the contour correctly conforms to the tooth 

boundary and in these portions Ip k i Ip k o , so we assign 

the positive gradient magnitude to the fitness value. Note 

that when there is no difference of gradient direction, which 

may happen if inner and outer pixel values are identical, 

then Ip k i =Ip k o . This aims at preventing the contour from 

being misfitted when the contour lies inside an object region 

having uniform intensity values, such as the inside region of 

a tooth. 

A constant-valued penalty C is deducted from the fitness 

value when the contour is twisted as shown in Fig. 2(c). 

Our experimental results showed that setting the penalty too 

high hindered searching the contour maximizing the sum of 

gradient magnitudes. The proposed fitting method yields the 

best performance when C is set to around 0.1% of the sum 

of gradient magnitudes. 

6 

Improved Adaptive Evolutionary Operations 

The evolutionary process generates a new population of possible 

solutions through the following three genetic operators: 

reproduction (or selection), crossover, and mutation. The 

selection operation constructs the mating pool from the current 

population for the crossover operation. The results presented 

here use a tournament selection scheme. 16 The crossover 

operation generates two child chromosomes by 

swapping genes between the two parent chromosomes. In 

this paper we present one point cutting scheme by improved 

adaptive crossover probability. We also use an adaptive mutation 

probability scheme for our evolutionary process. 

The conventional GA generally uses fixed crossover and 

mutation probabilities. Adaptive genetic algorithm 21 (AGA) 

was proposed by Srinivas et al. that uses variable crossover 

and mutation probabilities that are determined automatically 

based on fitness values during fitting process for fast 

convergence rate. The probabilities for evolution are, therefore, 

no longer required to be set to constants. At the beginning 

stage of the fitting process, we consider all the possibilities 

of control point locations in the search area. As the 

process goes on, we obtain the evolutionary probabilities 

such that the possible solution near the optimal solution 

quickly converges to the actual solution. In AGA, 21 the crossover 

probability is adaptively determined depending on the 

fitness value f, according to 

f best − f 

1 , f f avg , 

p c =k f best − f avg 

k 2 , f f avg , 

where f best and f avg are the best and average fitness values in 

the mating pool, respectively, and k 1 and k 2 are constants 

and set to 1.0. Hence, if f=f best when ff avg , f is preserved, 

although the value of k 1 ensures high occurrence of crossover. 

If ff avg , crossover is operated without exceptions, 

since its corresponding chromosome has low fitness value. 

The mutation operation is also implemented by using 

the mutation probability p m as follows: 

f best − f 

3 , f f avg , 

p m =k f best − f avg 

k 4 , f f avg , 

where k 3 and k 4 are constants set to 0.5. As in the case of 

crossover, the mutation operation does not affect the chromosome 

with the best fitness value. However if ff avg its 

mutation operation takes place with the most ambiguity 

since k 3 =0.5. 

In this paper we propose an improved adaptive crossover 

probability. To maintain the solution with high fitness 

value, we generate a random number p r and consider the 

relationship of p r with p c1 and p c2 ,wherep c1 and p c2 denote 

crossover probabilities generated from two parent chromosomes, 

father chromosome and mother chromosome respectively. 

When two parent chromosomes are selected, two children 

are generated as follows. 

7 

8 

(1) Generate a random number p r between0and1to 

determine the adaptive crossover probability, generate 

a random number p l between0and1todetermine 

the crossing site, and generate a random 



number p s between 0 and 1 to determine which 

side of the crossing site the portion of the chromosome 

should exchange with the corresponding portion 

of its mate. 

(2) Replace f in Eq. (7) by the fitness value of each 

parent for computing the crossover probabilities, 

p c1 and p c2 . 

(3) If p r p c1 and p r p c2 , put the two parents to the 

next generation without change. 

(4) If p r is between p c1 and p c2 , thus p c1 p c2 and 

p s 0.5 then the left portion of the father chromosome 

should be exchanged with the corresponding 

portion of the mother chromosome to generate one 

child and put mother chromosome directly to the 

generation as another child. If p s 0.5 then the 

right portion from the father chromosome should 

be exchanged to generate one child and another 

child is a copy of the mother chromosome. Similarly 

if p c1 p c2 then the mother chromosome 

should be changed and put to the next generation 

while the father chromosome is put to the next 

generation without any change. In addition, the 

crossover scheme is determined by the value of p s . 

(5) If p r is less than both p c1 and p c2 , generate two child 

chromosomes as the normal crossover method 

does. 

In the proposed operation, the chromosomes with high 

fitness values can survive until a new chromosome with 

higher fitness is created. It supports rapid searching for an 

optimal solution by taking advantage of the crossover 

scheme swapping either side to the crossing site. 

EXPERIMENTAL EVALUATION 

We tested the proposed contour segmentation with two 

kinds of sets of data: synthetic images and two sets of real 

dental CT image sequences with a slice thickness of 0.67mm 

and 1mm and x-y resolution of 512512. Visual C++ with 

DICOM libraries 22 for reading 16-bit CT images and the 3D 

graphics library OpenGL were used as tools to implement 

the proposed algorithm. CT images are saved in DICOM 

format, an international standard for medical images, after 

acquisition through the commercially available Shimadzu 

Ltd. SCT-7800 CT scanner. The test data were prepared to 

reveal the capability of the proposed algorithm in finding an 

accurate boundary among many similar objects nearby. We 

compared the proposed algorithm with the existing B-spline 

snake algorithm that uses the gradient magnitude based external 

force in the fitting function. 11 

First, we applied these algorithms to a synthetic image 

similar to a tooth surrounded by alveolar bone. To generate 

the results, we constructed a B-spline contour with 8 control 

points and selected 20 initial chromosomes for each 

4040 window. For the following examples of B-spline 

snake the stiffening parameter is set to 2. As shown in Figure 

3, the proposed algorithm extracts an accurate object 

boundary while the existing B-spline snake fails. 

We also applied the two algorithms to real CT image 

Figure 3. Contours extracted from the synthetic data number of control 

points CP=8. a By B-spline snake method. b By the proposed 

method. 

sequences where an individual tooth often appears with 

neighboring hard tissues such as other teeth and alveolar 

bone. If too many control points are used for a contour, it 

reduces the smoothing effect on the curve and consequently 

generates twisted parts of contour as shown in Figure 4. 

Figure 5 shows part of test results using different set of slices, 

which have lower resolution. Since the test image is small, a 

1010 search area suffices for a control point. 

As shown in Fig. 5, an individual tooth often appears 

with neighboring hard tissues such as other teeth and alveolar 

bone, and the proposed algorithm produces better results 

than B-spline snake. The difference in the results stems from 

the fitting function. 

Part of the segmentation results of slice sequences is 

shown in Figure 6 and those of a molar having a more 

complicated shape are shown in Figure 7. In Fig. 6, the figures 

at the far left side show the results of teeth initialization 

for the first slice by applying a proper threshold to each 

tooth interactively. As the segmentation is performed slice by 

slice, in contrast with the results of proposed method, malfitting 

error contained in the results of the existing method 

increases. 



Figure 4. Tooth contours extracted from CT image CP=16. a By the 

proposed method. b By B-spline snake. 

Table I lists part of the numerical results of the segmentation. 

N is the number of slices over which each tooth 

spans. FPE (false positive error) is the percent of area reported 

as a tooth by the algorithm, but not by manual segmentation. 

FNE (false negative error) is the percent of area 

reported by manual segmentation, but not by the algorithm. 

Similarity and dissimilarity indices, 23,10 which show the 

amount of agreement and disagreement, S agr and S dis ,respectively, 

between the algorithm area A alg and the manual 

segmentation area A man , are computed according to 

S agr =2 A man A alg 

A man + A alg 

, 

9 

S dis =2 A man A alg − A man A alg 

A man + A alg 

. 10 

Figure 5. Tooth contours extracted from CT image sequence CP=8. 

a By the proposed method. b By B-spline snake. 

These indices are calculated for validation on N slices of 

each tooth. Averaged values of S agr as well as its minimum 

and maximum values are shown in Table I, and we conclude 

that the proposed method for segmentation isolates individual 

region of tooth successfully, in contrast with the results 

of B-spline snake shown in Table II. 

The proposed fitting method is designed for the fast 

contour extraction by the improved crossover method which 

uses a random number for copying genes of a superior chromosome 

to an inferior one when the random number falls 

into the range of crossover probabilities of its parents, p c1 

and p c2 . Furthermore, the proposed crossover method decides 

which part of crossing site will be exchanged between 

parent chromosomes. The decided part fosters chromo- 



Table I. Segmentation results for 8 teeth of the proposed method from the same scans 

of CT set. 

Tooth N FPE% FNE% S agr S min S max S dis 

Figure 6. Tooth contours extracted from CT image sequence CP=16. 

a By the proposed method. b By B-spline snake. 

1 20 4.43 8.37 0.935 0.915 0.977 0.131 

2 22 7.88 3.45 0.945 0.916 0.973 0.111 

3 25 8.96 4.48 0.935 0.901 0.968 0.131 

4 24 8.46 6.47 0.926 0.905 0.970 0.148 

5 27 5.81 8.29 0.929 0.917 0.967 0.143 

6 26 2.07 7.05 0.953 0.923 0.971 0.094 

7 25 5.21 3.79 0.955 0.927 0.976 0.089 

8 23 5.69 1.42 0.965 0.932 0.983 0.069 

Table II. Segmentation results for 8 teeth of B-spline snake from the same scans of CT 

set. 

Tooth N FPE% FNE% S agr S min S max S dis 

1 20 6.12 27.21 0.814 0.574 0.952 0.373 

2 22 26.01 1.16 0.879 0.628 0.956 0.241 

3 25 45.86 11.28 0.756 0.316 0.897 0.487 

4 24 29.89 4.59 0.842 0.764 0.941 0.313 

5 27 28.06 8.06 0.836 0.726 0.933 0.328 

6 26 15.09 8.81 0.884 0.818 0.948 0.232 

7 25 27.98 5.03 0.852 0.755 0.936 0.296 

8 23 10.12 3.89 0.932 0.771 0.972 0.136 

Figure 8. Comparison of convergence rates. 

Figure 7. Extracted contours of molar CP=32. a By the proposed 

method. b By B-spline snake. 

somes to be competent with a high fitness value. We implement 

two genetic B-spline fittings with existing crossover 

methods to analyze the performance of the proposed crossover. 

Both existing methods generate the initial population 

randomly, with uniform distribution, while using different 

crossover methods. “Method A” uses a fixed p c of 0.75 and 

“Method B” uses AGA, which determines p c adaptively. Figure 

8 compares the convergence rate of the proposed crossover 

method with those of the existing methods in terms of 

the fitness value along chromosome generation. The figure 

shows that the proposed crossover method results in a better 



Figure 10. Manipulation of tooth. a Every tooth can be manipulated. 

b Simulation of having tooth out. 

Figure 9. Wireframe models of tooth and mandible. a 3D reconstruction 

of tooth. b 3D reconstruction of mandible. 

convergence rate than either method A or B. The proposed 

crossover method preserves the chromosomes with high fitness 

for fast convergence and the results shows it is effective 

to randomly select either side to crossing site for improved 

crossover operation. 

Individual segmentation of all teeth can be used to reconstruct 

a model of the mandible, as shown in Figures 9 

and 10. Every tooth can be separated from the jaw for simulation 

of dental treatments. 

CONCLUSIONS 

In this paper, we presented the improved genetic B-spline 

curve fitting algorithm for extracting individual smooth 

tooth contours from CT slices while preventing the contour 

from being twisted. This enables us to obtain individual accurate 

contours of teeth by overcoming the problem of the 

contour of a tooth expanding out to other teeth boundaries 

in the fitting process. Furthermore, we also devised the 

crossover method which accelerates convergence rate by 

means of both conserving chromosomes with high fitness 

value and allowing exchange of either side of cross site. The 

test results show that the proposed segmentation algorithm 

successfully extracts a smooth tooth contour under specific 

conditions such as the existence of objects in close vicinity. 

This paper also demonstrated the possibility of reconstruction 

of a 3D model in which each tooth was modeled 

and manipulated separately for the simulation of dental surgery. 

These anatomical 3D models can be used for facilitating 

diagnoses, pre-operative planning and prosthesis design. 

They will provide radiography of the mandible, an accurate 

mechanical model of the individual tooth and that of its root 

for endodontics and orthodontic operations. Hence the 3D 

reconstruction of the teeth can be used in virtual reality 

based training for orthodontics students and for preoperatory 

assessment by dental surgeons. 


This research was supported by the MIC (Ministry of Information 

and Communication), Korea, under the ITRC (Information 

Technology Research Center) support program 

supervised by the IITA (Institute of Information Technology 

Advancement) (IITA-2006–C1090–0602–0002). The authors 

are grateful to K. Blankenship and Y. Blankenship for their 

effort in proofreading this paper. 

REFERENCES 

1 T. Kondo, S. H. Ong, and K. W. C. Foong, “Tooth segmentation of 

dental study models using range images”, IEEE Trans. Med. Imaging 23, 



350–362 (2004). 

2 E. H. Said, D. E. M. Nassar, G. Fahmy, and H. H. Ammar, “Teeth 

segmentation in digitized dental X-ray films using mathematical 

morphology”, IEEE Trans. Inf. Forensics Security 1, 178–189 (2006). 

3 J. H. Ryu, H. S. Kim, and K. H. Lee, “Contour based algorithms for 

generating 3D CAD models from medical images”, Int. J. Adv. Manuf. 

Technol. 24, 112–119 (2004). 

4 A. G. Bors, L. Kechagias, and I. Pitas, “Binary Morphological Shape- 

Based Interpolation Applied to 3-D Tooth Reconstruction”, IEEE Trans. 

Med. Imaging 21, 100–108 (2002). 

5 G. Bohm, C. Knoll, V. G. Colomer, M. Alcaniz-Raya, and S. Albalat, 

“Three-dimensional segmentation of bone structures in CT images”, 

Proc. SPIE 3661, 277–286 (1999). 

6 S. Liu and W. Ma, “Seed-growing segmentation of 3D surfaces from 

CT-contour data”, Computer-Aided Design 31, 517–536 (1999). 

7 M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour 

models”, Int. J. Comput. Vis. 1, 321–331 (1988). 

8 C. Han and W. S. Kerwin, “Detecting objects in image sequences using 

rule-based control in an active contour model”, IEEE Trans. Biomed. 

Eng. 50, 705–710 (2003). 

9 A. Klemencic, S. Kovacic, and F. Pernus, “Automated segmentation of 

muscle fiber images using active contour models”, Cytometry 32, 

317–326 (1998). 

10 J. Klemencic, V. Valencic, and N. Pecaric, “Deformable contour based 

algorithm for segmentation of the hippocampus from MRI”, in Proc. 9th 

International Conference on Computer Analysis of Images and Patterns, 

Lect. Notes Comput. Sci. 2124, 298–308 (2001). 

11 P. Brigger, J. Hoeg, and M. Unser, “B-Spline snakes: A flexible tool for 

parametric contour detection”, IEEE Trans. Image Process. 9, 1484–1496 

(2000). 

12 S. Menet, P. Saint-Marc, and G. Medioni, “Active contour models: 

overview, implementation and applications. Systems”, in Proc. Man and 

Cybernetics (IEEE Press, Piscataway, NJ, 1990), pp. 194–199. 

13 G. Farin, Curves and surfaces for CAGD, 4th ed. (Academic Press, New 

York, 1996), pp. 141–168. 

14 M. Cerrolaza, W. Annicchiarico, and M. Martinez, “Optimization of 2D 

boundary element models using B-splines and genetic algorithms”, in 

Engineering Analysis with Boundary Elements (Elsevier, Oxford, 2000), 

Vol. 24, pp. 427–440. 

15 W. Annicchiarico and M. Cerrolaza, “Finite elements, genetic algorithms 

and B-splines: a combined technique for shape optimization”, Finite 

Elem. Anal. Design 33, 125–141 (1999). 

16 C. Ooi and P. Liatsis, “Co-evolutionary-based active contour models in 

tracking of moving obstacles”, Proc. International Conference on 

Advanced Driver Assistance Systems (IEEE, London, 2001), pp. 58–62. 

17 L. A. MacEachern and T. Manku, “Genetic algorithms for active contour 

optimization”, Proc. IEEE Int. Sym. for Circuits and Systems (IEEE Press, 

Piscataway, NJ, 1998), pp. 229–232. 

18 M.-S. Dao, F. G. B. De Natale, and A. Massa, “Edge Potential Functions 

(EPF) and Genetic Algorithms (GA) for Edge-Based Matching of Visual 

Objects”, IEEE Trans. Multimedia 9, 120–135 (2007). 

19 L. Ballerni and L. Bocchi, “Multiple Genetic Snakes for Bone 

Segmentation”, in Proc. EvoWorkshops, Lect. Notes Comput. Sci. 2611, 

346–356 (2003). 

20 R. C. Gonzalez and R. E. Woods, Digital Image Processing (Addison 

Wesley, Reading, MA, 1993), pp. 447–455. 

21 M. Srinivas and L. M. Patnaik, “Adaptive probabilities of crossover and 

mutation in genetic algorithms”, IEEE Trans. Syst. Man Cybern. 24, 

656–667 (1994). 

22 OFFIS Institute for Information Technology website, http://www.offis.de/ 

index-e.php/, accessed October 2006. 

23 A. P. Zijdenbos, B. M. Dawant, R. A. Margolin, and A. C. Palmer, 

“Morphometric analysis of white matter lesions in MR images: Method 

and validation”, IEEE Trans. Med. Imaging 13, 716–724 (1994). 




Colorimetric Characterization Model for Plasma Display 

Panel 

Seo Young Choi, Ming Ronnier Luo and Peter Andrew Rhodes 

Department of Color & Polymer Chemistry, University of Leeds, Leeds, United Kingdom LS2 9JT 

E-mail: seoyoung228@googlemail.com 

EunGiHeoandImSuChoi 

PDP Division, Samsung SDI, 508 Sungsung-Dong, Chonan City, Chungchongnam-Do 330–300, South Korea 

Abstract. This paper describes a new device characterization 

model applicable to plasma display panels (PDP). PDPs are inherently 

dissimilar to cathode ray tube and liquid crystal display devices, 

and so new techniques are needed to model their color characteristics. 

The intrinsic properties and distinct colorimetric 

characteristics are first introduced followed by model development. 

It was found that there was a large deviation in colorimetric additivity 

and a variation in color due to differences in the number of pixels in 

a color patch (pattern size). Three colorimetric characterization 

models, which define the relationship between the number of sustain 

pulses and CIE XYZ values, were successfully derived for full 

pattern size: A three-dimensional lookup table (3D-LUT) model, a 

single-step polynomial model and a two-step polynomial model including 

three 1D LUTs with a transformation matrix. The single-step 

and two-step polynomial models having more than 8 terms and the 

3D LUT model produced the most accurate results. However, the 

single-step polynomial model was selected and extended to other 

pattern sizes because of its simplicity and good performance. Finally, 

a comprehensive model was proposed which can predict CIE 

XYZ at sizes different to that used for the training set. It was found 

that one combined training set formed using two different pattern 

sizes could give better results than a single-size training set. 

© 2007 Society for Imaging Science and Technology. 


INTRODUCTION 

Large size displays such as plasma display panel (PDP), liquid 

crystal display (LCD), and digital light processing (DLP) 

TV are promising candidates for replacing the cathode ray 

tube (CRT) displays that are currently in widespread use. 

Plasma display technology has the following characteristics: 

it is thin and light, has superior video image performance 

and uses a large screen size from 42 to 100 inches. These 

enable PDPs to be used inside stores for product promotion 

and, increasingly, for home theater. One of the desirable 

properties for large displays is to achieve a “lifelike” appearance 

under a range of practical viewing conditions as judged 

in terms of color and image quality. It is therefore important 

to investigate its colorimetric behavior and to make a characterization 

model based on the intrinsic physical properties 

of a PDP. Already, much research has been conducted to 

investigate the use of LCD and CRT technology, however 

relatively little work has been performed for plasma 

displays. 1–5 Only one paper deals with PDP characterization 

based on the gain-offset-gamma (GOG) model; (previously 

developed for CRT) at one pattern size, however the physical 

properties of PDPs and the pattern-size effect were not considered 

in this model. 6 In other words, the model could not 

be successfully extended to different pattern sizes. The International 

Electrotechnical Commission (IEC) has issued a 

standard IEC 61966-5 which includes methods and parameters 

for investigating the use of PDPs to display colored 

images in multimedia applications. 7 It does include the pattern 

size effect as a display area ratio characteristic, but 

changes in other characteristics such as color gamut due to 

pattern size were not considered. It assumes that a PDP’s 

RGB channels are independent. Unfortunately, PDPs typically 

exhibit significant additivity failure when compared to 

other display technologies. It is therefore essential that this 

additivity failure should be taken into account during the 

development of any device characterization model. 

The simplified structure of a PDP RGB cell is shown in 

Figure 1. A PDP is composed of two glass plates having a 

100 m gap and filled with a rare gas mixture which can 

 

IS&T member. 

Received Jul. 27, 2006; accepted for publication Mar. 1, 2007. 

1062-3701/2007/514/337/11/$20.00. 

Figure 1. The structure of a PDP’s RGB cells. 

337

Choi et al.: Colorimetric characterization model for plasma display panel 

and resultant CIE XYZ values at one particular pattern size. 

In addition, one of these models was extended to take into 

account other pattern sizes. 

Figure 2. A 16.7 ms frame includes 8 subfields. The black boxes are the 

durations of the sustain periods proportional to 1,2,4,…,128. 

include a 500 torr, Xe–Ne or Xe–Ne–He mixture which, 

when excited, results in the Xe atoms emitting vacuum ultraviolet 

(vuv) radiation at 147 and 173 nm, respectively. 

This vuv radiation then excites the red, green, and blue 

phosphors located on the rear glass plate. The discharge also 

emits red-orange visible light due to neon, which causes a 

subsequent reduction in color purity (see the Colorimetric 

characteristics of a PDP section). Each pixel has three individual 

RGB microdischarge cells. An alternating current 

(ac) is generated by dielectric barrier discharge operating in 

a glow regime to generate plasma in each cell. The ac voltage 

is approximately rectangular with a frequency in the order of 

100 kHz and a rise time of about 200–300 ns. 8 Different 

intensity levels are obtained via the modulation of the number 

of ac pulses (sustain pulses) in a discharge cell. For CRT 

and LCD, intensity levels are controlled differently from a 

PDP, i.e., according to voltage level. In addition, the luminance 

output of a PDP is dependent on the pattern size 

displayed, even when the RGB input values remain the same. 

The average level of input video signal—a product of RGB 

input values and pattern size—increases in proportion to the 

increase in pattern size. This is also accompanied by an increase 

in power consumption. As a result, there is a need to 

regulate the power consumed for large area patterns by 

means of the automatic power control (APC) function. Specifically, 

this regulates power consumption to within a certain 

upper limit. Moreover, luminance output is affected by 

the APC function and generates different values dependent 

on pattern size. 

This paper describes an investigation into the colorimetric 

characteristics of a PDP and the development of three 

device characterization models which describe the relationship 

between the number of sustain pulses of RGB input 

PHYSICAL PROPERTIES OF A PDP 

Overall Transfer Process of the Input Video Signal 

As mentioned earlier, one unique feature of PDP is that, for 

afixedRGB input, its luminance output varies according to 

the pattern size displayed. The average level of an input 

video signal increases not only in proportion to the RGB 

input but also to the increase in pattern size. Furthermore, 

power demand also grows, because a larger input video signal 

leads to a bigger discharge current in the PDP. To protect 

the electronic components from damage, it is necessary to 

impose a limit on power consumption. This is accomplished 

by controlling the discharge current. There are two methods 

for controlling this: adjusting the number of RGB sustain 

pulses and modifying the input level of the video signal. The 

PDP used in this study adopts the first method. The number 

of RGB sustain pulses is adjusted through the APC function, 

which is determined by the average intensity level of the 

input video signal. To explain the role of the APC function 

here, it is assumed that each discharge cell can display 256 

gray levels. Unlike a CRT, each cell is only capable of being 

turned on or off (binary). Each gray level is obtained by 

modulating the number of sustain pulses during one frame 

(16.7 ms, 60 frames per second=60 Hz). A frame is divided 

into eight subfields, having weight ratios of 1, 2, 4…,128 

(Figure 2). The function of a subfield is to modulate light 

output over time. This is accomplished by dividing each 

video frame into shorter time periods where each cell is 

either turned on or off. Each subfield has a sustain period 

(see black box in Fig. 2) whose duration is proportional to 

weight ratios, and an address period (see white box in Fig. 2) 

whose duration is the same for 8 subfields. The address periodisusedtoswitchonoroffagivencell.An8-bit 

binary 

coding is used to obtain 256 gray levels since there are 256 

possible levels that can be achieved by assigning on/off to 

any combination of the eight subfields. In practice, the number 

of sustain pulses is determined by the sum of the product 

of the sustain pulse limit and the “weight ratios” which 

correspond to the proportion of subfields turned on. This 

calculation is shown in Table I. The sustain pulse limit, as 

mentioned previously, safeguards the display’s electronics. It 

Table I. One example of subfield configuration and the calculation process for the number of sustain pulses used for a color patch with input value of 5. 

Subfield 1 2 3 4 5 6 7 8 

Weight ratios 0.004 

1/255 

0.008 0.016 0.031 0.063 0.125 0.251 0.502 

128/255 

Configuration a 1 0 1 0 0 0 0 0 8 bits 

The sustain pulse limit, 2600, is given in the APC table defined by the manufacturer 

Calculation 

26000.004+26000.016=52 

the practical number of sustain pulses assigned to RGB cells 

a Binary coding: 0 is off and 1 is on. This configuration corresponds to input value 5. 

Weight 

ratios’ 

sum=1 



Figure 3. Flowchart explaining the transformation of input video signal to 

the emission of light in a PDP. 

Figure 4. Plot of CIE X values versus the number of red sustain pulses at 4 

pattern sizes. Points 1, 2, and3 correspond to the maximum number 

of sustain pulses at 100%, 60%, and 30% pattern size, respectively. 

Table II. Maximum CIE XYZ for RGB and the range of the number of sustain pulses at 4, 30, 60, and 100% pattern size. 

Color CIE XYZ 4% 30% 60% 100% 

Red Max X 498.1 454.5 330.5 230.4 

Green Max Y 593.9 546.2 395.7 284.7 

Blue Max Z 1311.6 1175.6 809.9 551.7 

Sustain pulse range 0–2594 0–2594 0–1826 0–1260 

is determined from a manufacturer-defined APC table according 

to the average input video signal. Eight subfield 

combinations yield 256 gray levels corresponding to the supplied 

input values; however, the actual number of sustain 

pulses (and hence the brightness of each level) is controlled 

by the sustain pulse limit. 

The overall transfer process of input video signal to 

output stimulus can be expressed in terms of the steps 

shown in Figure 3, which includes an example for a full 

white pattern. RGB input values for the video signal are 

transformed into the number of RGB sustain pulses via the 

PDP’s logic board. These are calculated from the sub-field 

configuration corresponding to the RGB input value and the 

sustain pulse limit assigned by the APC table. The number 

of sustain pulses are the same as the number of plasma 

discharges occurring in each cell. A succession of discharge 

pulses occurs between two sustain electrodes inside the front 

plate of the RGB cells according to the number of RGB 

sustain pulses assigned. The rare gas mixture (Xe–Ne or 

Xe–Ne–He) emits vacuum ultraviolet (vuv) photons at 147 

and 173 nm during discharge in RGB cells and those intensities 

are governed by the number of RGB sustain pulses. 

Xenon is used as a vuv emitter and neon acts as a buffer gas 

which lowers the breakdown voltage, i.e., the minimum voltage 

to initiate plasma. The vuv photons are converted into 

visible photons by the phosphor materials deposited on the 

inner walls of RGB discharge cells. Based on an understanding 

of this process, the final characterization model was developed 

between CIE XYZ values and the number of RGB 

sustain pulses (rather than simply RGB input values). 

Pattern Size Effect 

As mentioned in the Overall Transfer Process of the Input 

Video Signal section brightness varies according to pattern 

size. Figure 4 depicts CIE X values plotted against the normalized 

number of R sustain pulses. In the figure, there are 

four sets of data points corresponding to pattern sizes of 4%, 

30%, 60%, and 100% respectively. Each set includes 26 equal 

steps of the red channel. The decrease of maximum X value 

with increasing size shown in Fig. 4 can be explained due to 

an increased power demand by larger pattern sizes. To limit 

the total power consumption of a PDP, the number of sustain 

pulses must be lowered for larger color patches by the 

APC function. This results in the decrease in maximum X 

value with increasing size shown in Fig. 4. Table II illustrates 

also the size effect on color patches at 4% and 30% of the 

total screen area. It can be seen that they have a higher 

maximum range of the number of sustain pulses which result 

in larger CIE XYZ values than those for the 60% and 

100% pattern sizes. The highest number of sustain pulses for 

the R primary color at 100% pattern size is 1260 while the 

4% pattern size is 2594, even though the input RGB values 

are the same (0,255,0). Hence, the resultant maximum Y 

value for the 4% pattern size is higher than for the 100% 

pattern size. This implies that the number of sustain pulses 

should be used as an input color specification, contrary to 

conventional approaches to display characterization which 

only consider the input digital RGB values. 

EXPERIMENTAL METHODS 

The colorimetric characterization of a 42-inch Samsung 

high-definition PDP (model PPM42H3) was evaluated. Its 



Table III. a Color patches for three characterization models at 100% pattern size. 

b Color patches used for developing single-step polynomial models at other pattern 

sizes. 

a 

Models 

at 100% 

pattern 

3D 

LUT 

Single-step 

polynomial 

model 

Three 

1D LUT 

Two-step 

polynomial model 

Transformation 

matrix 

Training 

sets 

Testing 

set 

6- 

level 

3D LUT 

6-, 5-, 4- 

& 3-level 

3D LUT 

52 steps 

of RGB 

115-color test set 

6-, 5-, 4- 

& 3-level 

3D LUT 

b 

4%, 30%, and 60% 

Pattern 

size 

Training 

sets 

Test sets 

5-, 4-, and 3-level 3D LUT 

at each pattern size 

Three 27-color test sets 

at three pattern sizes 

pixel resolution is 1024768 with an aspect ratio of 16:9 

and it is capable of addressing 512 intensity levels per channel, 

although only 256 were used here. Its pixel pitch is 

0.912 mm H0.693 mm V so that the total display area 

is 933.89532.22 mm 2 . Color patches were displayed in the 

middle of the screen and generated by a computercontrolled 

graphic card equipped with digital visual interface 

(DVI) output. This allows the PDP’s logic board to receive a 

digital signal directly from the computer. 

Contrary to typical display characterization, the number 

of sustain pulses was used as input color specification in this 

study, as explained previously in the Pattern Size Effect section. 

Color measurements were taken using a Minolta 

CS-1000 tele-spectroradiometer (TSR) in a dark room. The 

repeatability of both the TSR and PDP was evaluated using 

15 colors measured twice over a two-month period. The 

* 

median and maximum E ab were 0.38 and 1.18 during this 

time. This performance is considered satisfactory. Measurement 

patches consisted of rectangles which were 100%, 80%, 

60%, 45%, 30%, or 4% of the display area. The background 

was set to black (except for the 100% case). 

Three characterization models were first developed for 

the 100% pattern size. Table III(a) describes the data set 

generated for the three types of characterization models: 3D 

LUT, single-step polynomial, and two-step polynomial 

model. For the 3D LUT model, 6 levels were used. For the 

other two models, 6-, 5-, 4-, and 3-level 3D LUTs were compared 

in order to determine which training set gave the optimum 

performance with the fewest number of measurements. 

The 6-level 3D LUT was first generated using 1, 15, 

43, 66, 105, and 255 digital input values for each of the RGB 

channels. These values were empirically determined to have 

approximately uniform coverage of the CIE XYZ destination 

color space. The distribution of the 6-level training set is 

Figure 5. Plot of 216 colors of the 6-level 3D LUT in a XY, b YZ, and 

c a * b * plane, respectively. 

shown as an XY and YZ projection in Figures 5(a) and 5(b), 

respectively. In addition, an a * b * diagram depicting the 

whole of the 6-level training set can be seen in Fig. 5(c). 

Among the 6 digital input values, 1, 15, 43, 105, and 255 

were used to make the 5-level 3D LUT; 1, 15, 66, and 255 



were used for the 4-level 3D LUT; and 1, 43, and 255 for the 

3-level 3D LUT. 

Another model is called the “two-step polynomial” 

model which includes three 1D-LUTs between the normalized 

RGB luminance values and the number of RGB sustain 

pulses. This is followed by a transformation from normalized 

RGB luminance values to XYZ values via transformation 

matrix. Three 1D-LUTs were created for each of the 

RGB channels including 52 equal steps in RGB space. Linear 

interpolation was then used to predict the normalized luminance 

values between the data points. Six transformation 

matrices were used. One of them was the primary matrix 

obtained by measurements of RGB primary colors. This matrix 

can be used for the ideal case where there is little interaction 

among RGB channels and little unwanted emission. 

Additionally, five transformation matrices for nonideal cases, 

were derived using polynomial regression between the measured 

XYZ values for 6-, 5-, 4-, and 3-level 3D LUTs and 

their corresponding normalized RGB luminance values. The 

cross terms RG, RB, GB, and RGB were included in the 

matrix to compensate for cross-channel interaction. The 

square terms R 2 , G 2 , and B 2 were also included, although 

these terms have no particular physical meaning. These 

terms were chosen because they were included in some previous 

display characterization studies. 1,9 

The performance of the three models was evaluated using 

three test sets. The first set includes 444 bright 

color patches that were chosen to correspond to L * values of 

45, 85, 95, and 99 for each of the RGB channels. Two additional 

sets, including 24 colors L * 40 and 27 colors composed 

of three L * values (20, 60, and 90), were also added to 

verify model performance. The three sets were merged to 

form a combined test set of 115 colors. The color difference 

E * ab between the predicted and measured values for these 

test colors was calculated to evaluate the accuracy of characterization 

models. All measured tristimulus values were corrected 

by subtracting those of the black level. 

A subsequent experiment was carried out to investigate 

model performance for different pattern sizes. Only the 

single-step polynomial model was further developed in this 

experiment. Three training sets at each pattern size were 

used to generate the 3D LUTs [see Table III(b)]. Performance 

was then evaluated by measuring 27 test colors consisting 

of combinations of three input levels producing L * 

values of 20, 60, and 90 for each channel. 

Table IV gives the terms used to develop the single-step 

model and the transformation matrices of the two-step 

model. The polynomial coefficients were computed from experimental 

data consisting of 216, 125, 64, or 27 colors measured 

from the 6-, 5-, 4-, and 3-level 3D LUTs, respectively. 

Each sample includes a set of RGB sustain numbers and 

their corresponding XYZ values. In the two-step model’s 

transformation matrix, a polynomial relationship was determined 

between the normalized RGB luminances and XYZ 

values. All calculations were executed using Matlab. 

Table IV. Detailed description of the terms used in the single- and two-step polynomial 

models. 

The matrices consisting of 

various polynomial coefficients 

Independent variables 

33 R, G, B 

34 R, G, B,1 

35 R, G, B, RGB,1 

38 R, G, B, RG, RB, GB, RGB,1 

311 R, G, B, R 2 , G 2 , B 2 , RG, RB, GB, RGB,1 

320 311 plus R 3 , G 3 , B 3 , R 2 G, R 2 B, G 2 R, G 2 B, 

B 2 R, B 2 G 

335 320 plus R 3 G, R 3 B, G 3 R, G 3 B, B 3 R, B 3 G, 

R 2 GB, RG 2 B, RGB 2 , R 4 , G 4 , B 4 , R 2 G 2 , R 2 B 2 , G 2 B 2 

COLORIMETRIC CHARACTERISTICS OF A PDP 

Spectral Characteristics 

The spectral power distributions of the maximum intensity 

RGB primaries are shown in Figure 6(a). Two kinds of green 

phosphors were used for boosting luminance and stabilizing 

discharge: Zn 2 SiO 4 :Mn with a broad band at 526 nm and 

YBO 3 :Tb with a sharp peak at 544 nm, respectively. To improve 

red saturation, two types of red phosphors were 

mixed: Y,GdBO 3 :Eu with three main peaks at 593, 611, 

and 628 nm, together with YV,PO 4 :Eu which has a sharp 

peak at 620 nm. The former red phosphor appears redorange 

due to its main 593 nm emission peak, and it also 

possesses the highest conversion efficiency of vuv radiation 

into red visible light. On the other hand, YV,PO 4 :Eu, 

having a sharp main peak at 620 nm, offers good red color 

purity. BaMaAl 10 O 17 :Eu was employed as the blue phosphor. 

It generates high luminance from a vuv excitation 

source but is weak under the harsh conditions of high energy 

vuv radiation. Consequently, it plays a key role in display 

longevity. For these reasons, the spectral properties of a 

PDP appear to be more complex than the other kinds of 

displays. Fig. 6(b) is an enlargement of Fig. 6(a) and illustrates 

the red-orange emission of Ne gas. The intensity of 

red-orange emission due to Ne gas was quite small compared 

with the other main peaks caused by phosphor materials. 

Hence the maximum radiance values used for Figs. 

6(a) and 6(b) are different. The fluctuations cause decrease 

in color purity as mentioned earlier. The characteristic redorange 

Ne gas emission at 585.2 nm results from atomic 

electronic transitions from the higher energy 2p quantum 

state to the lower lying 1s energy level. 10 

Temporal Stability 

A PDP needs time to reach a steady state for accurate measurement. 

As a result, temporal stability was evaluated over 

60 minutes using four pattern sizes, each consisting of a 

white color as shown in Figure 7. The white color at size 4% 

and having the highest number of sustain pulses shows the 



Figure 6. a Spectral power distributions of normalized RGB primaries. b Visible emission of neon gas 

between 580 and 680 nm. 

Figure 7. Plot of relative luminance values over time for white at 4, 30, 

60, and 100% pattern size. 

Figure 8. a White color patch at 30% pattern size with a black background. 

b White color patch at 30% pattern size with a white 

background. 

largest gap between its initial state and a steady state. For the 

4% and 30% color patches, the luminance decreases significantly 

at the beginning. Although the time to reach a steady 

state is similar for all four pattern sizes (around 40 min), the 

decrease in luminance is dependent on pattern size. A decrease 

in pattern size leads to an increase the number of 

RGB sustain pulses accompanied by an increase in temperature 

of the RGB cells. If a small color patch is displayed on 

a PDP, the initial temperature is higher than that for a larger 

sized patch; however this is mitigated by a greater rate of 

temperature change. As a result, the time to reach a steady 

state is not dependent on pattern size. Conversely, the decrease 

in luminance ratio is inversely proportional to pattern 

size due to the shorter time needed to reach a stable RGB 

cell temperature. 

Spatial Independence 

The two color patches shown in Figures 8(a) and 8(b) have 

the same center square color and different backgrounds. 

Spatial independence defines by how much the central white 

color is affected by changes in background color. 11 The central 

white colors of Figs. 8(a) and 8(b) have quite different 

CIE XYZ values (445, 444, 550) and (160, 166, 186), respectively, 

because the colorimetric characteristics of a PDP are 

dictated by the number of RGB sustain pulses as determined 

by the APC rather directly from the RGB input values. For 

example, in order to produce the same light output for the 

white in Figs. 8(a) and 8(b), there is a need to have different 

numbers of sustain pulses. 

Color Gamut 

The definition of color gamut is the range of colors that is 

achievable on a given color reproduction medium under a 

specified set of viewing conditions. Figure 9 shows the color 

gamut under dark viewing conditions defined by the primary 

and secondary colors of a PDP. The ranges of 

tristimulus values displayed differ according to pattern size 

(as explained in Pattern size effect section). The color gamuts 

of four pattern sizes are plotted in a CIELAB a * b * diagram 

[Fig. 9(a)]. In addition, color gamuts of four pattern 

sizes at a hue angle of 137° are compared using a CIELAB 

C * L * diagram [Fig. 9(b)]. The CIE L * a * b * color coordinates 

were calculated based on the peak white at 4% pattern size 

representing an L * of 100 848.4 cd/m 2 . Therefore, the 

C * L * coordinates of the white color patches for 30, 60, and 

100% pattern sizes show lower values than those at the 4% 

pattern size. As pattern size decreases, the range of colors 

achievable on a PDP becomes larger. 

Tone Reproduction Curve 

The tone reproduction curve depicts the relationship between 

RGB input values and their resultant luminance val- 



Figure 9. a Color gamuts for the 4, 30, 60, and 100% pattern sizes plotted in an a * b * diagram. b Color 

gamuts at a hue angle of 137° for the 4, 30, 60, and 100% pattern sizes plotted in a C * L * diagram. 

ues. The RGB luminances of a CRT are controlled by cathode 

voltages. For PDPs, on the other hand, the number of 

RGB sustain pulses controls luminance. Figures 10(a) and 

10(b) illustrate the intrinsic properties of the PDP studied. 

Figures 10(a) and 10(b) contain plots of normalized Y values 

for the 4% size (white luminance is 1) against the normalized 

number of sustain pulses and input values, respectively. 

It can be clearly seen in Fig. 10(a) that an increase in the 

number of RGB sustain pulses leads to an increase in RGB 

luminance. Furthermore, a larger patch size has a smaller 

range of sustain pulse numbers. This is because a larger 

pattern size is constrained to a lower maximum number of 

sustain pulses in order avoid exceeding power consumption 

limitations. The points indicated by the arrows correspond 

to 95% relative luminance, with respect to the maximum 

white luminance value at each pattern size. 

The intrinsic TRC of a PDP [Figs. 10(a) and 10(b)] 

should be modified so that the slope of the low luminance 

range is much smaller than that at high luminances. Figure 

10(c) shows the result after gamma is modified by adjusting 

the number of sustain pulses. The shape of the TRC after 

gamma modification is the same regardless of pattern size. 

The usable range of the number of sustain pulses, however, 

depends on pattern size. For example, to produce a white 

patch on this PDP using an input value of 255, either 466, 

766, 1376, or 2594 RGB sustain pulses are assigned, respectively, 

for the 100%, 60%, 30%, and 4% pattern sizes. The 

number of sustain pulses available for white at 100% pattern 

size, 0 to 466, are quantized to 256 levels to make white 

luminance follow a power function of approximately 2.2 (the 

gamma value). 

Additivity 

The channel additivity was evaluated for white at four pattern 

sizes. The results are given in Table V in terms of percentage 

change in tristimulus values and color difference 

E * ab . The tristimulus values of the RGB patches having 

the same number of sustain pulses as the peak white patch 

were measured to evaluate additivity for each pattern size. A 

substantial difference between the tristimulus values for 

white and the sum of the red, green and blue channels was 

found. The latter is larger than the former by about 15% 

* 

which corresponds to approximately 6 E ab units. The available 

power to a cell drops as other cells become active, leading 

to a reduction in brightness. This means that, for example, 

in terms of luminance, R+G+Bwhite. To 

counteract the problem of deviation from additivity, several 

matrix coefficients—including RGB channel cross terms 

(RG, RB, GB, and RGB)—were incorporated into the transformation 

matrices of the two-step model. In addition, a 3D 

LUT model was implemented in which many measured data 

points were included so as to compensate for the inherent 

additivity failure of a PDP. 

COLORIMETRIC CHARACTERIZATION MODEL FOR 

A PDP 

Testing the Models’ Performance at 100% Pattern Size 

As introduced in the third section, three types of characterization 

models were developed: 3D-LUT, single-step polynomial, 

and two-step polynomial. These were tested using the 

115-test color set. All of the models developed here are based 

on a 100% pattern size. The results are summarized in Table 

* 

VI in terms of mean and 95th percentile E ab units. 

It can be seen that the 3D-LUT model using tetrahedral 

interpolation 12 gave a reasonable prediction to the test data 

* 

with a mean and a 95th percentile of 1.3 and 2.5 E ab units. 

The two-step model using the primary matrix in which their 

coefficients were based on the measurement data gave quite 

poor performance with a mean difference of 4.3. 

Comparing different single-step polynomial models, 

there is a trend that the higher order polynomial models 

performed better than the lower ones. However, this is only 

true for the models developed using more training samples. 



For those models developed using the 3-level and 4-level 

training samples, the higher term polynomial models did 

not exhibit more accurate prediction than the lower order 

models. This could be caused by over-fitting the measurement 

noise when using higher order polynomial models 

based on small number of training samples. Overall, the 

311 polynomial model developed using the 4-level training 

data set (which included 64 colors) was found to be 

acceptable for industrial applications. Using the 320 

model can lead to further improvements in the modeling 

performance. 

In comparing the single- and two-step models, the 

single-step model performed slightly better than the twostep 

model except for the 33 model. This implies that 

single-step polynomial models with a higher order already 

consider the cross-talk between different channels in PDPs. 

There is needless to include a 1D-LUT normalization. 

Figure 11 shows different polynomial performances for 

* 

a single-step model in terms of mean E ab using the training 

and testing data sets. It can be seen that the models 

predicted more accurately when the terms increase until 

reaching 311 and 320 polynomial models. For the 

335 model, this fits the training data set best, however it 

performed poorly for the testing data set due to modeling 

the noise in the training data set. 

In real applications, both the forward and reverse characterization 

models are used, i.e., from device signal to XYZ 

and vice versa. However, not all models are analytically invertible 

and so reverse models were developed having the 

same structure as the forward model. The numerical reversibility 

of the single-step model was also tested. The testing 

procedure is shown in Figure 12 and does not require any 

color measurement. Here the 115-test color set, defined in 

terms of XYZ, was again used to first predict RGB sustain 

pluses using the reverse model and then further predict the 

corresponding XYZ via the forward model. Finally, the color 

difference was calculated between the target XYZ and predicted 

XYZ values. The results for each combination of forward 

and reverse polynomial model developed by the 4-level 

training data are given in Table VII. It can be seen that the 

311 polynomial model can give acceptable performance 

(its mean and 95th percentile are 0.3 and 0.8 E * ab , respectively). 

This can be further improved by using the 320 

model. Both models outperformed the other models 

studied. 

Figure 10. a The relationship between normalized number of sustain 

pulses and normalized white luminance for 4, 30, 60, and 100% pattern 

sizes before the modification of gamma. b The relationship between 

normalized input values and normalized white luminance for four pattern 

sizes before the modification of gamma. c The relationship between 

normalized input values and normalized white luminance for four pattern 

sizes after modifying gamma. 

Testing the Models’ Performance at Different Pattern 

Sizes 

Using the same approach as for the 100% pattern size (previous 

section), different single-step polynomial models developed 

using 3-, 4-, and 5-level 3D-LUT training data for 

each of the 4%, 30%, and 60% pattern sizes. Very similar 

performances were found and so only the results from the 

30% pattern size are reported in Table VIII in terms of mean 

* 

and 95th percentile E ab values. The results showed that the 

311 and 320 polynomial models using 4- or 5-level 

training data gave a reasonable prediction. These results are 

very similar to those found at 100% pattern size (see Table 

VI). 

Developing a Single Characterization Model 

As mentioned in the Pattern size effect section, light output 

is proportional to the number of sustain pulses and the 

range these are regulated by the APC according to pattern 

size. Hence the characterization models developed earlier are 

only applicable to a single pattern size. A new method is 

developed here which aims to predict the colors displayed at 

different pattern sizes. In order to make a single model 

which can predict CIE XYZ at pattern sizes other than that 

used for the training set, it is necessary to select an appropriate 

training set covering the whole range of sustain pulses 

used for the test set. For example, it needs to predict CIE 

XYZ values of several colors in 80% and 45% sizes. There 

are two approaches to the selection of a training set for this 

purpose. First, a set having smaller size than 45% can be 

used because this can cover a higher range of sustain pulses 

than those available for the 80% and 45% sizes. Second, two 



Table V. Tristimulus additivity failure and corresponding color difference for white at 4, 30, 60, and 100% pattern size. 

Pattern 

size 

RGB 

IVs 

Y 

cd/m 2 

APC 

level 

No. of 

sustain X Y Z 

* 

E ab 

100% 255 166.1 255 466 15.1% 13.9% 16.4% 5.6 

60% 255 263.2 219 766 16.7% 15.2% 19.2% 6.5 

30% 255 443.6 146 1376 17.6% 17.8% 19.9% 6.6 

4% 255 848.0 0 2594 16.7% 16.6% 21.1% 6.6 

Table VI. Testing the performance in terms of E * ab of the characterization models using the 115-color test set. The models were developed based on 6-, 5-, 4- and 3-level training sets. 

Training 

set 

3D 

LUT 

Single-step polynomial model 

Two-step polynomial model 

35 38 311 320 335 Primary matrix a 33 34 38 311 320 335 

6-level Mean 1.3 3.8 3.1 1.5 1.2 1.2 3.2 3.5 1.7 1.5 1.3 1.4 

95th 2.5 9.4 11.7 3.8 2.7 2.8 8.3 8.3 4.5 3.4 2.9 3.0 

5-level Mean 3.5 2.8 1.4 1.2 1.1 4.3 3.1 3.2 1.6 1.5 1.3 1.3 

95th 8.3 9.4 3.3 2.7 2.3 7.4 7.4 4.1 3.2 2.7 2.5 

4-level Mean 3.4 2.5 1.5 1.2 3.9 6.9 3.0 3.0 1.6 1.5 1.3 3.8 

95th 7.8 7.8 3.7 2.8 11.7 6.6 6.8 3.6 3.4 3.0 10.3 

3-level Mean 3.2 2.3 1.6 7.0 69.5 3.0 3.1 1.5 1.6 3.6 17.9 

95th 8.0 7.2 3.5 22.9 305.5 6.8 6.7 3.7 3.6 11.5 55.2 

a The primary matrix was obtained from RGB primary colors. 

Table VII. Reversibility result of polynomials for the 4-level training set in terms of 

E ab . 

35 38 311 320 335 

Mean 2.3 1.6 0.3 0.2 1.7 

95th 5.9 4.9 0.8 0.5 4.8 

Figure 11. A comparison of average E * ab values against the terms used 

in the single-step polynomial model for the test and training set. 

training sets—one set having smaller pattern size than 45% 

and another having 100% size—can be combined to make a 

new training set. The reasoning behind the second method 

is to improve model accuracy. If two color patches at different 

pattern sizes but with same RGB sustain pulses are measured, 

a subtle difference in their XYZ values could be 

found. One practical example is that the luminance values 

for 30%, 60% and 100% pattern sizes using the same number 

of sustain pulses (408), are 111, 107, and 104, respectively. 

This is due to the available power to a cell being 

slightly different because of the different number of activated 

cells at different pattern sizes. Although the first training set 

may be sufficient, the polynomial coefficients computed by 

the second training set can be expected to take into account 

this small color difference due to pattern size. In real applications, 

the number of sustain pulses used to display complex 

images typically corresponds to those associated with 

40–50 % pattern sizes. Therefore, we generated the first 

training set as 4-level 3D LUT for the 30% pattern size. In 

addition, a second training set was produced by combining 

two 4-level 3D LUTs of 100% and 30% pattern sizes. These 

two 4-level 3D LUTs were composed of different combinations 

of RGB sustain pulses. The test data included three 

27-color test sets at 80%, 60%, and 45% sizes. Tables IX(a) 

and IX(b) summarize the results from the first and second 

training sets. The results from the second training set show 

that the polynomials with 11, 20, and 35 terms performed 

well and gave similar predictive accuracy for the three pattern 

sizes in Table IX(b). Table IX(a) summarizes the results 

for the models developed using only the 30% pattern size. 

The results in Table IX(b) are worse than those in Table 

IX(b) in all cases. This demonstrates that it is better to use 

the combined training set of 100% and 30% sizes for predicting 

midsized test colors. 

CONCLUSIONS 

The physical properties of a PDP which affect colorimetric 

characterization were examined. Also, colorimetric characteristics 

unique to PDP displays were investigated. Among 

those, a pattern-size influence and a substantial additivity 



Figure 12. The process for testing reversibility. 

Table VIII. A comparison of the performances in terms of E * ab using the 27-color test set and 5-, 4-, and 3-level training sets at 30% pattern 

size. 

Training 

set 


35 38 311 320 335 

5-level Mean 3.8 3.3 1.1 1.3 1.5 

95th 8.8 9.1 2.2 3.0 3.4 

4-level Mean 3.7 3.0 1.4 1.5 3.2 

95th 7.6 6.6 2.4 2.6 8.8 

3-level Mean 3.9 3.2 2.2 8.6 26.9 

95th 7.7 6.8 4.2 18.2 52.0 

Table IX. a Comparing models’ performance E * ab using each 27-color test set at 80%, 60%, and 45% pattern size. Each model was developed 

using the 4-level training set at 30% pattern size. b Comparing models’ performance E * ab using each 27-color test set at 80%, 60%, and 45% 

pattern size. Each model was developed using two 4-level training sets at 30% and 100% pattern sizes. 

a 

Training 

set 

Test 

set 

* 

E ab 


35 38 311 320 335 

30% 

pattern 

size 

4-level 

80% Mean 5.1 5.3 2.1 2.5 2.9 

95th 9.5 10.6 4.2 5.2 7.6 

60% Mean 4.8 4.9 1.9 2.2 2.4 

95th 9.5 10.1 4.3 4.8 6.7 

45% Mean 3.9 3.7 1.8 1.7 2.4 

95th 7.7 7.6 3.2 3.1 6.3 

Training 

set 

Test 

set 

b 

* 

E ab 


35 38 311 320 335 

Mixture 

of 

100% 

4-level 

& 

30% 

4-level 

pattern 

size 

80% Mean 4.4 4.8 1.7 1.5 1.1 

95th 9.0 11.0 3.5 3.3 2.5 

60% Mean 4.2 4.4 1.7 1.5 1.3 

95th 8.8 10.3 3.7 2.8 2.3 

45% Mean 3.3 3.1 1.6 1.4 1.6 

95th 6.8 6.9 2.7 2.0 3.2 

failure were found. These must necessarily be considered 

when making an accurate colorimetric characterization 

model. 

Initially, three characterization methods were derived 

between the number of sustain pulses and CIE XYZ values 

at 100% pattern size in order to determine an appropriate 

model for a PDP. In the forward direction, single- and twostep 

polynomial models, which each have more than 8 

terms, and a 3D LUT model showed the best results for the 

6-, 5-, and 4-level training set. However, the single-step 

model was eventually selected because of its simplicity. The 

required number of training set samples needed to obtain 

good model performance and requiring the least measurement, 

was 64 for the 4-level 3D LUT. Also the reversibility of 

the single-step model was evaluated using a 4-level 3D LUT 

and this was shown to produce satisfactory results for 11- 

and 20-term polynomials. Therefore, the single-step model 

was extended to various other pattern sizes. Their results 



validated that the polynomial regression method using the 

4-level training set was a good characterization model for 

this PDP display. 

Finally, one comprehensive training set consisting of 

two 4-level 3D LUTs corresponding to 100% and 30% pattern 

sizes were produced to predict CIE XYZ at intermediate 

pattern sizes (i.e., sizes which were not present in the training 

set). These outcomes demonstrated that the single-step 

model could be successfully applied to estimate colors at 

different pattern sizes using just one combined training set. 

REFERENCES 

1 N. Katoh, T. Deguchi, and R. S. Berns, “An Accurate Characterization of 

CRT Monitor (I) Verification of Past Studies and Clarification of 

Gamma”, Opt. Rev. 8, 305 (2001). 

2 N. Katoh, T. Deguchi, and R. S. Berns, “An Accurate Characterization of 

CRT Monitor (II) Proposal for an Extension to CIE Method and its 

Verification”, Opt. Rev. 8, 397 (2001). 

3 M. D. Fairchild and J. E. Gibson, “Colorimetric Characterization of 

Three Computer Displays (LCD and CRT)”, Munsell Color Science 

Laboratory Technical Report, http://www.cis.rit.edu/mcsl/research/PDFs/ 

GibsonFairchild.pdf (2000). 

4 Y. S. Kwak and L. MacDonald, “Characterization of a Desktop LCD 

Projector”, Displays 21, 179 (2000). 

5 D. R. Wyble and H. Zhang, “Colorimetric Characterization Model for 

DLP Projectors”, Proc. IS&T/SID 11th Color Imaging Conference (IS&T, 

Springfield, VA, 2003), pp. 346–350. 

6 G. Kutas and P. Bodrogi, “Colorimetric Characterization of HD-PDP 

Device”, in IS&T’s 2nd European Conference on Color Graphics, Imaging 

and Vision, (IS&T, Springfield, VA, 2004,). pp. 65–69. 

7 Multimedia Systems and Equipment—Color Measurement and 

Management, Part 5: Equipment using Plasma Display Panels, IEC 

61966–5, 2001. 

8 J. P. Boeuf, “Plasma Display Panels: Physics, Recent Developments and 

Key Issues”, J. Phys. D 36, R53 (2003). 

9 P. Bodrogi and J. Schanda, “Testing a Calibration Method for Color CRT 

Monitors. A Method to Characterize the Extent of Spatial 

Interdependence and Channel Interdependence”, Displays 16(3), 123 

(1995). 

10 R. S. Van Dyck, C. E. Johnson, and H. A. Shugart, “Lifetime Lower 

Limits for the 3p 0 and 3p 2 Metastable States of Neon, Argon and 

Krypton”, Phys. Rev. A 5, 991 (1972). 

11 P. Green and L. MacDonald, Color Engineering: Achieving Device 

Independent Color (John Wiley and Sons Ltd, West Sussex, UK, 2002), 

p. 158. 

12 H. R. Kang, Color Technology for Electronic Imaging Devices (SPIE 

Optical Engineering Press, Bellingham, WA, 1997), p. 64. 




Real-Time Color Matching Between Camera and LCD 

Based on 16-bit Lookup Table Design in Mobile Phone 

Chang-Hwan Son 

School of Electrical Engineering and Computer Science, Kyungpook National University, 1370, 

Sankyuk-dong, Buk-gu, Daegu 702-701, Korea 

Cheol-Hee Lee 

Major of Computer Engineering, Andong National University, 388, Seongcheon-dong, Andong, 

Gyeongsangbuk-Do 760-747, Korea 

Kil-Houm Park and Yeong-Ho Ha 

School of Electrical Engineering and Computer Science, Kyungpook National University, 1370, 

Sankyuk-dong, Buk-gu, Daegu 702-701, Korea 

E-mail: yha@ee.knu.ac.kr 

Abstract. Based on the concept of multimedia convergence, imaging 

devices, such as cameras, liquid crystal displays (LCDs), and 

beam projectors, are now built-in to mobile phones. As such, mobile 

cameras capture still images or moving pictures, then store them as 

digital files, making it possible for users to replay moving pictures 

and review captured still images. Increasingly, users want LCD in 

the mobile phone (we call it mobile LCD hereafter) to reproduce the 

same colors as the real scene. Accordingly, this paper proposes a 

method for color matching between mobile camera and mobile LCD 

that includes characterizing the mobile camera and mobile LCD, 

gamut mapping, camera noise reduction, and a 16-bit lookup table 

(LUT) design. First, to estimate the CIELAB values for the objects in 

the real scene, mobile camera characterization is achieved through 

polynomial regression of the optimal order determined by investigating 

the relation between captured RGB values and measured 

CIELAB values for a standard color chart. Thereafter, mobile LCD 

characterization is conducted based on 16-bit/pixel processing because 

of the reduced bit depth of the images displayed on a mobile 

LCD. In addition, a sigmoid model is used to find the luminance 

value corresponding to the RGB control signal, instead of using gain 

offset gamma and S-curve models due to the adjustment of luminance 

curve made by a system designer for preference color reproduction. 

After completing the two types of characterization, gamut 

mapping is performed to connect the source medium (mobile camera) 

with the target medium (mobile LCD), then a combination of 

sigmoid functions with different parameters to control the shape is 

applied to the luminance component of the gamut-mapped CIELAB 

values to reduce camera noise. Finally, a three-dimensional RGB 

LUT is constructed using 16-bit/pixel-based data to enable color 

matching for moving pictures and inserted into the mobile phone. 

Experimental results show that moving pictures transmitted by a 

mobile camera can be realistically reproduced on a mobile LCD 

without any additional computation or memory burden. © 2007 Society 

for Imaging Science and Technology. 


 

IS&T Member. 


1062-3701/2007/514/348/12/$20.00. 

INTRODUCTION 

With the appearance of multimedia convergence in mobile 

phones that can now provide such functions as web browsing, 

3D games, television broadcasting, and image capturing, 

in addition to communication, manufacturers have invested 

heavily in super highway communication network, nextgeneration 

memory chips, and encryption technology for 

reliable e-commerce operations. The use of color reproduction 

technology in mobile phones has also been recently 

introduced to support the development of mobile cameras, 

mobile beam projectors, and mobile liquid crystal displays 

(LCDs). In particular, with the rapid increase in mobile 

cameras, mobile phones can now capture and store still images 

or moving pictures as digital files, making it possible for 

users to replay the moving pictures and review captured still 

images anytime and anywhere. However, mobile LCDs are 

currently unable to reproduce the original colors captured by 

a mobile camera due to a reduced bit-depth, lower backlight 

luminance, and weak resolution. 1 In addition, mobile cameras 

have a small lens, low dynamic range, and poor modulation 

transfer function (MTF), plus each device senses or 

displays in a different way, as they have unique 

characteristics. 2 As a result, there is a significant difference in 

the color appearance when captured images are displayed on 

a mobile LCD. Therefore, real-time colormatching between 

mobile camera and mobile LCD in a mobile phone needs to 

be considered to ensure a better image quality. 

The aim of color matching is to achieve color consistency 

even when an image moves across various devices and 

undergoes many color transformations. 3 Several color 

matching approaches have already been suggested, for example, 

a simple method is to transmit the RGB digital values 

from the original device to the reproducing device, referred 

to as device-dependent color matching. Yet, since this 

method is no more than physical data transmission, accurate 

color matching cannot be achieved across various devices. 

Meanwhile, spectral-based approaches match the spectral reflectance 

curves of the original and reproduced colors, so the 

original and reproduction look the same under any il- 

348

Son et al.: Real-time color matching between camera and LCD based on 16-bit lookup table design in mobile phone 

Figure 1. The block diagram of the proposed method. 

luminant: i.e., there is no metamerism. However, the computation 

of reflectance is very complex and time consuming, 

making a spectral-based approach inappropriate for realtime 

color matching. Another method is colorimetric color 

matching to reproduce the same CIE chromaticity and relative 

luminance compared with the original color. This has 

already been widely applied to imaging devices, such as 

monitors, printers, and scanners based on the international 

color consortium (ICC) profile, yet not to mobile phones, 

which have only been considered as a means of communication 

until quite recently. However, with multimedia convergence, 

mobile manufacturers have become aware of the 

importance of the ICC profile for color matching between 

mobile cameras and mobile LCDs. Accordingly, this paper 

presents a real-time color matching system for mobile cameras 

and mobile LCDs based on the concept of the ICC 

profile. 

The proposed color matching system is composed of 

four steps: Characterization of the mobile LCD and mobile 

camera, gamut mapping, noise reduction, and a 

16-bit-based lookup table (LUT) design. The device characterization 

defines the relationship between the tristimulus 

values (CIEXYZ or CIELAB) and RGB digital values. In general, 

mobile camera characterization is modeled by a polynomial 

regression, and the more the polynomial order increases, 

the better the performance. However, for a higher 

polynomial order, most estimated tristimulus values exceed 

the boundary of the maximum lightness and chroma, making 

the implementation of mobile camera characterization 

difficult, as the relation between the tristimulus values and 

digital RGB values has not been analyzed. Thus a polynomial 

order is suggested based on investigating the relation 

between RGB digital values transformed using the opponent 

color theory and CIELAB values. Meanwhile, for the mobile 

LCD characterization, a sigmoid function instead of a conventional 

method, such as the gain offset gamma (GOG) or 

S-curve model, is used to estimate the luminance curve 

made by the system designer to achieve a preferable color 

reproduction or to improve the perceived contrast of the 

mobile LCD. Furthermore, the characterization is conducted 

based on 16-bit data processing, as a mobile LCD is controlled 

based on 16-bit data, in contrast to digital TVs or 

monitors with 24-bit data. After completing the two types of 

characterization, a gamut-mapping algorithm is applied to 

connect the source medium (mobile camera) with the target 

medium (mobile LCD). 

Although the three processes mentioned above are sufficient 

to obtain colorimetric color matching for still images, 

noise reduction and an LUT design still need to be considered 

to achieve real-time color matching for moving pictures. 

In a mobile camera, various camera noises, such as 

CCD noise and thermal noise, are incorporated into moving 

pictures and further amplified after color matching, thereby 

degrading the image quality, especially in the dark region of 

the achromatic axis. Thus, to solve this problem, a combination 

of two sigmoid functions with different parameters to 

control the shape is applied to the lightness component of 

the gamut-mapped tristimulus values to change the contrast 

ratio. As a result, the lightness values for the camera noise 

are reduced in the dark region of the achromatic axis, 

thereby reducing the amplified camera noises. In addition, a 

three-dimensioinal (3D) RGB LUT is designed based on 

16-bit data to reduce the complex computation of serialbased 

processing and facilitate color matching for moving 

pictures. 

PROPOSED METHOD 

Figure 1 shows a block diagram of the proposed algorithm 

that can achieve real-time color matching between a mobile 

camera and a mobile LCD. First, to predict the CIELAB 

values of arbitrary objects in a real scene, the mobile camera 



characterization is conducted by finding the relation between 

the RGB digital values of a standard color chart captured 

in a lighting booth and CIELAB values measured using 

a colorimeter. The CIELAB values estimated from the 

mobile camera characterization of the input RGB values are 

then transformed into an achievable color range that can be 

reproduced by the mobile LCD, referred to as gamut mapping. 

Next, the lightness values of the gamut-mapped 

CIELAB values are changed using the parameters of a sigmoid 

function provided by a visual experiment to reduce the 

camera noise incorporated into a moving picture, then combined 

with two untouched color signals. Thereafter, the 

modified gamut-mapped CIELAB values are converted into 

color matched RGB values for display on the mobile LCD 

based on a sigmoid-based mobile-LCD characterization to 

consider the luminance curve adjusted for the preferred 

color reproduction, along with 16-bit data processing due to 

the reduced bit depth in the mobile LCD. Finally, a 3D-RGB 

LUT is constructed using the 16-bit/pixel-based data to enable 

color matching for moving pictures and inserted into 

the mobile phone, thereby allowing 24-bit moving pictures 

to be reproduced on a mobile LCD with a higher quality 

image. 

CHARACTERIZATION OF THE MOBILE LCD BASED 

ON 16-bit DATA PROCESSING 

The display characterization predicts the tristimulus value 

for the input digital value and may be conducted by a 

measurement-based approach or modeling-based 

approach. 4–6 A measurement-based approach measures a lot 

of patches made by combination of input digital values using 

a colorimeter and estimates the tristimulus value by the 

interpolation method or polynomial regression for an arbitrary 

digital value. Therefore, this approach improves the 

characterization accuracy, yet requires a lot of measurement 

data and extensive memory and is relatively complex. Meanwhile, 

a modeling-based approach finds the relationship between 

the digital input data and tristimulus value based on a 

mathematical function with a smaller number of data measurements. 

The GOG and S-curve models have been used as 

typical mathematical functions and have been applied to different 

types of display. In general, the GOG model is appropriate 

for CRT display because its electro-optical transfer 

function, the relationship between the grid voltage and beam 

current, follows a power curve shape, while LCD display is a 

binary device that switches from an OFF state to an ON state 

and follows the S-shaped curve, thereby adapting the 

S-curve model for LCD characterization. The overall procedure 

of modeling-based approach is identical except that 

electro-optical transfer function is modeled with a different 

mathematical function. The first step of modeling-based 

characterization is to convert the digital value to luminance 

value for each RGB channel. This can be done by estimating 

the coefficient of mathematical function with optimization 

programming. In the case of the GOG model, mathematical 

function can be described as 

Y ch =k g,ch d ch 

2 N −1 + k o,ch,ch 

, 1 

where ch represents RGB channel, d ch is input digital value, 

and N is the bit number; k g,ch ,k o,ch , are the gain, offset, 

and gamma parameters of the GOG model, respectively. Y ch 

is the normalized luminance value corresponding to the normalized 

input digital value for each channel. To get all parameters 

of the GOG model, the digital value of each channel 

is independently sampled by a uniform M interval, 

which assumes no channel interaction that the light emitted 

from a pixel location is dependent only on R, G, B triplet for 

that pixel and is independent of input digital value for other 

pixels. 7 Then, the CIEXYZ values for M-sample digital values 

are acquired by measuring the displayed patches created 

by M-sampled digital values with a colorimeter. At this time, 

even though displayed patches are made by 8-bit data for 

each channel, the 8-bit based M-sampled RGB digital values 

corresponding to the measured CIEXYZ values must be 

practically converted to 5,6,5-bit data in the mobile LCD, 

and thus the digital values of the 8-bit based R-channel and 

B-channel are divided by 8, while that of the G-channel is 

dividedby4: 

d R = d R 

2 R, d G = d G 

2 G, d B = d B 

, 2 

B 

2 

where d R , d G , and d B is the digital value of displayed patch 

for each channel and R,G,B is the difference of bitnumber 

between 8-bit/channel of patches and bit-number/ 

channel in mobile LCD. Therefore, d ch in Eq. (1) is substituted 

with d R ,d G ,d B , which is used to find the luminance 

curve of mobile LCD. 

Of the measured CIEXYZ values, the Y values are selected 

as Y ch , assuming that the shapes of X, Y, and Z are 

identical after normalization, which is referred to the 

channel-chromaticity constancy that spectrum of light from 

a channel has the same basic shape and only undergoes a 

scaling in amplitude as the digital value for each channel is 

varied. 7 Finally, the pairs of M-sampled digital value 

d R ,d G ,d B and Y values are substituted in Eq. (1), yielding 

all parameters of the GOG model using optimization nonlinear 

programming. The second step is to transform the 

luminance value of each channel calculated by the GOG 

model to the CIEXYZ value. This stage can be simply 

achieved by a matrix operation: 

X r,max X g,max X R 

Y Y r,max Y g,max Y b,max Y G 

Z=X Z r,max Z g,max Z b,maxY Y B, 

where Y R , Y G , and Y B are the luminance values of each channel, 

Y ch=R,G,B . In each column, the matrix coefficients are the 

CIEXYZ value at the maximum digital value of each channel, 

and can be directly measured with a colorimeter. 

Through the above-mentioned two steps, display characterization 

can be accomplished. In the case of the S-curve 

3 



Figure 2. Electro-optical transfer function for mobile LCD; a GOG model, b GOG model except saturation 

region, c S-curve model, and d sigmoid model. 

model, only the power-curved function shown in Eq. (1) is 

replaced with the S-shaped mathematical function in the 

process of display characterization 

d ch /2 N −1 ch 

Y ch =A ch 

d ch /2 N −1 ch + C ch , 

where A ch , ch , ch , and C ch are parameters, respectively. 

Equation (4) has various S-shaped curves according to the 

parameter values, and if both ch , and C ch is zero, Eq. (4) 

follows the gamma curve as Eq. (1). All parameters in Eq. 

(4) can be obtained by applying the same process, the first 

step explained in the GOG model. Using these parameters, 

the input digital value is converted into the luminance value 

and then transformed into the CIEXYZ value through a matrix 

operation. 

To conduct the characterization of mobile LCD, we apply 

conventional methods to a cellular phone, a Samsung 

SCH-500. In a mobile phone, each RGB pixel value is represented 

by (5,6,5) bit and image size is fixed at 240320. 

Figures 2(a)–2(c) show the electro-optical transfer function 

4 

resulting from the GOG model, the GOG model without the 

saturation region, and the S-curve model. In Figure 2, three 

types of lines represent the estimated luminance values obtained 

by using conventional characterization for each channel, 

while three types of marks indicate the measured luminance 

values for each channel. In Fig. 2, the shape of the 

electro-optical transfer function for mobile display is different 

from a power-curved shape of CRT display or S-curved 

shape of LCD display. As the input digital value moves toward 

the middle point, the gradient of the luminance curve 

rapidly increases and immediately decreases, producing a 

saturation region. This is due to the adjustment of the luminance 

curve by the system designer intended to enhance 

the contrast ratio and overcome the low channel-bit number. 

As a result, a conventional GOG model or S-curve model 

does not follow the luminance curve of the saturation region 

and is not directly applied to mobile LCD. Therefore, we 

used the sigmoid function to model the electro-optical 

transfer function of mobile LCD based on visual observation 

of the luminance curve. The sigmoid function is expressed 

as 



Table I. Estimated parameter of sigmoid function. 

a-parameter 

c-parameter 

R-channel 11.9149 0.4647 

G-channel 11.1892 0.4508 

B-channel 11.3273 0.4359 

Table II. Performance of mobile LCD characterization with various methods. 

* 

Average E ab 

* 

Maximum E ab 

GOG model 15.655 32.4424 

GOG model except 

8.7614 17.5898 

saturation region 

S-curve model 6.9801 15.2279 

Sigmoid model 3.9683 14.6831 

1 

sigmoidx,a,c = 

1 + exp− ax − c . 5 

The sigmoid function is a symmetrical function with respect 

to c and is a constant value if it is zero. The shape of a 

sigmoid function depends on the absolute value of a, and as 

the absolute value of a increases, the gradient of the sigmoid 

function rapidly increases with respect to c. Figure 2(d) 

shows the electro-optical transfer function resulting from 

the sigmoid model. In Fig. 2(d), the estimated luminance 

curve closely follows the measured luminance value and it is 

predicted that the estimation error will be reduced. The estimated 

coefficients of the sigmoid function are shown in 

Table I. The estimated curve is nearly symmetric with respect 

to 0.45 and the absolute value of a to determine the 

shape of the sigmoid function is independent of channel and 

is almost the same. 

To evaluate the performance of each method, the 

CIE1976 color difference E * ab was used to measure the 

characterization error, which is the Euclidian distance between 

estimated CIELAB value and measured CIELAB 

value. Sixty-four patches were tested and Table II shows the 

characterization error of various model-based methods. The 

GOG model had the largest color difference and the characterization 

error was still severe although the GOG model was 

used except in the saturation region. For the S-curve model, 

* 

the average E ab was approximately 6.9 and is normal color 

difference. However, in the middle region, the estimated luminance 

value shows a significant difference compared with 

measured luminance value. The sigmoid model has a good 

average color difference smaller than 6.0, which is indistinguishable 

in human vision. 

DECISION OF POLYNOMIAL ORDER FOR THE 

CHARACTERIZATION OF MOBILE CAMERA 

The camera characterization is to find the relationship between 

the tristimulus value and digital RGB value. Through 

the accurate camera characterization, we can get information 

about an object color in real scene and reproduce the object 

color on mobile LCD. The general procedure of camera 

characterization is shown in Figure 3. First, a standard color 

chart such as a Macbeth or Gretag Color Chart is placed 

with 0/45° geometry in a lighting booth, where an illuminant 

is set at D65 to reflect the perceived color corresponding 

to a daylight condition. 8 The standard color chart 

is then captured by a mobile camera set with autofocusing to 

avoid color clipping. Captured RGB digital values of each 

patch in the standard color chart are averaged to reduce the 

camera noise and nonuniformity of illumination. Next, the 

tristimulus value of the standard color chart is acquired by 

measuring each patch of the color chart or standard data 

provided by the manufacturer. Finally, polynomial regression 

with least square fitting is applied to find the relationship 

between captured RGB digital values and measured 

tristimulus values. 9,10 

In general, the performance of camera characterization 

becomes better as the polynomial order increases. Practically, 

Figure 3. The procedure for mobile camera characterization. 



Figure 4. The characteristics of cellular camera; a L * vs average RGB 

value of gray sample, b a * vs R−B, andb * vs G−B. 

for a higher polynomial order, most estimated tristimulus 

values exceed the boundary of the maximum lightness and 

chroma, and there is difficulty in implementing mobile camera 

characterization. This is because the characteristic curve 

of the mobile camera, i.e., the relationship between the 

tristimulus value and digital RGB value, is not analyzed to 

suggest an appropriate polynomial order. To determine the 

polynomial order, the RGB digital value is manipulated 

based on opponent color theory and is compared with the 

Figure 5. The characteristics of PDA camera; a L * vs average RGB 

value of gray sample, b a * vs R−B, andb * vs G−B. 

CIELAB value. The CIELAB space is an opponent color coordinate 

composed of a lightness signal and two types of 

color signals obtained by the difference of three color signals. 

Thus the RGB digital value is transformed into a lightness 

signal and two color signals, R+G+B/3, R−B, and 

G−B, just as in opponent color space. Figures 4 and 5 

show the relationship between the manipulated RGB values 

and CIELAB values for a cellular camera and PDA camera. 

From a visual evaluation, the distribution of the measure- 



ment data was found to be slightly dispersed due to nonuniform 

illumination intensities according to the spatial position 

on the color chart, where the lux-meter measurements 

for the four corners were 1990 lux, 2055 lux, 1955 lux, and 

1922 lux, respectively. Although efforts were made to correct 

the nonuniformity of the illumination intensity, the modeling 

of lux-meter measurements according to their distance 

from the center of the color chart is not trivial work due to 

their random distribution, therefore, this issue has been carried 

over to future work. However, it was still clear that the 

manipulated RGB digital values were roughly linear to the 

CIELAB values: 

R + G + B L * , R − B a * , G − B b * . 6 

3 

Therefore, a first-order polynomial is adopted, and mathematical 

modeling of mobile camera characterization is expressed 

as linear equations: 

L * =1+ L,R R + L,G G + L,B B, 

a * =1+ a,R R + a,G G + a,B B, 

b * =1+ b,R R + b,G G + b,B B. 

Equation (7) can be equally expressed in vector form, 

P = V T , 

7 

1 , ...,1 n 

L,1 , a,1 , b,1 

R 1 , ...R n L,2 , a,2 , b,2 

V =1 = 8 

G 1 , ...G n L,3 , a,3 , b,3 

B 1 , ...B n, L,4 , a,4 , b,4, 

1 

P =L * , a * * 

1 , b 1 

· · · 

· · · 

L * n , a * n , b 

*, 

n 

where n is the patch number of color chart. The ultimate 

goal of mobile camera characterization is deriving the coefficients 

of the linear equation, which can be obtained by 

pseudoinverse transformation of Eq. (8): 

= VV T −1 VP. 

Using the derived coefficients, an arbitrary captured digital 

value can be converted into the CIELAB value. However, 

some of the estimated CIELAB values may exceed the maximum 

value of CIELAB space caused by the error of linear 

regression. To solve this problem, the lightness value is subtracted 

from the amount of excessive lightness, and two 

color signals are linearly compressed while preserving their 

hue value: 

9 

Table III. Estimation errors of mobile camera characterization. 

Cellular camera 

Samsung SCH-100 

PDA camera 

Samsung SPH-M400 

* 

Average E ab 

L * = L * * 

− L max − 100, 

* 

Maximum E ab 

4.3605 12.2098 

6.2638 16.8828 

10 

a * = k a * * 

/a max , b * = b * /a * a * , 11 

where k is the constant value for color-signal compression. 

L max and a max are the estimated maximum lightness value 

and color signal value, respectively. Table III shows the performance 

of mobile camera characterization for the color 

chart; the PDA camera shows poorer performance than the 

cellular camera. When observing the moving picture transmitted 

from the mobile camera, the PDA camera is subject 

to more noise than the cellular camera, which will produce a 

large characterization error. 

REAL-TIME COLOR MATCHING BETWEEN MOBILE 

CAMERA AND MOBILE LCD BASED ON 16-bit 

LUT DESIGN INCLUDING NOISE REDUCTION 

The process of colorimetric color matching reproduces the 

same CIE chromaticity and relative luminance compared 

with the original color, and has been widely applied to imaging 

devices such as monitors, printers, and scanners based 

on the ICC profile, but not to mobile phones because mobile 

phones have been considered to be primarily communication 

devices. However, with multimedia convergence, mobile 

manufacturers have become aware of the necessity of color 

matching between mobile camera and mobile LCD. With the 

characterization of mobile camera and mobile LCD, to 

implement the color matching system, achievable ranges of 

colors (gamut) must be considered. Figure 6 shows the 

gamut difference between a mobile camera under D65 environment 

(point) and a mobile LCD (solid color). As shown 

in Fig. 6, the gamut of mobile camera is larger than that of 

mobile LCD, and has a regular form resulting from the use 

of linear equations. Thus, significant parts of the mobile 

camera gamut can be unachievable by the gamut of mobile 

LCD, and it is necessary to alter the original colors (mobile 

camera) to ones that a given output medium (mobile LCD) 

is capable of reproducing. This power is frequently referred 

to as gamut mapping. In this paper, gamut mapping with 

variable and multiple anchor points is used to reduce any 

sudden color changes on the gamut region boundary and 

increase the lightness range reduced in conventional gamut 

mapping toward an anchor point. 11 

In general, the performance of colorimetric color 

matching between cross media, such as a monitor and 

printer, depends on the gamut mapping and device characterization. 

However, in the case of a mobile camera, various 

camera noises, such as CCD noise and thermal noise, can be 



Figure 6. Gamut mismatches between mobile camera point and mobile 

LCD solid color; a projected to a * ,b * plane and b projected 

to L * ,b * plane. 

included in moving pictures and become amplified after 

color matching, especially in the dark regions of the achromatic 

axis, although not in the chromatic region due to the 

blending of the reproduced signal. To solve this problem, the 

combination of two sigmoid functions with different parameters 

is applied to the lightness component of gamutmapped 

tristimulus values to change the contrast ratio. The 

sigmoid function is expressed as 

n=i 

1 

S i = e 

n=0 −100x n /m − x 0 2 /2 2 , 

2 

i =1,2,. .,m, 

S i − minS 

S LUT = 

maxS − minS L * 

* 

* 

max out − L min out + L min out . 

12 

13 

Equation (12) is a discrete cumulative normal function S, 

where x 0 and are the mean and standard deviation of the 

normal distribution, respectively, and m is the number of 

points used in the discrete lookup table. X n is the gamutmapped 

lightness component of CIELAB values and this 

value is then scaled into dynamic range of the mobile LCD, 

* 

* 

as given in Eq. (13), where L min out and L max out are blackpoint 

and white-point lightness value of the mobile LCD. In 

Eq. (12), x 0 controls the centering of the sigmoid function, 

Figure 7. Modified sigmoid function for noise reduction; a sigmoid 

functions with different parameters and b the combination of two sigmoid 

functions. 

and controls the shape. To find the parameters to conceal 

the camera noise through the lightness remapping, visual 

experiments were repeated based on adjustment of two parameters, 

and thus we found that the combination of two 

sigmoid functions is needed. In Figure 7(a), the solid line 

with x 0 =30 and =11.025 is the optimal curve to reduce 

the camera noise in the dark region, yet remapped lightness 

values in the bright region are significantly increased, forming 

a saturation region. Thus another sigmoid function with 

x 0 =40 and =27.35, represented by dotted line in Fig. 7(a), 

is applied to the gamut-mapped light values larger than the 

input lightness value of 20 in order to make the lightness 

value of the reproduced image similar to that of the original 

image. Ultimately, the combination of two sigmoid functions, 

as shown in Fig. 7(b) expressed by the solid line, decreases 

the lightness value of camera noise in the dark region 

of achromatic axis, and from this result, amplified camera 

noise is hardly observed by a human eye. 

This kind of serial processing mentioned above, including 

the characterization, gamut mapping, and noise reduc- 



Table IV. Example of bit quantization. 

Table V. The example of proposed 3D-RGB LUT. 

6-bit quantization 

8-bit input data && 8-bit masking data 

R G B R G B 

R-channel: 00110 

G-channel: 011001 

B-channel: 10011 

00110011 && 11111000 

01100110 && 11111100 

10011001 && 11111000 

tion, has computational complexity and is not appropriate 

for real-time processing. Therefore, a 3D-RGB LUT is constructed 

based on N-grid points for each channel. The input 

RGB digital values are uniformly sampled by nnn grid 

points, which are processed by serial color matching, resulting 

in new corresponding output RGB values. The input 

RGB digital value and output RGB digital values are stored 

in the 3D-LUT and arbitrary input RGB values are calculated 

by interpolation. This 3D-LUT can be inserted into the 

mobile LCD without any difficulties associated with memory 

and computation. In actuality, in a mobile phone, a moving 

picture has 8-bits per channel, while the displayed RGB image 

on the LCD screen is represented by 5,6,5 bits per 

channel. Thus, before displaying an image on the LCD 

screen, 24-bit moving picture data is quantized into 16-bit 

data through a bit operation used in program language. For 

26 64 0 24 64 17 

32 64 0 25 64 18 

0 0 6 3 3 4 

6 0 6 10 3 2 

13 0 6 14 0 1 

19 0 6 17 0 1 

26 0 6 21 0 3 

32 0 6 23 0 4 

0 13 6 0 19 4 

example, suppose that the moving picture data to be displayed 

is (51,102,153). The final data are calculated by applying 

the AND operation (&&) with 8-bit masking data. 

Table IV shows an example of the AND operation. 

EXPERIMENTS AND RESULTS 

To conduct a subjective experiment of colorimetric color 

matching, test images were captured using a mobile camera; 

these included both face image and color chart images cap- 

Figure 8. The experimental results with the cellular phone; a and b device-dependent color matching, c 

and d proposed color matching. 



Figure 9. The experimental results with the PDA camera; a and b device-dependent color matching, c 

and d proposed color matching. 

tured in a lighting booth with D65 illumination. Statistically, 

the face image is one of the most frequently captured images, 

and people are very sensitive to their skin color displayed 

on mobile LCD. For this reason, a face image representing 

the skin color was selected as a test image. Similarly, 

the reason why the color chart image captured under D65 

illumination was used as a test image was that the characterization 

of the mobile camera was conducted under D65 

illumination, and subjective performance of color matching 

can be easily evaluated by comparing the displayed image 

with the real object as seen in the lighting booth. In addition, 

device-dependent color matching was compared to 

evaluate the performance of colorimetric color matching. 

Device-dependent color matching directly transmits the captured 

image to mobile LCD, while colorimetric color matching 

sends the captured image through the 3D-RGB LUT, 

which is quantized and transmitted to the mobile LCD. 

Table V shows a part of the data set stored in the 3D-RGB 

LUT designed to the 16-bit system. In the R channel and B 

channel, the maximum digital value is 2 5 , whereas 

G-channel’s maximum digital value is 2 6 . Based on the 

16-bit LUT, colorimetric color matching between mobile 

camera and mobile LCD can be processed in real time. 

SUBJECTIVE EXPERIMENT OF DEVELOPED COLOR 

MATCHING BASED ON 16-bit LUT DESIGN 

Figure 8 shows the captured images that are displayed on the 

cellular phone. Figures 8(a) and 8(b) show the images resulting 

from device-dependent color matching, while Figs. 

8(c) and 8(d) show the images resulting from LUT-based 

colorimetric color matching. In Fig. 8(a), even though the 

picture is taken against the light, the face region is very 



Figure 11. Quality evaluations of device-dependent color matching and 

proposed color matching. 

seen in the D65 daylight, especially the red and green hues. 

Figure 9 shows the results of color matching for a PDA 

phone; the same effect is shown. Figure 10 shows the resulting 

images of colorimetric color matching considering the 

noise reduction. Figure 10(a) is the resulting image obtained 

by conventional colorimetric color matching, and its image 

quality is significantly degraded by the camera noise. By applying 

the combination of two sigmoid functions in Fig. 

7(b) to conventional colorimetric matching, the contrast ratio 

of reproduced image is changed and from this result, 

camera noise is not observable to the human eye, as shown 

in Fig. 10(b). Consequently, color matching based on 3D- 

LUT accurately reproduces the object color seen in the real 

scene and thus improves the color fidelity of the mobile 

display. For moving pictures, the same results decreasing the 

camera noise can be achieved with no problems of computation 

and memory. 

Figure 10. The results of noise reduction; a before lightness remapping 

and b after lightness remapping. 

bright due to the tendency of the electro-optical transfer 

function of mobile LCD to saturate the bright region as 

shown. In addition, the colorfulness of “table” and “cloth” 

region is more decreased than the original color, and the 

image quality is degraded. As shown in Fig. 8(c), the skin 

color in the “face” region is more natural and realistic than 

in Fig. 8(a), and the object colors such as “cloth” and “table” 

arewellreproducedonLCD.FortheMacbethColorchart 

seen in Fig. 8(b), colors of each patch are washed out and 

exhibit major differences in appearance, compared with 

original color seen in the D65 lighting booth, because 

device-dependent color matching is only physical data transmission. 

On the other hand, the result shown in Fig. 8(d) 

adquately represents the colorfulness of the original color 

QUANTITATIVE EVALUATION OF THE DEVELOPED 

COLOR MATCHING 

To evaluate colorimetric color matching based on 16-bit 

RGB LUT, a Macbeth Color Chart composed of 24 patches 

was used as a test image. For quantitative evaluation of the 

device dependent color matching, the Macbeth Color Chart 

is previously captured in the D65 lighting booth, and is displayed 

on mobile LCD. Then, the CIELAB value of each 

patch is measured using a colorimeter and compared with 

the CIELAB data of the Macbeth Color Chart measured in 

the D65 lighting booth, thus calculating the CIE 1976 color 

difference. For the proposed color matching, the Macbeth 

Color Chart is again captured in the D65 lighting booth, and 

is displayed on mobile LCD through use of the 16-bit RGB 

LUT. Then, the CIELAB value of each patch is measured 

using a colorimeter and compared with the CIELAB data of 

Macbeth Color Chart measured in the D65 lighting booth. 

Figure 11 shows the result of quantitative evaluation using 

1976 Color difference. In Fig. 11, several patches corresponding 

to the proposed color matching have a larger color 

difference than for the conventional method, due to the 

characterization errors of the mobile camera and mobile 

LCD. However, the proposed color matching has a lower 

average color difference of 15.56, whereas device-dependent 

color matching has the average color difference of 24.395. 



Therefore, the proposed color matching achieves better colorimetric 

reproduction than the conventional method, and it 

is concluded that object color transmitted by mobile camera 

in real time can be accurately and realistically reproduced on 

a mobile LCD. 

CONCLUSIONS 

This paper presented a method for real-time color matching 

between mobile camera and mobile LCD, involving characterization 

of the mobile camera and mobile LCD, gamut 

mapping, noise reduction, and a LUT design. The characterization 

of the mobile LCD is conducted based on 16-bit 

processing, plus a sigmoid function is used to estimate the 

electro-optical transfer function. Meanwhile, for the characterization 

of the mobile camera, the optimal polynomial order 

is determined by transforming the captured RGB data 

into opponent color space and finding the relationship between 

the transformed RGB values and the measured 

CIELAB values. Following the two types of characterization, 

gamut mapping is executed to overcome the gamut difference 

between the mobile camera and the mobile LCD, then 

noise reduction processing is applied to the lightness component 

of the gamut-mapped CIELAB values. Finally, to reduce 

the complex computation of serial-based color matching, 

a 3D RGB LUT is designed based on 16-bit data and 

inserted into the mobile phone. Experiments demonstrated 

that the proposed color matching realistically reproduced 

object colors from a real scene on a mobile LCD and improved 

the fidelity color of the mobile display. The LUT was 

also designed without any further computation or memory 

burden, making real-time processing possible. 

Acknowledgments 

This work is financially supported by the Ministry of Education 

and Human Resources Development (MOE), the 

Ministry of Commerce, Industry and Energy (MOCIE), and 

the Ministry of Labor (MOLAB) through the fostering 

project of the Lab of Excellency. 

REFERENCES 

1 J. Luo, “Displaying images on mobile device: capabilities, issues, and 

solutions”, Wirel. Commun. Mob. Comput. 2, 585–594 (2002). 

2 J. Y. Hardeberg, Acquisition and reproduction of color images: 

colorimetric and multispectral approaches, Universal Publishers, 

Dissertation.com, 2001. 

3 H. R. Kang, Color Technology for Electronic Image Device (SPIE Optical 

Engineering Press, Bellingham, WA, 1996). 

4 R. S. Berns, “Methods for characterizing CRT displays”, Displays 16, 

173–182 (1996). 

5 Y. S. Kwak and L. W. MacDonald, “Characterisation of a desktop LCD 

projector”, Displays 21, 179–194 (2000). 

6 N. Tamura, N. Tusmura, and Y. Miyake, “Masking model for accurate 

colorimetric characterization of LCD”, Proc. IS&T/SID Tenth Color 

Imaging Conference (IS&T, Jmigtiel, VA, 2002), 312–316. 

7 G. Sharma, “LCD versus CRTs color-calibration and gamut 

consideration”, Proc. IEEE 90, 605–622 (2002). 

8 M. D. Fairchild, Color Appearance Models (Addison-Wesley, Reading, 

MA, 1998). 

9 G. Hong, M. R. Luo, and P. A. Ronnier, “A study of digital camera 

colorimetric characterization based on polynomial modeling”, Color 

Res. Appl. 26, 76–84 (2001). 

10 M. R. Pointer, G. G. Attridge, and R. E. Jacobson, “Practical camera 

characterization for colour measurement”, Imaging Sci. J., 49, 63–80 

(2001). 

11 C. S. Lee, Y. W. Park, S. J. Cho, and Y. H. Ha, “Gamut mapping 

algorithm using lightness mapping and multiple anchor points for linear 

tone and maximum chroma reproduction”, J. Imaging Sci. Technol. 45, 

209–223 (2001). 




Solving Under-Determined Models in Linear Spectral 

Unmixing of Satellite Images: Mix-Unmix Concept 

(Advance Report) 

Thomas G. Ngigi and Ryutaro Tateishi 

Center for Environmental Remote Sensing, Chiba University, 1-33 Yayoi, Inage, Chiba, 263-8522, Japan 

E-mail: tgngigi@hotmail.com 

Abstract. This paper reports on a simple novel concept of addressing 

the problem of underdetermination in linear spectral unmixing. 

Most conventional unmixing techniques fix the number of endmembers 

on the dimensionality of the data, and none of them can 

derive multiple 2 + end-members from a single band. The concept 

overcomes the two limitations. Further, the concept creates a processing 

environment that allows any pixel to be unmixed without any 

sort of restrictions (e.g., minimum determinable fraction), impracticalities 

(e.g., negative fractions), or trade-offs (e.g., either positivity 

or unity sum) that may be associated with conventional unmixing 

techniques. The proposed mix-unmix concept is used to generate 

fraction images of four spectral classes from Landsat 7 ETM+data 

(aggregately resampled to 240 m) first principal component only. 

The correlation coefficients of the mix-unmix image fractions versus 

reference image fractions of the four end-members are 0.88, 0.80, 

0.67, and 0.78. © 2007 Society for Imaging Science and 

Technology. 


PROBLEM STATEMENT / INTRODUCTION, AND 

OBJECTIVE 

“… the number of bands must be more than the number of 

end-members…” is perhaps the most ubiquitous statement 

in the field of linear spectral unmixing. This is simply because 

most of the conventional unmixing techniques are 

based on least squares, 1 convex geometry, 2 or combination 

of both and the number of end-members (unknowns) is 

dependent on the dimensionality (equations) of the data. 

Least squares can unmix as many end-members as up to the 

dimensionality of the data, and at the very best exceed by 

one when the unity constraint is enforced. In convex geometry, 

the number of determinable end-members (at the 

unmixing stage) is equal to the number of vertices of the 

data simplex, and this number exceeds the dimensionality of 

the data by one. After extracting the end-member spectra, 

most of the convex geometry-based techniques apply the 

least squares approach (combined case) in computing the 

fractions of the end-members. 

Some linear spectral unmixing techniques include Sequential 

Maximum Angle Convex Cone (SMACC) Spectral 

Tool, 3 (Generalized) Orthogonal Subspace Projection, 4,5 

Convex Cone Analysis, 6 N-FINDR, 7 Orasis, 7 and Iterative 

Error Analysis. 7 Keshava 8 gives a detailed account of spectral 

unmixing techniques. A number of commercially available 

software, including ENVI, IDRISI Kilimanjaro, PCI, and 

ERDAS Imagine, have linear spectral unmixing modules. 

The greatest fundamental commonality of all conventional 

linear spectral unmixing techniques is that none of them can 

derive multiple end-members 2 + from a single band. The 

object of the mix-unmix concept is to overcome this problem 

and unmix as many end-members as can be deciphered 

from the reference data and without introducing any sort of 

restrictions, impracticalities, or trade-offs that may be associated 

with conventional unmixing techniques. 

DESCRIPTION OF THE MIX-UNMIX CONCEPT 

As the term implies, the model consists of two branches, 

namely, mixing and unmixing. The mixing branch entails 

development of hypothetical mixed pixels on the basis of 

desired end-members’ actual digital numbers (DNs). 

Unmixing involves determination of each real image pixel’s 

DN’s contributory end-members and their fractions by 

back-propagating through the mixing branch using a pixel 

of the same DN in the hypothetical image as a proxy. This 

preliminary study demonstrates the concept on a single 

simulated band. 

Mixing Branch 

Nominally, the end-members are paired up hierarchically 

into a single hypothetical mixed class (Figure 1; 

EM1=end-member 1, EM1.2=combined end-members 1 

and 2). Essentially, in pairing up, each and every DN from a 

member of a pair is combined with each and every DN from 

the other member, at complementary percentages ranging 

from 0% to 100%, giving rise to various “mixture tables” 

(MTs) whose number depends on the ranges of training 

DNs of the two members. 

Theory of the mixing branch and formation of mixture 

tables (MTs) 

The number of possible DN combinations, MTs, of two 

members, A and B, of a pair is equal to the product of their 

training DN ranges, i.e., 

MTs = A max DN − A min DN + 1 B max DN 


1062-3701/2007/514/360/8/$20.00. 

where 

− B min DN + 1, 

360

Ngigi and Tateishi: Solving under-determined models in linear spectral unmixing of satellite images: mix-unmix concept 

Figure 1. Bottom-up pairing up of end-members as well as the resultant super-end-members—pairing up 

involves mixing the two members-of-a-pair’s DN ranges at all complementary fractions. In case the number of 

super-end-members is even but not a multiple of four, one level of mixing is skipped for one pair as indicated 

by EM5 and EM6 for six end-members. For an odd number, one end-member is simply carried forward to the 

next level individually as indicated by EM7 for seven end-members. In this case, EM5.6 and EM7 have the 

same hierarchical status as EM1.2.3.4. At the base of the branch are training DN ranges and assumed 

fractions of the end-members—the DNs are known by in situ observation, from spectral libraries, or identification 

of pure pixels in the image to be unmixed, etc. At the top of the branch are hypothetical pixels’ DN values 

resulting from mixing all the end-members’ spectra at all possible complementary fractions. 

A max DN = maximum DN of A, 

A min DN = minimum DN of A, 

Subsequently, the total number of possible DNs and percentages 

combinations of the two is 

MTs N%s. 

The expression also gives the total number of possible mixture 

pixels of the two. 

Thus all pixels, in a hypothetical band, composed of 

only two end-members, EM1 and EM2, would be defined by 

the following expression—discussed assuming: that the endmembers’ 

training DNs range from, respectively, 10–89 and 

90–150 in the band; a mixture interval of 10%, and assuming 

linear mixing. 

B max DN = maximum DN of B, 

B min DN = minimum DN of B. 

The number of possible percentages combinations, N%s, of 

the two members is given by 

1 

N%s=100 % ÷ MI +1, 

where 

MI = adopted mixture interval. 

where 

• f 1,i =percentage of EM1 in pixel i Table Ia 1st 

column, 

• f 2,i =percentage of EM2 in pixel i Table Ia 2nd 

column, 

• f 1 +f 2 =100%, 

• DN 1,i =DN of EM1 in pixel i Table Ia 2nd row, 

• DN 2,i =DN of EM2 in pixel i Table Ia 3rd row, 

• DN 1,2,i =mixture DN of DN 1,i and DN 2,i in pixel i 

Table Ia all cells excluding the first two columns 

and first three rows, 



Table I. a MTs of EM1 and EM2. The EM1.2 DNs are computed as: EM1.2 DN=EM1 % EM1 DN +EM2 % EM2 DN. Note that EM1% and 

EM2% are complementary. 

MT= 1 2 3− 

4 880 4 881 

4 879 

EM1 DN= 10 11 88 89 

EM2 DN= 90 90 150 150 

EM1% EM2% EM1.2 DN 

0 100 90 90 150 150 

10 90 82 82 143 143 

20 80 74 74 137 137 

30 70 66 66 131 131 

40 60 58 58 125 125 

50 50 50 51 119 119 

60 40 42 43 113 113 

70 30 34 35 107 107 

80 20 26 27 100 101 

90 10 18 19 94 95 

100 0 10 11 88 89 

b Min-max MTs of EM1 and EM2. 

c Min-max LUTs of EM1 and EM2. 

Min-MT 

EM1 DN= 10 

EM2 DN= 90 

Min-LUT 

Max-MT 

EM1 DN= 89 

EM2 DN= 150 

Max-LUT 

EM1 % EM2 % 

EM1.2 DN 

Amount of 

overlap with 

EM1.2 vector 

90–150 

0 100 90 150 60 

10 90 82 143 53 

20 80 74 137 47 

30 70 66 131 41 

40 60 58 125 35 

50 50 50 119 29 

60 40 42 113 23 

70 30 34 107 17 

80 20 26 101 11 

90 10 18 95 5 

100 0 10 89 0 

• min EM1DN=minimum DN of EM1, 

• max EM1DN=maximum DN of EM1. 

Bounding mixture tables, and various numbers of endmembers 

From Table I(a), it is very clear that, given constant fractions 

of EM1 and EM2, the mixture class DNs (EM1.2 DNs) always 

fall between the values in the first and last MTs, thus 

the two MTs fully give the ranges of all possible DNs of 

EM1.2. Hereinafter, the two are referred to as min-MT and 

max-MT, respectively, and min-max MTs collectively [Table 

I(b)]. 

Similarly, min-max MTs of the other paired endmembers 

are generated: for EM3 and EM4; in Eq. (1) EM1, 

EM2, and EM1.2 are replaced with EM3, EM4, and EM3.4, 

respectively. Table II(a) shows min-max MTs of EM3 and 

EM4 shows the DN ranges are 151–180 and 181–210, respectively. 

Next, second level min-max MTs are developed from 

the above first level MTs: for EM1.2 and EM3.4; in Eq. (1) 

EM1, EM2, and EM1.2 are replaced with EM1.2, EM3.4, and 

EM1.2.3.4, respectively. Table III(a) shows min-max MTs of 

EM1.2, and EM3.4. Since EM1.2 represents EM1 and EM2, 

and EM3.4 represents EM3 and EM4, subsequently, the second 

level min-max MTs inherently represent all the possible 

DN outcomes of mixing all the end-members EM1, EM2, 

EM3, and EM4 at all possible complementary fractions. 



Table II. a Min-max MTs of EM3 and EM4. The EM3.4 DNs are computed as: EM3.4 

DN=EM3 % EM3 DN +EM4 % EM4 DN. Note that EM3% and EM4% are 

complementary. b Min-max LUTs of EM3 and EM4. 

Min-MT 

EM3 DN= 151 

EM4 DN= 181 

Min-LUT 

Max-MT 

EM3 DN= 180 

EM4 DN= 210 

Max-LUT 

Amount of 

overlap with 

EM3.4 vector 

172–201 

EM3 % EM4 % EM3.4 DN 

0 100 181 210 20 

10 90 178 207 29 

20 80 175 204 29 

30 70 172 201 29 

40 60 169 198 26 

50 50 166 195 23 

60 40 163 192 20 

70 30 160 189 17 

80 20 157 186 14 

90 10 154 183 11 

100 0 151 180 8 

For more end-members, the process is successively repeated 

as shown in Fig. 1. For three end-members in Eq. (1) 

EM1, EM2, and EM1.2 are replaced with EM1.2, EM3, and 

EM1.2.3, respectively. 

Unmixing Branch 

This is similar to the mixing branch (Fig. 1) but with the 

arrows (processing) reversed and the MTs renamed look-uptables 

(LUTs)—Tables I(c), II(b), and IIIb. As discussed below, 

a real image pixel DN is fractionalized into two highest 

level super-end-members, each of which is then split into its 

two constituent end-members. The process continues until 

the finest level (end-members of interest) from which the 

mixing branch was constructed (see Figure 2). 

Fractionalization 

This discussion demonstrates the unmixing process on a 

single-band image composed of the four end-members outlined 

in the Mixing Branch section. For each DN in the 

band, all the vectors in which it lies are identified, e.g., Table 

III(b) EM.1.2.3.4 italicized DNs give all the possible vectors 

for DN 180, with the lower nodes located in Table III(b-1) 

and the upper nodes in Table III(b-2)—the first vector is 

172–204 (bold). Each one of these vectors is a combination 

of two minor vectors, one apiece from EM1.2 and EM3.4 

(italicized); e.g., for the vector 172–204, the constituent vectors 

are 90–150 (bold) from EM1.2 and 181–210 (bold) 

from EM3.4. 

The most probable vector (MPV) in which the DN 180 

lies is computed as 

Figure 2. Top-bottom fractionalization of a pixel; first into two highest level super-end-members, then effectively 

into second highest level four super-end-members by fractionalizing each of the highest level super-endmembers 

into two. The process is repeated successively until the lowest level end-members that were used to 

build up the mixing branch. At the top of the branch is a universe of values encompassing all the DNs in the 

image to be unmixed—all: assuming that the image is composed of only the end-members used in the mixing 

branch. At the base of the branch are estimated contributory percentages fractions of the end-members cf. 

Fig. 1. 



Table III. a-1 Min-MT of EM1.2 and EM3.4. The EM1.2.3.4 DNs are computed as: EM1.2.3.4 DN=EM1.2% EM1.2 DN +EM3.4% EM3.4 DN. Note that EM1.2% and EM3.4% are complementary. 

b-1 Min-LUT of EM1.2 and EM3.4. 

EM3.4 DN b 

EM1.2 

DN a 

EM1.2 

% 

EM3.4 

% 

181 178 175 172 169 166 163 160 157 154 151 

EM1.2.3.4 DN 

90 0 100 181 178 175 172 169 166 163 160 157 154 151 

90 10 90 172 169 167 164 161 158 156 153 150 148 145 

90 20 80 163 160 158 156 153 151 148 146 144 141 139 

90 30 70 154 152 150 147 145 143 141 139 137 135 133 

90 40 60 145 143 141 139 137 136 134 132 130 128 127 

90 50 50 136 134 133 131 130 128 127 125 124 122 121 

90 60 40 126 125 124 123 122 120 119 118 117 116 114 

Rows 8 to 117 

10 70 30 61 60 60 59 58 57 56 55 54 53 52 

10 80 20 44 44 43 42 42 41 41 40 39 39 38 

10 90 10 27 27 27 26 26 26 25 25 25 24 24 

10 100 0 10 10 10 10 10 10 10 10 10 10 10 

a-2 Max-MT of EM1.2 and EM3.4. 

b-2 Max-LUT of EM1.2 and EM3.4. 

EM3.4 DN d 

EM1.2 

DN c 

EM1.2 

% 

EM3.4 

% 

210 207 204 201 198 195 192 189 186 183 180 

EM1.2.3.4 DN 

150 0 100 210 207 204 201 198 195 192 189 186 183 180 

150 10 90 204 201 199 196 193 191 188 185 182 180 177 

150 20 80 198 196 193 191 188 186 184 181 179 176 174 

150 30 70 192 190 188 186 184 182 179 177 175 173 171 

150 40 60 186 184 182 181 179 177 175 173 172 170 168 

150 50 50 180 179 177 176 174 173 171 170 168 167 165 

150 60 40 174 173 172 170 169 168 167 166 164 163 162 

Rows 8 to 117 

89 70 30 125 124 124 123 122 121 120 119 118 117 116 

89 80 20 113 113 112 111 111 110 110 109 108 108 107 

89 90 10 101 101 100 100 100 99 99 99 99 98 98 

89 100 0 89 89 89 89 89 89 89 89 89 89 89 

a Column 1 elements are from EM1 and EM2 min-MT Table Ib, column 3 

b Row 1 elements are from EM3 and EM4 min-MT Table IIa, column 3 

c Column 1 elements are from EM1 and EM2 max-MT Table Ib, column 4 

d Row 1 elements are from EM3 and EM4 max-MT Table IIa, column 4 

where 

n 

lower nodes 

i=1 

cMNxDN = 

n 

, 2 

cMNyDN = 

n 

upper nodes 

i=1 

n 

, 3 

• cMNxDN=lower node of DN 180 MPV from combined 

classes M and N (EM1.2 or EM3.4), 

• cMNyDN=upper node ditto, 

• lower nodes=all the EM1.2.3.4 italicized DN in Table 

III(b-1), 

• upper nodes=ditto Table III(b-2), 

• n=number of EM1.2.3.4 italicized DN vectors=count 

of EM1.2.3.4 italicized DN nodes in Table III(b-1) or 

Table III(b-2). 

From Eqs. (2) and (3), cMNxDN=156 and cMNyDN 

=190. From Table III(b), the pair of nodes most close to the 

pair 156/190 is 156/191 and it is adopted as the MPV for 

the DN 180. This vector 156–191 [Table III(b) bold and 

underlined] lies at the intersection of EM1.2 vector 90–150 



given rise to the EM1.2 vector 90–150 is taken to be proportional 

to the amount of overlap with it. Each probability is 

also taken to be the probability of the corresponding paired 

percentages (PPs) having given rise to the EM1.2 vector 90– 

150 since the PV was developed from them (PPs). Similarly 

for Table II(b) PVs and PPs in the case of EM3.4 vector 

172–201. There are seven possible overlap scenarios as depicted 

by Figure 3. Table I(c) last column gives the weights 

of the EM1 and EM2 vectors to EM1.2 vector 90–150, and 

Table II(b) ditto EM3 and EM4 vectors to EM3.4 vector 

172–201. 

From Fig. 3 and Tables I(c) and II(b), the most probable 

percentage contribution (MPPC) of each daughter class to 

its mother class is computed as 

Figure 3. Possible universal overlap scenarios. A is EMx.y’s most probable 

vector, e.g., EM1.2 MPV 90-150 or EM3.4 MPV 172-201; all/ 

some of the other non-arrowed lines B-H are vectors contained in 

EMx.y’s min-max LUTs e.g., Table Ic for EM1.2, or Table IIb for 

EM3.4; arrowed lines are the respective overlaps. 

and EM3.4 vector 172–201. Therefore, by extension, the DN 

180 most probably resulted from these EM1.2 and EM3.4 

vectors as the combination most probably gave rise to the 

DN 180 MPV 156–191. 

Further, percentages-wise, the DN 180 could have resulted 

from any of the paired percentages associated with the 

EM1.2.3.4 italicized DNs vectors. The most probable contributory 

paired percentages (MPPC) are computed as 

where 

MPPC x.y = 

n 

i% x.y p i 

i=1 

± 

n 

n p i 

i=1 

n 

i=1 

p i v i 

2 

n 

n 2 

i=1 

p i 

, 

• i% x.y =ith paired percentages of x EM1.2 

and y EM3.4, 

• p i =weight of i% x.y =count of i% x.y ’s EM1.2.3.4 

italicized DNs, 

• n=count of probable contributory paired 

percentages, 

• v=i% x.y −MPPC x,y . The second term in Eq. 4 

is computed after the first one. 

From Eq. (4), EM1.2% =16.67% ±2.34% and EM3.4% 

=83.33% ±2.34%. Hence, the DN 180 most probably resulted 

from these EM1.2 and EM3.4 percentages combinations 

as the pair most probably gave rise to the DN 180 

MPV 156-191. 

Next, the EM1.2 90-150 and EM3.4 172-201 vectors are 

checked against the lower level min-max LUTs, Tables I(c) 

and II(b), respectively, and all the vectors with which they 

(EM1.2 and EM3.4 vectors) overlap form the universe of 

possible vectors (PVs) from which they (EM1.2 and EM3.4 

vectors) or, in other words, a fraction of the value 180, arose. 

The probability (weight) of each of the Table I(c) PVs having 

4 

where 

cM %= 

q 

cM% i p i 

i=1 

± 

q 

q p i 

i=1 

q 

i=1 

p i v i 

2 

q 

q 2 

i=1 

p i 

, 

5 

• cM% =MPPC of daughter class cM (EM1 or EM2) to 

its mother class (EM1.2). EM3 or EM4 for EM3.4; 

• cM% i =percent of cM’s ith probable vector—Table I(c) 

columns 1 and 2 for EM1 and EM2, respectively; Table 

II(b) columns 1 and 2 for EM3 and EM4, respectively; 

• p i =overlap range of cM’s ith probable vector with its 

cM mother’s MPV. Table I(c) last column for EM1 

and EM2. Table II(b) last column for EM3 and EM4; 

• q=count of probable paired-percentages; 

• v=cM%-cM% i . The second term in Eq. (5) is computed 

after the first one. 

From Eq. (5) and Tables I(c) and II(b), the MPPCs of 

EM1 and EM2 to EM1.2, and EM3 and EM4 to EM3.4 are; 

EM1=71% ±2.84%, EM2=29% ±2.84%, EM3 

=59% ±2.42%, and EM4=41% ±2.42%. 

The MPPC of an end-member to the original pixel DN 

is simply the product of all MPPCs along the path from the 

end-member itself to the pixel DN. Hence, for end-member: 

• 1=EM1% EM1.2% =71% 16.67% 

=11.84±1.73%, 

• 2=EM2% EM1.2% =29% 16.67% 

=04.83±0.83%, 

• 3=EM3% EM3.4% =59% 83.33% 

=49.16±2.44%, 

• 4=EM4% EM3.4% =41% 83.33% 

=34.17±2.23%. 

The standard deviation of product AB is computed as 

+ 2 

B 

, 6 

B 

AB = AB 2 

A 

A 

where k =standard deviation of k. 



Figure 4. Left location of the study area; right 30 m resolution Landsat 

ETM+ data RGB=342. 

Figure 5. Left reference data: 30 m resolution spectral classes. The 

spectral classes correspond to broad information classes dense vegetation 

C1, dense vegetation/bare land C2, bare land/dense vegetation 

C3, and bare land C4; right raw image: 240 m resolution first 

principal component. The rectangles show locations of reference black 

and mix-unmix red training sites. 

STUDY TEST 

Four spectral end-members are mapped from a single-band 

simulated raw image. 

Study Area and Data 

Landsat ETM+ data, of 21 February 2000, covering 

southern-central Kenya is utilized to generate both the reference 

and raw data. The land covers in the area transition 

from dense vegetation (forest) to bare land (Figure 4). 

Simulation of Reference and Raw Data 

K-mean classification is run on the ETM+ data bands 1, 2, 

3, 4, 5, and 7 to produce four spectral classes. The spectral 

classes are adopted as reference data. The six bands data is 

resampled to 240 m resolution (mimics moderate resolution 

data, e.g., MODIS bands 1 and 2–250 m resolution) and 

then principal components transformation (PCT) executed 

on the new data set. Each of the resampled bands and PCs is 

taken as a candidate raw image. 

Selection of Band to Unmix, and its Unmixing 

A section from the 30 m resolution reference spectral classes 

image, black rectangle in Figure 5, hereinafter referred to as 

Figure 6. a Comparison of DNs’ distribution curves of C1, C2, C3, 

and C4 in mix-unmix training site across original bands and principal 

components PCs. The least overlap between the curves occurs in PC1 

and, thus, it is adopted as the raw band to unmix. Y axes=frequencies, 

and X axes=DNs—but values not shown. b Training DNs of EM1, 

EM2, EM3, and EM4. 

reference training site, is geographically overlaid on each 

candidate raw image (240 m resolution) and pure pixels in 

the overlay section, red rectangle in Fig. 5, hereinafter referred 

to as mix-unmix training site, of the candidate raw 

image for each spectral class identified. A pixel in the mixunmix 

training site is pure if the geographically corresponding 

8-pixel8-pixel block in the reference training site is 

composed of a single class, 8 is the ratio of the two resolutions. 

Figure 6(a) compares the four spectral classes’ purepixels’ 

DNs’ distribution curves in the mix-unmix training 

site. Since the four spectral classes exhibit the highest spectral 

dissimilarity between themselves in the first PC, it is 

adopted as the raw image to be unmixed. The training DN 

ranges of the four spectral classes (now denoted as endmembers, 

EMs) are as shown in Fig. 6(b). The raw image 

(first PC) is unmixed under the mix-unmix concept on the 

basis of the training DNs into the four end-members. 

Mix-Unmix Fraction Images versus Reference Fraction 

Images 

Reference fraction images of the four spectral classes (Fig. 5) 

are generated by computing the percentage coverage of each 

class in every 8-pixel8-pixel block (each block is 240 m 



Figure 7. Data processing flowchart. 

240 m). The reference fraction images are compared with 

the mix-unmix fraction images. Figure 7 outlines the entire 

image processing flow, and Figure 8 compares the fraction 

images. The correlation coefficients of the mix-unmix image 

fractions versus the reference image fractions of the four 

end-members are 0.88 (EM1), 0.80 (EM2), 0.67 (EM3), and 

0.78 (EM4). 

DISCUSSION 

The mix-unmix fraction images show similar transition patterns 

(highest to lowest concentration levels) as the reference 

fraction images for all the end-members; though the correlation 

coefficients are not “very high.” Although a mixture 

interval of 10% is used throughout this study, any value that 

as a divisor of 100 gives a whole number can be used. We 

cannot use 100 itself as it would mean that each pixel contains 

just a single end-member. 

FUTURE 

As discussed in the Mixing Branch section, only the extreme 

DN values (i.e., bounding mixture tables) are used in this 

study. Also, the training DNs of end-members are assumed 

to be “frequency-less.” As the mix-unmix software develops, 

all mixture tables and training DNs’ distribution curves will 

be incorporated. 

The effect of adopted mixture interval and overlap of 

training DNs on accuracy of the concept will be addressed 

on implementation of the above. Also, performance of the 

concept across different numbers of end-members, different 

resolutions, and different geographical scales will be tested. 

Figure 8. Reference fraction images upper row and Mix-unmix fraction 

images lower row of four end-members first column=EM1, second 

=EM2, third=EM3, fourth=EM4. White and black are background, 

i.e., 0%. 

CONCLUSIONS 

This preliminary investigation shows that the mix-unmix 

concept is capable of addressing the problem of underdetermination 

in linear spectral unmixing—a very revolutionary 

dimension in data processing as the number of endmembers 

is not pegged on that of available bands. It is the 

only method that truly solves the problem of underdetermination. 

Sequential Maximum Angle Convex Cone 

(SMACC) Spectral Tool does not work on a single band, and 

Generalized Orthogonal Subspace Projection cannot generate 

additional bands from a single band. Further, the mixunmix 

concept creates a processing environment that allows 

any pixel to be unmixed without any sort of restrictions 

(e.g., minimum determinable fraction), impracticalities (e.g., 

negative fractions), or trade-offs (e.g., either positivity or 

unity sum) that may be associated with conventional unmixing 

techniques. 

REFERENCES 

1 Y. E. Shimabukuro and J. A. Smith, “The least squares mixing methods 

to generate fraction images derived from remote sensing multispectral 

data”, IEEE Trans. Geosci. Remote Sens. 29, 16 (1991). 

2 J. W. Boardman, “Geometric mixture analysis of imaging spectrometry 

data”, Proc. Int. Geosci Remote Sens Symposium 4, 2369 (1994). 

3 J. Gruninger, A. J. Ratkowski, and M. L. Hoke, “The sequential 

maximum angle convex cone (SMACC) endmember model”, Proc. SPIE 

5425 1 (2004). 

4 R. Hsuan and C. Yang-Lang, “Error Analysis for Band Generation in 

Generalized Process Orthogonal Subspace Projection”, IEEE Geoscience 

and Remote Sensing Symposium Proceedings, IGARSS (IEEE Press, 

Piscataway, NJ, 2005). 

5 I. Emmett, “Hyperspectral Image Classification Using Orthogonal 

Subspace Projections: Image Simulation and Noise Analysis”, http:// 

www.cis.rit.edu/~ejipci/Reports/osp_paper.pdf (2001). 

6 A. Ifarraguerri and C. Chang, “Multispectral and Hyperspectral Image 

Analysis with Convex Cones”, IEEE Trans. Geosci. Remote Sens. 37, 756 

(1999). 

7 M. E. Winter and E. M. Winter, “Comparison of approaches for 

determining end-members in hyperspectral data”, Proc. IEEE Aerospace 

Conference (IEEE Press, Piscataway, NJ, 2000). 

8 N. Keshava, “A Survey of Spectral Unmixing Techniques”, Lincoln Lab. 

J. 14, 55 (2003). 




Color Shift Model-Based Segmentation and Fusion 

for Digital Autofocusing 

Vivek Maik 

Image Processing and Intelligent Systems Lab, Graduate School of Advanced Imaging Science, Multimedia 

and Film, Chung Ang University, Seoul 156-756, South Korea 

E-mail: vivek5681@wm.cau.ac.kr 

Dohee Cho 

Digital/Scientific Imaging Lab, Graduate School of Advanced Imaging Science, Multimedia and Film, Chung 

Ang University, Seoul 156-756, South Korea 

Jeongho Shin 

Department of Web Information Engineering, Hankyong National University, Anseong 456-749, 

South Korea 

Donghwan Har 

Digital/Scientific Imaging Lab, Graduate School of Advanced Imaging Science, Multimedia and Film, Chung 

Ang University, Seoul 156-756, South Korea 

Joonki Paik 

Image Processing and Intelligent Systems Lab, Graduate School of Advanced Imaging Science, Multimedia 

and Film, Chung Ang University, Seoul 156-756, South Korea 

Abstract. This paper proposes a novel color shift model-based 

segmentation and fusion algorithm for digital autofocusing of color 

images. The source images are obtained using new multiple filteraperture 

configurations. We shift color channels to change the focal 

point of the given image at different locations. For each respective 

location we then select the optimal focus information and, finally, 

use soft decision fusion and blending (SDFB) to obtain fully-focused 

images. The proposed autofocusing algorithm consists of: (i) color 

channel shifting and alignment for varying focal positions; (ii) optimal 

focus region selection and segmentation using sum modified Laplacian 

(SML); and (iii) SDFB, which enables smooth transition 

across region boundaries. By utilizing segmented images for different 

focal point locations, the SDFB algorithm can combine images 

with multiple, out-of-focus objects. Experimental results show performance 

and feasibility of the proposed algorithm for autofocusing 

images with one or more differently out-of-focus objects. © 2007 

Society for Imaging Science and Technology. 


Received Sep. 25, 2006; accepted for publication Mar. 22, 2007. 

1062-3701/2007/514/368/12/$20.00. 

INTRODUCTION 

Demand for digital autofocusing techniques is rapidly increasing 

in many visual applications, such as camcorders, 

digital cameras, and video surveillance systems. Until now, 

most focusing efforts have been put on gray scale images. 

Even with specialized color processing techniques, each color 

channel is processed independently for autofocusing applications. 

In this paper, a novel autofocusing algorithm utilizing 

color shift property is proposed, which can restore an 

image with multiple, differently focused objects. We propose 

a new filter-aperture (FA) model for autofocusing color images. 

The proposed method overcomes the fusion with multiple 

source images as it uses a single input image. The FA 

model separates and distributes the out-of-focusing blur 

into different color channels. The multiple FA models also 

make it possible to generate as many source images as necessary 

for fusion-based autofocusing. Multiple focal points 

are spotted on the image and color channel shifting aligns 

each channel with the respective focal point. For each alignment 

the sum modified Laplacian (SML) operator is used to 

obtain a numerical measure indicating the degree of focus of 

that image. The in-focus pixels are selected and combined at 

each process using soft decision fusion and blending (SDFB) 

to produce the in-focus image with maximum focus metric. 

The SML operator can also be used to estimate a number of 

focal points starting from the minimum degree of focus in 

the input image. The proposed algorithm does not use any 

restoration filter, which usually results in undesired artifacts, 

such as ringing, reblurring, and noise clustering. 

The rest of the paper is organized as follows. The following 

section summarizes existing techniques, and presents 

the major contribution of the proposed work. The section 

titled “Multiple FA model” gives a detailed description of the 

multiple FA method and “Digital Autofocusing Algorithm” 

describes the proposed autofocusing algorithm. “Experi- 

368

Maik et al.: Color shift model-based segmentation and fusion for digital autofocusing 

EXISTING STATE-OF-THE-ART AUTOFOCUSING 

METHODS 

FA Model 

The conventional photo sensor array uses micro lenses in 

front of every pixel to concentrate light onto the photosensitive 

region. 1 In this paper, we can interpret the optical 

design in a gradual step that we are able to make the multiple 

detectors beneath the each micro lens, instead of multiple 

arrays of detectors. The artificial compound eye sensor 

(insect eyes) is composed of a micro lens array and a photo 

sensor. 2 However, the imaging quality of these optical designs 

is fundamentally inferior to a camera system with a 

large single lens; the resolution of these small lens arrays is 

severely limited by diffraction. The “wave front coding” 

system 3 is similar to the proposed system (see Figure 1) in 

that it provides a way to decouple the trade-off between 

aperture size and depth of field, but their design is very 

different. Rather than collecting and resorting rays of light, 

they use aspheric lenses that produce images with a depthindependent 

blur. Deconvolution of these images retrieves 

image details at all depths as shown in Figure 2. 

Figure 1. Block diagram of the proposed algorithm. 

mental Results” shows the simulation results and comparisons 

with existing methods. Finally, we have the concluding 

remarks. 

Autofocusing Methods 

The traditional autofocusing system in a camera usually 

consists of two different modules: analysis and control. The 

analysis module estimates a degree-of-focus of an image 

projected onto the image plane. The control module performs 

focusing functions by moving the lens assembly to the 

optimal focusing position according to the degree-of-focus 

information estimated in the analysis module. There are five 

different focusing techniques, such as manual focusing 

(MF), infrared autofocusing (IRAF), through-the-lens 

autofocusing (TTLAF), semi-digital autofocusing (SDAF), 

and fully digital autofocusing (FDAF). 4–7 Table I briefly 

summarizes and compares those techniques. 

The FDAF systems usually involve restoration and fusion 

methods in the control module which operates using 

prior information like point spread function (PSF), gradients, 

multiple source inputs, etc. to obtain the details about 

out-of-focus blur in images. Image fusion-based autofocus- 

Figure 2. Representation of the schematic of the a wave front coding system, b proposed FA system. 



Table I. Comparison of conventional AF systems with the proposed system. 

Autofocusing 

technique Analysis module Control module 

Focusing 

accuracy 

Hardware 

specifications 

Manual Human decision Manual Subject to human operation Low shutter speed f/2-f/8.0 

IRAF Calculating the time of IR travel Moving the focusing lens High High shutter speed f/3.5-f/5.6 

TTLAF Minimizing a phase difference Moving the focusing lens Very high under good conditions High shutter speed f/2-f/11 

SDAF Calculating high frequency of image Moving the focusing lens Acceptable Medium shutter speed f/2-f/28 

FDAF Estimating PSF, blur models Restoration filters and fusion methods Acceptable NIL 

Proposed method Color channel shifting Multiple filter aperture FA Acceptable 30 to 1 / 4,000 sec. f/5.6-f/22 

ing methods have focused on operation of multiple source 

images using wavelet or discrete cosine transformations 

(DCT) 8,9 with a priori obtained camera PSF. Other methods 

use pyramid-based representation to decompose the source 

images into different spatial scales and orientations. 10,11 

Similar results, although with more artifacts and less visual 

stability, can be achieved by using a set of basis functions. 12 

Another technique similar to pyramid representation approach 

has been based on wavelet transform to decompose 

the image into various subbands. 13,14 The output is generated 

by selecting one of the decomposed subbands such that 

the selected subband has maximum energy. Restorationbased 

techniques have been carried out to overcome the outof-focus 

problem. However, restoration of images with different 

depth of fields tend to cause reblurring and ringing 

artifacts in the region with low depth of field or in-focus 

regions. 15,16 Even with equal depth of field the nature of 

restoration poses a serious limitation to the visual quality of 

the restored images. Another drawback is the slow convergence 

process of the iterative framework. 

The main contribution of the proposed method is listed 

below: 

(a) Multiple apertures and corresponding sensors can 

enhance depth information. 

(b) Focusing process is inherently designed in accordance 

with color information. 

(c) Neither image restoration nor blur identification is 

necessary. 

Figure 3. General single aperture model. 

(d) Set of images with multiple apertures and focus 

settings can be generated using a single image with 

channel shifting, 

(e) Fusion algorithm involves separate feature-based 

fusion and color blending consistency to preserve 

the channel dependencies. 

(f) Proposed algorithm does not need transformation 

or convolution operations. 

Recently, images obtained at different shutter speeds 

were combined into an image in which full dynamic range is 

preserved. 17 The proposed approach extends and generalizes 

the standard fusion approach to color images. The proposed 

approach does not need multiple source images captured at 

different aperture settings. Instead we derive different source 

images from a single out-of-focus image to obtain various 

positions of focal points. 18–20 For each focal point three color 

channels are aligned and the corresponding images are used 

for fusion. 

MULTIPLE FILTER-APERTURE (FA) MODEL 

An aperture of a lens can adjust the amount of incoming 

light accepted through the lens. It can also control the focal 

length, camera-to-object distance, and depth of field. Generally, 

the center of an aperture is aligned on the optical axis 

of the lens. Any controlled aperture accepts light from various 

angles depending on the object position. Correspondingly, 

the convergence pattern on the imaging plane forms 

either a point or a circular region as shown in Figure 3. For 



Figure 4. Aperture shifted from the center. 

Figure 5. Multiple aperture set-up for red and blue channel filters. 

objects placed at near-, mid-, and far-focal distance the image 

convergence takes place either in front, on, or behind the 

CCD/CMOS sensor. However, this image convergence as 

well as the blur information can only be represented in a 

bi-axial plane as shown in Fig. 3. 

An interesting alternative for tri-axial representation of 

the image and out-of-focus blur was found to be achieved 

using non-centric aperture as shown in Figure 4. For a noncentric 

aperture located either on the upper or lower part of 

the optical axis, the convergence pattern was found to be 

split between these axes. The split difference between the 

patterns will give another dimension to the conventional biaxial 

plane making it a tri-axial representation. For the objects 

at the same positions (near, in, far focal distances), the 

convergence pattern of the channel aperture form an overlapping 

convergence on the CCD/CMOS sensor. For instance, 

the near focal distance object converges on the upper 

part of the optical axis where, at the same position, the far 

focal distance object converges on the lower part. If these 

overlapping channels are exactly aligned, then we will have a 

focused pattern in the image. 

An extension of the above approach will be to use a lens 

with two apertures on either side of the optical axis. An 

interesting phenomenon that can be observed is that, for the 

near and far focused objects, the convergence pattern lies on 

opposite sides for each aperture in reverse order for each 

channel. For example, the red aperture can have nearfocused 

convergence on the top and far-focused convergence 

on the bottom whereas the blue aperture has far-focused 

convergence on the top and near-focused convergence at the 

bottom, as shown in Figure 5. This phenomenon is called 

the filter-aperture (FA) extraction. The out-of-focus blur is 

now distributed among the color channels forming the 

image. 

Now we extend the above multiple aperture convergence 

to a typical RGB image scenario. To obtain an RGB 

image using the multiple aperture configurations we need to 

obtain R, G, and B channel convergence patterns separately. 

This can be done using three apertures in a Bayer pattern 

where the images are individually obtained on the sensor for 

the three apertures and later combined to form the RGB 

image. Evidently, multiple apertures provide additional 

depth information of objects at different distances. Since any 

color image is composed of three channels, we have used 

three apertures and, correspondingly, three filters (see 

Figure 6). 



Figure 6. Multiple FA model showing the convergence pattern for the R, G, and B color channels. 

The main advantage of the FA model is that it can provide 

an alternative method for blur estimation in autofocusing 

applications. Images acquired by using a normal lens 

have uniform or spatially variant blur confined on all channels. 

However, in the proposed algorithm, by using three 

filtered sensors the autofocusing problem turns into the 

alignment of R, G, and B channels with various depths of 

field. The out-of-focusing phenomenon with single and 

multiple aperture lenses are compared in Figure 7. As shown 

in Fig. 7(b) the out of focus blur is modeled as a misalignment 

of three color channels of R, G, and B. 

DIGITAL AUTOFOCUSING ALGORITHM 

The proposed algorithm uses the image obtained from the 

multiple FA configurations for the autofocusing application. 

The proposed autofocusing algorithm consists of the following 

procedures to obtain a well-restored image: (i) salient 

feature computation, (ii) color channel shifting and alignment 

for selected pixels, and (iii) soft decision fusion and 

blending. 

Salient Focus Measure 

The feature saliency computation process contains a family 

of functions that estimate saliency information. In practice, 

these functions can operate on individual pixels or on a local 

region of pixels. When combining images having different 

focus measures, for instance, a desirable saliency measure 

would provide a quantitative measure that increases when 

features are better focused. Various saliency measures, including 

variance and gradients, have been employed and 

Figure 7. Comparison of out-of-focus blurs for a single aperture model and the proposed multiple aperture 

models: a and b out-of-focus image captured using ordinary camera and proposed FA system under same 

focal settings, c restored result using the regularized restoration method, d restored result using the proposed 

channel shifting and fusion algorithm. 



validated for related applications. The saliency function only 

selects the frequencies in the focused image that will be attenuated 

due to defocusing. One way to detect a high frequency 

component is to apply the following absolute 

Laplacian operator as 

2 L k = 

2 2 L k 

+ 2 2 L k 

. 

x y 

The second derivatives in the x and y directions often have 

opposite signs and tend to cancel each other. In the case of 

textured images, this phenomenon may frequently occur and 

the Laplacian behaves in an unstable manner. However, this 

problem can be overcome by using the absolute Laplacian as 

in Eq. (1). In order to accommodate for possible variations 

in the size of texture elements, we compute the partial derivative 

using a variable spacing between pixels for computing 

the derivatives. Hence a discrete approximation to the 

modified Laplacian, ML k i,j, for pixel intensity, Ii,j, is 

given as 

ML k i,j = 2Ii,j − Ii −1,j − Ii +1,j 

1 

+ 2Ii,j − Ii,j −1 − Ii,j +1. 2 

Finally, the focus measure at a point i,j is computed as the 

sum of modified Laplacian values, in a small window around 

i,j, that are greater than a prespecified threshold value, 

i+N 

fi,j = 

j+N 

 

p=i−N q=j−N 

ML k p,q for ML k p,q T 1 . 

The heuristically determined threshold value T 1 in the range 

40–60 provides acceptable results in most cases. The parameter 

N represents the window size for computing the focus 

measure. In contrast to region-based autofocusing methods, 

we typically use a smaller window of size, e.g., N=1. Equation 

(3) can be referred to as sum modified Laplacian (SML) 

which is used as an intermediate image estimate for determining 

focus information. 

3 

Figure 8. Schematic of channel alignment procedure for R, G, and B 

channels. 

Color Channel Shift and Alignment 

For shifting and aligning color channels we need to find the 

optimal pixel-of-interest at different positions in the image 

according to their focal measures. These pixels-of-interest 

can be referred to as a focal point pixels. The term “focal 

point pixel” refers to a pixel-of-interest around which channel 

shifting and alignment is carried out. For a given image, 

the SML measure can be used to determine the focal point 

region whose focal measure is significantly lower than other 

regions of the image. Then for a given region, we select the 

focal point pixel either from the center of region or the pixel 

with the lowest focus measure. Similar operations can be 

performed for different selected focal point regions in different 

neighborhood. Henceforth, for a corresponding focal 

point pixel, we perform channel alignment and remove the 

out-of-focus blur in that given neighborhood (see Figure 8). 

For a given particular image captured by using FA configuration, 

the out-of-focus blur was just confined to channels 

on either side of the green channel as shown in Figure 9. 

As can be seen from the figure, the green channel suffers 

minimal blur distortion as the sensor was placed at the center 

whereas the red and the blue channels have maximal blur 

distortion. The proposed autofocusing technique uses the 

green channel as the reference and aligns the red and the 

blue channels to the green channel for any particular location, 

such as 

I RGB = S r,c I R + I B + I G , 

where S r,c represents the shift operator and the shift vector 

r,c represents the amount of shift in row and column directions 

for the respective red and blue channels with respect 

to the reference focal point on the green channel. If the shift 

vectors are not identical, we can generalize the above equation 

as 

I RGB = SI R r 1 ,c 1 + I B r 2 ,c 2 + I G . 

The shift vectors on the same sensor filter are linearly dependent. 

For a particular reference channel it is possible to 

estimate the exact number of shift vectors using the sensor 

filter configurations. For example, in our experiments the 

green channel has been used as reference, hence the red and 

blue pixels are misaligned by a pattern corresponding to the 

sensor filter as shown in Figure 10. 

Soft Decision Fusion and Blending 

In order to merge images with multiple focal point planes, 

image fusion is required on multiple channel images. Unfortunately, 

when the channel-shifted images are directly fused, 

misalignment or misregistration is unavoidable. The pixels 

of different channel aligned images, when fused together, 

may sometimes tend to overlap or get missed because of the 

channel shifting. This problem can be overcome by applying 

an inverse shift operation to the images with respect to a 

reference image. The reference image has to be chosen from 

one of the several channel shifted images extracted using 

channel shifting and alignment. In the proposed approach 

we choose the reference image as the one that will have a 

focal point located approximately in the center of the image, 

−1 

I k = S r,cIr I k ,I r , 

where I r represents the reference image for registration and 

S −1 represents the inverse shift operation. After selection of 

4 

5 

6 



Figure 10. Row and column shift vectors for color channel shifting and 

alignment with reference G channel. 

Figure 9. Multiple FA model image: a R, b G, and c B channels, 

respectively. 

the focal point using the SML operator and channel alignment 

by channel shifting, the appropriate regions need to be 

selected from the given image. Given the location of the 

pixel-of-interest for channel alignment, we simply select an 

approximate region area that is defined on its neighborhood. 

But for a more efficient fusion process we could isolate the 

region around that pixel using neighborhood connectivity. 

For a given pixel-of-interest and the eight-neighborhood 

connectivity, we can extract the region more accurately for 

the purpose of image fusion as 

Figure 11. Experimental set up: a digital camera with the FA model; 

b interior configuration of the FA; and c the R, G, and B sensor filters. 



Table II. A sample set of shift vectors estimated for different locations. 

Table III. Hardware configuration for the multiple FA system. 

Region Red channel r ,c Blue channel r ,c 

Hardware title 

Specifications 

R 1 upper left 10,4 8,5 

R 2 upper middle 7,6 8,5 

R 3 upper right 10,5 7,6 

R 4 center left 9,4 10,5 

R 5 center middle 9,3 9,3 

R 6 center right 8,6 10,2 

R 7 lower left 10,4 9,4 

R 8 lower middle 11,6 9,4 

R 9 lower right 11,5 8,3 

i+N 

F k = 

j+N 

 

x=i−N y=j−N 

f p xs i , ...,s i+k ,yt i , ...,t j+k , 

where F k represents the region around the pth focal point 

pixel, f p and s,t represent the neighborhood connectivity. 

Even though the SML operator can provide an accurate 

measure, we need to extract the specific region from the 

7 

Digital camera Nikon D-100 

R, G, B filters Green-Kodak-Wratten Filter No. 58 

Blue-Kodak-Wratten Filter No. 47 

Red-Kodak-Wratten Filter No. 25 

Focusing 

APO-Symmar-L-150-5.6,11,22 

f-5.6, f-11, f-22 

Sensor 

23.7 15.6 mm RGB CCD; 6.31 million total 

pixels 

Lens mounting 

Schneider Apo-Tele-Xenar 

Relative aperture focal length −5.6/ 250 

Shutter speed 

30 to 1 / 4,000 sec. and bulb 

Color mode 

Triple mode for R, G, and B channels 

image for fusion. One of disadvantages of the FA model is 

that, for the channel-aligned images with closely located focal 

points, the SML operator does not always perform well. 

Hence, we used a color-based region segmentation algorithm 

for extracting selective regions from the channel- 

Figure 12. Experimental results: a the source image; b the focal point location for channel shifting and 

alignment; c the image after channel shifting to new focal point location; and d the final image fused from 

a and c. 



Figure 13. Experimental results: a the source image; b objects closer to the left side of the image are 

focused and object to the right side are out-of-focus; c similar set up with the focus on the right side; and d 

the final image fused from b and c. 

aligned images if the SML results are not good enough. 

When fusing color images, features such as edge and textures 

should be preserved, and also the color blending consistency 

should be maintained. The fusion process is performed on 

each level of the channel-aligned images in conjunction with 

SML to generate the composite image C. The reconstruction 

process integrates information from different levels as 

and 

I ck = F k · I ak + 1−F k I bk 

−1 

I a = S r,cIr I a ,I r , 

8 

9 

where I ck represents the reconstructed image from two input 

images I ak and I bk . The variable k represents the regions 

extracted based on their respective focal measure. The inverse 

shifting operation is described in Eq. (9) where I r represents 

the reference image and r,cI r the corresponding 

shift vectors with respect to I r . A typical problem of image 

fusion is the appearance of unnatural borders between the 

different decisions regions due to overlapping blur at focus 

boundaries. To combat this, soft decision blending can be 

employed using smoothing or low pass filtering of the saliency 

parameter F k . In this paper Gaussian smoothing has 

been used to obtain the desired effect of blending. This creates 

weighted decision regions where a linear combination of 

pixels in the two images A and B are used to generate corresponding 

pixels in the fused image C. We then have 



Figure 14. Experimental results: a the source image; b–c channel shifted images for focal points at near 

and center positions; and d the final image obtained by fusing a–c. 

I ck = F˜k · I ak + 1−F˜kI bk , 

10 

where F˜k is now a smoothed version of its former self. At 

times there can be missing pixels in the fused image which 

are not selected using the SML. The number of missing 

pixels varies from image to image, but is always confined to 

a very small portion of the entire image. The missing pixels 

have to be replaced with pixels from any one of the available 

channel aligned images. One simple way to find an appropriate 

replacement is to get the location of the missing pixel 

in the image and then match it with the image whose focal 

point pixel is nearest to the respective missing pixel: 

Ix,y = min disIx,y,f p x,y. 

11 

EXPERIMENTAL RESULTS 

Dataset Simulation and Experiments 

In order to demonstrate the performance of the proposed 

algorithm, we used test images captured using the proposed 

multiple FA model with multiple out-of-focus objects in the 

background. The experimental setup is shown in Figure 11 

which represents the camera used for the experiments along 

with the multiple FA configurations of the camera and the 

sensor filter. The hardware specifications used for the system 

are listed in Table III. Experiments were performed on an 

RGB image of size 640480. Here, each image contains 

multiple objects at different distances from the camera. 

Figure 12(a) represents a test image with low depth-of-field, 

where focus is on the objects close to the camera lens. The 

channels aligned for the focal point are shown in Fig. 12(b). 

The image after channel shifting is shown in Fig. 12(c). The 

blue object in the back of the astronaut was out of focus in 

Fig. 12(a), which is now in-focus in Fig. 12(c), whereas the 

other regions of the image tend to get defocused. The fused 

image of Figs. 12(a) and 12(c) is shown in Fig. 12(d). Similar 

results with multiple objects are shown in Figures 13 and 

14. The selected focal point for the channel alignment 

and shifting are represented in Figures 15(f) and 15(g). 

Table IV. Image quality comparisons for the various autofocusing methods. 

Autofocusing 

method 

Prior 

information 

Mode 

Input 

frames Operation RMSE PSNR 

Wiener filter PSF Gray 1 Pixel based 12.35 23.36 

Iterative filter NIL Gray 1 Pixel based 8.56 26.32 

Constrained least 

Edge Gray 1 Pixel based 9.56 25.10 

square filter 

Pyramid fusion NIL Gray, Color At least 2 Window based 5.68 28.42 

and Pixel based 

Wavelet fusion NIL Gray, Color At least 2 Window based 5.02 29.95 


Proposed NIL Color 1 Window based 


8.06 26.41 



Figure 15. Experimental results: a the source image; b–d channel shifted images for focal points in right, 

center, and left positions; e–g the SML results for selected regions; and h the final image obtained fusing 

b–d. 

Figures 15(b)–15(d) illustrate the results of SML operator 

for a selected region. These figures represent images which 

have different out-of-focus regions obtained from single 

source images using channel shifting. Figure 15(e) represents 

the fused image from Figs. 15(b)–15(d). The resulting fused 

image contains in-focus regions from respective images. The 

above set of results illustrates the feasibility of the proposed 

fusion based algorithm for autofocusing. 



Performance Comparison 

For measuring the performance of the multiple FA configurations, 

various test images were captured using the proposed 

system as well as the ordinary Nikon D-100 camera. 

The test images were then processed for out-of-focus removal 

using the proposed channel shifting and fusion algorithm 

and the ordinary camera images were restored using 

some state of the art restoration methods, including Wiener 

filter, regularized iterative restoration, constrained least 

squares filter, as well as some existing fusion-based methods 

including pyramid decomposition and wavelet methods. The 

performance metric in the form of PSNR and RMSE were 

obtained for the test images using the above algorithms as 

given in Table IV. As can be seen in the table, the images 

captured using the multiple FA configurations tend to have 

some degradation when compared to conventional camera 

images when there is no out-of-focus blurs. But with the 

out-of-focus blur the image quality of the conventional camera 

images tend to drastically reduce due to processing by 

restoration and is more or less comparable with the restored 

images using the color channel shifting and fusion. However, 

the fusion methods tend to give slightly higher image quality, 

but they require multiple source input images for achieving 

good performance, whereas the proposed method can 

achieve it with just a single source input image making our 

method more suitable and efficient for increasing potential 

applications. 

For aligning the blue channel with the green channel 

the pixels have to be shifted in an upward direction and 

towards the left or diagonally to the left and vice versa for 

the red channel. In our experiments we tried precomputing 

the shift vectors at nine different locations on a test image 

manually using the above convention. We found that the 

shift vectors differ slightly for different regions on the image, 

as shown in Table II. These shift vectors were then used 

accordingly for various test images based on the location of 

the focal point pixel in one of the nine regions. The corresponding 

shift vectors were then used to align the channels. 

CONCLUSIONS 

In this paper, we proposed an autofocusing algorithm which 

restores an out-of-focus image with multiple, differently outof-focus 

objects. A novel FA configuration is proposed for 

modeling out-of-focus blur in images. The proposed algorithm 

starts with a single input image and multiple source 

images with different apertures are generated using channel 

shifting. The fusion is carried out for segmented regions 

from each source image using the SML operator. The soft 

decision fusion algorithm overcomes undesired artifacts in 

the region of merging in the fused images. Experimental 

results show that the proposed algorithm works well for the 

images with multiple out-of-focus objects. 


This research was supported by Seoul Future Contents Convergence 

(SFCC) Cluster established by Seoul R&BD Program 

and by the Korea Science and Engineering Foundation 

(KOSEF) through the National Research Laboratory Program 

funded by the Ministry of Science and Technology 

(M103 0000 0311-06J0000-31110). 

REFERENCES 

1 Y. Ishihara and K. Tanigaki, “A high photosensitivity IL-CCD image 

sensor with monolithic resin lens array”, in Proc. IEEE Integrated 

Electronic and Digitial Microscopy (IEEE Press, Piscataway, NJ, 1983) pp. 

497–500. 

2 J. Tanida, R. Shogenji, Y. Kitumara, K. Yamada, M. Miyamoto, and S. 

Miyatake, “Color image with an integrated compound imaging system”, 

Opt. Express 18, 2109–2117 (2003). 

3 E. R. Dowski and G. E. Johnson, “Wavefront coding system: amodern 

method of achieving high performance and/or low cost imaging 

systems”, Proc. SPIE 3779, 137–145 (1999). 

4 S. Kim and J. K. Paik, “Out-of-focus blur estimation and restoration for 

digital autofocusing system”, Electron. Lett. 34, 1217–1219 (1998). 

5 J. Shin, V. Maik, J. Lee, and J. Paik, “Multi-object digital autofocusing 

using image fusion”, Lect. Notes Comput. Sci. 3708, 806–813 (2005). 

6 V. Maik, J. Shin, and J. Paik, “Regularized image restoration by means of 

fusion for digital autofocusing”, Lecture Notes in Artificial Intelligence 

3802, 929–934 (2005). 

7 G. Ligthart and F. Groen, “A comparison of different autofocus 

algorithms”, IEEE Int. Conf. Pattern Recognition (IEEE Press, 

Piscasataway, NJ, 1992) pp. 597–600. 

8 M. Subbarao, T. C. Wei, and G. Surya, “Focused image recovery from 

two defocused images recorded with different camera settings”, IEEE 

Trans. Image Process. 4, 1613–1628 (1995). 

9 M. Matsuyama, Y. Tanji, and M. Tanka, “Enhancing the ability of 

NAS-RIF algorithm for blind image deconvolution”, Proc. IEEE Int. 

Conf. Circuits and Systems, vol. 4 (IEEE Press, Piscataway, NJ, 2000) pp. 

553–556. 

10 A. Tekalp and H. Kaufman, “On statistical identification of a class of 

linear space-invariant image blurs using non minimum-phase ARMA 

models”, IEEE Trans. Acoust., Speech, Signal Process. ASSP36, 

1360–1363 (1988). 

11 K. Kodama, H. Mo, and A. Kubota, “All-in-focus image generation by 

merging multiple differently focused images in three-dimensional 

frequency domain”, Lect. Notes Comput. Sci. 3768, 303–314 (2005). 

12 K. Aizawa, K. Kodama, and A. Kubota, “Producing object-based special 

effects by fusing multiple differently focused images”, IEEE Trans. 

Circuits Syst. Video Technol. 10, 323–330 (2000). 

13 L. Bogoni and M. Hansen, “Pattern selective color image fusion”, Int. J. 

Pattern Recognit. Artif. Intell. 34, 1515–1526 (2001). 

14 S. Li, J. T. Kwok, and Y. Wang, “Combination of images with diverse 

focuses using the spatial frequency”, Int. J. Inf. Fusion 2, 169–176 

(2001). 

15 Z. Zhang and R. S. Blum, “A categorization of multiscaledecomposition-based 

image fusion schemes with a performance study 

for a digital camera application”, Proc. IEEE 87, 1315–1326 (1999). 

16 V. Maik, J. Shin, J. Lee, and J. Paik, “Pattern selective image fusion for 

multi-focus image reconstruction”, Lect. Notes Comput. Sci. 3691, 

677–684 (2005). 

17 A. Kubota, K. Kodama, and K. Aizawa, “Registration and blur estimation 

method for multiple differently focused images”, Proc. IEEE Int. Conf. 

Image Processing, vol. 2 (IEEE Press, Piscataway, NJ, 1999) pp. 515–519. 

18 S. K. Lee, S. H. Lee, and J. S. Choi, “Depth measurement using frequency 

analysis with an active projection”, Proc. IEEE Conf. Image Processing, 

vol. 3 (IEEE Press, Piscataway, NJ, 1999) 906–909. 

19 G. Piella, “A general framework for multi resolution image fusion: From 

pixels to regions”, J. Inf. Fusion 4, 259–280 (2003). 

20 T. Adelson and J. Y. Wang, “Single lens stereo with a plenoptic camera”, 

IEEE Trans. Pattern Anal. Mach. Intell. 2, 99–106 (1992). 




Error Spreading Control in Image Steganographic 

Embedding Schemes Using Unequal Error Protection 

Ching-Nung Yang, Guo-Jau Chen and Tse-Shih Chen 

CSIE Dept., National Dong Hwa University, #1, Da Husueh Rd., Sec. 2, Hualien, Taiwan 

E-mail: cnyang@mail.ndhu.edu.tw 

Rastislav Lukac 

The Edward S. Rogers Sr. Department of ECE, University of Toronto, 10 King’s College Road, Toronto, 

Ontario, M5S 3G4 Canada 

Abstract. A steganographic scheme proposed by van Dijk and 

Willems can alter a relatively small amount of bits to hide the secret 

compared to other schemes while reducing the distortion and improving 

the resistance against the steganalysis. However, one bit 

error in the embedding scheme by van Dijk and Willems may result 

in multibit error when extracting the hidden data. This problem is 

called as error spreading. It is observed that only some single-bit 

errors suffer from error spreading. In this paper, we propose a new 

steganographic solution which takes advantage of unequal error 

protection codes and allows for the different protection of different 

secret bits. Thus the proposed solution can effectively protect bits 

which could suffer from error spreading. In addition, it saves parity 

bits, thus greatly reducing the amount of bit alterations compared to 

the relevant previous schemes. Experimentation using various test 

images indicates that the proposed solution achieves the tradeoff 

between the performance and the protection of the embedded 

secret. © 2007 Society for Imaging Science and Technology. 


INTRODUCTION 

Steganography is a method of hiding or embedding the secret 

message into a cover media to ensure that an unintended 

party will not be aware of the existence of the embedded 

secret. Popular steganographic techniques for visual 

data protection embed the secret message such as a binary 

image by manipulating the least significant bit (LSB) plane 

of a cover image, thus producing the so-called stegoimage. 

In the simplest form, the data embedding can be realized by 

c 0 s with c 0 denoting the least significant bit of a pixel from 

the cover image and s denoting the secret bit. 

A more efficient steganographic method was proposed 

by van Dijk and Willems. 1 Their coded LSB method attempts 

to reduce the distortion when the noise (e.g., due to faulty 

communication channels) or active attacks by the third party 

(intentional modifications of some insignificant bits in a 

cover image to prevent the extraction of hidden secret by the 

authorized user) are introduced into the stegoimage. However, 

in some situations, one error bit produced during the 

stegoimage transmission phase or by active attacks often results 

in two or more errors when decoding (extracting) the 


1062-3701/2007/514/380/6/$20.00. 

embedded message. This phenomenon, known as an error 

spreading problem, affects the clearness of the extracted secret 

image. A straightforward application of error correction, 

i.e., using stegoencoding to hide the secret first and then 

adding the parity to provide the error correcting (EC) capability, 

will inevitably increase the required amount of bit 

alterations. Thus, the risk of being detected will increase. 

Zhang and Wang 2 proposed a new stegoencoding approach 

combining the coded LSB and EC capability simultaneously 

to address the error spreading problem. Their solution has 

the same error correcting capability for all protected bits, but 

according to our observation the error spreading does not 

affect all spatial locations of the secret image. 

In this paper, we propose a more reasonable solution, 

unequal error protection (UEP) codes, to obtain the different 

protection ability for nonaffected and affected bits and to 

save parity bits. It will be shown that the proposed solution 

outperforms the previous relevant solutions in terms of the 

tradeoff between the performance and the protection of the 

embedded secret. 

The rest of this paper is organized as follows. In the 

Coded LSB Scheme Section, the coded LSB scheme is described. 

In the EC Codes Based Error Correction: Zhang- 

Wang Scheme Section, Zhang-Wang stegoencoding based on 

EC codes is presented to show a solution for the error 

spreading problem. Our scheme based on UEP codes is proposed 

in the UEP Codes Based Error Correction the Proposed 

Scheme Section. Motivation and design characteristics 

are discussed in detail. In the Comparison and Experimental 

Results Section, the proposed method is tested using a variety 

of test images. The effect of UEP codes-based data embedding 

is evaluated and compared with the previous approaches. 

Finally, conclusions are drawn in the Conclusions 

Section. 

CODED LSB SCHEME 

In the plain LSB embedding scheme, the secret bits are hidden 

by simply replacing LSBs of the cover pixels. Due to the 

noiselike appearance of the LSB plane of natural images, 

embedding n bits implies, in average, the alteration of n/2 

original LSBs. To reduce the number of altered LSBs and 

380

Yang et al.: Error spreading control in image steganographic embedding schemes using unequal error protection 

Table I. Eight cosets for the coded LSB scheme with l=3, n=7 using 7,4 Hamming code with Gx=x 3 +x 2 +1. 

L 0 

L 1 

L 2 

L 3 

L 4 

L 5 

L 6 

L 7 

0000000, 0001101, 0011010, 0010111, 0110100, 0111001, 0101110, 0100011, 

1101000, 1100101, 1110010, 1111111, 1011100, 1010001, 1000110, 1001011 

0000001, 0001100, 0011011, 0010110, 0110101, 0111000, 0101111, 0100010, 

1101001, 1100100, 1110011, 1111110, 1011101, 1010000, 1000111, 1001010 

0000010, 0001111, 0011000, 0010101, 0110110, 0111011, 0101100, 0100001, 

1101010, 1100111, 1110000, 1111101, 1011110, 1010011, 1000100, 1001001 

0000011, 0001110, 0011001, 0010100, 0110111, 0111010, 0101101, 0100000, 

1101011, 1100110, 1110001, 1111100, 1011111, 1010010, 1000101, 1001000 

0000100, 0001001, 0011110, 0010011, 0110000, 0111101, 0101010, 0100111, 

1101100, 1100001, 1110110, 1111011, 1011000, 1010101, 1000010, 1001111 

0000101, 0001000, 0011111, 0010010, 0110001, 0111100, 0101011, 0100110, 

1101101, 1100000, 1110111, 1111010, 1011001, 1010100, 1000011, 1001110 

0000110, 0001011, 0011100, 0010001, 0110010, 0111111, 0101000, 0100101, 

1101110, 1100011, 1110100, 1111001, 1011010, 1010111, 1000000, 1001101 

0000111, 0001010, 0011101, 0010000, 0110011, 0111110, 0101001, 0100100, 

1101111, 1100010, 1110101, 1111000, 1011011, 1010110, 1000001, 1001100 

preserve the original features such as edges and fine details 

of the cover image, the coded LSB scheme of Ref. 1 divides 

the secret image into chips of l bits. Each l-bit chip is then 

embedded into LSBs of n pixels using n,k cyclic codes 

where n is the information code length and l=n−k. 

k 

Let G 1 x= i=0 g 1,i x i l 

and G 2 x= i=0 g 2,i x i be two binary 

polynomials with degree k and l, respectively, so that 

G 1 x·G 2 x=x n +1. Using n,k cyclic codes with the generating 

function Gx=G 2 x, it is possible to construct 2 l 

code sets which consist of unique 2 k codewords. Thus, each 

code set can be used to describe one l-bit secret chip by 

choosing the nearest codeword to represent the embedded 

secret, as depicted in algorithm 1. 

Algorithm 1 

Inputs: secret message of l bits s l−1 s l−2 ...s 0 , and cover image 

O with n LSBs c n−1 c n−2 ...c 0 . 

Output: stegoimage O with n LSBs c n−1 c n−2 ...,c 0 . 

Step 1: Choose one cyclic n,k code with the generating 

function Gx=g l x l + ¯ +g 1 x+g 0 and then select any 

k-tuples as the input to construct a code set (coset) of 2 k 

codewords. Choose one n-tuple codeword that does not 

appear in this coset, and then add an unused n-tuple to 

all the codewords in the coset to construct another coset. 

Step 2: Repeat step 1, until all 2 n codewords are used. The 

process generates 2 l different cosets L 0 ,L 1 ,...,L 2 l −1 that 

include 2 k codewordsineachcoset. 

Step 3: Encrypt the secret bits s l−1 ,s l−2 ,...,s 0 by choosing 

the coset L i with i= l−1 i=0 s i 2 i . Then, find the codeword 

c n−1 c n−2 ...c 0 in the coset L i such that the Hamming distance 

between c n−1 c n−2 ...c 0 and c n−1 c n−2 ...c 0 is minimum. 

Step 4: Deliver n LSBs c n−1 c n−2 ...c 0 to the corresponding 

pixels in the embedded stegoimage O. 

As shown in Ref. 2, the efficiency of the steganographic 

schemes can be demonstrated using the so-called embedding 

rate (ER) which is defined as the number of embedded bits 

per pixel, i.e., ER=l/n. Another suitable criterion is the socalled 

embedding efficiency (EE) which is calculated as the 

number of altered bits per pixel, i.e., EE=l/l alt where l alt 

denotes the average LSB alteration when l secret bits are 

embedded into n LSBs. The value l alt can be calculated from 

all codewords in the cosets by a computer program. The ER 

parameter is suitable for discussing the embedded capacity 

whereas the EE parameter is used to evaluate the distortion 

in the cover image. It is obvious that for the plain LSB embedding 

scheme ER=l/n=n/n=100% and EE=1/l alt 

=n/n/2=200%. For the comparison purposes, the values 

corresponding to algorithm 1 (coded LSB scheme with 

l=3 and n=7) are provided below. 

Let us assume algorithm 1 with n=7 and l=3, i.e., the 

objective is to embed three secret bits into seven LSBs. The 

above setting implies that k=4, resulting in x 7 +1=x 4 +x 3 

+x 2 +1x 3 +x 2 +1 and Gx=x 3 +x 2 +1. After the first two 

steps in algorithm 1, we construct eight cosets with sixteen 

codewords in each coset (see Table I). Suppose that 101 

denotes the secret and 0001110 denotes the original set of 

LSBs. We use the coset L 5 to find the codeword 1001110 that 

has the minimum Hamming distance equal to one from 

0001110 while altering only one LSB to embed three secret 

bits. The embedding rate and embedding efficiency are 

ER=3/7=42.9% and EE=3/0.875=343%, respectively. 

Note that l alt =0.875 for Table I. It is easy to see that the plain 

LSB embedding scheme ER=100% ,EE=200% has larger 

embedded capacity than the coded LSB scheme. On the 

other hand, the coded LSB scheme modifies fewer bits, thus 

reducing the distortion in the cover image. 

However, the coded LSB scheme suffers from the error 

spreading problem. The occurrence of one error bit in the 

encoded LSBs may cause more than one bit error during the 

secret message extraction process. Considering the above example, 

we can use 0000000 to carry the secret 000. Suppose 

that there is one error bit, for example, 0010000. Then, according 

to Table I, the extracted secret is 111, i.e., there are 

three error bits. However, employing the error pattern 

0000001 results in the extracted secret 001 which corresponds 

to only one error bit (no error spreading). This sug- 



Table II. Eight cosets for the EC-based coded LSB scheme Zhang-Wang Scheme with l=3,N E =11 and 1-error correcting capability using 7,4 

Hamming code and 11,7 shorten Hamming code. 

L 0 

L 1 

L 2 

L 3 

L 4 

L 5 

L 6 

L 7 

00000000000, 10010001101, 10110011010, 00100010111, 11110110100, 01100111001 

01000101110, 11010100011, 01111101000, 11101100101, 11001110010, 01011111111 

10001011100, 00011010001, 00111000110, 10101001011 

11100000001, 01110001100, 01010011011, 11000010110, 00010110101, 10000111000 

10100101111, 00110100010, 10011101001, 00001100100, 00101110011, 10111111110 

01101011101, 11111010000, 11011000111, 01001001010 

01010000010, 11000001111, 11100011000, 01110010101, 10100110110, 00110111011 

00010101100, 10000100001, 00101101010, 10111100111, 10011110000, 00001111101 

11011011110, 01001010011, 01101000100, 11111001001 

10110000011, 00100001110, 00000011001, 10010010100, 01000110111, 11010111010 

11110101101, 01100100000, 11001101011, 01011100110, 01111110001, 11101111100 

00111011111, 10101010010, 10001000101, 00011001000 

10100000100, 00110001001, 00010011110, 10000010011, 01010110000, 11000111101 

11100101010, 01110100111, 11011101100, 01001100001, 01101110110, 11111111011 

00101011000, 10111010101, 10011000010, 00001001111 

01000000101, 11010001000, 11110011111, 01100010010, 10110110001, 00100111100 

00000101011, 10010100110, 00111101101, 10101100000, 10001110111, 00011111010 

11001011001, 01011010100, 01111000011, 11101001110 

11110000110, 01100001011, 01000011100, 11010010001, 00000110010, 10010111111 

10110101000, 00100100101, 10001101110, 00011100011, 00111110100, 10101111001 

01111011010, 11101010111, 11001000000, 01011001101 

00010000111, 10000001010, 10100011101, 00110010000, 11100110011, 01110111110 

01010101001, 11000100100, 01101101111, 11111100010, 11011110101, 01001111000 

10011011011, 00001010110, 00101000001, 10111001100 

gests that if the error falls in the right three positions, i.e., 

0000100, 0000010, and 0000001, then there is still only one 

error in the extracted secret and the damage is not expanded. 

However, for the other four error patterns 1000000, 

0100000, 0010000, and 0001000, the decoded secret is 110, 

011, 111, and 101, respectively, indicating that the extracted 

secret suffers from more than one error bit. Since such errors 

affect the quality of the extracted secret message, the scheme 

should be improved by using the error correcting mechanism, 

such as one described below based on EC codes. 

EC CODES BASED ERROR CORRECTION: 

ZHANG-WANG SCHEME 

Following the previous approach, a coded LSB scheme with 

ER=l/n is constructed using the generating function 

Gx=G 2 x. Then, 2 k n-tuple vectors in each coset are encoded 

into an N E ,n cyclic EC code with N E denoting the 

code length of EC codes. Although in this improved coded 

LSB scheme the embedding rate is reduced to ER=l/N E , the 

scheme now has the error correcting capability of N E ,n 

cyclic EC codes. 

Let us consider the previous example with one-error 

correcting capability and parameters n=7, l=3 and N E =11. 

A 7,4 Hamming code is used to embed three secret bits 

and a 11,7 shorten Hamming code is used to achieve one 

error correcting capability. For example, embedding the secret 

000 into the 7-tuple 0001101 first and then appending 

the parity 1001 forms the codeword 10010001101 which is 

listed as the second codeword in the coset L 0 (see Table II 

listing all eight generated cosets). If one error occurs in the 

sixth position, resulting in the codeword 10010101101, then 

the minimum Hamming distance is associated with the 

codeword 10010001101 from L 0 and the extracted secret is 

000. Because of the error correcting capability of the 11,7 

shorten Hamming code, the error is always corrected no 

matter where the error bit occurs, thus overcoming the error 

spreading problem. On the other hand, the approach is less 

efficient than the conventional method, as it reduces the 

embedding rate from 42.9% to 3/11=27.3% and also decreases 

the embedding efficiency from 343% to 

3/2.625=114% (note that l alt =2.625 for Table II). Therefore, 

the different correction mechanism is needed. Since 

only the error in the first k bits of an n-tuple will produce 

additional errors in the secret extraction phase, it should be 

sufficient to ensure the validity of the first k bits instead of 

all n bits. 

UEP CODES BASED ERROR CORRECTION: THE 

PROPOSED SCHEME 

UEP codes, a category of EC codes, allow different protection 

for different bit locations. 3,4 In practice, some information 

bits are protected against a greater number of errors 

than other, less significant, information bits. Basically, a UEP 

code can be denoted as n,k,d 1 ,d 2 ,...,d k . By employing 

UEP codes to protect the message, the occurrence of no 

more than ⌊d i −1/2⌋ errors in the transmitted codeword 

does not affect the correctness of the ith bit in the decoded 

message. 

It was noted that the first k bits of vectors need an 

enhanced protection to prevent the error spreading. Therefore, 

we propose to apply UEP codes to assure the correctness 

of these k bits and reduce the number of redundant 

parity bits. The main difference between the UEP-based 

scheme and the EC-based scheme relates to the use of 



Table III. Eight cosets for the UEP-based coded LSB scheme the proposed scheme with l=3, N U =11 using 10,7,3333222 UEP code. 

L 0 

L 1 

L 2 

L 3 

L 4 

L 5 

L 6 

L 7 

0000000000*11000011011000011010010001011110001101000100111001 

000010111011001000110001101000110110010110011100100101111111 

1001011100010101000100010001101101001011 

0010000001*11100011001010011011011001011010101101010110111000 

001010111111101000100011101001111110010010111100110111111110 

1011011101011101000000110001111111001010 

010000001010000011111100011000000001010111001101100000111011 

010010110010001000010101101010100110011111011100000001111101 

1101011110000101001101010001001001001001 

011000001110100011101110011001001001010011101101110010111010 

011010110110101000000111101011101110011011111100010011111100 

1111011111001101001001110001011011001000 

1000000100*01000010010000011110110001001100001100001100111101 

100010101001001001111001101100010110000100011101101101111011 

0001011000110101010110010000100101001111 

101000010101100010000010011111111001001000101100011110111100 

101010101101101001101011101101011110000000111101111111111010 

0011011001111101010010110000110111001110 

110000011000000010110100011100100001000101001100101000111111 

110010100000001001011101101110000110001101011101001001111001 

0101011010100101011111010000000001001101 

111000011100100010100110011101101001000001101100111010111110 

111010100100101001001111101111001110001001111101011011111000 

0111011011101101011011110000010011001100 

N U ,n,d 1 ,d 2 ,...,d k UEP codes with N U denoting the 

code length of UEP codes instead of N E ,n EC codes. Since 

the value of N U is smaller than N E , the proposed UEP-based 

coded LSB scheme will have the higher embedding rate 

while still providing the same protection of the secret to the 

error spreading. 

As before, let us consider the scenario with one error 

correcting capability and parameters n=7, l=3, and 

N U =10. Suppose that the 10,7,3333222 UEP code is 

used to ensure the protection against errors. The corresponding 

eight cosets are listed in Table III. Assuming that 

the embedded secret and the encoded result are, for instance, 

000 and 0000000000, respectively, one error can occur in the 

following cases: 

• The presence of the error in the 7th bit (from right) 

implies the codeword 0001000000. As shown in Table 

III, there is only one codeword 0000000000 in the coset 

L 0 with the unit Hamming distance to 0001000000. In 

this case, the recovered secret is 000, i.e., no processing 

error. 

• If the error affects the third bit (from right) beyond the 

error correcting capability of the considered UEP code, 

then 0000000000 in the coset L 0 and 1000000100 in the 

coset L 4 are the two codewords with the unit Hamming 

distance to the codeword 0000000100 under consideration. 

The decoding process can result in the extracted 

secret 000 or 100, respectively. Thus, even in the latter 

situation (i.e., 100), there is still only one error, suggesting 

no error spreading. 

• Finally, the alteration of the 8th bit (from right) due to 

the error implies 0010000000 which will be decoded as 

0000000000 in the coset L 0 or 0010000001 in the coset 

L 1 . This suggests that the secret can be extracted as 000 

or 001, respectively. Similar to the previous case, even 

when 001 is used as the extracted secret, the proposed 

method still overcomes the error spreading problem. 

It is evident that when no more than one error falls in 

the first four bits of the original 7-bit vector, the use of UEP 

codes ensures that the first four bits will be correctly decoded 

and the error will be corrected, as ⌊3−1/2⌋=1, i.e., 

the Hamming distance between the first four bits of two 

codewords is three providing one error correcting capability 

for the first four bits. However, no error spreading is also 

observed in the situations when one error occurs in other 

places. This is due to the fact that a single error in other 

places will result, in the worst case, in a single error in the 

decoded secret bits. The achieved embedding rate and embedding 

efficiency are ER=l/N U =30% and EE=l/l alt 

=133%. Note that l alt =2.25 for Table III. 

By employing the familiar representation used in UEP 

codes, the 11,7 shorten Hamming code can be represented 

as 11,7,3333333. Therefore, if we protect the first four 

bits only, then we can save one redundant checking bit by 

using the 11,7,3333222 UEP code. Note that the errorcorrecting 

capability of N U ,n,d 1 ,d 2 ,...,d k UEP codes is 

not better compared to N E ,n,d EC codes. However, UEP 

codes have better embedding rate and embedding efficiency 

and also overcome the error spreading problem. 

COMPARISON AND EXPERIMENTAL RESULTS 

Different analytical tools, such as the sample pair analysis 5 

and image quality metrics, 6 are used to analyze the 

steganographic solutions. To resist the various attacks on the 

stegoimage while still providing the required performance, 

an ideal steganographic scheme should be constructed by 

considering the relation between the cover image and the 



Table IV. Comparison of the conventional, EC-based, and UEP-based coded LSB schemes for n=2 to 12. 

Conventional scheme Zhang-Wang EC-based scheme Proposed UEP-based scheme 

n,k,l ER=l/n N E ,n,d ER=l/N E N U ,n,d 1 ,d 2 ,...,d k ER=l/N U 

2,1,1 50.0% 5,2,3 20.0% 4,2,32 25.0% 

4,1,3 75.0% 7,4,3 42.9% 6,4,3222 50.0% 

5,4,1 20.0% 9,5,4 11.1% 8,5,33332 12.5% 

6,4,2 33.3% 10,6,4 20.0% 9,6,333322 22.2% 

7,4,3 42.9% 11,7,4 27.3% 10,7,3333222 30.0% 

8,4,4 50% 12,8,4 33.3% 11,8,33332222 36.4% 

9,4,5 55.6% 13,9,4 38.5% 12,9,333322222 41.7% 

10,4,6 60% 14,10,4 42.9% 13,10,3333222222 46.2% 

11,4,7 63.6% 15,11,4 46.7% 14,11,33332222222 50.0% 

12,4,8 66.7% 17,12,5 47.1% 15,12,333322222222 53.3% 

stegoimage (or the relations between the stegopixel and the 

original pixel) and simultaneously the scheme should allow 

to achieve the high embedding rates. Therefore, the schemes 

under consideration are evaluated here in terms of the embedding 

rate, embedding efficiency, and the peak-signal-tonoise 

(PSNR) ratio calculated using the original cover image 

and its stegoversion. 

Table IV shows the embedding rates and codes used in 

the conventional scheme, EC-based scheme and UEP-based 

scheme for n=2 to 12. As it can be seen from the listed 

results, all UEP-based schemes have the shorter code length 

than the EC-based schemes. Both these schemes have the 

ability to correct the first k bits when no larger than one 

error occurs, and avoid the error spreading problem. Table V 

shows the detail comparison for these three schemes with 

n,k,l=7,4,3. Code-based schemes address the error 

spreading problem at the cost of their smaller ER and EE. 

The UEP-based scheme, with ER=30.0% and EE=133%, 

needs 3334 pixels in a cover image to embed 1000 

=333430% secret bits while altering 750 LSBs 

=33342.25/12 within these embedded pixels; and, as 

can be seen, it outperforms the EC-based scheme. 

In order to compare the distortion caused by the 

schemes under consideration, the well-known 259259 test 

gray-scale images “Baboon”, “Barb”, “Boat”, “Elaine”, 

“Mena”, and “Peppers” have been used as the cover images. 

The secret NDHU (National Dong Hwa University) “logo” 

and “text” gray-scale images to be embedded are shown in 

Figure 1. Secret images with size 5959 left, 4747 middle, and 

4949 pixels right. 

Table V. ER and EE for the conventional, EC-based, and UEP-based coded LSB schemes 

with n,k,l=7,4,3. 

Coded LSB scheme ER EE Embedding of 1000 bits 

Number of pixels needed Altered LSBs 

Conventional 42.9% 342% 2334 292 

EC-based 27.3% 114% 3667 877 

UEP-based 30.0% 133% 3334 750 

Table VI. PSNRdB between the cover image and its stegoimage for the conventional, 

EC-based, and UEP-based schemes. 

Coded LSB scheme Conventional EC-based UEP-based 

Cover image 

Baboon 57.173 54.671 54.691 

Barb 57.148 54.687 54.675 

Boat 57.181 54.696 54.677 

Elaine 57.143 54.675 54.681 

Mena 57.151 54.683 54.710 

Peppers 57.179 54.693 54.671 

Figure 1. Note that due to the different embedding rates for 

different schemes, the secret images of 5959, 4747, and 

4949 pixels for the conventional, EC-based, and UEPbased 

schemes, respectively, have been used to ensure fair 

comparisons. The achieved PSNR values are listed in Table 

VI. The results indicate that the considered schemes produce 

high-quality stegoimages and the highest PSNR was 

achieved by the conventional scheme due to the higher EE 

(343% versus 114% in the EC-based scheme and 133% in 

the UEP-based scheme). This suggests that adding the error 

correcting capability does not distort the stegoimage seriously 

and that employing the error correcting codes (EC or 

UEP) in the coded LSB embedding scheme constitutes a 

reasonable and practical solution to overcome the error 

spreading problem. 



Figure 2. Recovered secret images with BER=2%, 4%, and 8% for errors 

placed in random positions. 

To further study the error spreading problem, the two 

types of error patterns, namely random errors and worst 

errors have been added into the LSBs of stegoimages. The 

first type (random errors) means that the errors are randomly 

distributed in n-bit vector. However, the second type 

(worst errors) means that the errors occur in the worst positions 

(the first k bits of the original n-bit vector) where will 

cause error spreading. Figures 2 and 3 show the corresponding 

results obtained by extracting the embedded secret images 

from the noise corrupted stegoimages. Visual inspection 

of the results reveals that in a noisy environment the 

schemes based on UEP and EC codes have comparable performance 

and clearly outperform the conventional coded 

LSB scheme. Moreover, since the proposed UEP-based 

scheme has higher ER than the EC-based scheme, it can be 

concluded that our solution provides a tradeoff between the 

data embedding performance and the protection of the embedded 

secret. 

CONCLUSIONS 

A refined steganographic solution was introduced. Using 

UEP codes, we overcame the error spreading problem in the 

coded LSB steganographic scheme originally proposed by 

van Dijk and Willems. Our solution has the same correction 

effect as the Zhang-Wang EC-based scheme while allowing 

for lower embedding rates. This suggests that the solution 

Figure 3. Recovered secret images with BER=2%, 4%, and 8% for errors 

placed in the worst positions. 

proposed in this paper embeds the same secret message with 

the higher efficiency and produces less distortion in the generated 

stegoimage. The proposed solution is suitable for applications, 

such as transmission of the private digital materials 

(e.g., documents or signature images) through public 

and wireless networks, where data hiding and protection 

against communication errors are required or recommended. 

ACKNOWLEDGMENT 

This work was supported in part by TWISC@NCKU, National 

Science Council under the Grants NSC 94-3114-P- 

006-001-Y. 

REFERENCES 

1 M. van Dijk and F. Willems, “Embedding information in grayscale 

images”, Proc. 22nd Symp. Inform. Theory in the Benelux (Elsevier, 

Netherlands, 2001), pp. 147–154. 

2 X. Zhang and S. Wang, “Stego-encoding with error correction 

capability”, IEICE Trans. Fundamentals E88-A, 3663–3667 (2005). 

3 W. J. van Gils, “Two topics on linear unequal error protection codes: 

bounds on their length and cyclic code classes”, IEEE Trans. Inf. Theory 

29, 866–876 (1983). 

4 M. C. Lin, C. C. Lin, and S. Lin, “Computer search for binary cyclic UEP 

codes of odd length up to 65”, IEEE Trans. Inf. Theory 36, 924–935 

(1990). 

5 S. Dumitrescu, X. Wu, and Z. Wang, “Detection of LSB steganography 

via sample pair analysis”, IEEE Trans. Signal Process. 51, 1995–2007 

(2003). 

6 I. Avcibas, N. Memon, and B. Sankur, “Steganalysis using image quality 

metrics”, IEEE Trans. Image Process. 12, 221–229 (2003). 




In Situ X-ray Investigation of the Formation of Metallic 

Silver Phases During the Thermal Decomposition of Silver 

Behenate and Thermal Development of 

Photothermographic Films 

B. B. Bokhonov, M. R. Sharafutdinov and B. P. Tolochko 

Institute of Solid State Chemistry and Mechanochemistry, Russian Academy of Science, Kutateladze 18, 

Novosibirsk 630128, Russia 

E-mail: bokhonov@solid.nsk.su 

L. P. Burleva and D. R. Whitcomb 

Health Group, Eastman Kodak Company, Oakdale, Minnesota 55128 

Abstract. Metallic silver formation, resulting from the thermal decomposition 

of silver behenate, AgBe, and from the thermally induced 

reduction of AgBe incorporated into a photothermographic 

imaging construction, has been compared by in situ x-ray investigation. 

In the case of the thermal decomposition of individual AgBe 

crystals, the main factor that determines the growth of the silver 

particles is the change in the AgBe crystal structure, leading to the 

formation of intermediate mesomorphic phases that still retain characteristic 

layer structure. By contrast, development of AgBecontaining 

photothermographic films generates silver particles by 

the reduction of intermediate silver complexes, which are in a liquid 

state during the development process. The silver nanoparticles resulting 

from these processes exhibit different sizes and morphologies 

that are important for optimizing the optical properties of 

photothermographic films. © 2007 Society for Imaging Science and 

Technology. 


INTRODUCTION 

Silver behenate, AgO 2 C 22 H 43 2 , is one of the fundamental 

components of photothermographic materials because it 

provides the silver ions for reduction in the thermal development 

process that leads to the formation of a visible 

image. 1–4 In the literature, there are a large number of reports 

devoted to the investigation of the phase changes of 

long, saturated-chain silver carboxylates, including silver 

behenate, in thermal systems as well as the effect of individual 

components added to “dry silver” photothermographic 

formulations. 5–11 The x-ray investigation of 

silver carboxylates with carbon atoms from 2 to 22 12 showed 

that all of these crystal structures fall into the triclinic class 

and contain two molecules in the unit cell. Among the 

dominant characteristics of the silver carboxylate crystal 

structures, which are defined by their significant anisotropic 

physical and chemical properties, 1,13 is the presence of a lay- 

 

IS&T Member 

Received Jan. 17, 2007; accepted for publication Mar. 22, 2007. 

1062-3701/2007/514/386/5/$20.00. 

ered structure in which a double layer of silver ions separates 

a double layer of long methylene chains. For example, the 

solid-state crystal structure of silver stearate (AgSt, 

AgO 2 C 22 H 43 2 ) shows that the molecules are actually 

dimers connected together forming a polymer. 3 

Thermally induced phase changes in the silver carboxylate 

crystals have been investigated by various analytical 

methods, such as NMR, IR, conductivity, DSC, and XRD. 

The temperatures of the multiple-phase transitions for silver 

carboxylates having various chain lengths have been 

characterized. 5,14–17 Upon transition from the crystalline 

state to the isotropic liquid, the silver carboxylates undergo 

up to six to seven phase changes of the following sequence: 

crystal state→curd→super curd SUC→sub-waxy SW 

→waxy W→super waxy SUW→sub-neat SN→neat 

N→isotropic liquid. 5,18 It may be relevant that the phase 

changes in the silver carboxylate from the crystalline state 

into the super curd (SUC) or sub-waxy (SW) phase occur in 

the 120–125°C range, the temperature at which the thermal 

development in photothermography is normally carried out. 

X-ray diffraction, calorimetric, and IR methods were 

used in the investigation of the structural changes in the 

silver stearate crystal lattice. 5 It was shown that, upon heating, 

the silver stearate structure proceeded through a series 

of mesomorphic states. That is, the first phase transition 

occurred at 122°C, which was associated with a packing 

disorder of the aliphatic chains, manifested by a significant 

decrease in the separation between silver ion layers. It was 

proposed that increasing the temperature above 130°C leads 

to further disorder and the breakup of the silver ion layers, 

and it is responsible for the onset of the thermal decomposition 

reaction of the silver stearate, resulting in the formation 

of metallic silver and paraffin byproducts. 

The structure transformation of polycrystalline silver 

behenate was also studied by x-ray diffraction during in situ 

heating. 15 In contrast to the results reported in Ref. 5, these 

authors 15 observed an increase in the interlayer spacing dur- 

386

Bokhonov et al.: In situ x-ray investigation of the formation of metallic silver phases during the thermal decomposition of silver behenate... 

ing the heating of silver behenate crystals. The authors also 

indicated that heating silver behenate over 120°C irreversibly 

transforms it from a crystalline to an amorphous state. 

Further, at 138–142°C, the first phase changes are observed, 

established by the appearance of diffraction peaks at the 

smaller 2 Bragg angles, which correspond to an increase in 

interlayer distance in the silver behenate structure. In agreement 

with these results, upon heating above 145°C, the silver 

behenate crystals transform into a liquid-crystalline state 

and generate metallic silver phases at 180°C. The authors of 

this report consider that the initial stage of heating is the 

disordering of the silver behenate aliphatic chains. However, 

despite the agreement in the explanation of the structure 

transformations occurring in the silver behenate 15 and silver 

stearate, 5 there is a significant difference in the explanation 

of the subsequent structure changes in the phase transformations 

of these silver carboxylates. While heating silver 

stearate decreases its interlayer spacing, 5 heating silver 

behenate crystals initially proceeds through an amorphous 

phase followed by an increased distance between the layers. 15 

Such a contradictory heating behavior between the silver 

stearate and behenate seems to be quite surprising because 

of the close similarities between the silver stearate and 

behenate structures (C 18 and C 22 chain length, respectively) 

and their phase-transformation temperatures. In addition, 

we have recently reported the thermal decomposition of silver 

myristate, AgMy, AgO 2 C 14 H 27 2 under conditions 

similar to the AgSt and noted similar behavior. 5 Considering 

the importance of AgBe as a material for photothermographic 

imaging products and the contradiction between 

the trend observed for thermal decomposition of 

AgMy and AgSt (in solids and in photothermographic films) 

relative to the interlayer spacing differences reported for the 

AgBe, we have continued the systematic investigation of the 

effect of increasing the chain length on the thermal properties 

of the AgBe component in this series. Once the reasons 

for the formation of the solid products from these chemical 

reactions are better understood, novel routes to achieve control 

of these processes should be possible, and 

photothermographic properties can be further improved. In 

this work, we show the results of our in situ x-ray diffraction 

investigation related to the formation of metallic silver from 

the thermal decomposition of pure silver behenate, as well as 

from the thermal development of photothermographic materials 

based on silver behenate. 

EXPERIMENTAL 

The synthesis of silver behenate was carried out by the exchange 

reaction between sodium behenate and silver nitrate, 

as typically practiced. 5 Photothermographic films were prepared 

from pure AgBe (not a mixture of chain lengths as is 

common in photothermography) and preformed AgBr, 

along with the normal additional imaging components, as 

described elsewhere. 19 

X-ray experiments were carried out at the time-resolved 

diffractometry station—channel 5b of VEPP-3, BINP 

=1.506 Å. Transmission mode was used for small-angle 

Figure 1. Phthalazine and 4-methylphthalic acid. 

scattering (SAXS). X-ray patterns were obtained on the onecoordinate 

detector OD-3 with a 0.01° angular resolution 

and a 30 s recording time per frame. Samples were heated at 

1°C/min in a special tube furnace, and sample temperatures 

were controlled by a thermocouple. 

RESULTS AND DISCUSSION 

Despite the fact that the thermal decomposition of silver 

carboxylates and the development of photothermographic 

films both produce solid products of metallic silver, the 

chemical transformations occurring within these processes 

are completely different. If the thermal decomposition of 

silver carboxylate proceeds according to the following 

scheme: 7 

AgO 2 C n H 2n−1 2 → 2Ag + 2CO 2 +C 2n−1 H 22n−1 

with the formation of metallic silver and paraffin, then the 

development stages of photothermographic films are more 

complicated. Thermally induced reduction of the silver ions 

during film development can be summarized in a more simplified 

form as follows: 

AgO 2 C n H 2n−1 2 → silver ion intermediates 

→ 2Ag + 2HO 2 C n H 2n−1 . 

This reduction reaction is the result of preliminary exposure 

of the silver halide in the film, which forms active latent 

image centers that catalyze the thermal development step at 

110–130°C. The silver ion intermediates are reduced at the 

latent image centers, resulting in crystalline silver particles, 

which comprise the visible image. It is generally accepted 

that the silver ion source for silver particle formation at the 

latent image center is the silver carboxylate, 1–4 which is not 

light sensitive in the visible region of the spectrum, and that 

the transport of silver ions from the silver carboxylate to the 

latent image center is from the thermally initiated formation 

of the various silver complexes obtained with the components 

added into the formulation (developers, toners, and 

antifoggants). 1–6 

A recent investigation into the composition of phase 

changes occurring during the development process raised 

doubts about the source of the silver ions forming the visible 

image coming only from the silver carboxylate phase. 20 Contrary 

to all other literature, 1–4 the reduction of a model system 

composition (AgBe/AgBr with phthalazine (PHZ, 

Figure 1) 4-methyl-phthalic acid (4MPA, Fig. 1), and developer) 

resulted in a significant decrease 45% in the x-ray 

peak intensity for the AgBr phase. This change was proposed 

1 

2 



to be related to the contribution of silver ions from the AgBr 

in the formation of the metallic silver particles of the image. 

As discussed below, in the full photothermographic imaging 

formulation, we see no change in the AgBr signal. 

The in situ investigation of the structural and phase 

changes in the thermal decomposition of silver behenate and 

the development of the photothermographic films prepared 

from it showed that the processes accompanying the thermal 

formation of the silver particles are significantly 

different. 

In Situ Investigation of the Phase Formed in the Process 

of Thermal Decomposition of AgBe 

The change in the diffraction characteristics of AgBe in the 

small angle region 2=0.4–10° during in situ heating 

from 20–220°C is shown in Figure 2(a). 

Increasing the temperature through this range is accompanied 

by a change in the AgBe x-ray diffraction pattern that 

is due to phase transformations occurring within the heated 

powder. As the temperature is increased, the reflections of 

the high temperature phases shift to the high diffraction 

angle regions, which confirm the decrease in the interlayer 

spacing in the silver carboxylate structure. Analysis of the 

diffraction data of these phases in this temperature interval 

allows us to separate out at least six phases that have different 

structural characteristics, Fig. 2(b). 

It should be noted that the similar structural changes in 

AgBe and AgMy, with the formation of intermediate phases, 

were established previously. 5,6,21 According to the authors, 5 

the first three phases correspond to the transitions within 

the AgBe crystalline state, but then at 155°C, silver behenate 

transforms into a liquid-crystalline material. 

The x-ray diffraction patterns of the intermediate 

phases formed during heating, Fig. 2(b), include at least two 

series of layer reflections, which are evidence for the formation 

of a two-dimensional structure. Increasing the temperature 

above 230°C leads to the disappearance of the diffraction 

image of a layered structure. At the same time in the 

small 2 angle region, an increase in SAXS intensity is observed, 

the maximum of which is at 1.17°, Fig. 2(c). 

Subsequent heating to 250°C corresponds to a change 

in the shape of the small angle scattering peak, which appears 

as a decrease in the intensity of the SAXS maximum. 

Simultaneously, peaks appear in the small angle scattering 

angles at 20.8°. 

The in situ x-ray diffraction of the thermal decomposition 

of AgBe in the wide-angle region is (WAXS, 

2=25–55°). Figure 3 showed that heating the powder to 

140°C does not appear to significantly change the diffraction 

pattern. Upon heating to higher than 145°C, the 

crystal-phase reflections of AgBe disappear, and beginning at 

230°C broad reflections are observed because of the (111) 

and (200) planes of metallic silver, Fig. 3, the intensity of 

which increases as the temperature is increased. Given that 

the first phase transition in silver behenate occurs at 128°C, 

the presence in the x-ray diffraction of the crystalline phase 

Figure 2. a Change in the x-ray diffraction pattern of silver behenate 

during in situ heating. b X-ray diffraction pattern for the initial 20°C 

and intermediate phases formed in the heating of AgBe. c SAXS of 

AgBe. 

of AgBe is evidence that the first phase transition occurs 

from one crystalline state to another. 



Figure 3. WAXS of AgBe during in situ thermal decomposition. 

Figure 5. Change in the x-ray diffraction pattern of photothermographic 

films during thermal development. 

Figure 4. Change in the x-ray diffraction pattern of AgBe during the 

development of photothermographic films: Initial decrease in the AgBe 

layer peak intensities with a simultaneous increase in the signal intensity of 

the small-angle scattering peaks. 

In situ X-ray Diffraction Investigation of Phase 

Formation During Development of Photothermographic 

Films Prepared with AgBe 

The in situ x-ray investigation of the formation of phases 

during the development of photothermographic films prepared 

with AgBe showed that the formation of the silver 

phases occurs at temperatures significantly lower than the 

temperature of the first phase transformation 126°C. 

There are no shifts in the AgBe reflections observed during 

heating the photothermographic films from 20–80°C. After 

80°C, however, the intensity of the x-ray diffraction peaks 

related to the layered structure of AgBe (001) decrease, 

which corresponds to the simultaneous increase in the intensity 

in the small-angle scattering region of SAXS 

2=0.4–1.2°, Figure 4. 

The in situ x-ray diffraction pattern over 24–54° 2 

showed that the reflections that were due to the metallic 

silver appear as low as 80°C, Figure 5. Upon increasing the 

temperature (or the development time), the peak intensity of 

the silver reflections increases. 

It is important to note that the in situ investigation of 

the thermal development of films did not reveal any kind of 

additional reflections from intermediate solid phases. More 

significantly, the reflection intensity for silver bromide (200) 

up to and after processing remained completely unchanged, 

Fig. 5. 

Comparing the half-widths of the silver reflections 

(111) and (200) recorded during the decomposition of 

AgBe, Fig. 3, and the developed photothermographic films, 

Fig. 5, provide clear evidence that the size of the silver crystals 

in the developed films are significantly larger than that 

formed in the process of the thermal decomposition of pure 

AgBe, similar to that observed in AgMy. 6 

All of these results on the thermal decomposition of 

AgBe and thermal development of photothermographic 

films can be summarized as follows. 

The formation of metallic silver phases from the thermal 

decomposition of pure AgBe occurs through the formation 

of a series of intermediate mesomorphic phases. The 

formation of silver particles, established by the appearance in 

the x-ray diffraction pattern of in situ heated signal in the 

small-angle scattering, proceeds after the destruction of the 

AgBe layer structure. 

The development of the photothermographic films 

shows the formation of silver phases at 80°C that correspond 

to the decreasing intensity of the silver behenate layer 

structure reflections with a simultaneous increase in the intensity 

of the SAXS. In particular, it must be emphasized 

that the formation of the silver phase does not proceed 

through any change in the intensity of the silver halide 

peaks, and it is clear that the silver particles form from the 

reduction of silver ions originating only from the AgBe crystals. 



Overall, all of these results are in good agreement with 

previous reports. 22,23 That is, the initial stages of thermal 

decomposition of individual silver carboxylate form nanosized 

2–5 nm particles of silver, which subsequently agglomerate 

up to 10–15 nm, crystallizing on the lateral 

planes of the silver carboxylate crystals. In our opinion, this 

stage of silver particle growth is the cause of the curve shape 

changes in the small-angle scattering in which a decrease in 

intensity and a shift to the small-angle region of the SAXS 

maxima was observed. Finally, it should be noted that the 

difference between the thermal behavior of pure AgBe described 

in here and Ref. 16 could be attributed to the differences 

in the preparation procedures. The same effect may 

influence the x-ray results during the study of the role of 

AgBr in the photothermographic process. 15 

CONCLUSIONS 

The differences in the diffraction data during the development 

of photographic films and the thermal decomposition 

of pure AgBe are related to the differences in the chemical 

transformations in these processes: in contrast to the thermal 

decomposition of pure AgBe, development of the 

photothermographic films generates silver particles by the 

reduction of intermediate silver complexes, which are in the 

liquid state (not observable by x-ray diffraction). In the case 

of the thermal decomposition of individual AgBe crystals, 

the main factor that determines the growth of the silver 

particles is the change in the structure, leading to the formation 

of intermediate mesomorphic phases, which still retain 

the characteristic layer structure. 

ACKNOWLEDGMENT 

The authors gratefully thank T. Blanton (Eastman Kodak 

Company) for helpful discussions. 

REFERENCES 

1 P. J. Cowdery-Corvan and D. R. Whitcomb, in Handbook of Imaging 

Materials, edited by A. S. Diamond and D. S. Weiss (Marcel Dekker, 

New York, 2002), p. 473. 

2 D. H. Klosterboer, in Neblette’s Eighth Edition: Imaging Processes and 

Materials, edited by J. M. Sturge, V. Walworth, and A. Shepp (Van 

Nostrand-Reinhold, New York, 1989), Chap. 9, p. 279. 

3 V. M. Andreev, E. P. Fokin, Yu. I. Mikhailov, and V. V. Boldyrev, Zhur. 

Nauch. Priklad. Fotogr. Kinemat. 24, 311 (1979). 

4 T. Maekawa, M. Yoshikane, H. Fujimura, and I. Toya, J. Imaging Sci. 

Technol. 45, 365 (2001). 

5 B. B. Bokhonov, M. R. Sharafutdinov, B. P. Tolochko, L. P. Burleva, and 

D. R. Whitcomb, J. Imaging Sci. Technol. 49, 389 (2005). 

6 B. B. Bokhonov, L. P. Burleva, A. A. Sidelnikov, M. R. Sharafutdinov, B. 

P. Tolochko, and D. R. Whitcomb, J. Imaging Sci. Technol. 47, 89 (2003). 

7 V. M. Andreev, L. P. Burleva, and V. V. Boldyrev, J. Sib. Branch Acad. Sci. 

USSR 5(5), 3 (1984). 

8 D. R. Whitcomb and R. D. Rogers, J. Imaging Sci. Technol. 43, 517–520 

(1999). 

9 D. R. Whitcomb and M. Rajeswaran, J. Imaging Sci. Technol. 47, 107 

(2003). 

10 D. R. Whitcomb and R. D. Rogers, Inorg. Chim. Acta 256, 263 (1997). 

11 P. Z. Velinzon, S. I. Gaft, O. A. Karekina, N. K. Ryasinskaya, and I. G. 

Chezlov, Zhur. Nauch. i Priklad. Fotogr. 48(3), 35–45 (2003). 

12 V. Vand, A. Aitken, and R. K. Campbell, Acta Crystallogr. 2, 398–403 

(1949). 

13 A. E. Gvozdev, Ukr. Fiz. Zh. (Russ. Ed.) 24, 1856 (1979). 

14 M. Ikeda, Photograph. Sci. Eng. 24(6), 277 (1980). 

15 I. Geuens, I. Vanwelkenhuysen, and R. Gijbels, Proc. 2000 International 

Symposium on Silver Halide Imaging (IS&T, Springfield, VA, 2000) pp. 

203–233. 

16 K. Binnemans, R. V. Deun, B. Thijs, I. Vanwelkenhuysen, and I. Geuens, 

Chem. Mater. 16, 2021 (2004). 

17 X. Liu, S. Liu, J. Zhang, and W. Cao, Thermochim. Acta 440, 1 (2006). 

18 V. M. Andreev, L. P. Burleva, B. B. Bokhonov, and Y. I. Mikhailov, Izv. 

Sib. Otd. AN SSSR, Ser. Khim. Nauk. 2(4), 58 (1983). 

19 C. Zou, J. B. Philip, S. M. Shor, M. C. Skinner, and C. P. Zhou, US 

Patent No. 5,434,043 (1995). 

20 H. Strijckers, J. Imaging Sci. Technol. 47, 100 (2003). 

21 T. N. Blanton, S. Zdzieszynski, M. Nikolas, and S. Misture, Powder Diffr. 

48, 27 (2005). 

22 B. B. Bokhonov, L. P. Burleva, and D. R. Whitcomb, J. Imaging Sci. 

Technol. 43, 505 (1999). 

23 B. B. Bokhonov, L. P. Burleva, D. R. Whitcomb, and M. R. V. Sahyun, 

Microsc. Res. Tech. 42, 152 (1998). 


IS&T Corporate Members 

IS&T Corporate Members provide significant financial support, thereby assisting the Society in achieving its goals of 

disseminating information and providing professional services to imaging scientists and engineers. In turn, the Society 

provides a number of material benefits to its Corporate Members. For complete information on the Corporate 

Membership program, contact IS&T at info@imaging.org. 

Sustaining Corporate Members 

Adobe Systems Inc. 

345 Park Avenue 

San Jose, CA 95110-2704 

Canon USA Inc. 

One Canon Plaza, Lake Success 

New York, NY 11042-1198 

Eastman Kodak Company 

343 State Street 

Rochester, NY 14650 

Hewlett-Packard Company 

1501 Page Mill Road 

Palo Alto, CA 94304 

Lexmark International, Inc. 

740 New Circle Road NW 

Lexington, KY 40511 

Xerox Corporation 

Wilson Center for Research and 

Technology 

800 Phillips Road 

Webster, NY 14580 

Supporting Corporate Members 

Fuji Photo Film Company, Ltd. 

210 Nakanuma, Minami-ashigara 

Kanagawa 250-0193 Japan 

Konica Minolta Holdings Inc. 

No. 1 Sakura-machi 

Hino-shi, Tokyo 191-8511 Japan 

TREK, Inc./TREK Japan KK 

11601 Maple Ridge Road 

Medina, NY 14103-0728 

Pitney Bowes 

35 Waterview Drive 

Shelton, CT 06484 

Donor Corporate Members 

ABBY USA Software House, Inc. 

47221 Fremont Blvd. 

Fremont, CA 94538 

Axis Communications AB 

Embdalavägen 14 

SE-223 69 Lund, Sweden 

Ball Packaging Europe GmbH 

Technical Center Bonn 

Friedrich-Woehler-Strasse 51 

D-53117 Bonn, Germany 

Cheran Digital Imaging & Consulting, Inc. 

798 Burnt Gin Road 

Gaffney, SC 29340 

Clariant Produkte GmbH 

Division Pigments & Additives 

65926 Frankfurt am Main Germany 

Felix Schoeller Jr. GmbH & Co. KG 

Postfach 3667 

D-49026 Osnabruck, Germany 

Ferrania SpA 

Viale Martiri Della Liberta’ 57 

Ferrania (Savona) I-17014 

GretagMacbeth 

Logo GmbH & Co. KG 

Westfälischer Hof Garbrock 4 

48565 Steinfurt, Germany 

GR8moments Limited 

Units 15-16 Town Yard Industrial Estate 

Station Street 

Leek, Staffordshire 

England, ST13 8BF 

Hallmark Cards, Inc. 

Chemistry R & D 

2501 McGee, #359 

Kansas City, MO 64141-6580 

ILFORD Imaging Switzerland GmbH 

Route de l’Ancienne Papeterie 1 

CH-1723 Marly, Switzerland 

MediaTek Inc. 

No. 1 Dusing Rd., 1 

Hsinchu 300 R.O.C, Taiwan 

Pantone, Inc. 

590 Commerce Blvd. 

Carlstadt, NJ 07072-3098 

Quality Engineering Associates (QEA), Inc. 

99 South Bedford Street, #4 

Burlington, MA 01803 

The Ricoh Company, Ltd. 

16-1 Shinei-cho, Tsuzuki-ku 

Yokohama 224-0035 Japan 

Sharp Corporation 

492 Minosho-cho, Yamatokoriyama 

Nara 639-1186 Japan 

Sony Corporation/ 

Sony Research Center 

6-7-35 Kita-shinagawa 

Shinagawa, Tokyo 141 Japan 

*as 7/1/07

Journal of the Imaging Society of Japan VOL.46 NO.3 

2007 

CONTENTS 

Original Papers 

Analysis of the Magnetic Force Acting on the Toner in the Black Image Area and White Image Area in the 

Magnetic Printer (2)N. KOKAJI ...1722 

A Model for Electrostatic Discharge in the Toner Layer during the Transfer Process 

M. MAEDA, K. NISHIWAKI, K. MAEKAWA and M. TAKEUCHI ...1788 

Imaging Today 

The Law of Environmental Standard, Safety Standard and Energy Saving 

Introduction H. YAMAZAKI, T. TAKEUCHI, K. NAGATO, K. MARUYAMA and K. SUZUKI ...18414 

The Notification Systems of New Chemical Substances in the World T. YAMAMOTO ...18515 

The Environmental Regulations in Japan H. SATO ...19222 

The Trend of European Product Environmental Legislation and Eco-labels 

R. IWANAGA, K. FUJISAWA and A. MATSUMOTO ...19929 

Practical Side of the Environmentally Conscious Technology for the Product 

T. BISAIJI, K. YASUDA, T. ARAI, K. SUZUKI, K. AKATANI and M. HASEGAWA ...20737 

Lectures in Science 

Introduction of Optics (I) 

The Behavior of a Beam of Light upon Reflection or Refraction at a Plane Surface 

H. MUROTANI ....21646 

Meeting Reports 22353 

Announcements 22454 

Guide for Authors 22959 

Contents of J. Photographic Society of Japan23060 

Contents of J. Printing Science and Technology of Japan23161 

Contents of J. Inst. Image Electronics Engineers of Japan 23262 

Contents of Journal of Imaging Science and Technology 23363 

Essays on Imaging 

The Imaging Society of Japan 

c/o Tokyo Polytechnic University, 2-9-5, Honcho, Nakano-ku, Tokyo, 1648768 Japan 

Phone :033373-9576 Fax :033372-4414 E-mail :

Advanced Measurement Systems for All R&D and 

Quality Control Needs in Electrophotography, 

Inkjet and Other Printing Technologies 

PDT ® -2000 series 

Electrophotographic characterization, 

uniformity mapping, and defect 

detection for large and small format 

OPC drums 

PDT ® -1000L 

PDT ® -1000 

ECT-100 TM 

OPC drum coating thickness 

gauge 

Electrophotographic 

Component Testing 

MFA-2000 TM 

Magnetic field distribution 

analysis in mag roller magnets 

DRA-2000 TM 

Semi-insulating components testing 

including charge rollers, mag rollers, 

transfer rollers, transfer belts, and 

print media 

TFS-1000 TM 

Toner fusing latitude testing 

Objective Print Quality 

Analysis for All Digital 

Printing Technologies 

IAS ® -1000 

Fully-automated high volume print 

quality testing 

Scanner-based high speed print 

quality analysis 

Scanner IAS ® 

Personal IAS ® 

Handheld series for print quality, 

distinctness of image (DOI), and 

color measurements. Truly portable; 

no PC connection required 

PocketSpec TM 

DIAS TM 

Quality Engineering Associates, Inc. 

99 South Bedford Street #4, Burlington, MA 01803 USA 

Tel: +1 (781) 221-0080 • Fax: +1 (781) 221-7107 • info@qea.com • www.qea.com

imaging.org 

your source for imaging technology conferences 

UPCOMING CONFERENCES 

Join us in Anchorage, Alaska! 

September 16-21, 2007 

NIP23 Sessions 

NIP23 

23rd International Conference on 

Digital Printing Technologies 

www.imaging.org/conferences/nip23 

• Advanced Materials and Nanoparticles in Imaging 

• Color Science and Image Processing 

• Digital Art 

• Electronic Paper and Paper-like Displays 

• Environmental Issues 

• Fusing Curing and Drying 

• Image Permanence 

• Ink Jet Printing Materials 

• Ink Jet Printing Processes 

• Media for Digital Printing and Displays 

Come and learn about... 

• Photo-electronic Materials and Devices 

• Print and Image Quality 

• Printing Systems Engineering and Optimization 

• Production Digital Printing 

• Security and Forensic Printing 

• Textile & Industrial Printing 

• Thermal Printing 

• Toner Based Printing Materials 

• Toner Based Printing Processes 

DF 2007 Sessions 

• Industrial and Commercial Applications 

• Materials and Substrates 

• New and Novel Direct Write Methods 

• Printed Architectural Components 

• Printed Electronics and Devices 

• Printing of Biomaterials 

• Plus: Joint Session Intellectual Property Panel on “Future 

and Limitations of Ink Jet and Electrophotography” 

Digital 

Fabrication 2007 

www.imaging.org/conferences/df2007 

For more information visit the conference website or visit: www.imaging.org/conferences; or contact us at info@imaging.org

Additional Material, Journal of Imaging Science - Society for Imaging ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?