12.07.2015 Views

here - Biomedical Computation Review

here - Biomedical Computation Review

here - Biomedical Computation Review

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ities of a finite training set (overfitting) by monitoringthe progress of the FES/classifier. Theindependent test set is used for external crossvalidation,but only after completion of the FESand identification of the final classifier. Withsmall datasets, even partitioning into trainingand test set is statistically suspect, and k-foldcross-validation is used: the dataset is split into kequal parts (~5-10), trained on k-1 parts andtested on the remaining portion. One thencycles through k times and averages the testresults. For small sample sizes, the variance ofthe averaged test accuracies tends to be unacceptablylarge, while overtraining is still a threat.For highly imbalanced classes (e.g., rare diseasevs. healthy), overall classification accuracycan be misleading. For example, consider 90samples in the healthy class, but only 10 in thedisease class. Misclassifying all 10 still gives 90%overall accuracy. Hence, balanced sensitivityand specificity (i.e., comparable accuracies forboth classes) is more appropriate, and can beachieved by undersampling, oversampling or bypenalizing misclassifications differently for differentclasses. (Differing misclassification costsfor the classes is an example.)For each sample, we compute class probabilities.This is relevant clinically (e.g., additionaltests would be suggested if a classifier assigned apatient to the disease class with 55% probability,immediate treatment would commence ifthis probability were 90%.)In the biomedical field, the twin curses aregenerally active. They both must be dealt within concert, otherwise overly optimistic and frequentlywrong conclusions will result. ■SeeingScience continuedEcce Homology is aphysically interactivenew-media work thatvisualizes genetic dataas calligraphic forms.With a name inspired by FriedrichNietzsche’s Ecce Homo, a meditation onhow one becomes what one is, the projectexplores human evolution by examiningsimilarities between genes fromhuman beings and a target organism, inthis case the rice plant. Ecce Homology isa physically interactive new-media workthat visualizes genetic data as calligraphicforms. A novel computer-vision basedinterface allows multiple participants,through their movement in the installationspace, to select genes from thehuman genome for visualization usingthe Basic Local Alignment Search Tool(BLAST). Five projectors present thesechanges in Ecce Homology’s calligraphicforms across a 40-foot wide wall.“If we worked on the genomic calligraphyvisualization further, it couldbe useful to scientists,” she says, “butthe installation is not a tool; it’s art.And it’s specifically ambiguous and abit mysterious—by intention.”Ecce Homology, which was first displayedtwo years ago at the FowlerMuseum in Los Angeles, works on manylevels both scientifically and artistically.“People assume that t<strong>here</strong>’s value in thevast amounts ofgenomic data we aregenerating,” says West,“but data is not knowledge,and in order forus to derive knowledgefrom it, we need tointerpret it. The morecomplex it is, the harderit is for humanbeings to do that and,consequently, thegreater our need to findnew approaches.” So,says West, “we’ve producedan artwork thatboth speaks to thisneed and lets viewersinteract fluidly with thedata in a visceral way.”Ultimately, West says, the exhibitposes the question, “If you were to dowork that’s truly hybrid art/science,what would that process be like? Andwould t<strong>here</strong> be any outcome that wouldpoint to how art might nurture scientificdiscovery?”For more information about EcceHomology, visit www.insilicov1.org. ■Ecce Homology’s custom softwaretransforms strings of genetic codeinto luminous, scientifically accuratevisualizations that incorporatemultiple biological features. Forprotein sequences, the strokeplacement, shape and brush qualityare determined by physical andchemical properties, such as theproportion of mass to volume,hydrophobicity, or ionization of theamino acids. The visualization iscreated from amino-acid sequencechunks that are segmented by a“turn prediction” algorithm. Eachsegment’s corresponding calligraphicstroke is connected to itsneighbor by a connection whoseshape is based on a secondarystructure property of the segment.The result resembles calligraphy.Courtesy: Ruth Westwww.biomedicalcomputationreview.orgFall 2005 BIOMEDICAL COMPUTATION REVIEW 25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!