Mr Sach Mukherjee

Report on the IEEE Bioinformatics Conference 2004 

Sach Mukherjee 

Department of Engineering Science 

University of Oxford. 

The IEEE Computer Society Bioinformatics Conference (or CSB) is one of a handful of 

highly selective meetings in the area of bioinformatics and computational biology. Held every 

year at Stanford University, CSB attracts around six hundred attendees, but is single-track. 

Bioinformatics is an inherently multidisciplinary field, attracting, among others, engineers, 

biologists and statisticians, and CSB attendees tend to reflect this diversity. This year’s 

conference took place in mid-August and comprised around thirty talks, a hundred posters 

and a half-dozen tutorials. 

My own research focuses on the development and application of statistical and machine 

learning methods to the analysis of gene expression data. In particular, my work addresses the 

problem of finding genes whose levels of activity differ between conditions such as healthy 

and diseased, a task called ‘gene selection’. Having shown theoretically that many widelyused 

methods for gene selection are simply not very robust, and that an unsupervised learning 

approach could be used to significantly ameliorate the problem, I was eager to communicate 

my findings to the bioinformatics community. I was interested to hear the views of wet-lab 

biologists and statisticians working in broadly the same area. Beyond my immediate research 

area, I was also keen to learn as much as possible about where bioinformatics in general is 

heading. My goals in attending CSB were therefore threefold: to advertise my own work, 

interact with scientists working on similar problems, and also to network more broadly with 

researchers from other areas of bioinformatics. 

Having previously attended a large international meeting in machine learning (NIPS 2002, in 

Vancouver) I was aware that planning would be crucial to fully realising my goals. I had two 

separate pieces of work accepted to the conference, a paper and a poster. I was fortunate to 

have had my paper selected for podium presentation, which provided me with the opportunity 

of giving a 30 minute presentation to the entire conference. My talk turned out to be 

scheduled relatively early in the conference, so with regard to promoting my work, I focused 

my efforts on simply preparing an effective talk. The likely diversity of the audience led me 

to avoid excessive jargon and concentrate on presenting a coherent ‘story’. I also compiled a 

short list of people working my sub-area, with whom I specifically wanted to establish 

contact, and decided to make every effort to have at least one meaningful conversation with 

each of them.

The conference was organized into a total of 14 sessions covering every area of 

bioinformatics, from protein structure to gene ontologies. The organizers kept the sessions 

short with frequent coffee-breaks in between. This arrangement proved extremely popular as 

the breaks provided excellent opportunities to interact one-on-one with speakers, and discuss 

ideas in more detail than is possible when asking a question at the end of a talk. Coffee-breaks 

also turned out to be an excellent opportunity for ‘targeted’ networking, and over the course 

of the conference I was able to have good conversations with most of the researchers on my 

list. The setting of the conference could not have been better, with all meals taken in 

Stanford's sunny Dohrmann Grove, and posters presented in open-air sessions either side of 

the main auditorium. Meal-times proved quite productive too, as I found myself sharing tables 

with researchers from a wide variety of backgrounds. This had the effect of opening my eyes 

to areas of research that I would otherwise rarely encounter. 

My own talk went extremely well, and generated a good deal of interest. I spent several 

coffee breaks afterwards discussing my work with interested members of the audience, many 

of whom asked extremely good questions. The most useful comments concerned the 

biological validation of my largely theoretical results. A key problem in expression analysis is 

the scarcity of ‘gold-standard’ datasets, where the truth is known a priori, but several 

attendees were able to point me to some very good data upon which to further test my 

methods. Indeed, ideas that emerged at CSB have since formed the basis for a new paper 

which is currently under review. By and large, my impression was that my work did indeed 

raise awareness of some important problems in expression analysis, and that my progress on 

addressing those problems was on the right track. I was especially pleased to have a number 

of good conversations with biologists and statisticians, and to be invited by several groups to 

give seminars to them in the near future. The talk had the further effect of generating interest 

in my poster, which I found very well attended once I had given the talk. On my return, I was 

also very happy to find that my paper was one of ten or so to have been selected for inclusion 

in a special issue of the Journal of Bioinformatics and Computational Biology. 

The other talks were generally of a high technical standard, although a few failed to hold my 

interest on account of poor presentation. Personally, I found the poster sessions especially 

effective, as I was able to understand the work at my own pace and then interact one-on-one 

with the authors. Two of the best posters were one by Berger et al. of UCSB on the analysis 

of two different kinds of microarray data using a linear algebraic method called ‘Generalized 

Singular Value Decomposition’. Another excellent poster was by Xie et al. who showed that 

the dynamic range of gene expression measurements depends on the functional category of

the genes. This is an extremely interesting, albeit preliminary, finding, with important 

implications for the analysis of expression data. 

The best of the keynote speakers were Gene Myers and Michael Eisen, both from the 

University of California at Berkeley. Prof Myers' leadership in bioinformatics pre-dates the 

term itself, and he provided a superb overview of where the field is heading. His theme was 

the importance of seeing the ‘big picture’ in terms of biology. He rightly noted that it is 

crucial to go beyond simply ‘cranking the handle’ and observed that there is a growing 

tendency in the field to apply ever more complex methods to data without giving much 

thought to the underlying scientific questions. Dr. Eisen talked about Drosophila 

development, and displayed an infectious enthusiasm for his subject. He emphasised the 

importance of bringing together diverse data types to shed light on developmental processes. 

With so much work in bioinformatics rather narrowly focused on array analysis (including, 

admittedly, my own), his perspective as a biologist was timely and valuable. 

Thus, my overall impression was that bioinformatics is heading into a period where key 

themes will be the fusion of heterogeneous data types and the integration of biological 

knowledge into computational algorithms. To date, a lot of work has focused on one of 

expression, protein or sequence information, but increasingly these data will have to be dealt 

with together. Bayesian methods have proved adept at handing data fusion in areas like 

robotics and sensors, and my feeling is that Bayesian inference will prove if anything even 

more popular in bioinformatics, where computational efficiency is generally less of a concern 

than in many ‘real-time’ engineering systems. One of the bottlenecks in current 

bioinformatics research seems to be the fact that in many ways experimentalists and theorists 

do not yet share a common language. One solution to this problem is simply to have the 

various communities interact more. Indeed, teams are becoming ever-more multidisciplinary, 

with researchers from different fields interacting on a daily, rather than weekly or monthly, 

basis. Mixed wet-lab and theory institutes are now appearing at a number of universities, 

including the Centre for Gene Function here at Oxford. It was clear at the conference that 

only a handful of computer scientists and statisticians really have a good understanding of 

biology, and equally that few biologists are really comfortable with the language of 

mathematics. Looking at more mature fields like biophysics and biochemistry, it seems that 

what is needed are individuals who have both wet-lab training and an understanding of 

mathematics and computation. Talking to some of the Stanford undergraduates at the 

conference, it seemed that this is precisely the direction in which their training is now headed.

Overall, CSB 2004 was a great experience, enjoyable as well as productive. The lessons I 

took away were to learn more biology, take even more care to make my talks and writing 

accessible to biologists as well as engineers, and finally to stay in bioinformatics! While 

clearly in its early stages as a discipline, the sheer breadth of questions to be answered and the 

need for powerful computational algorithms with which to address those questions make it an 

exciting field for information engineers to work in.

Mr Sach Mukherjee

Create successful ePaper yourself

Delete template?

Save as template?