Mr Sach Mukherjee
Mr Sach Mukherjee
Mr Sach Mukherjee
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Report on the IEEE Bioinformatics Conference 2004<br />
<strong>Sach</strong> <strong>Mukherjee</strong><br />
Department of Engineering Science<br />
University of Oxford.<br />
The IEEE Computer Society Bioinformatics Conference (or CSB) is one of a handful of<br />
highly selective meetings in the area of bioinformatics and computational biology. Held every<br />
year at Stanford University, CSB attracts around six hundred attendees, but is single-track.<br />
Bioinformatics is an inherently multidisciplinary field, attracting, among others, engineers,<br />
biologists and statisticians, and CSB attendees tend to reflect this diversity. This year’s<br />
conference took place in mid-August and comprised around thirty talks, a hundred posters<br />
and a half-dozen tutorials.<br />
My own research focuses on the development and application of statistical and machine<br />
learning methods to the analysis of gene expression data. In particular, my work addresses the<br />
problem of finding genes whose levels of activity differ between conditions such as healthy<br />
and diseased, a task called ‘gene selection’. Having shown theoretically that many widelyused<br />
methods for gene selection are simply not very robust, and that an unsupervised learning<br />
approach could be used to significantly ameliorate the problem, I was eager to communicate<br />
my findings to the bioinformatics community. I was interested to hear the views of wet-lab<br />
biologists and statisticians working in broadly the same area. Beyond my immediate research<br />
area, I was also keen to learn as much as possible about where bioinformatics in general is<br />
heading. My goals in attending CSB were therefore threefold: to advertise my own work,<br />
interact with scientists working on similar problems, and also to network more broadly with<br />
researchers from other areas of bioinformatics.<br />
Having previously attended a large international meeting in machine learning (NIPS 2002, in<br />
Vancouver) I was aware that planning would be crucial to fully realising my goals. I had two<br />
separate pieces of work accepted to the conference, a paper and a poster. I was fortunate to<br />
have had my paper selected for podium presentation, which provided me with the opportunity<br />
of giving a 30 minute presentation to the entire conference. My talk turned out to be<br />
scheduled relatively early in the conference, so with regard to promoting my work, I focused<br />
my efforts on simply preparing an effective talk. The likely diversity of the audience led me<br />
to avoid excessive jargon and concentrate on presenting a coherent ‘story’. I also compiled a<br />
short list of people working my sub-area, with whom I specifically wanted to establish<br />
contact, and decided to make every effort to have at least one meaningful conversation with<br />
each of them.
The conference was organized into a total of 14 sessions covering every area of<br />
bioinformatics, from protein structure to gene ontologies. The organizers kept the sessions<br />
short with frequent coffee-breaks in between. This arrangement proved extremely popular as<br />
the breaks provided excellent opportunities to interact one-on-one with speakers, and discuss<br />
ideas in more detail than is possible when asking a question at the end of a talk. Coffee-breaks<br />
also turned out to be an excellent opportunity for ‘targeted’ networking, and over the course<br />
of the conference I was able to have good conversations with most of the researchers on my<br />
list. The setting of the conference could not have been better, with all meals taken in<br />
Stanford's sunny Dohrmann Grove, and posters presented in open-air sessions either side of<br />
the main auditorium. Meal-times proved quite productive too, as I found myself sharing tables<br />
with researchers from a wide variety of backgrounds. This had the effect of opening my eyes<br />
to areas of research that I would otherwise rarely encounter.<br />
My own talk went extremely well, and generated a good deal of interest. I spent several<br />
coffee breaks afterwards discussing my work with interested members of the audience, many<br />
of whom asked extremely good questions. The most useful comments concerned the<br />
biological validation of my largely theoretical results. A key problem in expression analysis is<br />
the scarcity of ‘gold-standard’ datasets, where the truth is known a priori, but several<br />
attendees were able to point me to some very good data upon which to further test my<br />
methods. Indeed, ideas that emerged at CSB have since formed the basis for a new paper<br />
which is currently under review. By and large, my impression was that my work did indeed<br />
raise awareness of some important problems in expression analysis, and that my progress on<br />
addressing those problems was on the right track. I was especially pleased to have a number<br />
of good conversations with biologists and statisticians, and to be invited by several groups to<br />
give seminars to them in the near future. The talk had the further effect of generating interest<br />
in my poster, which I found very well attended once I had given the talk. On my return, I was<br />
also very happy to find that my paper was one of ten or so to have been selected for inclusion<br />
in a special issue of the Journal of Bioinformatics and Computational Biology.<br />
The other talks were generally of a high technical standard, although a few failed to hold my<br />
interest on account of poor presentation. Personally, I found the poster sessions especially<br />
effective, as I was able to understand the work at my own pace and then interact one-on-one<br />
with the authors. Two of the best posters were one by Berger et al. of UCSB on the analysis<br />
of two different kinds of microarray data using a linear algebraic method called ‘Generalized<br />
Singular Value Decomposition’. Another excellent poster was by Xie et al. who showed that<br />
the dynamic range of gene expression measurements depends on the functional category of
the genes. This is an extremely interesting, albeit preliminary, finding, with important<br />
implications for the analysis of expression data.<br />
The best of the keynote speakers were Gene Myers and Michael Eisen, both from the<br />
University of California at Berkeley. Prof Myers' leadership in bioinformatics pre-dates the<br />
term itself, and he provided a superb overview of where the field is heading. His theme was<br />
the importance of seeing the ‘big picture’ in terms of biology. He rightly noted that it is<br />
crucial to go beyond simply ‘cranking the handle’ and observed that there is a growing<br />
tendency in the field to apply ever more complex methods to data without giving much<br />
thought to the underlying scientific questions. Dr. Eisen talked about Drosophila<br />
development, and displayed an infectious enthusiasm for his subject. He emphasised the<br />
importance of bringing together diverse data types to shed light on developmental processes.<br />
With so much work in bioinformatics rather narrowly focused on array analysis (including,<br />
admittedly, my own), his perspective as a biologist was timely and valuable.<br />
Thus, my overall impression was that bioinformatics is heading into a period where key<br />
themes will be the fusion of heterogeneous data types and the integration of biological<br />
knowledge into computational algorithms. To date, a lot of work has focused on one of<br />
expression, protein or sequence information, but increasingly these data will have to be dealt<br />
with together. Bayesian methods have proved adept at handing data fusion in areas like<br />
robotics and sensors, and my feeling is that Bayesian inference will prove if anything even<br />
more popular in bioinformatics, where computational efficiency is generally less of a concern<br />
than in many ‘real-time’ engineering systems. One of the bottlenecks in current<br />
bioinformatics research seems to be the fact that in many ways experimentalists and theorists<br />
do not yet share a common language. One solution to this problem is simply to have the<br />
various communities interact more. Indeed, teams are becoming ever-more multidisciplinary,<br />
with researchers from different fields interacting on a daily, rather than weekly or monthly,<br />
basis. Mixed wet-lab and theory institutes are now appearing at a number of universities,<br />
including the Centre for Gene Function here at Oxford. It was clear at the conference that<br />
only a handful of computer scientists and statisticians really have a good understanding of<br />
biology, and equally that few biologists are really comfortable with the language of<br />
mathematics. Looking at more mature fields like biophysics and biochemistry, it seems that<br />
what is needed are individuals who have both wet-lab training and an understanding of<br />
mathematics and computation. Talking to some of the Stanford undergraduates at the<br />
conference, it seemed that this is precisely the direction in which their training is now headed.
Overall, CSB 2004 was a great experience, enjoyable as well as productive. The lessons I<br />
took away were to learn more biology, take even more care to make my talks and writing<br />
accessible to biologists as well as engineers, and finally to stay in bioinformatics! While<br />
clearly in its early stages as a discipline, the sheer breadth of questions to be answered and the<br />
need for powerful computational algorithms with which to address those questions make it an<br />
exciting field for information engineers to work in.