06.06.2014 Views

Mr Sach Mukherjee

Mr Sach Mukherjee

Mr Sach Mukherjee

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Report on the IEEE Bioinformatics Conference 2004<br />

<strong>Sach</strong> <strong>Mukherjee</strong><br />

Department of Engineering Science<br />

University of Oxford.<br />

The IEEE Computer Society Bioinformatics Conference (or CSB) is one of a handful of<br />

highly selective meetings in the area of bioinformatics and computational biology. Held every<br />

year at Stanford University, CSB attracts around six hundred attendees, but is single-track.<br />

Bioinformatics is an inherently multidisciplinary field, attracting, among others, engineers,<br />

biologists and statisticians, and CSB attendees tend to reflect this diversity. This year’s<br />

conference took place in mid-August and comprised around thirty talks, a hundred posters<br />

and a half-dozen tutorials.<br />

My own research focuses on the development and application of statistical and machine<br />

learning methods to the analysis of gene expression data. In particular, my work addresses the<br />

problem of finding genes whose levels of activity differ between conditions such as healthy<br />

and diseased, a task called ‘gene selection’. Having shown theoretically that many widelyused<br />

methods for gene selection are simply not very robust, and that an unsupervised learning<br />

approach could be used to significantly ameliorate the problem, I was eager to communicate<br />

my findings to the bioinformatics community. I was interested to hear the views of wet-lab<br />

biologists and statisticians working in broadly the same area. Beyond my immediate research<br />

area, I was also keen to learn as much as possible about where bioinformatics in general is<br />

heading. My goals in attending CSB were therefore threefold: to advertise my own work,<br />

interact with scientists working on similar problems, and also to network more broadly with<br />

researchers from other areas of bioinformatics.<br />

Having previously attended a large international meeting in machine learning (NIPS 2002, in<br />

Vancouver) I was aware that planning would be crucial to fully realising my goals. I had two<br />

separate pieces of work accepted to the conference, a paper and a poster. I was fortunate to<br />

have had my paper selected for podium presentation, which provided me with the opportunity<br />

of giving a 30 minute presentation to the entire conference. My talk turned out to be<br />

scheduled relatively early in the conference, so with regard to promoting my work, I focused<br />

my efforts on simply preparing an effective talk. The likely diversity of the audience led me<br />

to avoid excessive jargon and concentrate on presenting a coherent ‘story’. I also compiled a<br />

short list of people working my sub-area, with whom I specifically wanted to establish<br />

contact, and decided to make every effort to have at least one meaningful conversation with<br />

each of them.


The conference was organized into a total of 14 sessions covering every area of<br />

bioinformatics, from protein structure to gene ontologies. The organizers kept the sessions<br />

short with frequent coffee-breaks in between. This arrangement proved extremely popular as<br />

the breaks provided excellent opportunities to interact one-on-one with speakers, and discuss<br />

ideas in more detail than is possible when asking a question at the end of a talk. Coffee-breaks<br />

also turned out to be an excellent opportunity for ‘targeted’ networking, and over the course<br />

of the conference I was able to have good conversations with most of the researchers on my<br />

list. The setting of the conference could not have been better, with all meals taken in<br />

Stanford's sunny Dohrmann Grove, and posters presented in open-air sessions either side of<br />

the main auditorium. Meal-times proved quite productive too, as I found myself sharing tables<br />

with researchers from a wide variety of backgrounds. This had the effect of opening my eyes<br />

to areas of research that I would otherwise rarely encounter.<br />

My own talk went extremely well, and generated a good deal of interest. I spent several<br />

coffee breaks afterwards discussing my work with interested members of the audience, many<br />

of whom asked extremely good questions. The most useful comments concerned the<br />

biological validation of my largely theoretical results. A key problem in expression analysis is<br />

the scarcity of ‘gold-standard’ datasets, where the truth is known a priori, but several<br />

attendees were able to point me to some very good data upon which to further test my<br />

methods. Indeed, ideas that emerged at CSB have since formed the basis for a new paper<br />

which is currently under review. By and large, my impression was that my work did indeed<br />

raise awareness of some important problems in expression analysis, and that my progress on<br />

addressing those problems was on the right track. I was especially pleased to have a number<br />

of good conversations with biologists and statisticians, and to be invited by several groups to<br />

give seminars to them in the near future. The talk had the further effect of generating interest<br />

in my poster, which I found very well attended once I had given the talk. On my return, I was<br />

also very happy to find that my paper was one of ten or so to have been selected for inclusion<br />

in a special issue of the Journal of Bioinformatics and Computational Biology.<br />

The other talks were generally of a high technical standard, although a few failed to hold my<br />

interest on account of poor presentation. Personally, I found the poster sessions especially<br />

effective, as I was able to understand the work at my own pace and then interact one-on-one<br />

with the authors. Two of the best posters were one by Berger et al. of UCSB on the analysis<br />

of two different kinds of microarray data using a linear algebraic method called ‘Generalized<br />

Singular Value Decomposition’. Another excellent poster was by Xie et al. who showed that<br />

the dynamic range of gene expression measurements depends on the functional category of


the genes. This is an extremely interesting, albeit preliminary, finding, with important<br />

implications for the analysis of expression data.<br />

The best of the keynote speakers were Gene Myers and Michael Eisen, both from the<br />

University of California at Berkeley. Prof Myers' leadership in bioinformatics pre-dates the<br />

term itself, and he provided a superb overview of where the field is heading. His theme was<br />

the importance of seeing the ‘big picture’ in terms of biology. He rightly noted that it is<br />

crucial to go beyond simply ‘cranking the handle’ and observed that there is a growing<br />

tendency in the field to apply ever more complex methods to data without giving much<br />

thought to the underlying scientific questions. Dr. Eisen talked about Drosophila<br />

development, and displayed an infectious enthusiasm for his subject. He emphasised the<br />

importance of bringing together diverse data types to shed light on developmental processes.<br />

With so much work in bioinformatics rather narrowly focused on array analysis (including,<br />

admittedly, my own), his perspective as a biologist was timely and valuable.<br />

Thus, my overall impression was that bioinformatics is heading into a period where key<br />

themes will be the fusion of heterogeneous data types and the integration of biological<br />

knowledge into computational algorithms. To date, a lot of work has focused on one of<br />

expression, protein or sequence information, but increasingly these data will have to be dealt<br />

with together. Bayesian methods have proved adept at handing data fusion in areas like<br />

robotics and sensors, and my feeling is that Bayesian inference will prove if anything even<br />

more popular in bioinformatics, where computational efficiency is generally less of a concern<br />

than in many ‘real-time’ engineering systems. One of the bottlenecks in current<br />

bioinformatics research seems to be the fact that in many ways experimentalists and theorists<br />

do not yet share a common language. One solution to this problem is simply to have the<br />

various communities interact more. Indeed, teams are becoming ever-more multidisciplinary,<br />

with researchers from different fields interacting on a daily, rather than weekly or monthly,<br />

basis. Mixed wet-lab and theory institutes are now appearing at a number of universities,<br />

including the Centre for Gene Function here at Oxford. It was clear at the conference that<br />

only a handful of computer scientists and statisticians really have a good understanding of<br />

biology, and equally that few biologists are really comfortable with the language of<br />

mathematics. Looking at more mature fields like biophysics and biochemistry, it seems that<br />

what is needed are individuals who have both wet-lab training and an understanding of<br />

mathematics and computation. Talking to some of the Stanford undergraduates at the<br />

conference, it seemed that this is precisely the direction in which their training is now headed.


Overall, CSB 2004 was a great experience, enjoyable as well as productive. The lessons I<br />

took away were to learn more biology, take even more care to make my talks and writing<br />

accessible to biologists as well as engineers, and finally to stay in bioinformatics! While<br />

clearly in its early stages as a discipline, the sheer breadth of questions to be answered and the<br />

need for powerful computational algorithms with which to address those questions make it an<br />

exciting field for information engineers to work in.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!