11.07.2015 Views

Information Retrieval and extraction

Information Retrieval and extraction

Information Retrieval and extraction

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table2 Experiment results with stemmingFull IndexIncrementalSearchAveragePrecision atPrecisionTime(sec)timeTime(sec)PrecisionR(30%)at 10 docsFBIS3 211.6s 2s 0.13 0.236 0.30FBIS3+FBIS4 483s 217.8s 3.4s 0.10 0.107 0.16There was a problem in our experiment. From the Fig1, we can see column 9 <strong>and</strong> 18.We use a function named “RetEval”. There are several parameters that we need totune up. The parameters are shown in Figure9.5. ConclusionThis is the first time we try to use lemur system to build our IR system. Weencountered lots of problems while using the toolkit. In the middle time, we still wantto build our system by our own, but the Lemur toolkit supports the construction ofbasic text retrieval systems using language modeling methods, as well as traditionalmethods such as those based on the vector space model <strong>and</strong> Okapi. As the toolkitevolves, it is expected that it will support research in a broader range of informationtechnologies such as filtering, <strong>and</strong> even question answering. In a word, the toolkit isso attractive that we still decided to use it. Lemur has many applications for indexing<strong>and</strong> retrieval that are fully functional for many purposes, so we almost use them "outof the box". In addition, since Lemur was written to facilitate research on LM <strong>and</strong> IR,the design allows us to try out new retrieval methods by subclass abstract interfaces,or write new applications based on existing methods. This is a big problem for us,because we don’t know clearly about the parameters they defined. We had tried manytimes to tune up the parameters in order to find better results. But the behaviors arereally not smart, we tried to search our problem form their public forum. This forumis for the users <strong>and</strong> developers of the Lemur toolkit to discuss the software <strong>and</strong> haretips on using Lemur as well as to ask questions. The developers of the toolkit monitorthis forum on a regular basis. In the forum, we found lots of problems in the toolkit.Some codes in lemur toolkit are wrong, <strong>and</strong> we found the error in this forum. Theexpected performance is not so good.6. Appendix


Figure1. Flow OverviewFigure2. Process of parsing query


Figure3. The parameter of indexing FBIS3Figure4. The parameter of incremental indexing FBIS4Figure5. The parameter of indexing FBIS3 + FBIS4


Figure6. The retrieval of FBIS3Figure7. The retrieval of FBIS3 + FBIS4


Figure8. Selection in <strong>Retrieval</strong> model7. Reference[1] CMU Lemur http://www.lemurproject.org/[2] <strong>Information</strong> <strong>Retrieval</strong> Data Structures & Algorithmshttp://www.dcc.uchile.cl/~rbaeza/iradsbook/irbook.html

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!