SESSION NOVEL ALGORIHMS AND APPLICATIONS + ...
SESSION NOVEL ALGORIHMS AND APPLICATIONS + ...
SESSION NOVEL ALGORIHMS AND APPLICATIONS + ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
22 Int'l Conf. Foundations of Computer Science | FCS'11 |<br />
Crafting a Lightweight Search Engine<br />
Abstract – Web-based search is commonly perceived<br />
as a desirable functionality of web sites. Although there<br />
are several open source search engines that can be<br />
tailored and embedded in a web site for free, those<br />
engines tend to be general and may not be easy to<br />
adapt them into an efficient search engine within a<br />
particular application domain. In this paper, I<br />
presented an easy but practical alternative that can be<br />
followed by readers as a template to build their own<br />
lightweight search engines with only a few<br />
programming.<br />
Keywords: Lightweight Search Engine, Web-Based<br />
Query.<br />
1 Introduction<br />
As more and more information is digitized and<br />
made available on the web, nowadays, online search is<br />
popularly used and commonly perceived as an essential<br />
functionality of a web site. Although there are several<br />
open source search engines, such as ASPSeek [1],<br />
Lucene [2], Namazu [3], mnoGoSearch [4], and<br />
WebGlimpse [5], available for web programmers to<br />
customize and fit into their own business operations,<br />
most of those free search engines are originated from<br />
general purposes and may not be easy to be reformed<br />
into special purpose search engines [6, 7]. Besides,<br />
embedding a big volume component into a small scope<br />
of operational web site may compromise the search<br />
performance due to the big consumption of CUP time.<br />
Another downside of being an open source search<br />
engine follower is the limitation on free technical<br />
support. Although some voluntary participants are<br />
willing to share their know-how through mailing list or<br />
online conferencing, most of the advanced consultants<br />
are still by payment. To this end, embedding an open<br />
source engine into a special purposed web site may not<br />
be the best choice despite the openness of its source<br />
code. With these concerns in mind, this hands-on<br />
project is crafted to provide an easy alternative for<br />
those in need of a web-based search engine but<br />
embedding an open source engine is not a practical<br />
practice. Instead of spending a great deal of time<br />
Feng-Jen Yang<br />
Department of Mathematics and Information Sciences<br />
University of North Texas at Dallas<br />
Dallas, TX 75241<br />
striving to understand a huge open source search engine<br />
and then tailor it to fit into a specific domain. We can<br />
actually perform a very similar functionality by crafting<br />
a lightweight search engine from the scratch that<br />
requires only a few programming.<br />
As an illustrative implementation of this lightweight<br />
search engine, I hypothetically confined the search<br />
domain into online books lookup and have the search<br />
engine perform partial-matched searches instead of<br />
exact-matched search to allow some fuzziness during<br />
the compare operations. For the purpose of quick<br />
prototyping, I am using only three technologies that<br />
most of the programmers are familiar with, namely the<br />
Microsoft Access, the Active Server Page (ASP) and<br />
the HyperText Markup Language (HTML). To ensure<br />
that most of the readers can follow and try out this<br />
implementation, the rest of this paper is written in an<br />
instructional and stepwise manner.<br />
1.1 The Coherence of Search Operations<br />
An effective search engine is not only efficient in<br />
the lookup for items but, more importantly, also able to<br />
cope with human’s partial, fuzzy, or incomplete<br />
memory about the keywords and other search criteria.<br />
Personally, I perceived this as the coherence between<br />
general users and the search operations. This<br />
expectation can be met by allowing users to perform<br />
searches based on their partial or fuzzy memory about<br />
the data they are looking for. Technically, this can be<br />
achieved by allowing partial and incomplete keywords<br />
to be used as search criteria [8]. Indeed, expecting users<br />
to spell out complete and correct keywords for the<br />
items they are looking for is neither necessary nor<br />
practical in real life, since most of the human memory<br />
can only be retained for a short term. Fuzziness and<br />
uncertainty caused by human’s short term memory<br />
should be considered and incorporated into the design<br />
and implementation of search operations.<br />
1.2 The Replacement for Web Crawler<br />
Unlike a typical search engine that is counting on a<br />
web crawler to glean and index information