09.01.2013 Views

SESSION NOVEL ALGORIHMS AND APPLICATIONS + ...

SESSION NOVEL ALGORIHMS AND APPLICATIONS + ...

SESSION NOVEL ALGORIHMS AND APPLICATIONS + ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

22 Int'l Conf. Foundations of Computer Science | FCS'11 |<br />

Crafting a Lightweight Search Engine<br />

Abstract – Web-based search is commonly perceived<br />

as a desirable functionality of web sites. Although there<br />

are several open source search engines that can be<br />

tailored and embedded in a web site for free, those<br />

engines tend to be general and may not be easy to<br />

adapt them into an efficient search engine within a<br />

particular application domain. In this paper, I<br />

presented an easy but practical alternative that can be<br />

followed by readers as a template to build their own<br />

lightweight search engines with only a few<br />

programming.<br />

Keywords: Lightweight Search Engine, Web-Based<br />

Query.<br />

1 Introduction<br />

As more and more information is digitized and<br />

made available on the web, nowadays, online search is<br />

popularly used and commonly perceived as an essential<br />

functionality of a web site. Although there are several<br />

open source search engines, such as ASPSeek [1],<br />

Lucene [2], Namazu [3], mnoGoSearch [4], and<br />

WebGlimpse [5], available for web programmers to<br />

customize and fit into their own business operations,<br />

most of those free search engines are originated from<br />

general purposes and may not be easy to be reformed<br />

into special purpose search engines [6, 7]. Besides,<br />

embedding a big volume component into a small scope<br />

of operational web site may compromise the search<br />

performance due to the big consumption of CUP time.<br />

Another downside of being an open source search<br />

engine follower is the limitation on free technical<br />

support. Although some voluntary participants are<br />

willing to share their know-how through mailing list or<br />

online conferencing, most of the advanced consultants<br />

are still by payment. To this end, embedding an open<br />

source engine into a special purposed web site may not<br />

be the best choice despite the openness of its source<br />

code. With these concerns in mind, this hands-on<br />

project is crafted to provide an easy alternative for<br />

those in need of a web-based search engine but<br />

embedding an open source engine is not a practical<br />

practice. Instead of spending a great deal of time<br />

Feng-Jen Yang<br />

Department of Mathematics and Information Sciences<br />

University of North Texas at Dallas<br />

Dallas, TX 75241<br />

striving to understand a huge open source search engine<br />

and then tailor it to fit into a specific domain. We can<br />

actually perform a very similar functionality by crafting<br />

a lightweight search engine from the scratch that<br />

requires only a few programming.<br />

As an illustrative implementation of this lightweight<br />

search engine, I hypothetically confined the search<br />

domain into online books lookup and have the search<br />

engine perform partial-matched searches instead of<br />

exact-matched search to allow some fuzziness during<br />

the compare operations. For the purpose of quick<br />

prototyping, I am using only three technologies that<br />

most of the programmers are familiar with, namely the<br />

Microsoft Access, the Active Server Page (ASP) and<br />

the HyperText Markup Language (HTML). To ensure<br />

that most of the readers can follow and try out this<br />

implementation, the rest of this paper is written in an<br />

instructional and stepwise manner.<br />

1.1 The Coherence of Search Operations<br />

An effective search engine is not only efficient in<br />

the lookup for items but, more importantly, also able to<br />

cope with human’s partial, fuzzy, or incomplete<br />

memory about the keywords and other search criteria.<br />

Personally, I perceived this as the coherence between<br />

general users and the search operations. This<br />

expectation can be met by allowing users to perform<br />

searches based on their partial or fuzzy memory about<br />

the data they are looking for. Technically, this can be<br />

achieved by allowing partial and incomplete keywords<br />

to be used as search criteria [8]. Indeed, expecting users<br />

to spell out complete and correct keywords for the<br />

items they are looking for is neither necessary nor<br />

practical in real life, since most of the human memory<br />

can only be retained for a short term. Fuzziness and<br />

uncertainty caused by human’s short term memory<br />

should be considered and incorporated into the design<br />

and implementation of search operations.<br />

1.2 The Replacement for Web Crawler<br />

Unlike a typical search engine that is counting on a<br />

web crawler to glean and index information

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!