SESSION NOVEL ALGORIHMS AND APPLICATIONS + ...

More documents

Recommendations

Info

22 Int'l Conf. Foundations of Computer Science | FCS'11 | Crafting a Lightweight Search Engine Abstract – Web-based search is commonly perceived as a desirable functionality of web sites. Although there are several open source search engines that can be tailored and embedded in a web site for free, those engines tend to be general and may not be easy to adapt them into an efficient search engine within a particular application domain. In this paper, I presented an easy but practical alternative that can be followed by readers as a template to build their own lightweight search engines with only a few programming. Keywords: Lightweight Search Engine, Web-Based Query. 1 Introduction As more and more information is digitized and made available on the web, nowadays, online search is popularly used and commonly perceived as an essential functionality of a web site. Although there are several open source search engines, such as ASPSeek [1], Lucene [2], Namazu [3], mnoGoSearch [4], and WebGlimpse [5], available for web programmers to customize and fit into their own business operations, most of those free search engines are originated from general purposes and may not be easy to be reformed into special purpose search engines [6, 7]. Besides, embedding a big volume component into a small scope of operational web site may compromise the search performance due to the big consumption of CUP time. Another downside of being an open source search engine follower is the limitation on free technical support. Although some voluntary participants are willing to share their know-how through mailing list or online conferencing, most of the advanced consultants are still by payment. To this end, embedding an open source engine into a special purposed web site may not be the best choice despite the openness of its source code. With these concerns in mind, this hands-on project is crafted to provide an easy alternative for those in need of a web-based search engine but embedding an open source engine is not a practical practice. Instead of spending a great deal of time Feng-Jen Yang Department of Mathematics and Information Sciences University of North Texas at Dallas Dallas, TX 75241 striving to understand a huge open source search engine and then tailor it to fit into a specific domain. We can actually perform a very similar functionality by crafting a lightweight search engine from the scratch that requires only a few programming. As an illustrative implementation of this lightweight search engine, I hypothetically confined the search domain into online books lookup and have the search engine perform partial-matched searches instead of exact-matched search to allow some fuzziness during the compare operations. For the purpose of quick prototyping, I am using only three technologies that most of the programmers are familiar with, namely the Microsoft Access, the Active Server Page (ASP) and the HyperText Markup Language (HTML). To ensure that most of the readers can follow and try out this implementation, the rest of this paper is written in an instructional and stepwise manner. 1.1 The Coherence of Search Operations An effective search engine is not only efficient in the lookup for items but, more importantly, also able to cope with human’s partial, fuzzy, or incomplete memory about the keywords and other search criteria. Personally, I perceived this as the coherence between general users and the search operations. This expectation can be met by allowing users to perform searches based on their partial or fuzzy memory about the data they are looking for. Technically, this can be achieved by allowing partial and incomplete keywords to be used as search criteria [8]. Indeed, expecting users to spell out complete and correct keywords for the items they are looking for is neither necessary nor practical in real life, since most of the human memory can only be retained for a short term. Fuzziness and uncertainty caused by human’s short term memory should be considered and incorporated into the design and implementation of search operations. 1.2 The Replacement for Web Crawler Unlike a typical search engine that is counting on a web crawler to glean and index information
Int'l Conf. Foundations of Computer Science | FCS'11 | 23 automatically from the entire World Wide Web. For smaller domains of search, it is not necessary to be overwhelmed by world-wide information. Instead, it could be more efficient, if the crawler is replaced by a supportive database which can be maintained in a regular manner of database design and administration. In this project, I adopted Microsoft Access as the platform for creating and maintaining the database that is running at the backend to support the web-based searches. 2. The Supportive Database In this demonstrative implementation, a relational database is created and executed at the backend to support the web-based query operations. For simplification, the data contents are minimized on purpose to have only one table that can be created by using the integrated development environment of Microsoft Access in the following steps: 1. Start from a blank database and use the table design wizard to create a table named Book with the following schema, in which the No field is chosen to be the primary key: 2. Within the same wizard, set the No field with the following properties: 3. Within the same wizard, set the Title field with the following properties: 4. Within the same wizard, set the Author field with the following properties: 5. Open the Book table and enter the following hypothetical data: 6. Name the database as BookDB.accdb and save it to the folder at: 1 C:\inetpub\wwwroot\search Although the above database is very simplified, it does reflect the stereotype of data collections for the purpose of online search. The number of columns as well as the number of tables can be expended as needed. The whole database can also be further refined by performing a certain level of normalizations on the schema. 3. The User Interface Since the supportive database is hidden from general users but running at the backend, the search engine must provide a friendly frontend operational interface that allows users to specify their search criteria for their target data. The engine can then go on to look for those items that are partially matched to these criteria. In the context of text-based search, the search criteria are usually represented by a combination of keywords entered from users. To work with human’s fuzzy and uncertain memory about their intended data, this search engine relaxes the restriction on search 1 To use a Windows PC as the web server, the Internet Information Server (IIS) component of Windows must be installed. After installing IIS, the inetpub folder and its subfolder wwwroot are created automatically. The programs have to create the search folder as a subfolder of wwwroot and save the database at this location.
Page 1 and 2: Int'l Conf. Foundations of Computer
Page 21: Int'l Conf. Foundations of Computer
Page 73 and 74:
Int'l Conf. Foundations of Computer
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
Page 193 and 194:
Page 195 and 196:
Page 197 and 198:
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
Page 207 and 208:
Page 209 and 210:
Page 211 and 212:
Page 213 and 214:
Page 215 and 216:
Page 217 and 218:
Page 219 and 220:
Page 221 and 222:
Page 223 and 224:
Page 225 and 226:
show all

SESSION NOVEL ALGORIHMS AND APPLICATIONS + ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?