13.07.2015 Views

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH ... - EJUM

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH ... - EJUM

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH ... - EJUM

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Improving the relevancy of document search using the multi-term adjacency keyword-order model. pp 1-10The search process begins with scanning the first document D1 sequentially from the first term to the last. Then foreach term, the term that is found to be a keyword will be added to the keyword array (the purpose of the keywordarray is to group all the adjacency keywords together). To construct the document vector, the information needed areterms, keyword group size, keyword group order, term frequency and number of keyword-order pairs that exist.Term frequency is accumulated each time the term occurs in the document. Keyword group size is computed basedon the size of the keyword array, whereas the keyword group order and number of keyword-order pairs aredetermined based on the pseudo-code shown in Fig 1.Input: keyword array, query arrayOutput: keyword group order, keyword-order pairsBegin…For each term in keyword arrayGet the position in query array where term matches keywordGet next term in keyword arrayGet next keyword in query arrayCompare next term with next keywordIf next term = next keyword,Set keyword group order = 1Increment keyword-order pair by 1…EndFig. 1. Pseudo-code for determining the keyword group order and keyword-order pairsAfter all the terms have been scanned, the term-by-group matrix is computed for the documents. In term-by-groupmatrix, rows correspond to terms in the document; columns correspond to keyword group and cells correspond tofrequency of term occurrence in the keyword group. Table 4 shows the term-by-group matrix for document vectorD1. The types of group are denoted as Type 1 for standalone keywords, Type 2 for group with two keywords, Type3 for group with three keywords.Malaysian Journal of Computer Science. Vol. 25(1), 20124

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!