13.07.2015 Views

Finding All Approximate Palindromes

Finding All Approximate Palindromes

Finding All Approximate Palindromes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The 25th Workshop on Combinatorial Mathematics and Computation Theory<strong>Finding</strong> <strong>All</strong> <strong>Approximate</strong> <strong>Palindromes</strong>L. C. Chen and R. C. T. LeeDepartment of Computer Science and Information Engineering,National Chi Nan University{s95321503,rctlee}@ncnu.edu.twAbstractIn this paper, we discuss the approximatepalindrome finding problem. An approximatepalindrome with error k is a string S S 1S2orS S 1aS 2such that the edit distance between S1and the reverse of S2is smaller than or equal tok . We give an algorithm to solve the allapproximate palindrome finding problem definedas follows: Given a text string T and a k , findall occurrences of approximate palindromes witherror k ' in T for 0 k' k .1. IntroductionA string S is a palindrome if S Swhere S denotes the reverse of S . Thepalindrome finding problem is defined as follows:Given a string S , find all palindromes within S .Algorithms to solve this problem can be found in[9].We now give the palindrome a newdefinition: A string S is a palindrome if it is ofthe form of S 1S2or S 1aS2such that S 1exactly matches with S2. A palindrome iseven(odd) if it is in the form of S 1S2( S 1aS2).Based upon the above definitions, we definean approximate palindrome with error k as astring of the form of S 1S2or S 1aS2such thatED( S , S ) k1 2where ED ( A,B)denotes theedit distance between two strings A and B . Itis customary to define the edit distance betweentwo strings A and B , denote ED ( A,B), asthe minimum number of substitutions, insertionsand deletions of characters in A to transform Ato B . Edit distances can be found by using thedynamic programming approach [3, 8 and 14].In [13], the maximal approximate palindrome witherror k was defined and algorithms to solve themaximal approximate palindrome finding problemwas given.In this paper, we shall solve the allapproximate palindromes with error k findingproblem.2. The <strong>All</strong> <strong>Approximate</strong> Palindrome<strong>Finding</strong> ProblemWe define the all approximate palindromefinding problem as follows: Given a text Tand a k , find all occurrences of approximatepalindromes with error at mostk ' ,0 k' k , inT . Before we present the algorithm to solve theproblem, let us note an important point here, wecall it “length property”. Suppose we havestrings A and B , and A m . If ED( A,B) k ,thenm k B m k . That is, the length ofB can be neither too small nor too large. Thisproperty has been used in many papers [2, 5, 10,11, and 15]. We now use the above property tosolve the approximate palindrome finding problem.In the following, we let T t 1t 2tnand letT ( i,j)denote t t ti i 1 .jBefore giving the details of our algorithm, letus assume that we are going to find all evenapproximate palindromes. We perform a linearscan of the text T . To further simplify ourdiscussion, we assume that i n2. Let S 1denote T ( 1, i)and S2denote T ( i 1,min{ 2i k,n}). Our problem now becomes thefollowing problem: Given a k ' , k' 0,1, kfind each prefix A of S2such thatED( S 1(1, i'), A) k', where 1 i' iThus our problem is the same as theapproximate string matching problem defined asfollows: Given a text T t t ... t , a patternp m1 2P p p ... and a error bound k , find all the12substrings of T whose edit distances with Pare less than or equal to k . Many researchesn1-242-


The 25th Workshop on Combinatorial Mathematics and Computation Theoryfocus on algorithms to find the edit distancebetween two strings, and many of them [1, 3, 4, 6,7, 10, 11, 12, 14, 15, and 16] use the followingrule: Consider the two strings A A 1A2andB B 1B 2, if A2 B2and ED ( A1 , B1) k , thenED ( A,B) k .There are two approaches to solve theapproximate string matching problem. The firstapproach [12, 2, 10, 11 and 15] is a local approachwhich computes ED ( P,S)where S is asubstring of T starting from each location i .That is, for every possible position i , we testwhether there exists a substring S of Tstarting from this location such that ED( P,S) k .Another approach [16], named Approach 2, is aglobal approach which can solve the problem inmO( nk ) time-complexity where w is the word wsize. We shall use the technique in [16] to solvethe problem.In the above, we only talked about one case.In the following, we give the precise definition ofS1and S 2for all cases.Case 1: We find even approximate palindromes. If1 i n2, let S1be T ( 1, i)and S2beT ( i 1, min{2i k,n}); otherwise, let S1beT (max{ 2i n k 1,1},i)and S2beT ( i 2, n).Case 2: We find odd approximate palindromes.If 1 i n2 , let S 1be T ( 1, i)and S 2beT ( i 2,min{2i k 1,n}); otherwise, let S 1beT (max{ 2i n k 2,1}, i)and S2beT ( i 2, n).Essentially, for each location i in T , wecompute the edit distance between S 1and S 2.Given two strings S 1 p 1p 2 p m',S t t tand k , we define2 1 2 n'M k1,if ED(S1(1, j),S2(1, i)) k( i,j) ,0,otherwise.where 0 i n'and 0 j m'.We now use dynamic approach to find M k ( i,j).Let M k ( i,j)I, M k ( i,j)Dand M k ( i,j)Sdenotethe M k( i,j)related to insertion, deletion andsubstitution respectively, wherek 11, if t p , M ( i 1,j) 1i jkkM ( i,j) or ti pjand M ( i 1,j 1) 1 0, otherwise.MI,kDandMkSk 11, if t p , M ( i,j 1) 1i jk( i,j) or ti pjand M ( i 1,j 1) 1 0, otherwise.k1, if t p and Mi j( i,j) or ti pjand M 0, otherwise.1k( i 1,j 1) 1( i 1,j 1) 1After every M k ( i,j)I, M k ( i,j)DandM k S( i,j)are found, we can determine M k( i,j)by M k ( i,j)= M k I( i,j)or M k D( i,j)or ( i,j)M k Swhere “or” is the “logical or” operation .Our algorithm is presented in the following:Input: A string T t1t2... tn.and k.Output: <strong>All</strong> occurrences of approximate palindromesof T with error k ' .for i=1 to n dofor 0 k' k dofor 0 i' n'and 0 j' m'dok 'M(0, j') 1, if 0 j' k'k ' k 'M ( i', j') M( i',0) 1, if 0 i' k'0 otherwise'M k '( i', j') M k '( i', j') or ( i', j')IM k D'or M k S( i', j') ;end forend forend for2-243-


The 25th Workshop on Combinatorial Mathematics and Computation TheoryAn ExampleWe are given a stringT abaaaacaacaa,k 1 . Since there will be too many approximatepalindromes if we consider every location i , weonly present the result for i 4 and we only findthe even approximate palindromes. In this case,T ( 1,4) abaa , S T (1,4 ) aaba1and T ( 5,10) aacaa .0MS 2a a c a a1 0 0 0 0 0a 0 1 0 0 0 0a 0 0 1 0 0 0b 0 0 0 0 0 0a 0 0 0 0 0 0From the above table0M , aa and aaaa are all ofthe even approximate palindromes with error 0.1Ma a c a a1 1 0 0 0 0a 1 1 1 0 0 0a 0 1 1 1 0 0b 0 0 1 1 0 0a 0 0 0 0 1 0Again, we can see that a, aa, aaa, aaaa,baaaa,aaaac, baaaac and abaaaaca are all of the evenapproximate palindromes with error 1.3. Concluding remarksIn this paper, we introduced the definition ofapproximate palindromes and the all approximatepalindromes finding problem. Then we use the“length property” and the approach in [WM92] tosolve this problem faster. In the future, we planto investigate the all approximate palindromesubsequence problem.References[1] Amir, A., Keselman, D., Landau, G.. M.,Lewenstein, M., Lewenstein, N. andRodeh, Text Indexing and DictionaryMatching with One Error, M., Journal ofAlgorithms, Vol. 37, 2000, pp. 309-325.[2] Fredriksson, K. and Navarro, G.,Average-Optimal Multiple <strong>Approximate</strong>String Matching, ACM Journal ofExperimental Algorithmics, Vol 9, No.1.4, 2004, pp. 1-47.[3] Heikki H., Bit-parallel approximatestring matching algorithms withtransposition, Journal of DiscreteAlgorithms, Vol. 3, 2005, pp. 215-229.[4] Huynh, T. N. D., Hon, W. K., Lam, T. W.and Sung, W. K., <strong>Approximate</strong> StringMatching Using Compressed SuffixArrays, Theoretical Computer Science,Vol. 352, 2006, pp. 240-249.[5] Holub, J. and Melichar, B., <strong>Approximate</strong>string matching using factor automata, ,Theoretical Computer Science, Vol. 249,2000, pp. 305-311.[6] Hyyro, H. and Navarro, G., Bit-parallelWitnesses and their Applications to<strong>Approximate</strong> String Matching,Algorithmica, Vol 4, No. 3, 2005, pp.203-231.[7] Landau, G. and Vishkin, U., Fast paralleland serial approximate string matching,Journal of algorithms, Vol. 10, 1989, pp.157-169.[8] Lee, R.C.T, Tseng, S. S., Chang, R. C.and Chang, Y. T., Introduction to thedesign and analysis of algorithms, McGraw Hill, 2005, pp. 253-316.[9] Manacher, G., A new linear-time on-linealgorithm for finding the smallest initialpalindrome of the string, J. ACM 22,1975, pp. 8-19.[10] Navarro, G. and Baeza-Yates, R., Veryfast and simple approximate stringmatching, Information ProcessingLetters, Vol. 72, 1999, pp. 65-70.[11] Navarro, G. and Baeza-Yates, R., AHybrid Indexing Method for<strong>Approximate</strong> String Matching, J.Discret. Algorithms, Vol.1, No.1,2000, pp. 205-239.[12] Needleman, S. B. and Wunsch, C. D., Ageneral method applicable to the searchfor similarities in the aminoacidsequence of two proteins, Journal ofMolecular Biology, Vol. 48, 1970, pp.3-244-


The 25th Workshop on Combinatorial Mathematics and Computation Theory443-453.[13] Porto, A. H. L and Barbosa, V. C.,<strong>Finding</strong> approximate palindromes instrings, Pattern Recognition, Vol.35,2002, pp. 2581-2591.[14] Sellers, P. H., String Matching withErrors, Journal of Algorithms, Vol. 20,No. 1, 1980, pp. 359-373.[15] Tarhio, J. and Ukkonen, E., <strong>Approximate</strong>Boyer-Moore String Matching, SIAMJournal on Computing, Vol. 22, No. 2,1993, pp. 243-260.[16] Wu, S. and Manber, U., Fast TextSearching: <strong>All</strong>owing Errors,Communications of the ACM, Vol. 35,1992, pp. 83-91.4-245-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!