Data Encryption Based On Protein Synthesis - Nguyen Dang Binh

More documents

Recommendations

Info

Figure 3: Illustration of first method - One Bite of data is translated to one Bite of code. Figure 4: Illustration of second method - One Bite of data is translated to a string. In Figures 5-a, b, these two coding methods are compared in term of their coding structure. Figure 5-a: indicates the table of unique codes based on first method. Elements including: 00, 01, 10 and 11. Figure 5-b: indicates the table of unique codes based on second method. Elements including: A, H, K and M 3.3 The transmission Since data communication in digital systems is in the form of 0 and 1, the first method well complies with these communication channels. However, because the second method produces the alphabetic outputs rather than binary digits, additional encoding is employed to transform alphabets to binary system while improving security. This is carried out by Huffman coding. 4 Required background of Huffman Code Huffman Coding is a lossless method of compressing data and a form of entropy encoding [1, 6]. Lossless data compression is an algorithm where original data can be reconstructed from the compressed data. Entropy encoding is lossless data compression which assigns codes to symbols with the length of each codeword proportional to the frequency of that symbol [4, 7]. Specifically Huffman codes are variable-length codes; shorter codeword are assigned to symbols with the highest frequency. This has an advantage over fixed-length codes as the average overall bits produced after encoding the data using variable-length codes, is fewer than encoding the same data using fixed-length codes. 4.1 Encoding using Huffman Tree The technique works by creating a binary tree of nodes. This is done first by considering each symbol to be encoded as a single tree, each consisting of a single node. Each node is shown by the frequency of its symbol. The trees which sum of their root nodes is the least total frequency among other trees are selected, “producing” the sub trees of a new root which is the sum of the frequencies of the sub trees. The sub trees are then removed from the forest. This process is recursive and continues until only one tree -The Huffman Encoding Tree- remains. Each level is represented using one bit code, 0 or 1. Classically a value of 0 is associated with an edge to any left child and a value of 1 with an edge to any right child .Thus the most frequent symbols’ codes have fewer bits as they are nearer to the root [1]. There are two variations for constructing Huffman codes: arbitrary and right-heavy Huffman coding. In the former, after combining two least frequent symbols as sub trees of the new root, the decision is arbitrary as to assign which sub tree as the right or left child of the root. However in the latter algorithm this is always the sub tree having greater frequency assigned as the right child. “By concatenating the labels associated with the edges that make up the path from the root to a leaf, we get a binary string. Thus the mapping is defined [1].”
4.2 Decoding the Huffman code In Huffman trees each symbol is a leaf which results in that Huffman codes have prefix property. In codes with prefix property, no codeword is the prefix of any other codeword in the set. The decoding is done by taking the root of the Huffman tree and recording 0 if the left child is traversed and 1 if the right child is visited. By reaching a leaf that symbol’s codeword is recovered. 5 Transmission encryption algorithms In the section 3.3, we have stated that our second encoding mechanism requires to be digitized. By means of Huffman coding, the data could be compressed and also encrypted for the second time. With 4 alphabet letters, 256 states are arranged. This results in the presence of maximum 256 different symbols and consequently the same maximum number of leaves in each Huffman tree. As all the possible states of arrangements of Amino acids might not be used in the data being encrypted, the number of leaves of each Huffman tree might be fewer than 256. Having constructed the Huffman tree, the output of second method is used for the input of the tree thus converting the arrays of alphabets into strings of binary data. But the problem is, the Huffman encoding tree should also be transmitted along with the message, reducing the effectiveness of this algorithm. One typical solution to the problem is to encode symbols along with their frequencies using RSA method and transmit them along with the message. At the receiver’s end, Huffman tree could be reconstructed from frequencies to decrypt the message. The alternative robust solution is to use the binary image of the recipient’s fingerprint, or specifically the binary image of his or her minutiae. In the process of fingerprinting, minutiae are specific points in a finger image which vary from person to person and also from finger to finger. After fingerprinting, the number and locations of minutiae for each finger is recorded and a binary image is created from this information. A binary image is defined as an image where each pixel has 2 possible values, black or white. Therefore each pixel can be stored in memory by one bit of information which is 1 if it is black and 0 if it is white [2]. When the Huffman tree is constructed for a message, the symbols’ codeword are achieved which are variable-length codes in a binary form. Having the binary codeword of each symbol, in the binary image of the recipient’s minutiae, the image should be searched to find the adjacent pixels which form each binary codeword. The addresses or coordinates of the first set of pixels forming the binary codeword of each symbol should be recorded and transmitted along with the message [2, 1]. At the destination, for decrypting the message, the intended recipient’s fingerprint should be used to decode the message. The binary image of the recipient’s minutiae is again used for achieving the codeword of symbols. This is done by putting together the information of pixels that their addresses -or coordinates- are given for each symbol’s codeword. As the minutiae of each finger of each individual is likely to be unique, if adversaries get access to the key, which is the symbols and their codeword in terms of the recipient’s minutiae information , it is nearly impossible for them to decrypt the message as they do not have access to the fingerprint of the recipient. 6 Conclusion and Future Works In this paper, after a brief revision of protein biosynthesis, we used it to introduce an encryption mechanism with two coding schemes. We have also taken the advantage of Huffman coding to further strengthen our method and provide compatibility with common communication medium. We are working on other methods to generate keys rather than minutiae of each finger. Besides, an extended coding scheme supporting several data types is in the center of attention. Acknowledgments We appreciate insightful instructions and generous contributions of Dr.R.Gharib and Mr.A.Abedini from whom we have learned a lot. References [1] Thomas H. Cormen, Charles E. Leiserson , Ronald L. Rivest , Clifford Stein. Introduction to Algorithms, Second Edition, The MIT Press (2001). [2] A. Farina, Z. M. Kovacs-Vajna, A. Leone, Fingerprint minutiae extraction from Skeletonized Binary Images, Pattern Recognition, (1999). [3] A. C. Guyton, J. E. Hall, Text Book of medical physiology, ninth edition, W.B. Saunders Company, 10th edition (2000). [4] R. W. Hamming, Coding and Information Theory, Prentice-Hall (1980). [5] S. Ignacimuthu,S.J. , Basic Bioinformatics, Alpha Science International Ltd.
Page 1 and 2: Data Encryption Based On Protein Sy
Page 3: Like DNA, RNA is composed of nucleo

Data Encryption Based On Protein Synthesis - Nguyen Dang Binh

Create successful ePaper yourself

Delete template?

Save as template?