Data Encryption Based On Protein Synthesis - Nguyen Dang Binh

More documents

Recommendations

Info

For For this this process process to to to begin, begin, begin, instructions instructions or or blueprints blueprints blueprints are are are required. required. These These instructions instructions instructions could could be be found on DNA. Deoxyribonucleic Deoxyribonucleic acid (DNA) (DNA) is a nucleic nucleic nucleic acid acid acid that that that contains contains the the genetic genetic genetic instructions instructions used used used in in in the the the development development development and and functioning functioning of of all all known known known living living living organisms. organisms. The The DNA DNA DNA is is situated situated situated in in in the the the nucleus, nucleus, organized organized into into chromosomes. chromosomes. 2.1 DNA The The main main role role of of DNA DNA molecules molecules is the long long-term long term storage e of of information information needed needed to to construct other other components components components of of cells, cells, cells, such such as as as proteins proteins proteins and and and RNA RNA RNA molecules. molecules. molecules. The The The DNA DNA DNA that that makes makes makes up up up the the the human human genome genome genome can can can be be be subdivided subdivided subdivided into into into information information bytes bytes bytes called genes [5]; ; each each gene gene encodes encodes a unique unique protein protein that that performs a specialized special special ized function in in the the cell. cell. cell. On On the the the other other other hand, hand, hand, other other DNA DNA DNA sequences sequences sequences have have have structural structural structural purposes, purposes, or or or are are involved involved in in in regulating regulating regulating the the the use use of of this this genetic genetic information. information. In In terms terms of of structure, DNA is a long polymer polymer made made made from from repeating repeating repeating units units units called called called nucleotides. nucleotides. nucleotides. In In living ving organisms, organisms, DNA DNA does does not not usually usually exist as a single single single molecule, molecule, but but instead instead as as a a tightly-associated tightly tightly associated pair of molecules. Two wo long strands of the DNA entwine entwine entwine like like vines, vines, in in in the the shape shape shape of of of a a a double double double helix helix helix [5, [5 8]. . The nucleotide nucleotide repeats, repeats contain containing both the segment segment segment of of of the the the backbone backbone backbone of of of the the the molecule, molecule, molecule, which which holds holds holds the the the chain chain together, together, together, and and a a a base, base, base, which which which interacts interacts interacts with with with the the the other other other DNA DNA DNA strand strand in in the the the helix. helix. In In general, general, a a base base base linked linked linked to to to a a sugar sugar is is called called a a a nucleoside nucleoside nucleoside and and and a a a base base base linked linked to to a a sugar sugar and and and one one or or more more phosphate groups groups is is called called a a nucleotide. nucleotide. The The DNA DNA double double helix helix is is is stabilized stabilized by by hydrogen bonds bonds bonds between between between the the bases bases attached attached attached to to the the the two two two strands. strands. strands. The The The four four four bases bases found found found in in in DNA DNA DNA are are are adenine adenine (abbreviated (abbreviated (abbreviated A), A), A), cytosine cytosine cytosine (C), (C), guanine guanine (G) (G) (G) and and and thymine th (T) ) [5, 8]. 8] Each ach type type of of base on one one strand forms forms forms a a a bond bond bond with with with just just just one one type type of of of base base on on on the the the other other other strand. strand. strand. This This is is called called complementary complementary complementary base base pairing [5, 5, 8], 8] , , with with A A bonding bonding only only to to T, T, and and C C bonding bonding only to to to G. G. This This arrangement arrangement arrangement of of two two two nucleotides nucleotides nucleotides binding binding together across the th e double double helix helix is called a a base base pair pair [5, 8]. 8] Due to a weak bond between base pairs, DNA strands strands strands could could could be be pulled pulled apart apart like like a a a zipper. zipper zipper 2.2 CODONS As As previously previously mentioned, mentioned, proteins proteins are are assembled assembled from from from amino amino acids acids acids using using using information information encoded encoded encoded in in in genes. genes Each Each protein protein has has has its its its own unique amino acid sequence sequence sequence that that that is is specified specified by by the the nucleotide nucleotide nucleotide sequence sequence sequence of of of the the the gene gene gene encoding encoding encoding this this this protein. protein. protein. The The genetic genetic code code is is a a set set of of three-nucleotide three three nucleotide sets sets called called CODONS and each three-nucleotide three nucleotide combination combination stands for an amino acid, for example AUG AUG stands for methionine [5, 8]. There are 4 4 bases bases in 3 3-letter combinations; there are 64 64 possible codons codons ( (4 ( encode the twenty standard amino acids, giving most amino acids more than one possible codon. There are also three 'stop' or 'nonsense' codons signifying the end of the the coding region are the TAA, TGA and TAG codons. Figure 2. 2 3 letter combinations; there combinations). These These encode the the twenty twenty standard amino acids, giving most amino acids more than one possible codon. There are also three 'stop' or or 'nonsense' 'nonsense' codons signifying the end of the coding region [5, 8]; ; these are the TAA, TGA and TAG codons. This is shown in Figure1 Figure1: : The chemical structure structure of DNA. Hydrogen bonds bonds bonds are are shown shown as as as dotted dotted lines. lines. Figure 2: the combination of 4 base pairs in codons 2.3 Transcription DNA transcription transcription is is a a process process process that that involves involves the the transcribing of genetic information, information, i.e. i.e. the codons codons of a gene from from DNA DNA to to RNA RNA [5]. . Simply stated, in transcription an an an mRNA mRNA mRNA template, template, template, encoding encoding encoding the the the sequence of of of the the the protein protein protein in in in the the the form form form of of of a a trinucleotide trinucleotide code, is is transcribed transcribed from from from the the genome genome to to to provide provide provide a a a template for for for translation. translation. Transcription Transcription Transcription copies copies the the template from from from one one one strand strand of of the the the DNA DNA DNA double double double helix, helix, helix, called the template template strand strand [5, 8].
Like DNA, RNA is composed of nucleotide bases. RNA however, contains the nucleotides adenine, guanine, cytosine and uricil (U) [5]. When RNA polymerase transcribes the DNA, guanine joins with cytosine and adenine joins with uricil. RNA polymerase moves along the DNA until it reaches a terminator sequence. At that point, RNA polymerase releases the mRNA polymer and detaches from the DNA [5, 8]. The outline of this stage involves following steps: • DNA unwinds. • RNA polymerase recognizes a specific base sequence in the DNA called a promoter and binds to it. The promoter identifies the start of a gene, which strand is to be copied, and the direction that it is to be copied. • Complementary bases are assembled (U instead of T). • A termination code in the DNA indicates where transcription will stop. The mRNA produced is called an mRNA transcript. 2.4 Translation The next step is to produce a chain of amino acids based on the sequence of nucleotides in the mRNA. The nucleotide sequence of an mRNA molecule is read from one end of mRNA to the other, in groups of three successive bases previously named codons. In the cytoplasm, mRNA combines with one or more ribosomes. Ribosomes act as catalysts to assemble individual amino acids into polypeptide chains. Ribosomes contain a small and a large subunit. Each subunit contains rRNA of varying length and a set of proteins. One portion of the mRNA molecule attaches to the smaller subunit and a tRNA with its amino acid attaches to the other subunit, thus the codon of the mRNA attracts a complementary anticodon on the tRNA. This codon-anticodon matching brings a specified amino acid into position [5]. After pairing with mRNA, the tRNA amino-acid is held in a vice-like grip on the ribosome’s larger subunit. Then ribosome moves on to the new location along the mRNA to repeat the same process again. Second tRNA now approaches the ribosome and pairs its anti codon with the second codon of mRNA. Thus two tRNA molecules and their amino acids stand next to one another on the mRNA. In a fraction of a second these two amino acids are joined together by a special enzyme to form a dipeptide. Now the first tRNA is freed and moves back to the cytoplasm leaving the amino acid attached to the second amino acid. Ribosome, then, proceeds by moving along the mRNA and doing the same process again until it reaches the final one or two codons of the mRNA which are chain terminators or stop signals. The polypeptide bond is formed by removal of water between amino acids. Now the polypeptide is released from ribosome and will coil to yield the functional protein [5]. 3. Data Encryption From the protein synthesis process, three factors are taken for granted for proposed encryption algorithm. Amino acids represent our basic units of data, combinations of these basic units produce the codes, which is codon in protein synthesis, and codon tables serve the purpose of coding table (refer to figure2). These concepts are expanded in the following lines. 3.1 Data Unit Digital computers operate zeroes and ones, meaning that entire data that computers are processing to surprise human race, are enormous amounts of information in the form of binary digits. However, in order to construct coding tables, bits cannot represent appropriate data units due to their not having enough semantic weight. Instead, an alternative is to consider an array of 8 bits, which is a “Bite”. The advantage of this option is that we are dealing with 256 different states rather than 2 states of the former scheme. By encrypting these Bites entire data will be encoded. 3.2. Coding Data Units and Table of Codes With regard to figure 2, the combinations of four elements (i.e. U, C, A, G) and 3 positions provides 4 3 =64 states from which almost twenty amino acids are produced while more than one codon for some unit (amino acid) is used. To enhance the coding efficiency, appropriate number of elements should be combined together in appropriate number of positions to cater for n number of states. Two methods are suggested differing in their output: In the first method every two bits are assumed one element. Because 2 bits account for 4 states (i.e. 00, 01, 10, 11) there are 4 different 2 bit elements in one bite. So 4 4 =256 states. Figure 3 illustrates this coding scheme. In the second method, similar to the previous scheme, there are 4 elements in one bite; the difference, however, is that in the second method each of 00, 01, 10, 11 are assigned to an alphabetic letter. Simply stated, the outcome is the combinations of 4 letters. Again 4 4 =256.(we will refer to this mechanism throughout the paper as second method) Figure 4 illustrates this coding scheme.
Page 1: Data Encryption Based On Protein Sy
Page 5 and 6: 4.2 Decoding the Huffman code In Hu

Data Encryption Based On Protein Synthesis - Nguyen Dang Binh

Create successful ePaper yourself

Delete template?

Save as template?