11.07.2015 Views

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

82 characters <strong>and</strong> stringsAs computer use became more widespread internationally,even 256 characters proved to be inadequate. A newst<strong>and</strong>ard called Unicode can accommodate all <strong>of</strong> the world’salphabetic languages including Arabic, Hebrew, <strong>and</strong> Japanese(Kana Unicode schemes can also be used to encodeideographic languages (such as Chinese) <strong>and</strong> languagessuch as Korean that use syllabic components. At presenteach ideograph has its own character code, but Unicode 3.0includes a scheme for describing ideographs through theircomponent parts (radicals). Most modern operating systemsuse Unicode exclusively for character representation. However,support in s<strong>of</strong>tware such as Web browsers is far fromcomplete, though steadily improving. Unicode also includesmany sets <strong>of</strong> internationally used symbols such as thoseused in mathematics <strong>and</strong> science. In order to accommodatethis wealth <strong>of</strong> characters, Unicode uses 16 bits to store eachcharacter, allowing for 65,535 different characters at theexpense <strong>of</strong> requiring twice the memory storage.Programming with StringsBefore considering how characters are actually manipulatedin the computer, it is important to realize that what thebinary value such as 1000001 (decimal 65) stored in a byte<strong>of</strong> memory actually represents depends on the context givento it by the program accessing that location. If the programdeclares an integer variable, then the data is numeric. If theprogram declares a character (char) value, then the data willbe interpreted as an uppercase “A” (in the ASCII system).Most character data used by programs actually representswords, sentences, or longer pieces <strong>of</strong> text. Multiplecharacters are represented as a string. For example, in traditionalBASIC the statement:NAME$ = “Homer Simpson”declares a string variable called NAME$ (the $ is a suffixindicating a string) <strong>and</strong> sets its value to the character string“Homer Simpson.” (The quotation marks are not actuallystored with the characters.)Some languages (such as BASIC) store a string in memoryby first storing the number <strong>of</strong> characters in the string,followed by the characters, with one in each byte <strong>of</strong> memory.In the family <strong>of</strong> languages that includes C, however,there is no string type as such. Instead, a string is stored asan array <strong>of</strong> char. Thus, in C the preceding example mightlook like this:char Name [20] = “Homer Simpson”;This declares Name as an array <strong>of</strong> up to 20 characters, <strong>and</strong>initializes it to the string literal “Homer Simpson.”An alternative (<strong>and</strong> equivalent) form is:char * Name = “Homer Simpson”;Here Name is a pointer that returns the memory locationwhere the data begins. The string <strong>of</strong> characters “HomerSimpson” is stored starting at that location.Unlike the case with BASIC, in the C languages, thenumber <strong>of</strong> characters is not stored at the beginning <strong>of</strong> thedata. Rather, a special “null” character is stored to mark theend <strong>of</strong> the string.Programs can test strings for equality or even for greaterthan or less than. However, programmers must be carefulto underst<strong>and</strong> the collating sequence, or the order given tocharacters in a character set such as ASCII. For example thetestIf State = “CA”will fail if the current value <strong>of</strong> State is “ca.” The lowercasecharacters have different numeric values than their uppercasecounterparts (<strong>and</strong> indeed must, if the two are to bedistinguished). Similarly, the expression:“Zebra” < “aardvark”is true because uppercase Z comes before lowercase “a” inthe collating sequence.Programming languages differ considerably in theirfacilities for manipulating strings. BASIC includes built-infunctions for determining the length <strong>of</strong> a string (LEN) <strong>and</strong>for extracting portions <strong>of</strong> a string (substrings). For examplegiven the string Test consisting <strong>of</strong> the text “Test Data,” theexpression Right$ (Test, 4) would return “data.”Following their generally minimalist philosophy, theC <strong>and</strong> C++ languages contains no string facilities. Rather,they are provided as part <strong>of</strong> the st<strong>and</strong>ard library, which canbe included in programs as needed. In the following littleprogram:#include #include void main (){char String1[20];char String2[20];strcpy (String1, “Homer”);strcpy (String2, “Simpson”);//Concatenate string2 to the end <strong>of</strong> string1strcat (String1, String2);cout String1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!