11.07.2015 Views

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

452 spreadsheetSpeech recognition begins with digitizing the speechsounds <strong>and</strong> converting them into a st<strong>and</strong>ard, compact representation.The analysis can be based on matching the inputsounds to one <strong>of</strong> about 200 “spectral equivalence classes”from which the representation can be created. Alternatively,algorithms can use data based on modeling how the humanvocal tract produces speech sounds, <strong>and</strong> extract key featuresthat then become the speech representation. Neuralnetworks can also be “trained” to recognize speech features(see neural network). The latter two approachesare potentially more flexible but also considerably more difficult,<strong>and</strong> tend to be used in research rather than in commercialvoice recognition systems.Whichever form <strong>of</strong> representation is used, it must thenbe matched to the characteristics <strong>of</strong> particular words orphonemes, usually with the aid <strong>of</strong> sophisticated statistical<strong>and</strong> time-fitting techniques. The simplest systems work ona word level, which may suffice if the system is restrictedto a simple vocabulary <strong>and</strong> the user speaks slowly <strong>and</strong> distinctlyenough. Such systems usually require that the user“train” the system by speaking selected words <strong>and</strong> phrases.The user can then control the system with a set <strong>of</strong> voicecomm<strong>and</strong>s.Creating a system that can h<strong>and</strong>le the full range <strong>of</strong> languageis much more difficult. This kind <strong>of</strong> system breaksthe language down into phonemes, its basic sound constituents(English has about 40 phonemes). The system includesa stored dictionary <strong>of</strong> phoneme sequences <strong>and</strong> the correspondingwords. However, “underst<strong>and</strong>ing” which wordsare being spoken is more than a matter <strong>of</strong> matching phonemesequences to a dictionary. For one thing, the sound <strong>of</strong>the first or last phoneme in a word can change dependingon the phoneme in an adjacent word.Once the speech has been recognized, it can be convertedto character data (see characters <strong>and</strong> strings)<strong>and</strong> treated as though the text had been entered from thekeyboard. This means, for example, that a user could dictatetext to be placed in a word processor document as wellas using voice comm<strong>and</strong>s to perform tasks such as formattingtext. (Special words can be used to introduce <strong>and</strong> endcomm<strong>and</strong>s.)Voice control <strong>and</strong> dictation have been <strong>of</strong>fered commerciallyby such companies as Dragon Systems <strong>and</strong> Kurzweil.Micros<strong>of</strong>t now includes speech recognition <strong>and</strong> synthesisfacilities in the latest version <strong>of</strong> its popular <strong>of</strong>fice suite,Office 2007.Voice SynthesisThe other part <strong>of</strong> the speech equation is the ability to havethe computer turn character codes into spoken words. Themost primitive approach is to digitally record appropriatespoken words or phrases, which can then be replayedwhen speech is desired. Naturally, what is spoken is limitedto what is available in the recorded library, although thewords <strong>and</strong> phrases can be combined in various ways. Sincethe combinations lack the natural transitions that speakersuse, the result sounds “mechanical.” Common applicationsinclude automated announcements in train stations or inprompts for voicemail systems.To produce a synthesizer that can “speak” any naturallanguage text, the system must have a dictionary that givesthe phonemes found in each word. The 40 or so differentphonemes can then be digitally recorded <strong>and</strong> the systemwould then identify the phonemes in each word <strong>and</strong> playthem to create speech. While this solves the limited vocabularyproblem, the synthesized speech is rather unnatural<strong>and</strong> hard to underst<strong>and</strong>. This is because, as noted earlier,the way phonemes are sounded changes under the influence<strong>of</strong> adjacent phonemes, <strong>and</strong> these nuances are lackingin a simple phoneme playback.More sophisticated voice synthesis systems record naturalspeech <strong>and</strong> identify all the possible combinations <strong>of</strong> half<strong>of</strong> a phoneme <strong>and</strong> half <strong>of</strong> an adjacent phoneme. That way thepossible transition sounds are also recorded, <strong>and</strong> the resultingspeech sounds considerably more natural. The drawbackis that more memory <strong>and</strong> processing power are required, butthese commodities are becoming increasingly cheaper.Speech recognition <strong>and</strong> synthesis technology has madeonly slow inroads into the computing mainstream, suchas <strong>of</strong>fice applications. Given the costs <strong>of</strong> hardware, s<strong>of</strong>tware,<strong>and</strong> training, the keyboard remains more productive<strong>and</strong> cost-effective for most applications. However, voicetechnology does have a growing number <strong>of</strong> specialty uses,including security <strong>and</strong> access systems, speech synthesis fordisabled persons who cannot see or speak, <strong>and</strong> enablingservice robots to interact with people in the environment.Speech technology has also been a long-st<strong>and</strong>ing topic inartificial intelligence <strong>and</strong> robotics research.Further ReadingBrown, Robert. “Exploring New Speech Recognition <strong>and</strong> SynthesisAPIs in Windows Vista.” MSDN Magazine. Available online.URL: http://msdn.micros<strong>of</strong>t.com/msdnmag/issues/06/01/speechinWindowsVista/. Accessed August 22, 2007.Holmes, John, <strong>and</strong> Wendy Holmes. Speech Synthesis <strong>and</strong> Recognition.2nd ed. Boca Raton, Fla.: CRC Press, 2001.Huang, Xuedong, Alex Acero, <strong>and</strong> Hsiao-Wuen Hon. Spoken LanguageProcessing: A Guide to Theory, Algorithm, <strong>and</strong> SystemDevelopment. Upper Saddle River, N.J.: Prentice Hall, 2001.Jurafsky, Daniel, <strong>and</strong> James H. Martin. Speech <strong>and</strong> Language Processing:An Introduction to Natural Language Processing, ComputationalLinguistics <strong>and</strong> Speech Recognition. Upper SaddleRiver, N.J.: Prentice Hall, 2000.Speech <strong>Technology</strong> (Google Directory). Available online. URL:http://www.google.com/Top/<strong>Computer</strong>s/Speech_<strong>Technology</strong>/.Accessed August 22, 2007.Speech <strong>Technology</strong> Research, Development, <strong>and</strong> Deployment(Carnegie Mellon University). Available online. URL: http://www.speech.cs.cmu.edu/. Accessed August 22, 2007.spreadsheetWith the possible exception <strong>of</strong> word processing, no personalcomputer application caught the imagination <strong>of</strong> thebusiness world as quickly as did the spreadsheet, whichfirst appeared as Daniel Bricklin’s VisiCalc in 1979. Visi-Calc quickly became the “killer app”—the application thatcould justify corporate purchases <strong>of</strong> Apple II computers.When the IBM PC began to dominate the <strong>of</strong>fice computingindustry in the mid-1980s, it had a new spreadsheet, Lotus1-2-3. By the end <strong>of</strong> the decade, however, Micros<strong>of</strong>t’s Excel

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!