Skip over navigation

Restore visual elements.

 

 

 

 

TechDis logo


Valid XHTML 1.0 Strict

 

Voice Recognition and Text-To-Speech

Text to Speech Explained

Text-To-Speech is the artificial synthesis of speech by an electronic speech engine.

However there are two methods for the production of text to speech:

True text-to-speech requires a lot of processing power and can only be accomplished using a Desktop PC or a Speech Engine specifically designed for the Pocket PC platform. Currently (up to Palm OS 4.1) Palm based PDAs do not have the necessary processing power to accomplish this.

Captioned text-to-speech only requires the ability of the PDA device to play audio files. Captioned text-to-speech can be used within a program to provide audio feedback for specific commands or information. For example the speech for numbers and time phrases could be captioned to allow a device to speak the time, by playing captioned speech phrases in a specific order. This method is employed by some symbolic communication packages that speak when icons are selected.

Captioned text-to-speech can also be used to allow users to produce their own audio books. A desktop computer can be used to produce an audio file for any text, which can then be transferred to a PDA for playback.

All Pocket PC PDAs have the ability to play wave (.Wav) files and on top of this the Window Media player is also able to play MP3 files.

Only Sony Clie Palm based PDAs have the native ability to play MP3 and Wave files.

Palm PDAs have the ability to play Wave files. These are converted to a database file, but they have a poor quality speaker which makes this playback quite poor. However no functional software program has yet been located that allows the user to play any Wave file in a user friendly interface.

 

Why would a user want Text-To-Speech?

Users with a specific learning difficulty may find audio feedback useful because they may have slow reading speeds and/or poor reading comprehension skills. Most users might also find the use of text to speech a good way to review their own writing content and highlight grammatical errors.

Users who have a visual impairment or whose study environment makes visual concentration difficult, might prefer text-to-speech as a more effective reading method.

 

Voice Command Not Speech Recognition!

Pocket PC based PDAs have incorporated microphones that can allow the user to use voice command recognition software. Currently PDAs do not have the processing power required to accomplish speech recognition. Speech recognition is, when any text can be entered into a computer by dictation (the speech is recognised and the text is entered into the document).

Pocket PC PDAs can however be used with voice command recognition. This can either be, when simple commands i.e. with a limited recognition vocabulary or, a string of commands (a command with recognised variables e.g. read > next item) are entered into the PDA.

Voice recognition programs are either "speaker dependent" or "speaker independent." The speaker-dependent programs adjust to the way you speak, which requires a training process. Speaker-dependent programs tend to be smaller, faster, and more accurate than speaker-independent programs. The speaker-independent programs attempt to recognise anyone's speech, without having been trained beforehand.

Most voice recognition software allow the user to retrieve and display appointments, tasks and events. Look up contacts and perform basic navigation commands such as read "next"

Voice recognition is limited on current Palm based PDAs.

The Pocket PC operating system does not generally come with voice recognition . However some models such as the HP IPaqs come with a speech recognition engine (IBM ViaVoice comes with the IPaqs).

 

Why would a User want Voice Recognition?

Voice recognition can be useful in controlling the basic PIM functions of a PDA, "hands-free". This might be useful to a user with a manual dexterity problem or a mobility problem.

Pocket PC Software

There are a various text-to-speech engines and voice recognition engines available to developers, but only some have been produced into user-friendly applications.

The following three voice recognition software applications are available for Pocket PC operating systems. However due to the time consuming nature of their evaluation only the manufacturer links are listed.

Fonix iSpeak for Pocket PC

This application allows the user to use true text-to-speech. The user can use the software to read Email text and text files. The speech engine includes two good quality human-like voices that include inflections, intonations, and pauses. There is also a choice of nine other voices which take much less space but have a much more synthesised sound. The application is operated from a simple to use interface that lists all the text files,mp3 files and Email headers. The only limiting factor is that when playing a text file or Email the content may not be viewed within the application at the same time. The software also allows text-to-speech with Word for reviewing words and sentences as they are typed. The software also includes a useful clipboard reader that reads aloud the contents of the clickboard.

Fonix VoiceAlert

This application is a combination of basic text-to-speech and voice recognition. It includes a digital clock that announces the current time in quarter, half, or full-hour increments. VoiceAlert automatically speaks both the time and your upcoming appointments, events and tasks. Fonix VoiceAlert automatically retrieves information entered into the calendar, tasks, and the "to do" lists on the Pocket PC. The program then speaks details aloud as an individual event time approaches. It includes Fonix Voice Central Lite, which can be trained to recognise simple commands such as "Pocket Word," or "Calendar," which take you directly to the desired application or task without having to click through several menus with the stylus.

 

Symbolic Communication software

Pocket WinSpeak

This is a symbolic communications package that is very similar to the desktop version of WinSpeak . Symbols are selected and the corresponding text or phrases are spoken aloud. It uses symbols and letters to build sentences of any length with captioned DECtalk speech or recorded messages. The cell icons can use digital photos and scanned images as well as various symbols. Up to 1,000 grids may be stored on a memory card. The system works with prestored vocabularies such as Ingfield Dynamic Vocabularies (Levels A and B).