Stay up to date with notifications from The Independent

Notifications can be managed in browser preferences.

Have you talked to your PC lately?

With the latest speech recognition packages, a computer can develop an acute ear for an individual's voice and deliver impressive results - and all for less than pounds 100.

David Fo
Monday 12 May 1997 23:02 BST
Comments

I've been talking to my computer a lot lately. It's a good listener - now. It wasn't initially. It used to disagree with me all the time. Although speech-recognition software is finally of real use to people, from lawyers and doctors to the disabled, users still need to invest time in training their software, even if the investment in time (and money) has been reduced considerably.

Indeed, there are now three surprisingly good speech-recognition systems available for under pounds 100, from IBM, Dragon Systems, and Kurzweil. Previously such packages cost pounds 500-pounds 1,000.

Generally, dictation is carried out slowly, with a pause between each word. The initial temptation is to speak very ... slowly ... indeed, but once you trust the computer to understand your words most of the time, you find you can speak reasonably fast, with only very short pauses between the words. It is certainly a lot faster than most of us can type.

According to an IBM survey, only 15 per cent of PC users can touch-type. Even for a reasonable typist anything more than 30 words per minute is good, which is less than you might achieve from any of the speech-recognition systems straight from the box. Once user and system are used to each other, they should easily do 60 to 80wpm, climbing to 120wpm or more for regular users with a set vocabulary. The benefits for some professions who have to dictate reams of notes, such as lawyers and pathologists, are considerable, especially as there are upgraded versions of the various systems available with their specialist vocabularies.

But it takes time to get the best out of these packages. There are systems that can recognise virtually any voice without training, such as those used for automated telephone queries, but they use very limited vocabularies and need powerful computers.

Training can be frustrating at first, as the system tries to learn your speech patterns and often picks the wrong word. Each time it does you must correct it, so it can learn. It should take only one correction for it to recognise a word from then on - provided you say the word the same way each time.

Some of the early results can be laughable. Occasionally it becomes annoying. But any stress (or laughter) in your voice makes it even more likely to misbehave. You have to remain calm and speak slowly and evenly. It eventually becomes relaxing, especially as you do not need to strain over a hot keyboard any more. Having had repetitive strain injury, I am used to aching wrists and shoulders, which is why having a listening PC is such a boon.

To get the best from each package requires a fast Pentium PC, preferably with lots of memory, and some time spent in setting up the microphone correctly (as that can make a big difference to recognition accuracy). Some soundcards (such as the Ensonic on my Gateway 2000 PC) will not work with the headset microphones supplied, so check with the dealer in advance if you use anything other than a Creative Labs Soundblaster 16 soundcard.

DragonDictate Solo

Besides dictating, Solo also allows you control applications, so you can command your PC by voice instead of using the mouse and keyboard, which makes it the best bet for hands-free use. This version will only work with one of these Microsoft applications (your choice): Word, Excel, PowerPoint, Access or Internet Explorer. But it also works with Netscape Navigator (version 3.0 plus), Adobe Acrobat Reader, Microsoft Exchange and Windows 95 accessories such as WordPad, Calculator or Solitaire. So if you need it most for Excel, you can at least dictate into WordPad and surf the Net with Navigator. Solo opens as a floating toolbar above your application, with pop-down menus as required.

It has a 120,000-word vocabulary (10,000-word active dictionary), and with very little use was delivering about 92 per cent accuracy (with most of the errors of the 2/to and 4/for variety). To reduce errors further, you can alter the settings so it changes words depending on the context, replacing "to big" with a more likely "too big", although that affects speed.

As the word history (or "Ooops Buffer") can remember up to 32 words, you do not have to keep stopping if you are in full flow. However, it can get frustrating trying to remember what commands you can say (although there is a "what can I say" function which will show you, but it could be simpler, and a quick reference card with the most-used functions).

Like its rivals, more documentation (including an extended tutorial) would be helpful, although the help files are generally good. It also has the best on-screen tutorial, with lots of demonstrations and a cartoon dragon roaring flames. Besides improving your familiarity with it, it also learns more about the way you speak, which makes your first words look less like baby-talk.

Conclusion: Best choice for disabled users, and good all-round choice for most users. The only choice at this price if you want to do more than dictate. Good recognition, average ease of use, good tutorial, average documentation. More features than you will probably ever use.

Dragon Dictate Solo, pounds 79.99, on CD-Rom. Endeavour Technologies (01932 827324 www.endeavour.co.uk). System requirements: Windows 3.1 or 95, a 486 DX4/66 (Pentium recommended), at least 16Mb RAM, Soundblaster 16 or compatible soundcard.

IBM VoiceType Simply Speaking

This system offered the best instant speech recognition. Straight from the box it recognised 85 per cent of words, or more, provided they were already in its 30,000 word vocabulary (you can add a further 27,000 words of your own). Once more familiar, it was getting up to 96 per cent right. Like its rivals it should improve further with use.

Its big advantage is that you can speak without having to stop to correct words as you go along; useful if you are reading, as you do not have to keep glancing from paper to screen. Instead, it also records your speech, so that when you click on the word you hear what you said. This allows you correct everything in one batch (via the keyboard), although it won't save the audio for later playback. But if it is crucial that what you say is completely accurate, this method does make you more likely to miss an error in a long document, unless you keep glancing at the monitor.

If you do look at the screen, you will notice that it often displays a couple of words before alighting on its choice. That is because it uses a certain amount of intelligence in relating each word to those around it, to try to give a good fit. Another reason its test results were so good may be that with the smallest vocabulary, there is less likelihood of it getting words mixed up, but as most people use only a few thousand words regularly, that may not matter so much.

The very basic word processor it comes with may be adequate for some users and it can save in Word 6.0 or RTF format, but if you want to integrate it with other Windows applications you have to upgrade to the pounds 650 VoiceType Dictation version, which also gives command and control, allows you work with macros and do your corrections (or have your secretary do them for you) later. It even lets you dictate on to a (digital) tape or mini-disc and plug that in to the computer, bypassing the Dictaphone typist, although you still have to speak slowly.

Conclusion: Best choice for dictators, especially if you need to input lots of text. Worst choice for disabled or if you don't want to use the keyboard. Very good recognition, good ease of use, average (limited) documentation, limited features, limited vocabulary.

IBM VoiceType Simply Speaking, pounds 89, on CD-Rom. IBM (01705 492 249, www.software.ibm.com/workgroup/voicetyp) System requirements: Windows 95, a 100MHz Pentium, at least 16Mb RAM, Soundblaster 16 (or 100 per cent compatible) or IBM Mwave soundcard.

Kurzweil VoicePad Pro

The simplest of the three to get running. Its "active words" window displays what you can say in any situation, which is handy when you cannot remember a command or how it likes its punctuation marks (its use of "period" instead of "full-stop" displays its US origins, although you can add your own commands).

Word correction is relatively easy, although having to move around the screen to correct earlier words is not as simple as Solo's "Ooops" command. However, when it misunderstood a "correct-that" (commands are said as one word) and then heard me say "take-2" instead of "take-4" (the commands to choose a displayed correction), I ended up deleting a word ("delete- that") every time I wanted to correct it. Unfortunately, the training module did not offer "correct-that" as a choice (just as a subject heading). But, delving further brought up the Recognition Wizard, which deals simply with such problems as "confuses a pair of words", but it should tell you that you can type in the words instead of scrolling through the 200,000 words and 7,700 commands in its vocabulary. Although only 30,000 words can be active at any time, having so many to call on means that once it is used to your voice, its recognition can be very good.

However, straight from the box it was about 73 per cent accurate. That rose quickly to about 85 per cent after initial training, and was above 90 per cent in a few days. An add-on, Talk Commands, which includes a wide range of useful (UK-oriented) macros and commands makes it easier to use. It is included in a pounds 92 package with a high-quality microphone which helps to improve recognition, especially in noisy environments.

Conclusion: Best choice for occasional users. Good for disabled users. Nice, straightforward, design. Good recognition, good ease of use, good documentation, good features.

Kurzweil VoicePad Pro, pounds 93, on CD-Rom. Talking Technologies (0171 602 4107, www.talk-systems.com) System requirements: Windows 3.11 or 95, a 486 DX4/75 (Pentium for Windows 95), at least 16Mb RAM, Soundblaster 16 or Windows-compatible 16-bit soundcard.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in