Starting a dialogue with your computer: As manufacturers develop machines that can respond to spoken commands, Cliff Joseph reports on a system which will cope even with different accents

Click to follow
THE TROUBLE with computers is that they just don't understand us. And, by and large, we don't understand them either.

Computers understand numbers, but people tend to deal in ideas that can rarely be expressed in precise numerical terms. So using a computer involves a major effort on the part of the user to master the obscure half-English commands the machine can understand.

Improving the 'interface' - the way that computers communicate with people - is the Holy Grail of the computer industry. Apple Computer built a billion-dollar business on the back of its easy-to-use Macintosh computers, and Microsoft's Windows software is selling in millions because it brings a Macintosh-style interface to the clumsy IBM PC.

These systems are easier to use than their predecessors, but they still force us to do most of the work. What if, instead of us learning to understand them, computers could learn to understand us and to accept commands in ordinary spoken English?

Voice recognition, as this technology is called, has been a staple of science fiction for years, but in real life it is a technology still in its infancy.

Most computer manufacturers are doing research into voice recognition, but Apple seems to be ahead at the moment and the company has recently been demonstrating a system that looks as if it has finally overcome the main stumbling blocks.

In principle, computers can be trained to understand simple commands quite easily. You could speak the word 'delete' into a microphone linked to the computer, then use special software to tell the computer that that particular sound signal is associated with its own 'delete' command.

But the moment you add a microphone to a computer you come up against practical problems. Few computers are designed to use microphones, so the first thing you need is a digital signal processor that can turn the sound into a digital form the machine can understand.

This extra hardware adds to the cost of the system, and training the computer to understand each new word individually is very time-consuming.

Even worse, the whole thing could grind to a halt if someone with a different accent comes along and tries to use the machine. Teaching computers to differentiate between accents requires much greater computing power, and this has the effect of further limiting the number of commands that can be understood.

But systems like this already exist, and have a wide range of applications. The Voice Navigator, a system sold in the UK by the London-based VoiceQuest, is used by many people with disabilities that prevent them from using keyboards. The Department of Employment is running a 'back to work' program using voice-control systems for those who suffer from repetitive-strain injuries and other work-related disabilities.

Other applications include computer-aided design, where many people use the Voice Navigator for controlling repetitious commands such as drawing shapes on screen or filling shapes with colour. Banks and financial institutions are investing heavily in voice-controlled home-banking systems.

The cost and technical limitations of voice recognition mean that it is still mainly used for this type of specialist application, but as the price comes down voice control will be more widely used, and this could bring about a major change in the way we relate to computers, according to Christophe DeBuchet, of VoiceQuest. 'It's a totally different feeling,' he says. 'It's incredible. The computer is no longer just a tool, it becomes your computer. You have a totally different relationship with it.'

One recent survey claimed that 50 per cent of senior managers were still intimidated by computers, which suggests that the vast investment made in information technology by businesses is not being fully exploited.

If using a desktop computer suddenly became a simple matter of sitting in front of it and telling it what you wanted, this techno-fear could be a thing of the past. It could also be the advance needed to bring about the sci-fi dream of homes in which electrical appliances are controlled by computers.

But before this is possible there is one more hurdle to overcome. Coping with continuous speech - whole sentences rather than just single words - is immensely complicated. Instead of responding parrot- fashion to specific sounds, the computer really does have to understand the rules of grammar and spoken language. Words like 'too' and 'to' sound exactly the same and can be distinguished only if the machine understands the context in which they are used.

This is the area where Apple has made its breakthrough. Systems that understand continuous speech can require up to 100 times the power available in a personal computer and cost tens of thousands of pounds.

Apple's system, codenamed Casper, is a piece of software that will run on an ordinary Macintosh, and as Apple already equips most of its machines with microphones Casper does not need any extra hardware.

One of the Casper demonstrations involves linking a Macintosh to a video recorder and telling it: 'Set my video to record BBC 1 from 7pm to 8pm tomorrow night.'

At the demonstration I saw this worked perfectly, though Casper has let its developers down on other occasions. Equally impressive is its ability to cope with a combination of continuous speech and varying accents.

One of Apple's marketing managers is a Frenchman with a bizarre Parisian- Texan accent, and Casper copes with his voice as well as with any other. At the moment Casper is too inconsistent to present as a marketable product, and other problems - such as how to filter out background noise in a busy office - still have to be solved.

But the main goal, that of producing a system that can understand continuous speech on an ordinary personal computer, is finally in sight.

The potential for this technology is vast, and goes beyond just computers. Every electrical device that has a control panel - video, washing machine, lift, cash-point - could be controlled by a chip designed to accept voice commands. Already, in France, there is a trial telephone booth that has no handset or dial. You just say what number you want.

Apple will not reveal the secrets behind Casper, or even in what form it expects voice systems to be sold. Industry analysts estimate that the system needs another two to three years' work to make it marketable, but Apple is rumoured to be launching a monitor with built-in microphone and speakers some time this year, perhaps suggesting that a basic system for accepting simple commands rather than continuous speech is in the pipeline.

Apple's ambitions to move into consumer electronics are well known and, let's face it, anything that could program your video recorder for you is guaranteed to sell like hot cakes.

(Photograph omitted)