This is fantasy, of course, but only just. In another two to five years, such a VCR will be in the shops.
The speech technology for such a VCR and many other products now coming on to the market comes from the Belgian firm Lernout and Hauspie. Founded in 1987, L&H struggled in its early years, even begging farmers for loans. Then, in September 1993, telecoms giant AT&T took a 5 per cent stake, valuing the company at $100m. Since then, L&H has not looked back. Today, it is listed on the NASDAQ exchange, has imported a hard-nosed, hi-tech business guru as president, bought up one of its largest competitors, bought up a major translation company and signed a licensing agreement with Microsoft.
"The relationship with Microsoft is a dangerous liaison," says Gaston Bastiaens, L&H president. "But without the liaison, you are not a player."
Microsoft's main interest is building speech capabilities into Windows CE - a cut-down version of the Windows 95 operating system which is used in palm-top computers. But Microsoft also hopes CE will be built into consumer devices, opening up a whole new market.
Microsoft also plans to build speech capabilities into the final version of Memphis, the operating system that will replace Windows 95 at the end of 1997 or early 1998. While Memphis may not have a lot of speech features "out of the box", Microsoft is licensing key technologies from L&H so that Microsoft and other companies will be able to integrate speech functions in applications.
Getting a PC to work well with speech is not an easy task. The general- purpose processors at the heart of PCs are not designed for speech, but as processors are getting more powerful, PCs are getting better at talking and listening.
The best a PC can manage in speech recognition is slow speech with pauses between word, and that only from a user who has trained the system to his or her voice. But improvements in software and increased processing power means that is about to change. Everyone is trying to come up with a "continuous speech" product, where you can speak more naturally, without pauses between the words. L&H believes it will be first to market when it launches its new software late this year in the US. True, you will only be able to speak at around 70 words a minute, but it's a start.
L&H works in a lot of different speech technology areas. One of the underlying technologies where it has particular expertise is text to speech processing. First, a phrase such as "I wind a watch in the wind" has to be analysed for meaning and then broken down phonetically. The correct sounds are then retrieved from a database, then prosody - the rhythm and intonation of speech - is applied to the phrase.
L&H uses the latter stages of the text-to-speech technology to store speech very efficiently. Instead of spoken phrases, it can store the phonetic text and then re-create the speech by accessing a database of sounds. Instead of millions of bytes of storage needed to keep the speech, a few hundred will do. This all saves money.
Its most successful product to use this technology is a Chinese pocket translator that has helped thousands of tourists. The user looks up a phrase in English and pushes a button. The system looks up the Chinese translation and the text-to-speech system reads out the Chinese phrase. Sound quality may not be fantastic, but, with the heart of the translator being a sub-$10 digital signal processing chip using a tiny amount of memory, the package can sell for less than $50.
Speech technology looks set to be an important part of our work and home life very soon. L&H believes that adding speech into washing machines and cookers will only add some $20-30 to the price of a product. It could be fun to talk to our video recorders, microwaves and washing machines. What if they start talking to each other, and ignore us altogether?Reuse content