Stuff the search engines

The amount of information we have to deal with keeps growing at an alarming rate. How can we cope? Mike Lynch has a reasonable idea
Click to follow
The Independent Online

It's not every day you meet a netrepreneur whose inspiration - and, ultimately, his billions - come from an 18th-century cleric and statistician. Mike Lynch's hugely successful software company, Autonomy, lists everyone from Microsoft to the BBC to Volvo to the US Department of Defense among the long list of clients whose knowledge management problems Autonomy solves singlehandedly. So successful, in fact is his company, that it's giving away Kenjin, a slimline version of its search-engine killing software for the home PC user.

Kenjin - Japanese for "wise man" - turns your PC (though, sadly, not your Mac) into a very wise machine indeed, and obviates the need for search engines. "Let me show you why search engines are dead," says Lynch, demonstrating a fatter version of Kenjin on his PC. Kenjin not only allows your computer to "read" and "understand" prose, without distinguishing whether it's on your hard drive or on the web, but to act on that understanding - link the piece of prose to other sites or documents.

"It's much more accurate than a search engine because it's not using keywords," says Lynch. "Just because computers work with URLs, e-mail addresses and complex directory systems, why should we? Kenjin does away with all this, and gives users access to documents and web pages based on what they are about rather than where they are stored or what keywords they contain."

Autonomy's "Dynamic Reasoning Engine", which underlies Kenjin and Autonomy's other software products, lets companies automate handling "unstructured" information. What Lynch calls "doing the next bit" can be categorising the information, like recognising an e-mail as a request for Tinky Winky to attend a child's birthday party, or linking it to other information, such as automatically writing hypertext links, or perhaps routing it according to its subject.

Lynch developed the Dynamic Reasoning Engine to cope with the new economy's dirty little secret: the information explosion. He did so with the help of the pattern-recognition theorem of one Thomas Bayes, now pushing up daisies in Tunbridge Wells. Allow Dr Lynch to explain.

"There's actually a really nasty little problem at the bottom of the whole of the new economy that no one likes to talk about," he says. "If you take a computer and you give it something in prose, it understands about the same amount as your dog does - basically absolutely nothing. At very best it can spot whether a few keywords are in it.

"As soon as you want to do anything with that type of information, a human has to get involved - ie, it's manual." In the new economy, he says, "There's been this massive shift from using information in a form that's very friendly to the computer - things like databases - to using prose, which the academic would call unstructured information - e-mail, web pages, news stories.

"Various figures say the amount of that type of information in use is doubling every three months at the moment. If you take any large company, about 80 per cent of their valuable knowledge is in that form - a Lucent or a BT or a Daimler Chrysler. You've got a fundamental problem here... you can't afford to hire twice as many people every three months. It's crunch time."

Lynch disparages the two other attempts at solving the problem of exploding information. The first is a human reading then manually tagging everything, which Lynch says is "incredibly labour-intensive", not to mention inaccurate. "If you tag general news to very big categories like 'basketball', you end up with about 800 tags," he says. If you take this to the level of 'college' and not just 'basketball', for instance, "You'll end up with 32,000 tags."

The other way is "to duck the whole problem and throw it into some sort of search engine". That's no good, either. "Those technologies are just using keywords - they can't understand context, which means you get a horribly large amount of irrelevant information." And, he says, especially for technologies like Wireless Application Protocol (WAP), where the interface is tiny, "Everyone thinks the more information you can find on something, the better. It's actually completely the opposite. You want a very small amount of very, very accurate information."

Autonomy's pattern-recognition code has some other killer applications, including personalisation for WAP phones, but going much further. "You could walk into a shop, type in a barcode and it would come back with quotes telling you automatically where's the cheapest place to buy that product," Lynch explains.

How does it work, without giving away the family recipe? Lynch explains. "The first thing is to understand the power of subtlety," he says. "Computers are very blunt instruments - they're obsessed with what's true and what's false. I'm afraid in the real world, things are usually only partly true or partly false."

Using Lynch's approach, a computer can recognise what a document is actually about. "If I tell you I've got the words 'sea', 'ice', 'fish', 'flightless', 'feather', 'detergent', the probability that this document isn't about the effects of oil pollution on penguins the birds is vanishingly small," rather than, say, Penguin books. Taking a lot of weak information together is powerful, he says: "Rather than using the black and the white, I'm using the shades of grey."

The technology also lets computers 'learn' the relationships between words, such as the fact that, in some contexts, the words "laptop" and "portable" might mean the same thing. Pattern recognition does what for some 20 years linguistics has failed to do. "You can't just use linguistic ideas to solve linguistic problems," says Lynch. "The dog walked into the room; it was furry. There's no rule of language to tell you what the 'it' refers to. You just have to know that, generally speaking, dogs are more furry than rooms."

Has Lynch got plans to extrapolate from the work of any other 18th-century scientist-clerics? "We're a lot more recent," he says - he's looking to some work that Claude Shannon, "the father of information theory", did in the 1940s, "...that affects things like encryption and telecommunications". His work showed, says Lynch, "paraphrasing very greatly, that the value of a piece of information is proportional to how unexpected it is.

"If I say to you, 'In the carpark there are...' and the next word is 'cars', it's quite likely that'll be the next idea, but it's also very boring. If I say, 'in the carpark there are rioters', it's unlikely but it's also more interesting." This will mean your computer may one day be able not only to understand documents, e-mails and web pages, but "make some sort of assessment as to which is more important".

One day, then, if Lynch's interest in information theory bears technological fruit, your mobile phone might not only locate potential perfect strangers for you - it might even decide which one to ask out for coffee.

 

Kenjin is available for free download at www.kenjin.com

Comments