Fast and easy way to cut a long story short

Summarising a text is a key human skill. But can software do it? Roderick Neil Kay looks at progress
Automatic summarising has long been considered one of the most prized goals in artificial intelligence, or AI. Working summarisers have now finally appeared, although they are based on far more superficial techniques.

The ability to summarise a text is founded on core intellectual skills, so much so that the trials in the task still appear in numerous IQ and recruitment tests, including those run for the elite of the Civil Service.

Like many other products involving natural language, the new summarising technology doesn't quite live up to its billing. Summarisers have now been developed by a host of companies, including Oracle, InXight and BT, all based on similar statistical techniques. The simplest such method runs as follows. First, find the most frequently occurring words in a text, excluding trivial words. Second, locate the sentences in which groups of these words occur. Third, extract these sentences and compile them in chronological order. Then describe the compilation as a summary, without blinking.

One of the most surprising things about the statistical techniques at the centre of the new software is that the ideas have been around for a long time, almost before the dawn of AI, in fact. As early as 1958, while working for IBM, Luhn ran some experiments on a corpus of technical articles, using the algorithm just described. He was enthusiastic about the results: "The auto-abstract is perhaps the first example of a machine- generated equivalent of a completely intellectual task in the field of literature evaluation."

But at the time, text retrieval wasn't the hot topic it is today, and his idea was never marketed in the form of software. The view within the AI community, which has always aimed at getting computers to understand language, has been that statistical techniques are OK as far they go, but if you consider what a human can do, that isn't very far. The emergence at this point of the new summarisers probably owes as much to bumped-up demand as it does to advancement in the field.

While the recent crop of summarisers fall reassuringly short of human performance, their wide availability should generate the kind of interest which leads to improvement. And anyway, enough of human condescension; let us allow the computer to speak - or rather to summarise - for itself. The following extract is a 20 per cent summary of this article, produced by BT's Netsumm.

"Automatic summarising has long been considered one of the most prized goals in artificial intelligence, but working summarisers have now finally appeared based on far more superficial techniques. Summarisers have now been developed by a host of companies not normally associated with text processing: Oracle, InXight and British Telecom, all based on similar statistical heuristics. The view within the AI community, which has always aimed at getting computers to understand language, has been that statistical techniques are OK as far they go, but if you consider what a human can do, it isn't very far"n

Comments