Fast and easy way to cut a long story short
Summarising a text is a key human skill. But can software do it? Roderick Neil Kay looks at progress
Tuesday 29 July 1997
The ability to summarise a text is founded on core intellectual skills, so much so that the trials in the task still appear in numerous IQ and recruitment tests, including those run for the elite of the Civil Service.
Like many other products involving natural language, the new summarising technology doesn't quite live up to its billing. Summarisers have now been developed by a host of companies, including Oracle, InXight and BT, all based on similar statistical techniques. The simplest such method runs as follows. First, find the most frequently occurring words in a text, excluding trivial words. Second, locate the sentences in which groups of these words occur. Third, extract these sentences and compile them in chronological order. Then describe the compilation as a summary, without blinking.
One of the most surprising things about the statistical techniques at the centre of the new software is that the ideas have been around for a long time, almost before the dawn of AI, in fact. As early as 1958, while working for IBM, Luhn ran some experiments on a corpus of technical articles, using the algorithm just described. He was enthusiastic about the results: "The auto-abstract is perhaps the first example of a machine- generated equivalent of a completely intellectual task in the field of literature evaluation."
But at the time, text retrieval wasn't the hot topic it is today, and his idea was never marketed in the form of software. The view within the AI community, which has always aimed at getting computers to understand language, has been that statistical techniques are OK as far they go, but if you consider what a human can do, that isn't very far. The emergence at this point of the new summarisers probably owes as much to bumped-up demand as it does to advancement in the field.
While the recent crop of summarisers fall reassuringly short of human performance, their wide availability should generate the kind of interest which leads to improvement. And anyway, enough of human condescension; let us allow the computer to speak - or rather to summarise - for itself. The following extract is a 20 per cent summary of this article, produced by BT's Netsumm.
"Automatic summarising has long been considered one of the most prized goals in artificial intelligence, but working summarisers have now finally appeared based on far more superficial techniques. Summarisers have now been developed by a host of companies not normally associated with text processing: Oracle, InXight and British Telecom, all based on similar statistical heuristics. The view within the AI community, which has always aimed at getting computers to understand language, has been that statistical techniques are OK as far they go, but if you consider what a human can do, it isn't very far"n
Diving in at the deep end is no excuse for shirking the style stakes
Life & Style blogs
Sustained immigration has not harmed Britons' employment, say government advisers
War is war: Why I stand with Israel
Even when it brutalises one of its own teenage citizens, America is helpless against Israel
Socialist Worker called to apologise over ‘vile’ article saying Eton schoolboy Horatio Chapple's death is ‘reason to save the polar bears’
Emergency data law: David Cameron plots to bring back snoopers’ charter
NUT strike: David Cameron announces crackdown on strike action ahead of mass industrial action
- 2 Why I'm on the brink of burning my Israeli passport
- 4 War is war: Why I stand with Israel
- 5 Blackest is the new black: Scientists have developed a material so dark that you can't see it...
£40000 - £60000 per annum + Benefits + Bonus: Harrington Starr: Dynamics CRM D...
£40000 - £45000 per annum + Benefits + Bonus: Harrington Starr: Web Developer ...
£50000 - £67000 per annum + Benefits + Bonus: Harrington Starr: C# R&D .NE...
£40000 - £50000 per annum + Benefits + Bonus: Harrington Starr: C# Developer (...