Second site Size does matter

Click to follow
The Independent Culture
I REALISE many of you were disappointed that I didn't supply a link on the Second Site Web pages to the search engine facility I described a few weeks ago, which allows a voyeuristic overview of the searches other people are running. As much of the function of that service is to provide easy access to pornography, supplying a link to it was out of the question. However, by way of compensation to those of you who felt shortchanged, I've posted a link to a site which answers the slow-burning question of the summer: what is that song at the beginning of The Sopranos, on the ride out to New Jersey?

A number of you who e-mailed me about the missing link intimated that you were merely curious to find out more about the workings of the Web. Search engines are indeed what makes the Web go round, and an ideal place to learn about their machinations is a site called Search Engine Watch, run by Danny Sullivan. There's plenty to watch: lists of engines and thumbnail descriptions of them, search tips for novices or power users, and the struggle over which search engine can claim the biggest index.

The size question was thrown into relief back in July, when the scientific journal Nature published an article rating the coverage offered by different engines. As Sullivan explains, a search engine has two basic elements. One is known as the "spider", or "crawler"; it skates across the Web by following the links between pages. The other is the index, which contains details of the pages the spider has visited. According to the Nature authors' estimates, there were about 800 million pages on the Web last February. The largest index belonged to an up and coming engine called Northern Light, which beat the better-known AltaVista service by a short head. But its 128 million pages still only represented 16 per cent of the Web, on these estimates.

FAST Search, an engine from Norway, took over the top spot in August with a 200 million-page index. Then Excite announced its completion of a 250 million-page index. Whether bigger will always be better is an open question. Sullivan notes that professional researchers tend to favour engines with large indexes such as AltaVista because they want to increase the likelihood of detecting obscure items. But larger indexes may also worsen the problem of picking out the needles of relevant information from the haystack of irrelevant Web pages returned by undiscriminating engines. Perhaps the solution would be smaller indexes for general-interest engines, and a range of engines geared to specialist needs. But, as Sullivan also points out, there may not be much profit in the latter.

In a survey published this year, the techie news service CNET examined the major engines AltaVista, Excite, HotBot, Infoseek and Lycos. It picked out HotBot, and also recommended a service called SavvySearch from a clutch of "metasearchers", which summon a number of different engines to search for the user's selected keywords. But for most users, there's little practical difference between search engines. You may as well choose the one with the interface you find most friendly. Or you may prefer a directory service, such as that offered by Yahoo! These are based on entries compiled by humans, not spiders, who provide brief descriptions of sites and organise them under subject headings. That's why Yahoo! searches typically return a list of sites from its million-strong directory, followed by a list of Web pages, which are supplied by the Inktomi search engine.

Yahoo! dates back to 1994, and remains the Web's favourite search source. It also has a site for the UK and Ireland, offering the option to restrict searches to the British Isles; a feature that makes trawling the Web in home waters much easier.

You can visit www.poptel.org.uk/secondsite for links to pages mentioned or contact Marek Kohn on secondsite@poptel.net

Comments