Network: The Web untangled

Can't find what you're looking for on the Internet? Don't blame your search engine. The real culprits are sneaky webmasters and designers. Or so says David Peterschmidt, CEO of Inktomi.
Click to follow
The Internet is the place where you can find anything - as long as you know where to look. According to industry estimates, there are some 450 million documents on the Web today. Next year, that figure will probably be a billion.

For Net users, this richness of information has a downside: slow surfing and time-consuming, inaccurate Web searches. Anyone who has used the Net to try to find a product, or even a well-known company's home page, knows the frustration that can come with looking for a website.

In 1994, a group at the University of California at Berkeley, led by academics Eric Brewer and Paul Gaulthier, started a venture with the goal of providing an alternative to supercomputers. In the mid-1990s, the US government and universities were worried about the rising cost of supercomputing. The Berkeley group started to investigate ways of linking workstations together over a network to provide supercomputer power.

To test their hypothesis, the academics needed an application. They picked an Internet search engine, ideal because it is "scaleable". As the number of documents on the Internet increased, the team could add more computers to their network to make the search system more powerful.

The search engine idea rapidly gained ground and attracted interest from website developers. In 1996, the Berkeley group set up a company, Inktomi, to market the technology. The first customer was Wired Digital, the Internet arm of the US-based technology magazine. Inktomi's technology became the power behind Hotbot, Wired's search portal. According to David Peterschmidt, who joined Inktomi three months after its launch and is now the president and CEO, the original team was happy to work behind the scenes. "They didn't want to be a portal company," he explains. "They are engineering and infrastructure people, not media people."

Recently, Inktomi has become far more recognised by Web users. Inktomi provides search facilities for more than 40 websites, and several, such as Yahoo!, now choose to display Inktomi's logo on their portals. The company develops other tools alongside search, to improve users' experience of the Web. Inktomi's caching software, Traffic Server, makes Web surfing significantly faster even across low-speed connections. The package is used by some of the leading Internet service providers, including AOL.

Net users this side of the Atlantic have long complained about the US bias of search engines. Inktomi believes it has a solution with a new "cluster" of European search servers. The cluster is hosted in the UK by BT, which will market Inktomi's search engine as part of its package of Internet services.

If Inktomi can provide better Web searches, it will find a fertile market: research carried out for the company this month by Mori revealed that half of all UK Internet users were unhappy with the performance of their preferred search engine. The most common complaint is that search engines fail to produce relevant "hits".

"If a first-time user goes to the Internet and wants to look for a company, they may not get the company website right away," Peterschmidt admits. "But more and more the algorithms and the way that we help people find information mean they should be able to find it even if they are relatively new to the Internet."

Running a search engine is a complicated and increasingly automated process. In the early days, directories such as Yahoo! were created largely by hand. Human editors hunted on the Net for interesting sites and added the ones they liked to the listings. The sheer scale of the Web means that is no longer an option. Inktomi, along with other search companies, relies on automated "Web crawlers" that search the Net for sites and list their contents. The search engine then uses this information to build a database using keywords and mathematical algorithms. This database then responds to Web users' search requests with a list of documents, in descending order of relevance.

A search site's effectiveness depends on the efficiency of the Web crawler and of the indexing software. Inktomi's Web crawlers are designed to visit each site at least every 30 days. Ideally, says Peterschmidt, they should go back every 10 days. This is no easy task.

"There is a lot more content all the time," he explains. "The challenge is finding and indexing that, but maintaining relevancy. It has to bring you documents that are germane to what you are looking for. That requires continual research and development."

Users are becoming more experienced, and technology more sophisticated, leading Peterschmidt to believe that the Internet is becoming both more useful and easier to use.

"There are better search results than 12 months ago, even though the Web's content is four times larger," he says. "We are getting much better at finding the fine grain of information. We are also learning a lot from popularity: what are the most frequently visited sites? That helps us understand what users want, and how they want it to look and feel."

Inktomi's challenge goes beyond keeping ahead of the Web's incredible growth. Search engines are also locked in a constant struggle with website developers, who want their clients' pages to appear at the top of visitors' search results - whether or not the content merits it.

For David Peterschmidt, it is an ongoing game of hi-tech double bluff. His information scientists and software engineers develop ever more sophisticated ways to search and rank sites, and Web designers respond with tricks, ranging from pages with nothing but keywords to hidden "meta-tags" to fool the crawlers into putting their sites top of the list.

"There is a constant gaming that goes on between our technologists and the people who write the content," he says. "The people writing content ... hope that if they spray a page with the a keyword it will get them there faster, even though the document itself may not be too valuable. It is an ongoing battle." It is a battle, though, that David Peterschmidt appears to relish.

A game of tag, Page 13