Is your site lost in cyberspace?

You can help potential visitors find their way to your site if you understand how search engines work
Click to follow
The Independent Online

IT ISuniversally agreed that finding information on the Web can be damnirritating. It's not impossible, of course. If it was, no onewould use it (and we'd be out of jobs).

IT ISuniversally agreed that finding information on the Web can be damnirritating. It's not impossible, of course. If it was, no onewould use it (and we'd be out of jobs).

Most people, bothnovice and experienced Net users, will start their quest for information withone of the many search engines. They type in a description of what they areseeking - say, a Marx Brothers movie clip from A Night at the Opera -so they type in "Marx Brothers", hit "search" and are whiskedaway to what is usually a massive list of possible Web pages that might, justmight, have information that is relevant to them.

Naturally if youhave that famous movie clip of Groucho trying to stuff the entire crew of thecruise ship into his cabin, then you want one of those websites placedtowards the top, if not at the very top, of the search results list.Yet, you are potentially competing against thousands of other Web pages thathave the key words "Marx" and "Brothers" on them, some ofwhich might be extolling the virtues of the proletariat rather then containingthe comedic genius.

So, how do you separate Karl Marx'sphilosophies from the Groucho Marx's shtick? Vladimir Lenin'srevolution from John Lennon's "Revolution?" How do you make surethat the content you have gets found by the people who need it?

The placeto start is by understanding how the search engines that your potential visitorswill be using work.

Different flavours

Although the outcome is often thesame - a list of search results - there are really two different types ofsearch engines on the Web: crawlers and directories. These two methodsdiffer primarily in the ways that they gather the data from which they createtheir index of sites, which is then searched.

Crawlers:Crawlers, such as AltaVista or Excite, use a program called a spider,which "crawls" through the Web, indexing pages along the way.Visitors can then search through the results that the spider finds.However, if a change is made to a Web page, the spider has to crawlthrough that page again before the change is detected. The World Wide Web isa really big place, so it might take a while for the spider to get backagain.

Directories: Unlike the active crawlers, passive directoriesrequire website creators (or whoever wants to do it) to register a sitein their index. The advantage of a directory such as Yahoo! or DMOZ,is that they are far more selective as to what content is indexed, sosearches tend to be more focused and produce more accurate results.However, directories are also harder to keep up to date, especially if asite has to be checked by a human being before entry. The other greatadvantage of a directory is that the searcher can actually bypass the searchengine, and find what they are looking for by narrowing down the subject byselecting from lists of increasingly specific topics.

Hybrid SearchEngines: Several search engines, for example, Yahoo!, willallow you to search indexes created both by crawlers and directoriessimultaneously. This allows you to deploy the advantages of both techniquesat the same time.

The parts of a search engine

Whether the search engineuses a crawler or a registration directory to get its data, they all have atleast two parts in common: the index and the search software.

TheIndex: All of the content that gets crawled by the spider, and/or allof the entries in the directory get placed into the index. If the searchengine uses a spider, then this massive database can contain every page thathas been crawled, making it a carbon copy of the Web. If the searchengine uses a directory, then only the titles, URLs, and descriptionsof Web pages are included in the index.

The Search Software: When avisitor uses a search engine, they first enter one or more keywords. Theindex is then sifted through by search software which matches the keyword(s) to Web pages and ranks them in order of relevance. So,how does the search software make the crucial decision as to which pages are morerelevant, and thus closer to the top of the list, than others?

Ranking Web sites

Most search engines that use a crawler to produce themassive amounts of data used to search, determine relevancy by following aset of rules that stay more or less consistent across products. If someone isusing the search engine to find the words "Marx Brothers", the searchengine will check to see:

Which pages have these words in the<title>.

Which pages one or both of the words appear on.

Howclose to the top the words appear, assuming that the closer the words are tothe beginning of a page the more relevant that page is.

How frequently thewords appear on the page.

How close the words appear together.

Andafter considering all of these criteria, it produces the list of sites inorder of relevancy. Well, almost.

Secret ingredients

While all ofthe major search engines follow this basic recipe, if all search enginesworked exactly same way, then we would only need one search engine. Somecrawlers index more pages than others, while many directories will use humanbeings to evaluate submitted websites. All search engines will put their ownspin on searching to differentiate themselves from the competition.

Nextweek, I'll go further in depth into some of the secret ingredients thatdifferent search engines use to find your site. And then over the followingweeks I'll be taking a look at how to optimise a site for searching, andsome of the resources online to help you get a handle on the search enginemonster.

Jason Cranford Teague is the author of 'DHTML For the WorldWide Web'. If you have questions, you can find a complete archive of thiscolumn at WebbedEnvironments or send e-mails to: