Is your site lost in cyberspace?

You can help potential visitors find their way to your site if you understand how search engines work

News in pictures
News in pictures
On Facebook
From the blogs

Thanks to The Sun, for enriching each of our lives

Those at the super-soaraway Sun are, yet again, making outlandish claims that they’ve changed the wo...

Ones to watch: Aiden Grimshaw to Hey Sholay

With so much new music coming out it’s difficult to keep track of what’s out there. It’s a lucky dip...

Banter Bigotry: It’s only a joke, love

Banter is a very odd thing. As an activity it provides a handy shelter for bigots to flex their ant...

Fighting out of the Fringes: taking a school show to the Edinburgh Fringe

When I first thought about taking a group of ten Year 13 students to the Edinburgh Fringe Festival i...

IT ISuniversally agreed that finding information on the Web can be damnirritating. It's not impossible, of course. If it was, no onewould use it (and we'd be out of jobs).

IT ISuniversally agreed that finding information on the Web can be damnirritating. It's not impossible, of course. If it was, no onewould use it (and we'd be out of jobs).

Most people, bothnovice and experienced Net users, will start their quest for information withone of the many search engines. They type in a description of what they areseeking - say, a Marx Brothers movie clip from A Night at the Opera -so they type in "Marx Brothers", hit "search" and are whiskedaway to what is usually a massive list of possible Web pages that might, justmight, have information that is relevant to them.

Naturally if youhave that famous movie clip of Groucho trying to stuff the entire crew of thecruise ship into his cabin, then you want one of those websites placedtowards the top, if not at the very top, of the search results list.Yet, you are potentially competing against thousands of other Web pages thathave the key words "Marx" and "Brothers" on them, some ofwhich might be extolling the virtues of the proletariat rather then containingthe comedic genius.

So, how do you separate Karl Marx'sphilosophies from the Groucho Marx's shtick? Vladimir Lenin'srevolution from John Lennon's "Revolution?" How do you make surethat the content you have gets found by the people who need it?

The placeto start is by understanding how the search engines that your potential visitorswill be using work.

Different flavours

Although the outcome is often thesame - a list of search results - there are really two different types ofsearch engines on the Web: crawlers and directories. These two methodsdiffer primarily in the ways that they gather the data from which they createtheir index of sites, which is then searched.

Crawlers:Crawlers, such as AltaVista or Excite, use a program called a spider,which "crawls" through the Web, indexing pages along the way.Visitors can then search through the results that the spider finds.However, if a change is made to a Web page, the spider has to crawlthrough that page again before the change is detected. The World Wide Web isa really big place, so it might take a while for the spider to get backagain.

Directories: Unlike the active crawlers, passive directoriesrequire website creators (or whoever wants to do it) to register a sitein their index. The advantage of a directory such as Yahoo! or DMOZ,is that they are far more selective as to what content is indexed, sosearches tend to be more focused and produce more accurate results.However, directories are also harder to keep up to date, especially if asite has to be checked by a human being before entry. The other greatadvantage of a directory is that the searcher can actually bypass the searchengine, and find what they are looking for by narrowing down the subject byselecting from lists of increasingly specific topics.

Hybrid SearchEngines: Several search engines, for example, Yahoo!, willallow you to search indexes created both by crawlers and directoriessimultaneously. This allows you to deploy the advantages of both techniquesat the same time.

The parts of a search engine

Whether the search engineuses a crawler or a registration directory to get its data, they all have atleast two parts in common: the index and the search software.

TheIndex: All of the content that gets crawled by the spider, and/or allof the entries in the directory get placed into the index. If the searchengine uses a spider, then this massive database can contain every page thathas been crawled, making it a carbon copy of the Web. If the searchengine uses a directory, then only the titles, URLs, and descriptionsof Web pages are included in the index.

The Search Software: When avisitor uses a search engine, they first enter one or more keywords. Theindex is then sifted through by search software which matches the keyword(s) to Web pages and ranks them in order of relevance. So,how does the search software make the crucial decision as to which pages are morerelevant, and thus closer to the top of the list, than others?

Ranking Web sites

Most search engines that use a crawler to produce themassive amounts of data used to search, determine relevancy by following aset of rules that stay more or less consistent across products. If someone isusing the search engine to find the words "Marx Brothers", the searchengine will check to see:

Which pages have these words in the<title>.

Which pages one or both of the words appear on.

Howclose to the top the words appear, assuming that the closer the words are tothe beginning of a page the more relevant that page is.

How frequently thewords appear on the page.

How close the words appear together.

Andafter considering all of these criteria, it produces the list of sites inorder of relevancy. Well, almost.

Secret ingredients

While all ofthe major search engines follow this basic recipe, if all search enginesworked exactly same way, then we would only need one search engine. Somecrawlers index more pages than others, while many directories will use humanbeings to evaluate submitted websites. All search engines will put their ownspin on searching to differentiate themselves from the competition.

Nextweek, I'll go further in depth into some of the secret ingredients thatdifferent search engines use to find your site. And then over the followingweeks I'll be taking a look at how to optimise a site for searching, andsome of the resources online to help you get a handle on the search enginemonster.

Jason Cranford Teague is the author of 'DHTML For the WorldWide Web'. If you have questions, you can find a complete archive of thiscolumn at WebbedEnvironments or send e-mails to: jason@webbedenvironments.com

Independent Comment
blog comments powered by Disqus

Day In a Page

Child of the revolution: the Burmese family that democracy brought back together

Home of the free

The Burmese family that democracy brought back together
Cannes review: Canine accolade and Hitler's return are high spots amid the gloom

Cannes review

Frocks, canine accolade and Hitler's return
Robert Fisk: The going price of getting away with murder... would $33m be enough?

The going price of getting away with murder

Robert Fisk: The long view
Principled Skinner rises above the fray

Principled Skinner rises above the fray

Andy McSmith meets Dennis Skinner
Patrick Cockburn: I fear this terrible massacre will be the beginning of a long civil war in Syria

Patrick Cockburn

I fear this terrible massacre will be the beginning of a long civil war in Syria
Hardeep Singh Kohli: For me, it is all about 'Gregory's Girl', a record of first love

Hardeep Singh Kohli

For me, it is all about 'Gregory's Girl', a record of first love
Christian Louboutin: 'I don't think comfort equals happiness'

Christian Louboutin interview

'I don't think comfort equals happiness'
Happy birthday, Hotel Babylon!

Happy birthday, Hotel Babylon!

Hollywood's home to the A-list celebrates 100 years of discreet luxury
Rupert Cornwell: Low-rise capital could finally reach for the sky

Rupert Cornwell: Out of America

Low-rise capital could finally reach for the sky
The secret life of the red carpet

The secret life of the red carpet

As Cannes reaches its climax with the Palme d'Or and the celebrities gather in London for the Baftas tonight, Kate Youde and Jack Dean investigate the real star of the show
It's not easy being Professor Green: The rapper, the heiress and a drama made in Chelsea...

It's not easy being Professor Green

The rapper, the heiress and a drama made in Chelsea...
Hardcore, hard-wired: How the prevalence of porn is changing our everyday lives

How porn is changing our lives

It's everywhere - from pop videos to fashion magazines to the theatrical stage.
River Phoenix: the final reel

River Phoenix: the final reel

Twenty years after the actor's death, his last film is to be released
Facebook: The shares shenanigans

Facebook: The shares shenanigans

Investors are crying foul over the huge losses they incurred when the social network site floated on the stock market last week
Up and away – how '7 Up' went global

Up and away – how '7 Up' went global

As the last episode of Britain's '56 Up' airs, the first episode of '28 Up', from the former USSR, starts. Then there's the US, Japan, Germany...