Network: Web Design: Finding an irritation bypass when in search of Marx

IT IS universally agreed that finding information on the Web can be damn irritating. It's not impossible, of course. If it was, no one would use it (and we'd be out of jobs).

Most people, both novice and experienced Net users, will start their quest for information with one of the many search engines. They type in a description of what they are seeking - say, a Marx Brothers movie clip from A Night at the Opera - so they type in "Marx Brothers", hit "search" and are whisked away to what is usually a massive list of possible Web pages that might, just might, have information that is relevant to them.

Naturally if you have that famous movie clip of Groucho trying to stuff the entire crew of the cruise ship into his cabin, then you want one of those websites placed towards the top, if not at the very top, of the search results list. Yet, you are potentially competing against thousands of other Web pages that have the key words "Marx" and "Brothers" on them, some of which might be extolling the virtues of the proletariat rather then containing the comedic genius.

So, how do you separate Karl Marx's philosophies from the Groucho Marx's shtick? Vladimir Lenin's revolution from John Lennon's "Revolution?" How do you make sure that the content you have gets found by the people who need it?

The place to start is by understanding how the search engines that your potential visitors will be using work.

Different flavours

Although the outcome is often the same - a list of search results - there are really two different types of search engines on the Web: crawlers and directories. These two methods differ primarily in the ways that they gather the data from which they create their index of sites, which is then searched.

Crawlers: Crawlers, such as AltaVista or Excite, use a program called a spider, which "crawls" through the Web, indexing pages along the way. Visitors can then search through the results that the spider finds. However, if a change is made to a Web page, the spider has to crawl through that page again before the change is detected. The World Wide Web is a really big place, so it might take a while for the spider to get back again.

Directories: Unlike the active crawlers, passive directories require website creators (or whoever wants to do it) to register a site in their index. The advantage of a directory such as Yahoo! or DMOZ, is that they are far more selective as to what content is indexed, so searches tend to be more focused and produce more accurate results. However, directories are also harder to keep up to date, especially if a site has to be checked by a human being before entry. The other great advantage of a directory is that the searcher can actually bypass the search engine, and find what they are looking for by narrowing down the subject by selecting from lists of increasingly specific topics.

Hybrid Search Engines: Several search engines, for example, Yahoo!, will allow you to search indexes created both by crawlers and directories simultaneously. This allows you to deploy the advantages of both techniques at the same time.

The parts of a search engine

Whether the search engine uses a crawler or a registration directory to get its data, they all have at least two parts in common: the index and the search software.

The Index: All of the content that gets crawled by the spider, and/or all of the entries in the directory get placed into the index. If the search engine uses a spider, then this massive database can contain every page that has been crawled, making it a carbon copy of the Web. If the search engine uses a directory, then only the titles, URLs, and descriptions of Web pages are included in the index.

The Search Software: When a visitor uses a search engine, they first enter one or more keywords. The index is then sifted through by search software which matches the key word(s) to Web pages and ranks them in order of relevance. So, how does the search software make the crucial decision as to which pages are more relevant, and thus closer to the top of the list, than others?

Ranking Web sites

Most search engines that use a crawler to produce the massive amounts of data used to search, determine relevancy by following a set of rules that stay more or less consistent across products. If someone is using the search engine to find the words "Marx Brothers", the search engine will check to see:

Which pages have these words in the

Which pages one or both of the words appear on.

How close to the top the words appear, assuming that the closer the words are to the beginning of a page the more relevant that page is.

How frequently the words appear on the page.

How close the words appear together.

And after considering all of these criteria, it produces the list of sites in order of relevancy. Well, almost.

Secret ingredients

While all of the major search engines follow this basic recipe, if all search engines worked exactly same way, then we would only need one search engine. Some crawlers index more pages than others, while many directories will use human beings to evaluate submitted websites. All search engines will put their own spin on searching to differentiate themselves from the competition.

Next week, I'll go further in depth into some of the secret ingredients that different search engines use to find your site. And then over the following weeks I'll be taking a look at how to optimise a site for searching, and some of the resources online to help you get a handle on the search engine monster.

Jason Cranford Teague is the author of 'DHTML For the World Wide Web'. If you have questions, you can find an archive of his column at Webbed Environments (www.webbedenvironments.com) or e-mail him at jason@webbedenvironments.com

Independent Comment
blog comments powered by Disqus
News in pictures
World news in pictures
Arts & Ents blogs

Owen Howells: From the UK to Australia and back again (and again!)

Owen Howells is a DJ/producer who grew up in Australia but was born in the UK. He came back to the U...

Brighton Fringe 2013 – Is everyone sitting uncomfortably?

Fancy seeing a play about serial killers? How about inviting a funeral director into your home for a...

The Fall ‘Darkness Visible’ – Series 1, episode 2

There are a good many moments in the second episode of this psychological thriller that deserve refl...

       
Independent
Travel Shop
India and Shimla
14 nights from only £1899pp Find out more
Prague city break
Three nights from £199pp Find out more
4* Soreda hotel break, Malta
Seven nights all-inclusive from £399pp Find out more

ES Rentals

    James Pembroke: The man who's eaten everywhere

    The man who's eaten everywhere

    Few people know more about restaurants than James Pembroke, who only spent five mealtimes at home during his entire childhood.
    A Berliner in 1963 – but did John F Kennedy once admire Adolf Hitler?

    A Berliner in 1963 – but did John F Kennedy once admire Adolf Hitler?

    The young JFK praised 'superior' Nordic races during visits to Germany
    Banned Iranian director Mohammad Rasoulof to attend Cannes Film Festival 2013, his first public appearance since prison

    Banned Iranian director to attend Cannes Film Festival

    Mohammad Rasoulof to make his first public appearance since being imprisoned three years ago
    Seeing the larger picture: Inspiring images of space

    Seeing the larger picture: Inspiring images of space

    An exhibition explores images how photography has shaped astronomy
    Eat Spam and carry on: Wartime pamphlets could teach us a thing or two about healthy, thrifty eating

    Eat Spam and carry on

    Wartime pamphlets could teach us a thing or two about healthy, thrifty eating
    Facial hair: Cat beards and the purrrsuit of excellence

    Facial hair

    Cat beards and the purrrsuit of excellence
    The 10 Best salt and pepper sets

    The 10 Best salt and pepper sets

    Whether they're for everyday use or to make your dining table look just right, it's worth getting a stylish shaker...
    Ferran Soriano: Predicting success if Manchester City 'vision' is followed

    Ferran Soriano: Predicting success if Manchester City 'vision' is followed

    Chief executive says trophies will come if a 'core' of suitable players is in place
    Thomas Müller: We couldn't handle losing a Champions League Final again

    Thomas Müller: We couldn't handle losing a Champions League Final again

    The Bayern Munich forward tells Tim Rich his side have to shed chokers' tag after two recent final defeats
    Giro d'Italia: The Stelvio Pass - cycling's killer climb

    The Stelvio Pass - cycling's killer climb

    As the Giro d'Italia tackles the brutal climb, Simon Usborne takes on the snow and switchbacks – and soon realises what the fuss is about
    National archives: Edward VIII’s phone calls - and how MI5 bugged them

    Edward VIII’s phone calls - and how MI5 bugged them

    Newly unearthed papers reveal a shocking extra dimension to the constitutional crisis over monarch’s abdication
    Sent down at the Old Bailey: A tour of the world's most famous court

    Sent down at the Old Bailey

    A tour of the world's most famous court
    Hollywood's random acts of red-carpet kindness

    Hollywood's random acts of red-carpet kindness

    The Hangover actor Zach Galifianakis’s date for his movie premieres isn’t arm candy  – it’s his 87-year-old friend who he saved from homelessness
    British football scores an own goal

    British football scores an own goal

    Many managers barely survive a year in post. Martin Baker talks to experts who make a case for clubs using forensic business skills to find the best staff
    James Lawton: Sergio Garcia cracks as major fault line opens up again

    James Lawton

    Sergio Garcia cracks as major fault line opens up again