[ Previous | Home | Index | Next ]

Part 9: Understand Your Engines

Effective searching requires understanding how best to utilize the features of your search services. But, Internet searching is a highly-competitive, dynamic area. New search engines are cropping up continually, others are folding or being acquired, and feature sets change almost daily in order to keep pace.

This part is a comprehensive overview of the state of search services on the Internet as of Spring 1999, updated from the first version a year previous. When published, it was already possibly dated. The authors therefore take no responsibility for the accuracy or completeness of the information herein. Hey, we're just doing the best job we can. But we do make mistakes ....

Topic 34: Some Caveats: The Dynamic Search Business

Searching on the Internet extends from the quick question, for which a lot of information is known to exist, to serious and purposeful research on esoteric topics. Casual users simply surfing or posing the quick question likely do not need an understanding of query syntax and construction nor search engine features and operation. This tutorial is definitely geared to those who want to spend the time to get more enjoyment and results from serious searches.

As of early 1997, some 600 search services were known to exist on the Internet. Recent citations have noted as many as 1,800 and one Web site, www.beaucoup.com, includes references to more than 1,400 [15] our own estimate is 11,000 on the low side and likely many times more that amount [16, 38]. Recent major engines including Galaxy, Magellan and WebCrawler, have gone out of business or been acquired by competitors. Major partnerships have been formed and some apparently separate engines, such as AOL NetFind, are branded implementations of other services (in this case, Excite). Entirely new services, such as Direct Hit, have also begun in the past year and achieved early prominence. The industry is clearly in flux.

This dynamism makes it impossible to keep absolutely current on the state of Internet search services. The information presented herein is a best-faith effort to provide an accurate snapshot of its state as of Spring 1999. The authors or BrightPlanet.com LLC make no representations as to the accuracy or completeness of the information presented.

The authors do not intend endorsement by virtue of whether a search service is listed herein. The decision as to which engines to include as major ones comes from one of the more authoritative Web sites on search engines, www.searchenginewatch.com [39]. The engines included in that service were used to define which search services in this tutorial were classified as "major."

Additional updates of this tutorial are likely. The authors welcome identification of errors or provision of additional, useful information. These updates and corrections will be reflected in future versions.


Topic 35: Duplication, Coverage and Responsiveness

The best estimates of the number of publicly-available documents on the Internet hover around 800 million [11]. The fact that the numbers available are simply estimates and differ greatly is an indication of how little is truly known about the size of the Internet and the completeness with which search services cover it.

The same Science article by Steve Lawrence and Lee Giles of the NEC Research Institute from which the larger estimate was drawn also is the reference for much of the information on search engine duplication and coverage.

Lawrence and Giles (L&G)analyzed coverage of 575 mostly scientific or technical queries posed by researchers at their institute in December 1997. Krishna Bharat and Andrei Broder (B&B) of the Digital Systems Research Center recently conducted a similar study with nearly equivalent methodology [21]. Here are their findings for coverage of the Internet by six of the major services, all of which do full-text indexing (in other words, a directory service like Yahoo was not included in their analysis):


Search Engine % Combined Coverage % Coverage of Total Web
  B & B L & G B & B L & G
HotBot 48% 58% 42% 34%
AltaVista 62% 47% 50% 28%
Northern Light 33% 20%
Excite 20% 23% 17% 14%
Infoseek 17% 17% 15% 10%
Lycos 4% 3%

The combined coverage figure refers to what percentage of searches were successfully returned by that engine. Because none of the engines comprehensively covered the Internet, the percent coverage of the total Web represents the authors' estimate of gaps and overlap.

[You should always use multiple search services for your important queries.]One of the main conclusions of both studies is that no search engine indexes more than about one-third to one-half of the publicly-available documents on the Internet. By applying these figures to the known documents these services have indexed as of late 1997, the authors were able to come up with their estimates of 200 million to 320 million total documents on the Web. Even still, the authors believed their size estimate to be a lower bound, expecting the "true size of the Web to be much larger" than their methodology suggests. [12]

Three additional conclusions from these L&G study deserve mention. First, submitting queries to multiple search engines greatly increases the amount of results obtainable. They estimated that combining queries to the six engines studied increased the likelihood of finding results by a factor of 3.5.

Second, they found surprisingly little duplication between the engines. With the largest two engines, HotBot and AltaVista, the number of duplicates was only 18% [40].

And, third, they found that "dead links," that is pages listed on the search engines but no longer in existence, ranged from 1.6% to 5.3%. Though not universally true, there tended to be a correlation of engines that indexed more documents, such as HotBot, with a higher incidence of dead links. This result should not be surprising, in that significant effort must be expended to maintain a larger database, and the room for error and untimeliness is higher.

Of course, size is not all that matters on the Internet. Many search engines justifiably make the argument that better and more accurate beats bigger. As a searcher, your interests should be on the quality of results. What perhaps is most disturbing, then, is that many quality results may not be indexed by the major engines in use. This possible lack of coverage is likely not a concern if the search topic is one of a broad, widespread nature. But, if looking for technical information or that which is inherently not part of the mainstream, these results are not comforting.

There is perhaps a serious methodological flaw at the heart of the Science article analysis. Recall two things: First, the subject of the analysis was technical queries; and, second, the nature of how items get listed initially by search engines.

Full-text search engines get their listings in one of two ways. Either a site developer submits one or more Web addresses asking the engine to index it (in which case it is then scheduled for a later full-site indexing). Or, the 'spiders' used by the engines to find new content on the Web encounter the site and then include it. Spiders depend on linkages from prior sites to identify new ones. Information tucked away in the nooks and crannies of the Internet — in other words, some of the most specific information you may be trying to obtain — may have few if any links to them. Without links, or without prior notification by the developers, spiders will only chance upon new sites.

Because businesses tend to actively seek listings on search engines, it is not at all clear that the lack of coverage implied by the Science article would apply to this sector. By focusing on technical searches, the authors could therefore have significantly overestimated the lack of coverage on the Internet. Whether coverage is better or worse for different subject areas or for different focuses on the Web is unknown at this time.

As professional information searchers have come to well understand, individual search engines can return outstanding results that are found on no other engines [41]. For this reason, and the reason of inadequate coverage by those engines, you should always submit your important queries to multiple search engines.


Topic 36: Boolean or Not?

For serious searching, perhaps the most important first choice facing you is choice of search engines. Which search engines better cover the topics you are interested in? Which support the search features that will enable you to find what you want?

Not all searches are created equal. The increasing ability of some search engines to take your requests in context, and then enable you to narrow results based on your first attempt, is a promising development. Certainly being able to type in a few words and then begin receiving documents of value bodes well for common-topic searches. We ourselves use this approach when quick searches are needed.

We doubt, however, the ability of search engines in the near term to improve on this process for complicated searches or for hard-to-find information. Not only is coverage of such topics weak for a given engine, but the ability to anticipate refinements is weakened by the need to categorize information into levels insufficiently specific to the difficult query.

[Use search engines with full-text indexing and Boolean support for your most demanding queries.]Thus, for difficult search topics, we still must recommend the use of search engines with full Boolean support. Only you know what information you are seeking (even though it may be ill-defined or abstract). With full Boolean searching, you have complete control to find what you seek.

This recommendation, however, exacerbates the lack of coverage of any given search engine. By definition, hard-to-find information is not well-indexed, meaning you will likely need to use more than one search engine to get the robust results you desire.

Topic 37: A Comparison of 100 Search Services

A listing comparing major features of 100 of the largest search services on the Web is shown below. For a larger listing of about 2,500 to 3,000 search services, see [16].


Search
Service
URL
Address
Boolean
Operators
Results/
Page
Multiple
Pages ?
Max.
Listings
@BRINT — Business Research www.brint.com AND,OR,()," No
AlbanyNet www.albany.net --- 10 Yes
AltaVista UseNet www.altavista.digital.com --- 10 Yes 200
AltaVista UseNet Advanced www.altavista.digital.com AND,OR,(),NOT,NEAR,",* 30 Yes
AltaVista WEB www.altavista.digital.com --- 10 Yes 200
AltaVista WEB Advanced www.altavista.digital.com AND,OR,(),NOT,NEAR,",* 10 Yes 200
American Memory Collection Search lcweb2.loc.gov --- 20 Yes
America's Job Bank Search Index www.ajb.dni.us --- 200 No 200
AOL NetFind www.aol.com AND,OR,(),NOT,,", 40 Yes
AquaLink www.aqualink.com --- 40
ArchNet Archaeology spirit.lib.uconn.edu AND,OR,(),NOT,", 200 No 200
BizWeb www.bizweb.com # 200 No 200
c|net News www.news.com AND,OR,(),NOT,", 25 Yes 500
c|net Search.Com www.search.com --- 10 Yes
c|net Shareware.com www.shareware.com --- 100
CBS Sportsline cbs.sportsline.com --- 25 Yes
CNN Database www.cnn.com --- 10 Yes
CNNfn — the financial network www.cnnfn.com --- 100
CollegeNET collegenet.com 200 No 200
Computer Gaming World cgw.gamespot.com --- 50 Yes
DejaNews www.dejanews.com AND,OR,(),NOT,NEAR,* 50 Yes
Discovery Channel Online Search www.discovery.com/whatsonline/search.aspl --- 10 Yes
Encarta Online find.msn.com AND,OR,(),NOT,NEAR,* 50 Yes
Environmental Organization Web Directory www.webdirectory.com ---
EuroFerret www.euroferret.com/ --- 10 Yes
Excite www.excite.com AND,OR,(),NOT,,", 20 Yes
Excite News Tracker excite.com AND,OR,(),NOT,,", 10 Yes
Explorer-K-12 Math/Science unite.ukans.edu ---
Forum One — Online Discussion Forums www.forumone.com ---
Galaxy www.einet.net --- 20 Yes
HotBot hotbot.com --- 100 Yes
HotBot Advanced www.hotbot.com AND,OR,(),NOT,,", 100 Yes
IBM Infomarket-Research Reports www.infomarket.ibm.com --- 15 Yes
Inference FIND www.inference.com/infind/ AND,OR,(),NOT,,",* No
Infohiway www.infohiway.com --- 30 Yes
Infomine (Internet Enabling Tools) lib-www.ucr.edu/search/ucr_enbsearch.aspl AND,OR,(),#
Infoseek www.infoseek.com AND,OR,NOT 50 Yes 200
Internet ArtResources artresources.com/search.aspl-ssi AND,OR,(),NOT,",* No
Jayde Online Directory www.jayde.com --- 50 No 50
Lawcrawler www.lawcrawler.com AND,OR,(),NOT,NEAR,",* 10 Yes
Librarians' Index to the Internet sunsite.Berkeley.EDU --- 200 No 200
LinkMonster www.linkmonster.com AND,OR,(),NOT,",* 200 Yes
LinkStar www.linkstar.com --- 10 Yes
Liszt, the Mailing List Directory www.liszt.com AND,OR,(),NOT,"
Lycos Pro www.lycos.com AND,OR,(),NOT,NEAR,", 40 Yes
Magellan www.mckinley.com AND,OR,(),NOT,,", 10 Yes
Mamma Search Engines mamma.com --- 10 Yes
Metacrawler www.metacrawler.com --- 30 Yes
Microsoft(r) www.microsoft.com/search/default.asp AND,OR,(),NOT,NEAR,",* 10 Yes
Northern Light www.northernlight.com --- 25 Yes
OneLook Dictionaries www.onelook.com --- No
Open Text www.opentext.com --- 10 Yes None
Orientation.com — Asia www.orientation.com AND,OR,(),NOT,NEAR,",*
PC World Online www.pcworld.com --- No
PlanetSearch www.planetsearch.com --- 10 Yes
Point's Top 5% www.pointcom.com --- 10 Yes
Product Review Net www.productreviewnet.com --- No
PubMed — National Library of Medicine www.ncbi.nlm.nih.gov AND,OR,(),NOT,",*
Reference.com (Mailing List) www.reference.com AND,OR,(),NOT,NEAR,",* 200 No 200
SavvySearch www.savvysearch.com ---
Science Fiction Review Archives julmara.ce.chalmers.se --- 20 Yes
searchUK www.searchuk.com AND,OR,(),NOT,",*
Social Science Information Gateway /www.sosig.ac.uk AND,OR,NOT,*
Spry Internet Wizard www.sprynet.com --- No
Surf Point www.surfpoint.com --- 30 Yes
The Sporting News www.sportingnews.com ---
The United Nations www.un.org ---
Time Magazine Online www.pathfinder.com/time --- 10 Yes
WebCrawler www.WebCrawler.com AND,OR,(),NOT,,", 25 Yes
WebCrawler News search.excite.com/wc AND,OR,(),NOT,,", 25 Yes
What's New Too! newtoo.manifest.com --- 25 Yes
Windows 95 Magazine Search www.win95mag.com --- No
World Wide Arts Resources world-arts-resources.com ',# No
WWW Virtual Law Library www.law.indiana.edu --- 20
WWW Virtual Library-US Government Information iridium.nttc.edu --- 100
WWWomen www.wwwomen.com ---
Yahoo search.yahoo.com --- 20 Yes Varies
Yahooligans www.yahooligans.com --- 25 Yes

Topic 38: Features of the Top 10 Search Services

Based on March 1999 rankings from Media Metrix [42], and including service characterizations from Search Engine Watch [39] and Search Engine Showdown [43], the chart below compares features for the major search services on the Web. Included in this listing are search engines (SE), and directories (D). For a further description of search service types, see Topic 2; for a description of the features listed, see Topic 33. Specific notes on some of the services are appended at the end of the table.

Some of the listed features are coded. These codes represent the authors judgment as to the completeness of a feature compared to other services in the listing:

[ y ] means the feature is deemed to be as complete as others

[ y/n ] means the feature is not as complete as others offered\

    or does not provide full functionality

    A blank means that service does not offer the feature shown.

As before, we do not imply endorsement nor claim complete accuracy for the features presented. You are always advised to consult the online help topics for any given services. Features and sometimes syntax change on a periodic basis.

 
Y
A
H
O
O
I
N
F
O
S
E
E
K

(
G
O
)
E
X
C
I
T
E
L
Y
C
O
S
A
L
T
A
V
I
S
T
A
S
N
A
P
H
O
T
B
O
T
A
B
O
U
T
.
C
O
M
L
O
O
K
S
M
A
R
T
G
O
T
O
GENERAL
    Ranking by User Base 1 2 3 4 5 6 7 8 9 10
    Type D SE SE SE SE D SE D D SE
    Unique Visitors/mo (Mill pages) 31.0 21.2 16.7 16.1 10.5 9.8 7.4 5.8 4.8 4.1
    Size (Mill pages) 1.2 45 55 50 150 110 110 0.4 1.0 110
    Pageviews/Visitor 176 22 64 23 21 9 13 --- 10 ---
STRUCTURED QUERIES
    Complete Boolean y y y y y y
    Stemming y y y y y y y/n y
    Case Sensitive y y y/n y/n y/n
    Phrases y y y y y y y y y
    AND y y y y y y y y y y
    OR y y y y y y y y y
    NOT y y y y y y y y
    NEAR y y y
    BEFORE y
    AFTER y
    '( ) y y y y y
INDEXING
    Separate Names/Titles y
    Metatag y y/n y y y y
    Title y y y y y y y y y
    Body y y y y y y y
    ALT Tags y y y
    Comments y y y y
RESULTS RANKINGS
    Relevancy 5 2 1 4 3 2 2 5 5 2
    Results Clustering y y/n y y y y
    Suggest Related Terms y y y y y
    Find Similar Pages y y y
    User Specified y
FILTERS
    Date y/n y/n y y y
    File/Media Types y/n y y y
LANGUAGE CHOICES
    Language y y y y
    Special Characters y
SPECIAL SEARCH OPTIONS
    People'Automatic Phrase Attempt y
    People's Names y y y
    Text y y y y y y y y
    Depth y/n y
    Anchor y
    Applet y y
    Domain y y y y
    Host y y y y
    Image y y/n y/n
    MP3 (music) y y y/n
    Link y y y y
    Title y y y y y y
    URL y y y y y
NOTES YA IS EX LY AV SN HB MC LS GT

The key for how the services determine relevance is: 1 -- 3/4 star review; 2 -- metatag keywords; 3 -- title keywords, popularity; 4 -- none; 5 -- in title, higher in category tree.

All of the directory services link to a standard search engine if their own listings do not satisfy the query. The directories and their associated engines are: Yahoo! - Inktomi; About.com - AltaVista; and LookSmart - AltaVista. There are also differences between the services that license the Inktomi engine: Snap, HotBot and GoTo. While all of these index and score pages in a similar manner, the options presented to the user can differ quite substantially, with HotBot providing the most power, GoTo the simplest interface.

Specific service notes are:

YA — people searching uses the Four11 specialty engine
IS — need to use commas to separate phrases and hyphenate words that need to appear next to one another; word within brackets are found if within 100 words of one another
EX — employs 'morphological analysis' to suggest refinement words for the keywords entered into a search
LY — can specify NEAR, BEFORE, FAR word distances. Lycos announced in April 1999 that it was going to switch its search engine service to a directory structure using Netscape's Open Directory format, an unprecendented move
AV — specialized functions for usenet searches; advanced searching turns off relevancy ranking (can hand enter); allows translation from different languages
SP — uses special search options through advanced search page with dropdown lists
HB — special search options through what Hotbot calls meta words
MC — paid experts provide listings in about 600 topic areas LS — uses AltaVista as source engine; presents results with category options for each entry; entries reviewed by editors
GT — simplest interface of all of the services.

The options shown in the table are often noted by different terms by the services that support them, and usually involve special syntax rules. Sometimes, too, the descriptions of how these features operate is difficult to find from the main pages of the services. Directly consult each service's home page; and, then, try consulting advanced or power searching, the help sections or the frequently asked questions (FAQ) areas to read about the special operators and their rules.

Another useful resource, though based on relatively small sample sizes, is Greg Notess' Search Engine Showdown [43]. This site reports dead link percentages, unique hits, overlap and some other different statistics.

Topic 39: Specialty Engines

Specialty engines have the advantage of cataloging information particular to a narrow topic area, thus potentially increasing coverage versus the general search services. This advantage, however, often comes at the cost of not providing you with the search options and flexibility that the general services provide.

One of the most complete catalogs of Internet search engines is found at www.beaucoup.com, listed below for English-oriented services, by its breakdown of more than 1,400 search engines and major topic area [44]:


Category Count Category Count
General 76 School Listings/Student Aids 31
Multiple/Meta 28 Educational Resources 28
Radio/TV 18 Music/Sounds 35
Publications 38 Arts/Graphics 46
Regional - Global 5 New Sites/Reviews 18
Regional - Americas 42 Science/Nature/Technology 49
Regional - Europe 124 Business Directories 51
Regional - Asia/Aust/Africa/+ 73 Email/Domains/People 16
Software - Windows 30 Computer/Programming 41
Software - Other 15 Webmasters 20
Reference 64 Internet/WWW 32
Language 25 Social/Political/Environment 30
Literature 25 Politics/Government/Law 64
Health and Fitness 18 Finance/Consumer 51
Foods and Diet 29 Malls/Classifieds 14
Medicine 33 Large Corporations 67
Hobbies/Rec/Pets/Games 59 Potpourri 46
Employment Listers 45
Corporate Employers 63 Total 1,449

Be aware some of these services catalog information that is not normally spidered or indexed by the general search services.

Not shown on the table above are searches in languages other than English. For example, major search alternatives are provided in the languages of Dutch, Spanish, German, Japanese, French, and specialty search services are offered in perhaps another 30 languages or so. For regional alternatives, Yahoo alone provides 12 different country-based search services and another 12 focused specifically on individual U.S. metropolitan areas. Similar diversification is occurring with other major search services.

These specializations are natural and reflect the huge size of the Internet (plus, obviously, the fact that English is not the only language used on the Web!). This specialization trend is likely to continue.

Other search engine directories that are comprehensive listings of other specialty engines on the Web are:

http://www.searchpower.com
http://www.123go.com/drw/search/search.htm
http://www.dreamscape.com/frankvad/search.aspl
http://www.finderseeker.com/

The latter in fact seems to have the largest listing; you will need to poke around some to get them. To page individually through about 26 pages (about 2,500 engines), try this URL, and then continue paging using the 'Next Page' option at the bottom of each page:

http://www.finderseeker.com/cgi-bin/search.cgi?disp=99&sp=1&cat=&key=&country=

Depending on the topics of your searches, you are encouraged to test out and try these listing services.

Topic 40: Some Other Services to Watch

There are a number of other search services that bear watching, either because of new and unique search technology, or because of partnering or plans that may cause them to become big players:

  • AOL Netfind — this service, a branded implementation of Excite, deserves note because of AOL's largest installed user base on the Web. See http://www.aol.com/netfind/.
  • AskJeeves — the premise of this service is to accumulate questions posed by searchers to obtain specific information. AskJeeves already has a database of many million questions; if the question that you pose is new, they add it to their listing. If information has not already been compiled by AskJeeves staff in response to previous questions, the service provides a "smart" question query to leading search engines such as AltaVista, Yahoo!, Infoseek and Webcrawler. Though a long-time Web presence, AskJeeves has recently gotten prominence. Compaq is a major investor and Dell has a branded service called AskDudley. Recent major financing suggests this service is one to watch. See http://www.askjeeves.com.
  • BrightPlanet — BrightPlanet has uncovered a vast reservoir of Internet content that is 500 times larger than the known World Wide Web — the "deep" Web. What makes the discovery of the deep Web so significant is the quality of content found within. There are literally hundreds of billions of highly valuable documents hidden in searchable databases that can not be retrieved by existing search engines. And BrightPlanet controls the only technology that can search it automatically. This discovery is the result of groundbreaking search technology developed over three years by BrightPlanet called the LexiBotTM — the first and only search technology capable of identifying, retrieving, qualifying, classifying and organizing "surface" and "deep" content from the World Wide Web. The LexiBotTM allows searchers to dive deep and explore hidden data from multiple sources simultaneously using directed queries. For the first time, businesses, researchers and consumers now have access to the most valuable and hard-to-find information on the Web, and can retrieve it with pinpoint accuracy. See www.CompletePlanet.com to learn more about the Deep Web and download a free trial version of the LexiBotTM.
  • DirectHit — Direct Hit tracks which search results links users click and how long users stay at each site. The longer a user stays at a site, the higher it's ranked. As such, DirectHit is more of a "popularity" engine than a general search service. Management of DirectHit has been quoted as indicating they intend on adding a general search component to the site. See http://www.directhit.com.
  • FAST — begun in Norway, this new service has the most ambitious plans reported to date in indexing the entire Web. Already the site has 80 million documents indexed, with claims to reach 200 million by Summer 1999 with an eventual goal for 1 billion documents. This may be a player to watch for those who desire complete Web indexing. Dell is a partner with FAST. Lycos has also partnered with FAST for MP3 music searches. See http://web.fast.no/fast.php3.
  • Google! — Originally developed at Stanford, Google has gone commercial. According to the company, it uses a complicated mathematical analysis, calculated on more than a billion hyperlinks on the web, to return high-quality search results. This analysis allows Google to estimate the quality, or importance, of every web page it returns in conjunction with standard text retrieval criteria. The key difference of Google's approach is the premise that the more influential sites that link to a result, the better it is. See http://www.google.com.
  • iAtlas — a proclaimed "filter" service, iAtlas claims to add filter capabilities for segments/topics, dates, addresses and document sizes to the basic Inktomi (same as HotBot) search capability. Testing of the filtering capabilities proved disappointing; however, the service was just announced in the past few months and may be experiencing some start-up pains. IAtlas wants to act like Inktomi in branding for specific vertical markets. See http://www.iatlas.com.
  • Metacrawler — part of the Go2Network, this metasearcher metasearches Lycos, Infoseek, WebCrawler, Excite, AltaVista, Thunderstone, About.com., Looksmart, and Yahoo. Results are combined with duplicates removed and presented in the standard format with description. See http://www.metacrawler.com.
  • MSN — Microsoft has most recently chosen the Inktomi engine to power its MSN search site. See http://home.microsoft.com/.
  • OneView — this unique service is based on accumulating bookmarks from individual submitters and then placing them in a comprehensive subject structure. The concept is intriguing because bookmarks often tend to be pre-screened, quality, comprehensive sites. Complete access to the site requires a free sign-in. The site is from Germany; much of its current documentation is in German. See http://www.oneview.com/.

Topic 41: Some Perplexing Behaviors

Search engines may not always perform as indicated on their help pages. These differences are due to constant changes in how they handle their service, strange quirks relating to their scoring and indexing methodologies, errors made by the developers of Web pages, and decisions the service may make to speed performance at high-traffic volume periods. We can illustrate some of these quirks using our standard AltaVista search source [1].

Let's take the most simple example we've used throughout this tutorial: posing the query query. Depending on the time of day, during a 24 hour period AltaVista returned results ranging from 671,424 counts to 712,851 counts. This indicates that at times of high traffic limits are placed internally on the search. In the worst cases, AltaVista or other engines may even provide a message that the server is busy, and prompt you to return at a later time to obtain results. Most often when this error occurs, a quick re-issue of the query will obtain results.

Here's another example of quirky behavior when comparing results counts from two different formats (text vs. phrase) using our basic query, and using the simple and advanced search forms on AltaVista:

Form/Query "bird*" bird*
Simple Form 3,935,170 3,935,170
Advanced Form 1,836,261 1,836,261

The advanced form returned only half of the results of the simple (this query was posed at a low-demand time). Sometimes, too, the unquoted version on the advanced search turns up no results matching the query. Clearly, both time of day and search form can affect results.

Order on a seemingly equivalent pair of queries can also produce slightly different results. Let's compare these two queries:

"bird*" AND falcon
vs.

falcon AND "bird*"

The first query produces a results count of 9,581; the second a count of 9,228.

Of course, counts are not what you the searcher wants when you search. The actual results pages for these examples were quite similar. But it's useful to realize that how an engine operates exactly may not be clear or consistent.

As Topic 21 and Topic 26 note, you must also exercise care in the use of capitalization, special characters and special terms. What you are thinking you are posing as a query may not be evaluated as such by specific engines.

This is not meant to be a criticism of the search engines, or of AltaVista in particular. Many anomalies occur because of improperly formatted Web pages. And, after all, engines are indexing millions of pages in very short periods of time and need to provide snappy response in all instances. The fact they do accurately index very high percentages is remarkable. But, you, as a searcher, should be aware results are not foolproof.

[ Previous | Home | Index | Next ]

Copyright © 2001. BrightPlanet.com LLC. All rights reserved.
Privacy and site use policies.
Problems or suggestions? Report quality suggestions or site problems here.

[bottom.htm]