[ Previous | Home | Index | Next ]
Part 9: Understand Your Engines
Effective searching requires understanding how best to utilize the features of your search services. But, Internet searching is a highly-competitive, dynamic area. New search engines are cropping up continually, others are folding or being acquired, and feature sets change almost daily in order to keep pace.
This part is a comprehensive overview of the state of search services on the Internet as of Spring 1999, updated from the first version a year previous. When published, it was already possibly dated. The authors therefore take no responsibility for the accuracy or completeness of the information herein. Hey, we're just doing the best job we can. But we do make mistakes ....
Topic 34: Some Caveats: The Dynamic Search Business
Searching on the Internet extends from the quick question, for which a lot of information is known to exist, to serious and purposeful research on esoteric topics. Casual users simply surfing or posing the quick question likely do not need an understanding of query syntax and construction nor search engine features and operation. This tutorial is definitely geared to those who want to spend the time to get more enjoyment and results from serious searches.
As of early 1997, some 600 search services were known to exist on the Internet. Recent citations have noted as many as 1,800 and one Web site, www.beaucoup.com, includes references to more than 1,400 [15] our own estimate is 11,000 on the low side and likely many times more that amount [16, 38]. Recent major engines including Galaxy, Magellan and WebCrawler, have gone out of business or been acquired by competitors. Major partnerships have been formed and some apparently separate engines, such as AOL NetFind, are branded implementations of other services (in this case, Excite). Entirely new services, such as Direct Hit, have also begun in the past year and achieved early prominence. The industry is clearly in flux.
This dynamism makes it impossible to keep absolutely current on the state of Internet search services. The information presented herein is a best-faith effort to provide an accurate snapshot of its state as of Spring 1999. The authors or BrightPlanet.com LLC make no representations as to the accuracy or completeness of the information presented.
The authors do not intend endorsement by virtue of whether a search service is listed herein. The decision as to which engines to include as major ones comes from one of the more authoritative Web sites on search engines, www.searchenginewatch.com [39]. The engines included in that service were used to define which search services in this tutorial were classified as "major."
Additional updates of this tutorial are likely. The authors welcome identification of errors or provision of additional, useful information. These updates and corrections will be reflected in future versions. |
| Topic 35: Duplication, Coverage and Responsiveness
The best estimates of the number of publicly-available documents on the Internet hover around 800 million [11]. The fact that the numbers available are simply estimates and differ greatly is an indication of how little is truly known about the size of the Internet and the completeness with which search services cover it.
The same Science article by Steve Lawrence and Lee Giles of the NEC Research Institute from which the larger estimate was drawn also is the reference for much of the information on search engine duplication and coverage.
Lawrence and Giles (L&G)analyzed coverage of 575 mostly scientific or technical queries posed by researchers at their institute in December 1997. Krishna Bharat and Andrei Broder (B&B) of the Digital Systems Research Center recently conducted a similar study with nearly equivalent methodology [21]. Here are their findings for coverage of the Internet by six of the major services, all of which do full-text indexing (in other words, a directory service like Yahoo was not included in their analysis): |
| Search Engine |
% Combined Coverage |
% Coverage of Total Web |
| |
B & B |
L & G |
B & B |
L & G |
| HotBot |
48% |
58% |
42% |
34% |
| AltaVista |
62% |
47% |
50% |
28% |
| Northern Light |
|
33% |
|
20% |
| Excite |
20% |
23% |
17% |
14% |
| Infoseek |
17% |
17% |
15% |
10% |
| Lycos |
|
4% |
|
3% |
|
| The combined coverage figure refers to what percentage of searches were successfully returned by that engine. Because none of the engines comprehensively covered the Internet, the percent coverage of the total Web represents the authors' estimate of gaps and overlap.
One of the main conclusions of both studies is that no search engine indexes more than about one-third to one-half of the publicly-available documents on the Internet. By applying these figures to the known documents these services have indexed as of late 1997, the authors were able to come up with their estimates of 200 million to 320 million total documents on the Web. Even still, the authors believed their size estimate to be a lower bound, expecting the "true size of the Web to be much larger" than their methodology suggests. [12]
Three additional conclusions from these L&G study deserve mention. First, submitting queries to multiple search engines greatly increases the amount of results obtainable. They estimated that combining queries to the six engines studied increased the likelihood of finding results by a factor of 3.5.
Second, they found surprisingly little duplication between the engines. With the largest two engines, HotBot and AltaVista, the number of duplicates was only 18% [40].
And, third, they found that "dead links," that is pages listed on the search engines but no longer in existence, ranged from 1.6% to 5.3%. Though not universally true, there tended to be a correlation of engines that indexed more documents, such as HotBot, with a higher incidence of dead links. This result should not be surprising, in that significant effort must be expended to maintain a larger database, and the room for error and untimeliness is higher.
Of course, size is not all that matters on the Internet. Many search engines justifiably make the argument that better and more accurate beats bigger. As a searcher, your interests should be on the quality of results. What perhaps is most disturbing, then, is that many quality results may not be indexed by the major engines in use. This possible lack of coverage is likely not a concern if the search topic is one of a broad, widespread nature. But, if looking for technical information or that which is inherently not part of the mainstream, these results are not comforting.
There is perhaps a serious methodological flaw at the heart of the Science article analysis. Recall two things: First, the subject of the analysis was technical queries; and, second, the nature of how items get listed initially by search engines.
Full-text search engines get their listings in one of two ways. Either a site developer submits one or more Web addresses asking the engine to index it (in which case it is then scheduled for a later full-site indexing). Or, the 'spiders' used by the engines to find new content on the Web encounter the site and then include it. Spiders depend on linkages from prior sites to identify new ones. Information tucked away in the nooks and crannies of the Internet in other words, some of the most specific information you may be trying to obtain may have few if any links to them. Without links, or without prior notification by the developers, spiders will only chance upon new sites.
Because businesses tend to actively seek listings on search engines, it is not at all clear that the lack of coverage implied by the Science article would apply to this sector. By focusing on technical searches, the authors could therefore have significantly overestimated the lack of coverage on the Internet. Whether coverage is better or worse for different subject areas or for different focuses on the Web is unknown at this time.
As professional information searchers have come to well understand, individual search engines can return outstanding results that are found on no other engines [41]. For this reason, and the reason of inadequate coverage by those engines, you should always submit your important queries to multiple search engines. |
| Topic 36: Boolean or Not?
For serious searching, perhaps the most important first choice facing you is choice of search engines. Which search engines better cover the topics you are interested in? Which support the search features that will enable you to find what you want?
Not all searches are created equal. The increasing ability of some search engines to take your requests in context, and then enable you to narrow results based on your first attempt, is a promising development. Certainly being able to type in a few words and then begin receiving documents of value bodes well for common-topic searches. We ourselves use this approach when quick searches are needed.
We doubt, however, the ability of search engines in the near term to improve on this process for complicated searches or for hard-to-find information. Not only is coverage of such topics weak for a given engine, but the ability to anticipate refinements is weakened by the need to categorize information into levels insufficiently specific to the difficult query.
Thus, for difficult search topics, we still must recommend the use of search engines with full Boolean support. Only you know what information you are seeking (even though it may be ill-defined or abstract). With full Boolean searching, you have complete control to find what you seek.
This recommendation, however, exacerbates the lack of coverage of any given search engine. By definition, hard-to-find information is not well-indexed, meaning you will likely need to use more than one search engine to get the robust results you desire.
Topic 37: A Comparison of 100 Search Services
A listing comparing major features of 100 of the largest search services on the Web is shown below. For a larger listing of about 2,500 to 3,000 search services, see [16]. |
Search
Service |
URL
Address |
Boolean
Operators |
Results/
Page |
Multiple
Pages ? |
Max.
Listings |
| @BRINT Business Research |
www.brint.com |
AND,OR,()," |
|
No |
|
| AlbanyNet |
www.albany.net |
--- |
10 |
Yes |
|
| AltaVista UseNet |
www.altavista.digital.com |
--- |
10 |
Yes |
200 |
| AltaVista UseNet Advanced |
www.altavista.digital.com |
AND,OR,(),NOT,NEAR,",* |
30 |
Yes |
|
| AltaVista WEB |
www.altavista.digital.com |
--- |
10 |
Yes |
200 |
| AltaVista WEB Advanced |
www.altavista.digital.com |
AND,OR,(),NOT,NEAR,",* |
10 |
Yes |
200 |
| American Memory Collection Search |
lcweb2.loc.gov |
--- |
20 |
Yes |
|
| America's Job Bank Search Index |
www.ajb.dni.us |
--- |
200 |
No |
200 |
| AOL NetFind |
www.aol.com |
AND,OR,(),NOT,,", |
40 |
Yes |
|
| AquaLink |
www.aqualink.com |
--- |
|
|
40 |
| ArchNet Archaeology |
spirit.lib.uconn.edu |
AND,OR,(),NOT,", |
200 |
No |
200 |
| BizWeb |
www.bizweb.com |
# |
200 |
No |
200 |
| c|net News |
www.news.com |
AND,OR,(),NOT,", |
25 |
Yes |
500 |
| c|net Search.Com |
www.search.com |
--- |
10 |
Yes |
|
| c|net Shareware.com |
www.shareware.com |
--- |
100 |
|
|
| CBS Sportsline |
cbs.sportsline.com |
--- |
25 |
Yes |
|
| CNN Database |
www.cnn.com |
--- |
10 |
Yes |
|
| CNNfn the financial network |
www.cnnfn.com |
--- |
100 |
|
|
| CollegeNET |
collegenet.com |
|
200 |
No |
200 |
| Computer Gaming World |
cgw.gamespot.com |
--- |
50 |
Yes |
|
| DejaNews |
www.dejanews.com |
AND,OR,(),NOT,NEAR,* |
50 |
Yes |
|
| Discovery Channel Online Search |
www.discovery.com/whatsonline/search.aspl |
--- |
10 |
Yes |
|
| Encarta Online |
find.msn.com |
AND,OR,(),NOT,NEAR,* |
50 |
Yes |
|
| Environmental Organization Web Directory |
www.webdirectory.com |
--- |
|
|
|
| EuroFerret |
www.euroferret.com/ |
--- |
10 |
Yes |
|
| Excite |
www.excite.com |
AND,OR,(),NOT,,", |
20 |
Yes |
|
| Excite News Tracker |
excite.com |
AND,OR,(),NOT,,", |
10 |
Yes |
|
| Explorer-K-12 Math/Science |
unite.ukans.edu |
--- |
|
|
|
| Forum One Online Discussion Forums |
www.forumone.com |
--- |
|
|
|
| Galaxy |
www.einet.net |
--- |
20 |
Yes |
|
| HotBot |
hotbot.com |
--- |
100 |
Yes |
|
| HotBot Advanced |
www.hotbot.com |
AND,OR,(),NOT,,", |
100 |
Yes |
|
| IBM Infomarket-Research Reports |
www.infomarket.ibm.com |
--- |
15 |
Yes |
|
| Inference FIND |
www.inference.com/infind/ |
AND,OR,(),NOT,,",* |
|
No |
|
| Infohiway |
www.infohiway.com |
--- |
30 |
Yes |
|
| Infomine (Internet Enabling Tools) |
lib-www.ucr.edu/search/ucr_enbsearch.aspl |
AND,OR,(),# |
|
|
|
| Infoseek |
www.infoseek.com |
AND,OR,NOT |
50 |
Yes |
200 |
| Internet ArtResources |
artresources.com/search.aspl-ssi |
AND,OR,(),NOT,",* |
|
No |
|
| Jayde Online Directory |
www.jayde.com |
--- |
50 |
No |
50 |
| Lawcrawler |
www.lawcrawler.com |
AND,OR,(),NOT,NEAR,",* |
10 |
Yes |
|
| Librarians' Index to the Internet |
sunsite.Berkeley.EDU |
--- |
200 |
No |
200 |
| LinkMonster |
www.linkmonster.com |
AND,OR,(),NOT,",* |
200 |
Yes |
|
| LinkStar |
www.linkstar.com |
--- |
10 |
Yes |
|
| Liszt, the Mailing List Directory |
www.liszt.com |
AND,OR,(),NOT," |
|
|
|
| Lycos Pro |
www.lycos.com |
AND,OR,(),NOT,NEAR,", |
40 |
Yes |
|
| Magellan |
www.mckinley.com |
AND,OR,(),NOT,,", |
10 |
Yes |
|
| Mamma Search Engines |
mamma.com |
--- |
10 |
Yes |
|
| Metacrawler |
www.metacrawler.com |
--- |
30 |
Yes |
|
| Microsoft(r) |
www.microsoft.com/search/default.asp |
AND,OR,(),NOT,NEAR,",* |
10 |
Yes |
|
| Northern Light |
www.northernlight.com |
--- |
25 |
Yes |
|
| OneLook Dictionaries |
www.onelook.com |
--- |
|
No |
|
| Open Text |
www.opentext.com |
--- |
10 |
Yes |
None |
| Orientation.com Asia |
www.orientation.com |
AND,OR,(),NOT,NEAR,",* |
|
|
|
| PC World Online |
www.pcworld.com |
--- |
|
No |
|
| PlanetSearch |
www.planetsearch.com |
--- |
10 |
Yes |
|
| Point's Top 5% |
www.pointcom.com |
--- |
10 |
Yes |
|
| Product Review Net |
www.productreviewnet.com |
--- |
|
No |
|
| PubMed National Library of Medicine |
www.ncbi.nlm.nih.gov |
AND,OR,(),NOT,",* |
|
|
|
| Reference.com (Mailing List) |
www.reference.com |
AND,OR,(),NOT,NEAR,",* |
200 |
No |
200 |
| SavvySearch |
www.savvysearch.com |
--- |
|
|
|
| Science Fiction Review Archives |
julmara.ce.chalmers.se |
--- |
20 |
Yes |
|
| searchUK |
www.searchuk.com |
AND,OR,(),NOT,",* |
|
|
|
| Social Science Information Gateway |
/www.sosig.ac.uk |
AND,OR,NOT,* |
|
|
|
| Spry Internet Wizard |
www.sprynet.com |
--- |
|
No |
|
| Surf Point |
www.surfpoint.com |
--- |
30 |
Yes |
|
| The Sporting News |
www.sportingnews.com |
--- |
|
|
|
| The United Nations |
www.un.org |
--- |
|
|
|
| Time Magazine Online |
www.pathfinder.com/time |
--- |
10 |
Yes |
|
| WebCrawler |
www.WebCrawler.com |
AND,OR,(),NOT,,", |
25 |
Yes |
|
| WebCrawler News |
search.excite.com/wc |
AND,OR,(),NOT,,", |
25 |
Yes |
|
| What's New Too! |
newtoo.manifest.com |
--- |
25 |
Yes |
|
| Windows 95 Magazine Search |
www.win95mag.com |
--- |
|
No |
|
| World Wide Arts Resources |
world-arts-resources.com |
',# |
|
No |
|
| WWW Virtual Law Library |
www.law.indiana.edu |
--- |
20 |
|
|
| WWW Virtual Library-US Government Information |
iridium.nttc.edu |
--- |
100 |
|
|
| WWWomen |
www.wwwomen.com |
--- |
|
|
|
| Yahoo |
search.yahoo.com |
--- |
20 |
Yes |
Varies |
| Yahooligans |
www.yahooligans.com |
--- |
25 |
Yes |
|
|
| Topic 38: Features of the Top 10 Search Services
Based on March 1999 rankings from Media Metrix [42], and including service characterizations from Search Engine Watch [39] and Search Engine Showdown [43], the chart below compares features for the major search services on the Web. Included in this listing are search engines (SE), and directories (D). For a further description of search service types, see Topic 2; for a description of the features listed, see Topic 33. Specific notes on some of the services are appended at the end of the table.
Some of the listed features are coded. These codes represent the authors judgment as to the completeness of a feature compared to other services in the listing:
means the feature is deemed to be as complete as others
means the feature is not as complete as others offered\
or does not provide full functionality
A blank means that service does not offer the feature shown.
As before, we do not imply endorsement nor claim complete accuracy for the features presented. You are always advised to consult the online help topics for any given services. Features and sometimes syntax change on a periodic basis. |
| |
Y
A
H
O
O
|
I
N
F
O
S
E
E
K
(
G
O
)
|
E
X
C
I
T
E |
L
Y
C
O
S |
A
L
T
A
V
I
S
T
A |
S
N
A
P |
H
O
T
B
O
T |
A
B
O
U
T
.
C
O
M |
L
O
O
K
S
M
A
R
T |
G
O
T
O |
| GENERAL |
|
|
|
|
|
|
|
|
|
|
| Ranking by User Base |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
| Type |
D |
SE |
SE |
SE |
SE |
D |
SE |
D |
D |
SE |
| Unique Visitors/mo (Mill pages) |
31.0 |
21.2 |
16.7 |
16.1 |
10.5 |
9.8 |
7.4 |
5.8 |
4.8 |
4.1 |
| Size (Mill pages) |
1.2 |
45 |
55 |
50 |
150 |
110 |
110 |
0.4 |
1.0 |
110 |
| Pageviews/Visitor |
176 |
22 |
64 |
23 |
21 |
9 |
13 |
--- |
10 |
--- |
| STRUCTURED QUERIES |
|
|
|
|
|
|
|
|
|
|
| Complete Boolean |
|
|
y |
y |
y |
y |
y |
y |
|
|
| Stemming |
y |
y |
|
y |
y |
y |
y |
|
y/n |
y |
| Case Sensitive |
|
y |
|
|
y |
y/n |
y/n |
|
|
y/n |
| Phrases |
y |
y |
y |
y |
y |
y |
y |
y |
|
y |
| AND |
y |
y |
y |
y |
y |
y |
y |
y |
y |
y |
| OR |
y |
y |
y |
y |
y |
y |
y |
y |
|
y |
| NOT |
y |
y |
y |
y |
y |
y |
y |
y |
|
|
| NEAR |
|
|
|
y |
y |
|
|
y |
|
|
| BEFORE |
|
|
|
y |
|
|
|
|
|
|
| AFTER |
|
|
|
y |
|
|
|
|
|
|
| '( ) |
|
|
y |
y |
y |
y |
y |
|
|
|
| INDEXING |
|
|
|
|
|
|
|
|
|
|
| Separate Names/Titles |
|
y |
|
|
|
|
|
|
|
|
| Metatag |
|
y |
|
y/n |
y |
y |
y |
|
|
y |
| Title |
y |
y |
y |
y |
y |
y |
y |
y |
|
y |
| Body |
|
y |
y |
y |
y |
y |
y |
|
|
y |
| ALT Tags |
|
y |
|
y |
y |
|
|
|
|
|
| Comments |
|
y |
|
|
|
y |
y |
|
|
y |
| RESULTS RANKINGS |
|
|
|
|
|
|
|
|
|
|
| Relevancy |
5 |
2 |
1 |
4 |
3 |
2 |
2 |
5 |
5 |
2 |
| Results Clustering |
|
y |
y/n |
y |
|
y |
y |
|
|
y |
| Suggest Related Terms |
|
|
y |
|
y |
y |
y |
|
|
y |
| Find Similar Pages |
|
y |
y |
y |
|
|
|
|
|
|
| User Specified |
|
|
|
|
y |
|
|
|
|
|
| FILTERS |
|
|
|
|
|
|
|
|
|
|
| Date |
y/n |
|
|
|
y/n |
y |
y |
|
|
y |
| File/Media Types |
|
|
|
|
y/n |
y |
y |
|
|
y |
| LANGUAGE CHOICES |
|
|
|
|
|
|
|
|
|
|
| Language |
|
|
|
y |
y |
y |
y |
|
|
|
| Special Characters |
|
|
|
|
y |
|
|
|
|
|
| SPECIAL SEARCH OPTIONS |
|
|
|
|
|
|
|
|
|
|
| People'Automatic Phrase Attempt |
|
|
|
|
y |
|
|
|
|
|
| People's Names |
y |
|
|
|
|
y |
|
|
y |
|
| Text |
y |
y |
y |
y |
y |
y |
y |
|
|
y |
| Depth |
|
|
|
|
|
y/n |
y |
|
|
|
| Anchor |
|
|
|
|
y |
|
|
|
|
|
| Applet |
|
|
|
|
y |
|
y |
|
|
|
| Domain |
|
y |
|
|
y |
y |
y |
|
|
|
| Host |
|
y |
y |
|
y |
|
y |
|
|
|
| Image |
|
|
|
|
y |
y/n |
y/n |
|
|
|
| MP3 (music) |
y |
|
|
y |
|
|
|
|
y/n |
|
| Link |
|
y |
|
|
y |
y |
y |
|
|
|
| Title |
y |
y |
|
y |
y |
y |
y |
|
|
|
| URL |
y |
y |
|
y |
y |
|
y |
|
|
|
| NOTES |
YA |
IS |
EX |
LY |
AV |
SN |
HB |
MC |
LS |
GT |
|
|
| The key for how the services determine relevance is: 1 -- 3/4 star review; 2 -- metatag keywords; 3 -- title keywords, popularity; 4 -- none; 5 -- in title, higher in category tree.
All of the directory services link to a standard search engine if their own listings do not satisfy the query. The directories and their associated engines are: Yahoo! - Inktomi; About.com - AltaVista; and LookSmart - AltaVista. There are also differences between the services that license the Inktomi engine: Snap, HotBot and GoTo. While all of these index and score pages in a similar manner, the options presented to the user can differ quite substantially, with HotBot providing the most power, GoTo the simplest interface.
Specific service notes are:
YA people searching uses the Four11 specialty engine
IS need to use commas to separate phrases and hyphenate words that need to appear next to one another; word within brackets are found if within 100 words of one another
EX employs 'morphological analysis' to suggest refinement words for the keywords entered into a search
LY can specify NEAR, BEFORE, FAR word distances. Lycos announced in April 1999 that it was going to switch its search engine service to a directory structure using Netscape's Open Directory format, an unprecendented move
AV specialized functions for usenet searches; advanced searching turns off relevancy ranking (can hand enter); allows translation from different languages
SP uses special search options through advanced search page with dropdown lists
HB special search options through what Hotbot calls meta words
MC paid experts provide listings in about 600 topic areas LS uses AltaVista as source engine; presents results with category options for each entry; entries reviewed by editors
GT simplest interface of all of the services.
The options shown in the table are often noted by different terms by the services that support them, and usually involve special syntax rules. Sometimes, too, the descriptions of how these features operate is difficult to find from the main pages of the services. Directly consult each service's home page; and, then, try consulting advanced or power searching, the help sections or the frequently asked questions (FAQ) areas to read about the special operators and their rules.
Another useful resource, though based on relatively small sample sizes, is Greg Notess' Search Engine Showdown [43]. This site reports dead link percentages, unique hits, overlap and some other different statistics.
Topic 39: Specialty Engines
Specialty engines have the advantage of cataloging information particular to a narrow topic area, thus potentially increasing coverage versus the general search services. This advantage, however, often comes at the cost of not providing you with the search options and flexibility that the general services provide.
One of the most complete catalogs of Internet search engines is found at www.beaucoup.com, listed below for English-oriented services, by its breakdown of more than 1,400 search engines and major topic area [44]: |
| Category |
Count |
|
Category |
Count |
| General |
76 |
|
School Listings/Student Aids |
31 |
| Multiple/Meta |
28 |
|
Educational Resources |
28 |
| Radio/TV |
18 |
|
Music/Sounds |
35 |
| Publications |
38 |
|
Arts/Graphics |
46 |
| Regional - Global |
5 |
|
New Sites/Reviews |
18 |
| Regional - Americas |
42 |
|
Science/Nature/Technology |
49 |
| Regional - Europe |
124 |
|
Business Directories |
51 |
| Regional - Asia/Aust/Africa/+ |
73 |
|
Email/Domains/People |
16 |
| Software - Windows |
30 |
|
Computer/Programming |
41 |
| Software - Other |
15 |
|
Webmasters |
20 |
| Reference |
64 |
|
Internet/WWW |
32 |
| Language |
25 |
|
Social/Political/Environment |
30 |
| Literature |
25 |
|
Politics/Government/Law |
64 |
| Health and Fitness |
18 |
|
Finance/Consumer |
51 |
| Foods and Diet |
29 |
|
Malls/Classifieds |
14 |
| Medicine |
33 |
|
Large Corporations |
67 |
| Hobbies/Rec/Pets/Games |
59 |
|
Potpourri |
46 |
| Employment Listers |
45 |
|
|
|
| Corporate Employers |
63 |
|
Total |
1,449 |
|
|
| Be aware some of these services catalog information that is not normally spidered or indexed by the general search services.
Not shown on the table above are searches in languages other than English. For example, major search alternatives are provided in the languages of Dutch, Spanish, German, Japanese, French, and specialty search services are offered in perhaps another 30 languages or so. For regional alternatives, Yahoo alone provides 12 different country-based search services and another 12 focused specifically on individual U.S. metropolitan areas. Similar diversification is occurring with other major search services.
These specializations are natural and reflect the huge size of the Internet (plus, obviously, the fact that English is not the only language used on the Web!). This specialization trend is likely to continue.
Other search engine directories that are comprehensive listings of other specialty engines on the Web are:
http://www.searchpower.com
http://www.123go.com/drw/search/search.htm
http://www.dreamscape.com/frankvad/search.aspl
http://www.finderseeker.com/
The latter in fact seems to have the largest listing; you will need to poke around some to get them. To page individually through about 26 pages (about 2,500 engines), try this URL, and then continue paging using the 'Next Page' option at the bottom of each page:
http://www.finderseeker.com/cgi-bin/search.cgi?disp=99&sp=1&cat=&key=&country=
Depending on the topics of your searches, you are encouraged to test out and try these listing services.
Topic 40: Some Other Services to Watch
There are a number of other search services that bear watching, either because of new and unique search technology, or because of partnering or plans that may cause them to become big players:
- AOL Netfind this service, a branded implementation of Excite, deserves note because of AOL's largest installed user base on the Web. See http://www.aol.com/netfind/.
- AskJeeves the premise of this service is to accumulate questions posed by searchers to obtain specific information. AskJeeves already has a database of many million questions; if the question that you pose is new, they add it to their listing. If information has not already been compiled by AskJeeves staff in response to previous questions, the service provides a "smart" question query to leading search engines such as AltaVista, Yahoo!, Infoseek and Webcrawler. Though a long-time Web presence, AskJeeves has recently gotten prominence. Compaq is a major investor and Dell has a branded service called AskDudley. Recent major financing suggests this service is one to watch. See http://www.askjeeves.com.
- BrightPlanet BrightPlanet has uncovered a vast reservoir of Internet content that is 500 times larger than the known World Wide Web the "deep" Web. What makes the discovery of the deep Web so significant is the quality of content found within. There are literally hundreds of billions of highly valuable documents hidden in searchable databases that can not be retrieved by existing search engines. And BrightPlanet controls the only technology that can search it automatically. This discovery is the result of groundbreaking search technology developed over three years by BrightPlanet called the LexiBotTM the first and only search technology capable of identifying, retrieving, qualifying, classifying and organizing "surface" and "deep" content from the World Wide Web. The LexiBotTM allows searchers to dive deep and explore hidden data from multiple sources simultaneously using directed queries. For the first time, businesses, researchers and consumers now have access to the most valuable and hard-to-find information on the Web, and can retrieve it with pinpoint accuracy. See www.CompletePlanet.com to learn more about the Deep Web and download a free trial version of the LexiBotTM.
- DirectHit Direct Hit tracks which search results links users click and how long users stay at each site. The longer a user stays at a site, the higher it's ranked. As such, DirectHit is more of a "popularity" engine than a general search service. Management of DirectHit has been quoted as indicating they intend on adding a general search component to the site. See http://www.directhit.com.
- FAST begun in Norway, this new service has the most ambitious plans reported to date in indexing the entire Web. Already the site has 80 million documents indexed, with claims to reach 200 million by Summer 1999 with an eventual goal for 1 billion documents. This may be a player to watch for those who desire complete Web indexing. Dell is a partner with FAST. Lycos has also partnered with FAST for MP3 music searches. See http://web.fast.no/fast.php3.
- Google! Originally developed at Stanford, Google has gone commercial. According to the company, it uses a complicated mathematical analysis, calculated on more than a billion hyperlinks on the web, to return high-quality search results. This analysis allows Google to estimate the quality, or importance, of every web page it returns in conjunction with standard text retrieval criteria. The key difference of Google's approach is the premise that the more influential sites that link to a result, the better it is. See http://www.google.com.
- iAtlas a proclaimed "filter" service, iAtlas claims to add filter capabilities for segments/topics, dates, addresses and document sizes to the basic Inktomi (same as HotBot) search capability. Testing of the filtering capabilities proved disappointing; however, the service was just announced in the past few months and may be experiencing some start-up pains. IAtlas wants to act like Inktomi in branding for specific vertical markets. See http://www.iatlas.com.
- Metacrawler part of the Go2Network, this metasearcher metasearches Lycos, Infoseek, WebCrawler, Excite, AltaVista, Thunderstone, About.com., Looksmart, and Yahoo. Results are combined with duplicates removed and presented in the standard format with description. See http://www.metacrawler.com.
- MSN Microsoft has most recently chosen the Inktomi engine to power its MSN search site. See http://home.microsoft.com/.
- OneView this unique service is based on accumulating bookmarks from individual submitters and then placing them in a comprehensive subject structure. The concept is intriguing because bookmarks often tend to be pre-screened, quality, comprehensive sites. Complete access to the site requires a free sign-in. The site is from Germany; much of its current documentation is in German. See http://www.oneview.com/.
Topic 41: Some Perplexing Behaviors
Search engines may not always perform as indicated on their help pages. These differences are due to constant changes in how they handle their service, strange quirks relating to their scoring and indexing methodologies, errors made by the developers of Web pages, and decisions the service may make to speed performance at high-traffic volume periods. We can illustrate some of these quirks using our standard AltaVista search source [1].
Let's take the most simple example we've used throughout this tutorial: posing the query query. Depending on the time of day, during a 24 hour period AltaVista returned results ranging from 671,424 counts to 712,851 counts. This indicates that at times of high traffic limits are placed internally on the search. In the worst cases, AltaVista or other engines may even provide a message that the server is busy, and prompt you to return at a later time to obtain results. Most often when this error occurs, a quick re-issue of the query will obtain results.
Here's another example of quirky behavior when comparing results counts from two different formats (text vs. phrase) using our basic query, and using the simple and advanced search forms on AltaVista:
| Form/Query |
"bird*" |
bird* |
| Simple Form |
3,935,170 |
3,935,170 |
| Advanced Form |
1,836,261 |
1,836,261 |
The advanced form returned only half of the results of the simple (this query was posed at a low-demand time). Sometimes, too, the unquoted version on the advanced search turns up no results matching the query. Clearly, both time of day and search form can affect results.
Order on a seemingly equivalent pair of queries can also produce slightly different results. Let's compare these two queries:
"bird*" AND falcon
vs.
falcon AND "bird*"
The first query produces a results count of 9,581; the second a count of 9,228.
Of course, counts are not what you the searcher wants when you search. The actual results pages for these examples were quite similar. But it's useful to realize that how an engine operates exactly may not be clear or consistent.
As Topic 21 and Topic 26 note, you must also exercise care in the use of capitalization, special characters and special terms. What you are thinking you are posing as a query may not be evaluated as such by specific engines.
This is not meant to be a criticism of the search engines, or of AltaVista in particular. Many anomalies occur because of improperly formatted Web pages. And, after all, engines are indexing millions of pages in very short periods of time and need to provide snappy response in all instances. The fact they do accurately index very high percentages is remarkable. But, you, as a searcher, should be aware results are not foolproof. |
|
[ Previous | Home | Index | Next ]
|
Copyright © 2001. BrightPlanet.com LLC. All rights reserved.
Privacy and site use policies.
Problems or suggestions? Report quality suggestions or site problems here.
|