Access the full text.
Sign up today, get DeepDyve free for 14 days.
Cricket Liu, Paul Albitz (1994)
DNS and BIND
T. Mizutani, T. Kondo (2000)
PD patterns and PD current shapes of a void in LDPEProceedings of the 6th International Conference on Properties and Applications of Dielectric Materials (Cat. No.00CH36347), 1
H. Snyder, H. Rosenbaum (1998)
How Public is the Web?: Robots, Access, and Scholarly Communication, 35
I. Ben-Shaul, Michael Herscovici, Michal Jacovi, Y. Maarek, D. Pelleg, Menachem Shtalhaim, Vladimir Soroka (1999)
Adding Support for Dynamic and Focused Search with FetuccinoComput. Networks, 31
M. Henzinger, Allan Heydon, M. Mitzenmacher, Marc Najork (2000)
On near-uniform URL samplingComput. Networks, 33
(1998)
The presentation of self in WWW home pages
H Snyder, H. Rosenbaum
Can search engines be used for Web‐link analysis? A critical review
A. Broder, Ravi Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, Raymie Stata, A. Tomkins, J. Wiener (2000)
Graph structure in the WebComput. Networks, 33
Claire Cockburn, Thomas Wilson (1996)
Business use of the World-Wide WebInf. Res., 1
Mike Thelwall Page 18 of 19
Stephanie Haas, Erika Grams (2000)
Readers, authors, and page structure: A discussion of four questions arising from a content analysis of web pagesJ. Am. Soc. Inf. Sci., 51
H. Snyder, H. Rosenbaum (1999)
Can search engines be used as tools for web-link analysis? A critical viewJ. Documentation, 55
J. Dean, M. Henzinger (1999)
Finding Related Pages in the World Wide WebComput. Networks, 31
Steve Kirsch (1998)
Infoseek's experiences searching the internetSIGIR Forum, 32
M. Thelwall (2001)
The Responsiveness of Search Engine Indexes
S. Lawrence, C. Giles (1999)
Accessibility of information on the webNature, 400
P. Ingwersen
Web impact factors
Charlotte Jenkins, M. Jackson, P. Burden, J. Wallis (1999)
Automatic RDF Metadata Generation for Resource DiscoveryComput. Networks, 31
M. Thelwall (2000)
Web impact factors and search engine coverageJ. Documentation, 56
A. Yakhnis, Wu Ke (1998)
Searching the world wide WebScience, 280 5360
J Callaghan, A. Pie
Business use of Internet Web sites‐could do better!
Junghoo Cho, H. Garcia-Molina, Lawrence Page (1998)
Efficient Crawling Through URL OrderingComput. Networks, 30
D. Hawking, Nick Craswell, P. Thistlewaite, D. Harman (1999)
Results and Challenges in Web Search EvaluationComput. Networks, 31
W. Koehler (1999)
Classifying Web sites and Web pagesJournal of Librarianship and Information Science, 31
H. Kamiya, K. Ohta, N. Kato, Glenn Mansfield, Y. Nemoto (1998)
An improved content search engine. Usage of network configuration informationProceedings of IEEE TENCON '98. IEEE Region 10 International Conference on Global Connectivity in Energy, Computer, Communication and Control (Cat. No.98CH36229), 1
Paat Rusmevichientong, David Pennock, S. Lawrence, C. Giles (2001)
Methods for Sampling Pages Uniformly from the World Wide Web
Craig Knoblock (1997)
Lycos : Design choices in an Internet search service
Candy Schwartz (1998)
Web Search EnginesJ. Am. Soc. Inf. Sci., 49
M. Thelwall (2000)
Effective websites for small and medium‐sized enterprisesJournal of Small Business and Enterprise Development, 7
U Manber, A Patel, J Robison
Experience personalisation with Yahoo!
Anthony Scime (2000)
Learning from the World Wide Web: Using Organizational Profiles in Information SearchesInforming Sci. Int. J. an Emerg. Transdiscipl., 3
Chee-Wai Ho, A. Goh (1999)
Jamaica: a World Wide Web profilerInternet Res., 9
S. Brin, Lawrence Page (1998)
The Anatomy of a Large-Scale Hypertextual Web Search EngineComput. Networks, 30
Tim Berners-Lee, R. Fielding, H. Nielsen (1993)
Hypertext transfer protocol--http/i
S Lawrence, C.L Giles
Science
J.P.H. Burden, M. Jackson (1999)
WWLib-TNG new direction in search engine technology
M Thelwall
Commercial Web sites: lost in cyberspace?
S. Lawrence, David Pennock, G. Flake, Robert Krovetz, Frans Coetzee, E. Glover, F. Nielsen, A. Kruger, C. Giles (2001)
Persistence of Web References in Scientific ResearchComputer, 34
Ravi Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins (1999)
Trawling the Web for Emerging Cyber-CommunitiesComput. Networks, 31
Robert Miller, K. Bharat (1998)
SPHINX: A Framework for Creating Personal, Site-Specific Web CrawlersComput. Networks, 30
Michael Herscovici, Michal Jacovi, Y. Maarek, D. Pelleg, Menachem Shtalhaim, Sigalit Ur (1998)
The Shark-Search Algorithm. An Application: Tailored Web Site MappingComput. Networks, 30
(2000)
Scraping the page
P. Ingwersen (1998)
The calculation of web impact factorsJ. Documentation, 54
J. Pitkow (1998)
Summary of WWW characterizationsWorld Wide Web, 2
S. Lawrence, C. Giles (1998)
Inquirus, the NECI Meta Search EngineComput. Networks, 30
T. Bray (1996)
Measuring the WebComput. Networks, 28
U. Manber, Ash Patel, J. Robison (2000)
Experience with personalization of Yahoo!Commun. ACM, 43
W Koehler
Classifying Web sites and Web pages: the use of metrics and URL characteristics as markers
M. Thelwall (2000)
Implications of Search Engine Coverage on the Viability of Business Web Sites
K. Bharat, A. Broder (1998)
A Technique for Measuring the Relative Size and Overlap of Public Web Search EnginesComput. Networks, 30
(1996)
What are WWW, hypertext and hypermedia?
(1993)
WorldWide Web Seminar
Tim Berners-Lee, R. Fielding, H. Nielsen (1997)
Hypertext Transfer Protocol - HTTP/1.1RFC, 2068
M. Thelwall (2000)
Who is using the .co.uk domain? Professional and media adoption of the webInt. J. Inf. Manag., 20
M. Perkowitz, Oren Etzioni (2000)
Adaptive Web sitesCommun. ACM, 43
(2000)
Estimating web properties by using search engines and random crawlers
Ziv Bar-Yossef, A. Berg, Steve Chien, Jittat Fakcharoenphol, Dror Weitz (2000)
Approximating Aggregate Queries about Web Pages via Random Walks
S. Lawrence, C. Giles (1999)
Searching the Web: general and scientific information accessFirst IEEE/POPOV Workshop on Internet Technologies and Services. Proceedings (Cat. No.99EX391)
M. Thelwall (2001)
Extracting macroscopic information from Web linksJ. Assoc. Inf. Sci. Technol., 52
V. Turau (1998)
What Practices Are Being Adopted on the Web?Computer, 31
Hooi-Im Ng, Ying Pan, Thomas Wilson (1998)
Business Use of The World Wide Web: a report on further investigationsInf. Res., 3
A. Ardö, S. Lundberg (1998)
A Regional Distributed WWW Search and Indexing Service - the DESIRE WayComput. Networks, 30
David Byers (1998)
Full-Text Indexing of Non-Textual ResourcesComput. Networks, 30
Junghoo Cho, H. Garcia-Molina (2000)
The Evolution of the Web and Implications for an Incremental Crawler
A. Spink, B. Jansen, H. Ozmultu (2000)
Use of query reformulation and relevance feedback by Excite usersInternet Res., 10
Kevin Crowston, M. Williams (1997)
Reproduced and emergent genres of communication on the World-Wide WebProceedings of the Thirtieth Hawaii International Conference on System Sciences, 6
Davood Rafiei, A. Mendelzon (2000)
What is this page known for? Computing Web page reputationsComput. Networks, 33
Alastair Smith (1999)
A Tale of Two Web Spaces: Comparing Sites Using Web Impact Factors.Journal of Documentation, 55
Michael Gordon, Praveen Pathak (1999)
Finding Information on the World Wide Web: The Retrieval Effectiveness of Search EnginesInf. Process. Manag., 35
M. Henzinger, Allan Heydon, M. Mitzenmacher, Marc Najork (1999)
Measuring Index Quality Using Random Walks on the WebComput. Networks, 31
Allan Heydon, Marc Najork (1999)
Mercator: A scalable, extensible Web crawlerWorld Wide Web, 2
Mark Overmeer (1999)
My personal search engineComput. Networks, 31
There have been many attempts to study the content of the Web, either through human or automatic agents. Describes five different previously used Web survey methodologies, each justifiable in its own right, but presents a simple experiment that demonstrates concrete differences between them. The concept of crawling the Web also bears further inspection, including the scope of the pages to crawl, the method used to access and index each page, and the algorithm for the identification of duplicate pages. The issues involved here will be well-known to many computer scientists but, with the increasing use of crawlers and search engines in other disciplines, they now require a public discussion in the wider research community. Concludes that any scientific attempt to crawl the Web must make available the parameters under which it is operating so that researchers can, in principle, replicate experiments or be aware of and take into account differences between methodologies. Also introduces a new hybrid random page selection methodology.
Internet Research – Emerald Publishing
Published: May 1, 2002
Keywords: Surveys; Indexes
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.