Indexing size approximation of WWW repository with leading information retrieval and web filtering robots

Ijaz Ali Shoukat, Mohsin Iftikhar, Abdul Haseeb

Research output: Contribution to journalArticlepeer-review

Abstract

The biggest information system of World Wide Web indexing is critical to estimate. Web is the beneficial and growing scientific utility like digital library to explore electronic literature to its lovers. Indexing estimation of WWW information is an open problem since 1998. Yahoo has claimed 19 billion web documents as its indexed size on which Google is not satisfied because in accordance with last published study by Gulli and Signorini the total "indexed web size" was around 11.5 billion pages. Web is growing hastily; what is the current size of web? Which search engine possesses large indexing of authentic information (PDF files)? Which search engine provides large indexing of all types of Web pages? This article provides the answers of all above questions. We estimated the index size of leading search engines (Google, Yahoo and MSN) under easy and cost effective approach because if easy way persists then why we select tough heuristics. Our technique relies on querying over the search engines with selected common affixes that can be a part of each and every document or web page. This paper concludes the total size of current "indexed web contents" and provides comparative analysis to support the scholars; which search engine has more authentic information and large indexing size.
Original languageEnglish
Pages (from-to)71-75
Number of pages5
JournalInternational Journal of Advanced Research in Computer and Communication Engineering (IJARCCE)
Volume2
Issue number3
Publication statusPublished - 2011

Fingerprint

Dive into the research topics of 'Indexing size approximation of WWW repository with leading information retrieval and web filtering robots'. Together they form a unique fingerprint.

Cite this