Focused web crawling using decay concept and genetic programming

Mahdi Bazarganigilani, Ali Syed, Sandid Burki

Research output: Contribution to journalArticlepeer-review

117 Downloads (Pure)

Abstract

The ongoing rapid growth of web information is a theme of research in many papers. In this paper, weintroduce a new optimized method for web crawling. Using genetic programming enhances the accuracyof simialrity measurement. This measurement applies to different parts of the web pages including thetitle and the body. Consequently, the crawler uses such optimized similarity measurement to traverse thepages .To enhance the accuracy of crawling, we use the decay concept to limit the crawler to theeffective web pages in accordance to search criteria. The decay measurements give every page a scoreaccording to the search criteria. It decreases while traversing in more depth. This value could be revisedaccording to the similarity of the page to the search criteria. In such case, we use three kinds ofmeasurement to set the thresholds. The results show using Genetic programming along the dynamicdecay thresholds leads to the best accuracy.
Original languageEnglish
Pages (from-to)1-12
Number of pages12
JournalInternational Journal of Data Mining & Knowledge Management Process
Volume1
Issue number1
Publication statusPublished - Jan 2011

Fingerprint

Dive into the research topics of 'Focused web crawling using decay concept and genetic programming'. Together they form a unique fingerprint.

Cite this