TY - JOUR
T1 - Focused web crawling using decay concept and genetic programming
AU - Bazarganigilani, Mahdi
AU - Syed, Ali
AU - Burki, Sandid
PY - 2011/1
Y1 - 2011/1
N2 - The ongoing rapid growth of web information is a theme of research in many papers. In this paper, weintroduce a new optimized method for web crawling. Using genetic programming enhances the accuracyof simialrity measurement. This measurement applies to different parts of the web pages including thetitle and the body. Consequently, the crawler uses such optimized similarity measurement to traverse thepages .To enhance the accuracy of crawling, we use the decay concept to limit the crawler to theeffective web pages in accordance to search criteria. The decay measurements give every page a scoreaccording to the search criteria. It decreases while traversing in more depth. This value could be revisedaccording to the similarity of the page to the search criteria. In such case, we use three kinds ofmeasurement to set the thresholds. The results show using Genetic programming along the dynamicdecay thresholds leads to the best accuracy.
AB - The ongoing rapid growth of web information is a theme of research in many papers. In this paper, weintroduce a new optimized method for web crawling. Using genetic programming enhances the accuracyof simialrity measurement. This measurement applies to different parts of the web pages including thetitle and the body. Consequently, the crawler uses such optimized similarity measurement to traverse thepages .To enhance the accuracy of crawling, we use the decay concept to limit the crawler to theeffective web pages in accordance to search criteria. The decay measurements give every page a scoreaccording to the search criteria. It decreases while traversing in more depth. This value could be revisedaccording to the similarity of the page to the search criteria. In such case, we use three kinds ofmeasurement to set the thresholds. The results show using Genetic programming along the dynamicdecay thresholds leads to the best accuracy.
KW - Open access version available
KW - Decay Concept
KW - Focused Web Crawler
KW - Genetic Programming
KW - Similarity Space Model
M3 - Article
SN - 2230-9608
VL - 1
SP - 1
EP - 12
JO - International Journal of Data Mining & Knowledge Management Process
JF - International Journal of Data Mining & Knowledge Management Process
IS - 1
ER -