The ongoing rapid growth of web information is a theme of research in many papers. In this paper, weintroduce a new optimized method for web crawling. Using genetic programming enhances the accuracyof simialrity measurement. This measurement applies to different parts of the web pages including thetitle and the body. Consequently, the crawler uses such optimized similarity measurement to traverse thepages .To enhance the accuracy of crawling, we use the decay concept to limit the crawler to theeffective web pages in accordance to search criteria. The decay measurements give every page a scoreaccording to the search criteria. It decreases while traversing in more depth. This value could be revisedaccording to the similarity of the page to the search criteria. In such case, we use three kinds ofmeasurement to set the thresholds. The results show using Genetic programming along the dynamicdecay thresholds leads to the best accuracy.
|Number of pages||12|
|Journal||International Journal of Data Mining & Knowledge Management Process|
|Publication status||Published - Jan 2011|