Web robot detection in the scholarly information environment

Paul Huntington, David Nicholas, Hamid Mahmuei

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

An increasing number of robots harvest information on the world wide web for a wide variety of purposes. Protocols developed at the inception of the web laid out voluntary procedures in order to identify robot behaviour, and exclude it if necessary. Few robots now follow this protocol and it is now increasingly difficult to filter for this activity in reports of on-site activity. This paper seeks to demonstrate the issues involved in identifying robots and assessing their impact on usage in regard to a project which sought to establish the relative usage patterns of open access and non-open access articles in the Oxford University Press published journal Glycobiology, which offers in a single issue articles in both forms. A number of methods for identifying robots are compared and together these methods found that 40% of the raw logs of this journal could be attributed to robots.
Original languageEnglish
Pages (from-to)726-741
Number of pages16
JournalJournal of Information Science
Volume34
Issue number5
DOIs
Publication statusPublished - 2008

Fingerprint Dive into the research topics of 'Web robot detection in the scholarly information environment'. Together they form a unique fingerprint.

Cite this