How many scientific articles on the Internet?

Professor Lee Giles (Lee Giles) from the College of information technology at the University of Pennsylvania a significant part of his career dedicated to the development of search engines for research articles to the academic community was convenient access to the materials.

Recently Professor published the first of its kind study, which evaluates the amount of available research articles on the Internet. Work "The Number of Scholarly Documents on the Public Web" published in the may issue of the journal PLoS ONE and cited in Nature.

In the work considered only English-language documents, taking into account the overlap in the two largest specialized search engines: Google Scholar and Microsoft Academic Search. Under the scientific papers are published in journals and reports from conferences, theses and dissertations, books, technical reports and working papers (preliminary versions of scientific articles).

Statistical methods showed that using the Internet is at least 114 million scholarly documents in English, of which Google Scholar has access to about 100 million at least 27 million documents (24%) are in the public domain.



The authors have adapted in their work, capture-recapture, which is commonly used in ecology to estimate the size of animal populations. There it involves catching a certain number of animals marked and released into the wild. Then, by re-trapping in the same area. Scientists estimate the percentage of banded animals in the second sample — and make a rough estimate of total population size by a simple formula.

Research Giles has practical meaning for him as a developer. In 1997 he and his colleagues released an open search engine for scientific CiteSeer documents, mainly from the field of computer science. The search engine took into account the citations and references in documents to build the index based ranking. It is believed that this is the first automatic indexing of citations, predecessor of such tools as Google Scholar and Microsoft Academic Search.

In 2008 he released a new version of CiteSeerX in which the subject has expanded on the physics, Economics, medicine and other scientific areas. Giles tries to estimate what infrastructure is needed to index the documents in each industry.



Giles highlights the fact that 24% of all documents are freely available online, in the form of direct links to the documents via Google Scholar (in computer science the percentage of freely available documents 50%). The Professor also notes that the documents in open access are cited more often and have more weight.

Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Performance comparison of hierarchical models, Django and PostgreSQL

google life search

Transport Tycoon Deluxe / Emscripten part 2