Impact of template removal on Web search DOI 10.5752/P.2316-9451.2012v1n1p28
DOI:
https://doi.org/10.5752/P.2316-9451.2012V1N1P28Abstract
Previous work in literature has indicated that template of web pages represent noisy information in web collections, and advocate that the simple removal of template result in improvements in quality of results provided by Web search systems. In this paper, we study the impact of template removal in two distinct scenarios: large scale web search collections, which consist of several distinct websites, and intrasite web collections, involving searches inside of web sites. Our work is the first in literature to study the impact of template removal to search systems in large scale Web collections. The study was carried out using an automatic template detection method previously proposed by us. As contributions, we present statistics about the application of this automatic template detection method to the well known GOV2 reference collection, a large scale Web collection. We also present experiments comparing the amount of template detected by our automatic method to the ones obtained when humans select templates. And finally, experiments which indicate that, in both experimented scenarios, template removal does not improve the quality of results provided by search systems, but can play the role of an effective loss compression method by reducing the size of their indexes.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
I (we) submit the present work, an original and unpublished manuscript, from my (our) authorship, to Abakós - Magazine of Interdisciplinary Studies on Science and Informatics, and I (we) agree that the copyright related to this work will become property of PUC Minas Publisher. No partial or full reproduction is allowed, by any means (printed or electronic), dissociated from Abakós. Any reproduction requires prior written authorization granted by the Editor.
I (we) declare there is no type of interest conflict among the subject theme, author(s), organization(s), institution(s) and person(s).
I (we) recognize that Abakós is licensed under CREATIVE COMMONS:
Licença Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0).