Study of Algorithms and Techniques of Web Search Engines Indexing
DOI:
https://doi.org/10.32628/IJSRST25126323Abstract
In this paper, we compressing the huge index of a Web Search Engine (WSE) entails a better utilization of memory hierarchies and thus a lower query processing time. During the last years several works addressed the problem of index compression. The majority of them focused on devising effective and efficient methods to encode the document identifiers (DocIDs) contained in the posting lists of Inverted File (IF) indexes[1-5]. Since posting lists are ordered sequences of integer DocID values, and are usually accessed by scanning them from the beginning, these lists are stored as sequences of d-gaps, i.e. differences between successive DocID values.
Downloads
References
L. Adamic, R. Lukose, A. Puniyani, and B. Huberman. Search in Power–Law Networks. Available at http://www.parc.xerox.com/istl/groups/iea/papers/plsearch/,2001. DOI: https://doi.org/10.1103/PhysRevE.64.046135
Charu C. Aggarwal, Fatima Al-Garawi, and Philip S. Yu. Intelligent Crawling on the World Wide Web with Arbitrary Predicates. In Proceedings of the World Wide Web 2001 (WWW10), pages 96–105, 2001. DOI: https://doi.org/10.1145/371920.371955
Virg´ılio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira. Characterizing reference locality in the WWW. In Proceedings of the IEEE Conference on Parallel and Distributed Information Systems (PDIS), Miami Beach, FL, 1996.
Brian Amento, Loren G. Terveen, and Willuam C. Hill. Does “Authority” mean quality? Predicting expert quality ratings of web documents. Research and Development in Information Retrieval, pages 296–303, 2000. DOI: https://doi.org/10.1145/345508.345603
C.J. Van Rijsbergen. Information Retrieval. Butterworths, 1979. Available at http://www.dcs.gla.ac.uk/Keith/Preface.html.
Freeweb web site. http://freenet.sourceforge.net.
H. Williams and J. Zobel. Compressinog integers for fast file access. Computer Journal, 42(3):193–201, 1999. DOI: https://doi.org/10.1093/comjnl/42.3.193
Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing Gigabytes – Compressing and Indexing Documents and Images. Morgan Kaufmann Publishing, San Francisco, second edition edition, 1999.
Y. Xie and D. O’Hallaron. Locality in search engine queries and its implications for caching. In Proceedings of IEEE INFOCOM 2002, The 21st Annual Joint Conference of the IEEE Computer and Communications Societies, 2002.
Xu, Jinxi, and W.B. Croft. Effective Retrieval with Disributed Collections. In Proceedings of SIGIR98 conference,
C. Hoelscher. How internet experts search for information on the web. Paper presented at the World Conference of the World Wide Web, Internet, and Intranet, Orlando, FL, 1998.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Science and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.
https://creativecommons.org/licenses/by/4.0