SEOMining SEOMining


Search Engine Basics   «Prev  Next»

Information retrieval services

  1. Directory: Categories and references to Web sites compiled by human editors
  2. Search engine: An index compiled automatically by "spiders" that constantly explore the Web
  3. Metasearch engine: Submits your query to multiple search services
  4. Subject page: A topical collection of information, references, and links to other Web sites
  5. Link page: A Web site with links to many different sites (sometimes including search engines or directories), usually arranged in categories

Automated Information Retrieval Systems

Automated information retrieval systems were originally developed to help manage the huge scientific literature that has developed since the 1940s. Many university, corporate, and public libraries now use IR systems to provide access to books, journals, and other documents. Commercial IR systems offer databases containing millions of documents in diverse subject areas. Dictionary and encyclopedia databases are now widely available for various types of online systems.
Information retrieval has been found useful in such disparate areas as office automation and software engineering. Indeed, any discipline that relies on documents to do its work could potentially use and benefit from information retrieval.

Data structures and algorithms needed to build IR systems are beyond the scope of this website.
An information retrieval system matches user queries, which are formal statements of information needs used in order to retrieve documents stored in a database. A document is a data object, usually textual, though it may also contain other types of data such as photographs, graphs, and json.
Often, the documents themselves are not stored directly in the information retrieval system, but are represented in the system by document surrogates. This page is a document and could be stored in its entirety in an information retrieval database. One might instead, however, choose to create a document surrogate for it consisting of the title, author, and abstract. This is typically done for efficiency to reduce the size of the database and searching time. Document surrogates are also called documents. An information retrieval system must support certain basic operations. There must be a way to enter documents into a database, change the documents, and delete them. There must also be some way to search for documents and present them to a user. Information retrieval systems vary greatly in the ways they accomplish these tasks. In the next section, the similarities and differences among IR systems are discussed.