Search Directories   «Prev  Next»

Lesson 1

How Search Engines and Directories work

Search engine directories operate on a fundamentally different premise compared to search engines that utilize web crawlers. The theory underpinning search engine directories is predicated on human oversight and hierarchical categorization rather than algorithmic web crawling and indexing. Below is an exploration of this theory:

Human-Curated Categorization:

At the core of search engine directories is the principle of human-mediated categorization. Websites are not automatically crawled but are instead submitted by site owners or identified by human editors. These editors review submissions and categorize them based on content and subject matter. This process aims to ensure a high level of quality and relevance because human judgment is used to screen for authority, accuracy, and value.

Hierarchical Organization:

Search engine directories employ a hierarchical structure to organize information, akin to a digital library's system of classification. Websites are arranged into categories and subcategories. This taxonomy facilitates a more intuitive search process for users who can navigate through layers of categories to find the type of sites they are interested in, from general to specific topics.

Quality Control:

Unlike algorithm-driven search engines, directories maintain quality by being selective about the websites they include. Human editors can assess the credibility and relevance of a site, excluding those of poor quality or those that do not meet the directory's guidelines. This vetting process is designed to provide a directory of websites that are trustworthy and substantive.

Search Methodology:

When a user queries a search engine directory, the system does not dynamically crawl the web to find new content. Instead, it searches its pre-defined categories to find matches within its curated list of sites. This means the results are limited to what has been reviewed and included, emphasizing quality over quantity.

Directory-Based Ranking:

In search engine directories, the concept of 'ranking' differs from search engines that use complex algorithms. The placement of a website within a directory is more static, based on the category it has been assigned to rather than a continuously updated ranking score. Some directories may prioritize sites within categories by additional criteria, such as user ratings or editorial preference, but these are typically less fluid than algorithmic rankings.

Evolution and Integration:

It is important to note that the strict division between search engine directories and crawling search engines has evolved over time. Many traditional directories have integrated algorithmic search capabilities to enhance the breadth of their search services, while algorithmic search engines have adopted aspects of human curation for certain functions, such as featured snippets or verified listings. Search engine directories are grounded in a theory of structured, human-mediated content curation and organization. While their prominence has declined with the rise of powerful algorithm-based search engines, the principles of a directory approach—human curation, hierarchical classification, and emphasis on quality, remain relevant, particularly in niche or specialized search applications where the trustworthiness and quality of content are paramount.


Traditional Web Searching

Up to this point in the course, you have not done any searching (unless you have tried a search with some of the sites you visited in the last module). Now that you have been introduced to different search services and to some challenges of searching, you can begin to practice some searches. The searching exercises in this module let you compare different categories of information retrieval services and different services in the same category. This module will also discuss in more detail the concepts and functions of directories and search engines, including their advantages and disadvantages compared to other information retrieval service categories, how they can complement each other, and why one may be more appropriate than the other in a particular search.
An understanding of how each type of search service functions will help you to create more effective search strategies.
After completing this module, you will be able to:
  1. Describe how directories are created and organized, their advantages and limitations
  2. Describe how a search engine creates and maintains its database of sites
  3. Ask a search engine to find information with a search query
  4. Explain how a search engine's database affects your results

Search Engine Functions

Search engines fundamentally do three things:
  1. ingest content,
  2. return content matching incoming queries, and
  3. sort the returned content based upon some measure of how well it matches the query.
Relevance is the term used to describe this notion of "how well the content matches the query". Most of the time the matched content is documents, and the returned and ranked content is those matched documents along with some corresponding metadata describing the documents.
In most search engines, the default relevance sorting is based upon a score indicating how well each keyword in a query matches the same keyword in each document, with the best matches yielding the highest relevance score and returned at the top of the search results. The relevance calculation is highly configurable, however, and can be easily adjusted on a per-query-basis in order to enable very sophisticated ranking behavior.
In this module, we will provide an overview of how relevance is calculated, how the relevance function can be easily controlled and adjusted through function queries, and how to implement popular domain-specific and user-specific relevance ranking features. We’ll start by looking at how ranking actually works.

Click the link below to consider what makes using a search engine or directory an easy or a difficult experience.
How search engines work
Search Engine Optimization (SEO) is the activity of optimizing web pages or whole sites in order to make them search engine friendly, thus getting higher positions in search results. This tutorial explains simple SEO techniques to improve the visibility of your web pages for different search engines, especially for Google, Yahoo, and Bing.

How does a Search Engine Work?

Search engines perform several activities in order to deliver search results.
  1. Crawling: Process of fetching all the web pages linked to a website. This task is performed by a software called a crawler or a spider (or Googlebot, in case of Google).
  2. Indexing: Process of creating index for all the fetched web pages and keeping them into a giant database from where it can later be retrieved. Essentially, the process of indexing is identifying the words and expressions that best describe the page and assigning the page to particular keywords.
  3. Processing: When a search request comes, the search engine processes it, i.e., it compares the search string in the search request with the indexed pages in the database.
  4. Calculating Relevancy: It is likely that more than one page contains the search string, so the search engine starts calculating the relevancy of each of the pages in its index to the search string.
  5. Retrieving Results: The last step in search engine activities is retrieving the best matched results. Basically, it is nothing more than simply displaying them in the browser.
Search engines such as Google often update their search algorithms several times per month. When you see changes in your rankings, it is due to a new algorithm being implemented.