SEOMining SEOMining


Web Development  «Prev 

Using Search Engines

Log on to a few of the following websites and experiment with different search queries to check out the results.
  1. Home Depot
  2. amazon.com
  3. Relationaldbdesign
  4. PBS

When you are exploring these sites and putting their search engines to the test, think about the following.
  1. How do the search engines differ in function and feel?
  2. What design elements make one site's search engine easier to use than that of another site? For example, was it obvious where to enter keywords?
  3. Did you get lost? If so, at what point in the search?

Search based on a user query (sometimes called ad hoc search because the range of possible queries is huge and not prespecified) is not the only text-based task that is studied in information retrieval. Other tasks include filtering, classification, and question answering. Filtering or tracking involves detecting stories of interest based on a person'sinterests and providing an alert using email or some other mechanism. Classification or categorization uses a defined set of labels or classes (such as the categories listed in the Yahoo! Directory) and automatically assigns those labels to documents.
Question answering is similar to search but is aimed at more specific questions, such as
What is the height of Mt. Everest?.
The goal of question answering is to return a specific answer found in the text, rather than a list of documents.


Information retrieval

Information retrieval researchers have focused on a few key issues that remain just as important in the era of commercial web search engines working with billions of web pages as they were when tests were done in the 1960s on document collections containing about 1.5 megabytes of text. One of these issues is relevance. Relevance is a fundamental concept in information retrieval. Loosely speaking, a relevant document contains the information that a person was looking for when she submitted a query to the search engine. Although this sounds simple, there are many factors that go into a person's decision as to whether a particular document is relevant. These factors must be taken into account when designing algorithms for comparing text and ranking documents. Simply comparing the text of a query with the text of a document and looking for an exact match, as might be done in a database system or using the grep utility in Unix, produces very poor results in terms of relevance. One obvious reason for this is that language can be used to express the same concepts in many different ways, often with very different words. This is referred to as the vocabulary mismatch problem in information retrieval. It is also important to distinguish between topical relevance and user relevance.
A text document is topically relevant to a query if it is on the same topic. For example, a news story about a tornado in Kansas would be topically relevant to the query 'severe weather events'. The person who asked the question (often called the user) may not consider the story relevant, however, if she has seen that story before, or if the story is five years old, or if the story is in Chinese from a Chinese news agency. User relevance takes these additional features of the story into account.