Search Directories   «Prev 

Scoring query and document vectors with cosine similarity

Previously, we demonstrated the idea of measuring the similarity of two vectors by calculating the cosine between them. We created vectors (lists of numbers), where each number represents the strength of some feature. For example, representing different food items, and we then calculated the cosine, the size of the angle between the vectors, in order to determine their similarity. We will expand upon that technique in this section, discussing how text queries and documents can map into vectors for ranking purposes. We’ll further get into some popular text-based feature weighting techniques and how they can be integrated to create an improved relevance ranking formula

Mapping text to vectors

In a typical search problem, we start with a collection of documents and we then try to rank documents based upon how well they match some user’s query. In this section, we’ll walk through the process of mapping the text of queries and documents into vectors. In the last chapter, we used the example of a search for food and beverage items, like apple juice, so let’s reuse that example here.

How search engines and directories work

If you have experience in searching, you may recall searches that were successful very quickly and returned high-quality results that were just what you wanted. You may also recall search experiences that went on for a long time and many tries, but still did not find useful, specific, or current enough results. If you are new to searching, it may seem premature to ask you about your experiences, but you are aware by now of the scale of the Web and some of the challenges in finding what you want.
Spend a minute or two reflecting on or writing about how the "perfect" search engine or directory would help you in your initial search effort and further help you if your first search was unsuccessful. Some characteristics of this ideal search engine might be:

  1. Giving you a "natural language" method of describing exactly what you want to find
  2. Offering easy-to-select menus or check boxes to add advanced searching syntax to narrow or expand your search keywords
  3. Refining your search by performing a second search through the initial results
  4. Pointing to a particular result and indicating that you want more like that result

As you perform the searching exercises in this course, and in your own explorations, refer to (or recall) your list as you look around the search site's main page (you may have to go to the Advanced Search page) to see if any of your ideal search engine characteristics are present. You will likely find at least one search site in each category that has some of what you desire.