![]() ![]() How do we quantify the similarity between two documents in this vector space? A first attempt might consider the magnitude of the vector difference between two document vectors. This representation loses the relative ordering of the terms in each document recall our example from Section 6.2 (page ), where we pointed out that the documents Mary is quicker than John and John is quicker than Mary are identical in such a bag of words representation. ![]() The set of documents in a collection then may be viewed as a set of vectors in a vector space, in which there is one axis for each term. Unless otherwise specified, the reader may assume that the components are computed using the tf-idf weighting scheme, although the particular weighting scheme is immaterial to the discussion that follows. We denote by the vector derived from document, with one component in the vector for each dictionary term. ![]() Next: Queries as vectors Up: The vector space model Previous: The vector space model Contents Index ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |