Search

Filter

  • Advanced Filters:

  • to
  • Specific Data Sources:

    All Edit

    Select All  |  Select None

Reset filters

S o m e Thoughts on Similarity Measures • Robert Korfhage Dept. of Information Science University of Pittsburgh Pittsburgh, PA 15260 korfhage @lis.pitt.edu Here are three thoughts on similarity measures for the vector model of retrieval. T h e y are meant primarily to open discussion, and to perhaps stimulate some thinking and research. Intrinsic versus extrinsic measures. T h e m o s t c o m m o n l y used similarity measures are probably the cosine m e a s u r e and a variety of m e a s u r e s based on a distance calculation. Putting aside q u e s t i o n s o f orthogonality (which m a y be important), it seems that these two kinds of measures have very different characteristics. The cosine measure is an "extrinsic" measure: it measures the similarity of d o c u m e n t s by the angle between their vectors. But the vectors are drawn from the origin, <0,0,..0> of the d o c u m e n t space. That is, the measure makes use of an external reference point, and when that point is changed, the m e a s u r e o f d o c u m e n t similarity changes, even though the documents in question have not. A distance-based measure, however, refers only to the two d o c u m e n t points in the space. It is thus "intrinsic," with a value that is independent of where the origin or any other point in the space is located. • It is clear that the subspace consisting of documents "close" to a given d o c u m e n t is very different for the two kinds of measures. Whether this is significant, and if so, which type of measure is appropriate in a given situation, is not known. Mixed measures. Because the cosine measure is a monotonic function of the absolute value of the angle (at least within the range of angles used in information retrieval), there is really little reason to use the cosine rather than the angular measure itself. As a measure, the angle is a pseudometric, that is, it satisfies all of the metric axioms except that the angle between two d o c u m e n t vectors can be zero even if the documents are not identical. It is easy to show that the sum of a metric and a pseudometric is a metric. Hence we could consider (distance + angle)(D1, D2) as a measure of similarity between D1 and D2. This is interesting: it is a distance measure, hence is in some sense "intrinsic" as defined above. Yet a portion of it is also "extrinsic," depending on the point from w h i c h the angle is measured. It appears that for documents far from the origin the distance c o m p o n e n t of this measure is dominant, while for documents close to the origin the angular c o m p o n e n t is the main similarity measure. How does this relate to retrieval? Angles and reference points. Traditionally the cosine measure is developed with respect to the origin of the d o c u m e n t space. N o w suppose that we have a query, Q, and an additional reference point, R, perhaps a k n o w n document. Consider a d o c u m e n t D. We could measure the angles between the d o c u m e n t ' vector and the Q vector, and between wish), then figure out s o m e way to c o m b i n e these two measures. Alternatively, we could measure the angle between the Q-D vector and the Q-R vector, and use that as a partial basis for retrieval. It is only partial because this says nothing about the similarity of the d o c u m e n t to the query. For that we could use a distance measure, or perhaps the usual cosine measure. This certainly changes the basis for retrieval decisions, and may possibly improve retrieval.

Page 1 of 1

Page 1 of 1

Toggle back to continuous viewing mode

/lp/association-for-computing-machinery/some-thoughts-on-similarity-measures-7AWrgQ5Uwr
Welcome to DeepDyve! Rent Premier Research Articles and Save Up to 90%

Learn more

Free Article

Bookmark

Some thoughts on similarity measures

Korfhage, Robert
ACM SIGIR Forum , Volume 29 (1)
Association for Computing MachineryMar 21, 1995

More Info

More Like This Article

View All dataSource[]=actageo&dataSource[]=aspet&dataSource[]=aaos&dataSource[]=aacc&dataSource[]=aacr&dataSource[]=aea&dataSource[]=aip&dataSource[]=ajnr&dataSource[]=ams&dataSource[]=aps_physical&dataSource[]=appi_book&dataSource[]=appi_journal&dataSource[]=apha&dataSource[]=asip&dataSource[]=asm&dataSource[]=asn&dataSource[]=aspb&dataSource[]=avs&dataSource[]=annual_reviews&dataSource[]=arxiv&dataSource[]=acm&dataSource[]=berghahn&dataSource[]=cabi&dataSource[]=clinical_trials&dataSource[]=dailymed&dataSource[]=degruyter&dataSource[]=du_press&dataSource[]=esa&dataSource[]=eu_press&dataSource[]=elsevier&dataSource[]=emerald&dataSource[]=ejtr&dataSource[]=emea&dataSource[]=epo&dataSource[]=faseb&dataSource[]=gsa&dataSource[]=health_affairs&dataSource[]=hindawi&dataSource[]=imanager&dataSource[]=imedpub&dataSource[]=informa_healthcare&dataSource[]=informs&dataSource[]=iop&dataSource[]=iucr&dataSource[]=iospress&dataSource[]=jbjs&dataSource[]=leftcoast&dataSource[]=lu_press&dataSource[]=mesharpe&dataSource[]=mary_ann_liebert&dataSource[]=medline&dataSource[]=mit_press&dataSource[]=nature&dataSource[]=oxford&dataSource[]=pier_professional&dataSource[]=pnas&dataSource[]=portlandpress&dataSource[]=psyc_articles&dataSource[]=psyc_books&dataSource[]=psyc_critiques&dataSource[]=plos_journal&dataSource[]=pubmed_central&dataSource[]=rsna&dataSource[]=rockefeller&dataSource[]=rcn&dataSource[]=ria&dataSource[]=rsc&dataSource[]=sage&dataSource[]=spie&dataSource[]=springer_journal&dataSource[]=springer&dataSource[]=taylor_francis&dataSource[]=aps&dataSource[]=the_scientist&dataSource[]=uc_press&dataSource[]=uspto_abstract&dataSource[]=wiley&dataSource[]=pct

Browse: Subject Areas | Journals | Publishers

Sign Up for a DeepDyve Account

Bookmark an Article

To bookmark an article, please log in first, or sign up for a DeepDyve account if you don't already have one.

OK

Subscribe to Journal Email Alerts

To subscribe to email alerts, please log in first, or sign up for a DeepDyve account if you don't already have one.

OK

Thank you for renting with DeepDyve

Your PayPal account has been charged $2.99. You now have access to the full text of this article. A rental receipt has also been sent to your email address.

Your credit card has been charged $2.99. You now have access to the full text of this article. A rental receipt has also been sent to your email address.

OK

New! You can now keep track of new articles from ACM SIGIR Forum on your personalized homepage! Learn more

PDF Download — Not Available

Thanks for your interest in purchasing the PDF. Your request has been noted and we will work with our publisher partner to discuss enabling this feature.

In the meantime, you can get the PDF by visiting the publisher site.

Thank you for purchasing with DeepDyve

Your PayPal account has been charged $.

Your credit card has been charged $.

You can now download this article. A purchase receipt has also been sent to your email address.

Download This Article or I'm done with my download

Print Page — Not Available

Thanks for your interest in printing individual pages. Your request has been noted and we will work with our publisher partner to discuss enabling this feature.

In the meantime, you can get the PDF by visiting the publisher site.

Thank you for printing with DeepDyve

Your PayPal account has been charged $0.

Your credit card has been charged $0.

You can now print this article. A purchase receipt has also been sent to your email address.

Print the Selected Pages or I'm done with my printing

Please refresh to generate a new download link

Your article download link has expired. Please refresh this page to obtain a new download link and try again.

Follow a Journal

To get new article updates from a journal on your personalized homepage, please log in first, or sign up for a DeepDyve account if you don't already have one.

OK