Latent Semantic Indexing

Latent Semantic Indexing is a newish technique for indexing documents; essentially, you get a list of all the important words in all your documents, then build an n-dimensional space (where n is the number of words in the list), and each document takes a place in the space according to whether it contains each of the words or not. You can then compare two documents for similarity by looking at each document’s vector in the space (the line between the origin and the document’s point) and seeing how small the cosine of the angle between the two vectors is.

This is basically how Autonomy works. I’m interested in this because, if it can be implemented, I don’t have to buy Autonomy :)

I'm currently available for hire, to help you plan, architect, and build new systems, and for technical writing and articles. You can take a look at some projects I've worked on and some of my writing. If you'd like to talk about your upcoming project, do get in touch.

More in the discussion (powered by webmentions)

  • (no mentions, yet.)