<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Latent Semantic Indexing</title>
	<atom:link href="http://www.kryogenix.org/days/2003/04/28/semantic/feed" rel="self" type="application/rss+xml" />
	<link>http://www.kryogenix.org/days/2003/04/28/semantic</link>
	<description>scratched tallies on the prison wall</description>
	<pubDate>Wed, 03 Dec 2008 23:58:36 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.5</generator>
		<item>
		<title>By: Mark Guckeyson</title>
		<link>http://www.kryogenix.org/days/2003/04/28/semantic#comment-2454</link>
		<dc:creator>Mark Guckeyson</dc:creator>
		<pubDate>Thu, 01 Jan 1970 01:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.kryogenix.org/adpb/2003/04/28/semantic/#comment-2454</guid>
		<description>&lt;p&gt;Is there any chance I could see the code behind your implementation, Todd? I have something up and running, but everything that I search returns all docs with a relevance of 1 in each case; clearly not desired behavior. It also returns nothing on words that are in the actual $self-&gt;{ word_list }, which seems very wrong as well.&lt;/p&gt;&lt;p&gt;The perl.com article has a few issues in it, btw&#8230; most notably the line on the final script that uses a method not available in the class:  &lt;code&gt;$engine-&#62;set_threshold( 0.8 );&lt;/code&gt;&lt;/p&gt;&lt;p&gt;It&#8217;s a fascinating concept and I&#8217;m looking forward to seeing it return &#8220;real&#8221; results, but my non-math-genius head is having some trouble wrapping around the basic concepts.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Is there any chance I could see the code behind your implementation, Todd? I have something up and running, but everything that I search returns all docs with a relevance of 1 in each case; clearly not desired behavior. It also returns nothing on words that are in the actual $self->{ word_list }, which seems very wrong as well.</p>
<p>The perl.com article has a few issues in it, btw&#8230; most notably the line on the final script that uses a method not available in the class:  <code>$engine-&gt;set_threshold( 0.8 );</code></p>
<p>It&#8217;s a fascinating concept and I&#8217;m looking forward to seeing it return &#8220;real&#8221; results, but my non-math-genius head is having some trouble wrapping around the basic concepts.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Maciej Ceglowski</title>
		<link>http://www.kryogenix.org/days/2003/04/28/semantic#comment-2455</link>
		<dc:creator>Maciej Ceglowski</dc:creator>
		<pubDate>Thu, 01 Jan 1970 01:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.kryogenix.org/adpb/2003/04/28/semantic/#comment-2455</guid>
		<description>&lt;p&gt;From what I can tell, Autonomy uses Bayesian networks, which is a rather different model than the vector-space approach underlying &lt;span class="caps"&gt;LSI&lt;/span&gt;.Colin is right to say that the key step in &lt;span class="caps"&gt;LSI&lt;/span&gt; is the dimensionality reduction, which squishes things down and creates the expanded-recall magic.You might want to take a look at Search::ContextGraph for a different approach that gives similar behavior to &lt;span class="caps"&gt;LSI&lt;/span&gt;, without all the overhead of the vector stuff.  And send in patches :-)&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>From what I can tell, Autonomy uses Bayesian networks, which is a rather different model than the vector-space approach underlying <span class="caps">LSI</span>.Colin is right to say that the key step in <span class="caps">LSI</span> is the dimensionality reduction, which squishes things down and creates the expanded-recall magic.You might want to take a look at Search::ContextGraph for a different approach that gives similar behavior to <span class="caps">LSI</span>, without all the overhead of the vector stuff.  And send in patches :-)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Todd Larason</title>
		<link>http://www.kryogenix.org/days/2003/04/28/semantic#comment-2456</link>
		<dc:creator>Todd Larason</dc:creator>
		<pubDate>Thu, 01 Jan 1970 01:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.kryogenix.org/adpb/2003/04/28/semantic/#comment-2456</guid>
		<description>&lt;p&gt;I&#8217;m using a search engine trivially built on top of VectorSpace.pm from the perl.com story (with one change[1] to avoid a warning; still not sure this is the right fix).I&#8217;ve been pretty happy with it after not quite 2 months use.[1]-               my $offset = $self-&gt;{&#8216;word_index&#8216;}-&gt;{$w};-               index( $vector, $offset ) .= $value;+               if (defined($w) &#38;&#38; defined($self-&gt;{word_index}-&gt;{$w})) {+                   my $offset = $self-&gt;{&#8216;word_index&#8216;}-&gt;{$w};+                   index( $vector, $offset ) .= $value;+               }&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>I&#8217;m using a search engine trivially built on top of VectorSpace.pm from the perl.com story (with one change[1] to avoid a warning; still not sure this is the right fix).I&#8217;ve been pretty happy with it after not quite 2 months use.[1]-               my $offset = $self->{&#8216;word_index&#8216;}->{$w};-               index( $vector, $offset ) .= $value;+               if (defined($w) &#38;&#38; defined($self->{word_index}->{$w})) {+                   my $offset = $self->{&#8216;word_index&#8216;}->{$w};+                   index( $vector, $offset ) .= $value;+               }</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: colin_zr</title>
		<link>http://www.kryogenix.org/days/2003/04/28/semantic#comment-2457</link>
		<dc:creator>colin_zr</dc:creator>
		<pubDate>Thu, 01 Jan 1970 01:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.kryogenix.org/adpb/2003/04/28/semantic/#comment-2457</guid>
		<description>&lt;p&gt;I&#8217;m not an expert at all, so I may be confusing matters here, but I think your description misses the most important aspect of &lt;span class="caps"&gt;LSI&lt;/span&gt;, which is that the &lt;span class="caps"&gt;SVD&lt;/span&gt; stage finds similarities between terms, allowing searches to return results which do not contain the terms in the query. Your description sounds like a standard vector-space system without &lt;span class="caps"&gt;LSI&lt;/span&gt;.As far as Autonomy is concerned, I&#8217;ve always got the impression that the important thing about their systems is the user-interface and the degree of integration into whatever else it is that you&#8217;re doing, rather than the particular algorithms they use. (I also somewhat distrust them generally&#8212;though you probably shouldn&#8217;t base your buying decisions on the hunches of random readers of your site. :) )&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>I&#8217;m not an expert at all, so I may be confusing matters here, but I think your description misses the most important aspect of <span class="caps">LSI</span>, which is that the <span class="caps">SVD</span> stage finds similarities between terms, allowing searches to return results which do not contain the terms in the query. Your description sounds like a standard vector-space system without <span class="caps">LSI</span>.As far as Autonomy is concerned, I&#8217;ve always got the impression that the important thing about their systems is the user-interface and the degree of integration into whatever else it is that you&#8217;re doing, rather than the particular algorithms they use. (I also somewhat distrust them generally&#8212;though you probably shouldn&#8217;t base your buying decisions on the hunches of random readers of your site. :) )</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: colin_zr</title>
		<link>http://www.kryogenix.org/days/2003/04/28/semantic#comment-2458</link>
		<dc:creator>colin_zr</dc:creator>
		<pubDate>Thu, 01 Jan 1970 01:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.kryogenix.org/adpb/2003/04/28/semantic/#comment-2458</guid>
		<description>&lt;p&gt;I&#8217;m not an expert at all, so I may be confusing matters here, but I think your description misses the most important aspect of &lt;span class="caps"&gt;LSI&lt;/span&gt;, which is that the &lt;span class="caps"&gt;SVD&lt;/span&gt; stage finds similarities between terms, allowing searches to return results which do not contain the terms in the query. Your description sounds like a standard vector-space system without &lt;span class="caps"&gt;LSI&lt;/span&gt;.As far as Autonomy is concerned, I&#8217;ve always got the impression that the important thing about their systems is the user-interface and the degree of integration into whatever else it is that you&#8217;re doing, rather than the particular algorithms they use.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>I&#8217;m not an expert at all, so I may be confusing matters here, but I think your description misses the most important aspect of <span class="caps">LSI</span>, which is that the <span class="caps">SVD</span> stage finds similarities between terms, allowing searches to return results which do not contain the terms in the query. Your description sounds like a standard vector-space system without <span class="caps">LSI</span>.As far as Autonomy is concerned, I&#8217;ve always got the impression that the important thing about their systems is the user-interface and the degree of integration into whatever else it is that you&#8217;re doing, rather than the particular algorithms they use.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Simon Willison</title>
		<link>http://www.kryogenix.org/days/2003/04/28/semantic#comment-2459</link>
		<dc:creator>Simon Willison</dc:creator>
		<pubDate>Thu, 01 Jan 1970 01:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.kryogenix.org/adpb/2003/04/28/semantic/#comment-2459</guid>
		<description>&lt;p&gt;I find this particularly fascinating because I did a course on vectors last term at Uni and was given the impression that they were mainly used for graphics related stuff &#8211; seeing how they could be used for seach as well was really interesting.&lt;/p&gt;
 
</description>
		<content:encoded><![CDATA[<p>I find this particularly fascinating because I did a course on vectors last term at Uni and was given the impression that they were mainly used for graphics related stuff &#8211; seeing how they could be used for seach as well was really interesting.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
