Zend_Search_Lucene + PHPMorphy is just

When was looking in the documentation for Zend_Search_Lucene. All is well, all is clear. Take strivi to your website. But that's no words there about how to tie stammer, or morphological analyzer to this thing. In fact, it turned out that to make friends, for example, PHPMorphy, very simple.
Actually, how to do it — under the cut.
The note will be primarily useful to developers facing the problem full-text search on the website did not stand still.
Here you will find the manual for configuring the Lucene or PHPMorphy — this information and so abound on the Internet.


So, let's start.
Before adding to the index the text into tokens. For how this happens, answer classes Zend_Search_Lucene_Analysis_Analyzer_*. On the analyzer input — text output — a list of tokens. A token is a word that is written directly in the index + its position in the document. At least I understand it that way. In addition to the analyzer has filters that convert word to, say, lower case, or do not allow words shorter than three letters.
All we need to do is to write a filter that will convert the word to a base form. This form will remain in the index. I forgot to say. All requests to the index also undergo the same procedure tokenization and filtering. Thus, the search will be carried out by the initial forms of the words that we actually need. Code below:

the
class My_PHPMorphy_TokenFilter extends Zend_Search_Lucene_Analysis_Tokenfilter
{
public function normalize(Zend_Search_Lucene_Analysis_Token $srcToken)
{
// look in Zend_Search_Lucene_Analysis_Tokenfilter_lowercaseutf8
// and do exactly the same
}
} 

$analyzer = new Zend_Search_Lucene_Analysis_Analyzer_common_utf8();

$analyzer->addFilter(new My_PHPMorphy_TokenFilter());

Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);


All. Index and display the search results to the user, as taught in the manual on Zend_Framework.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

When the basin is small, or it's time to choose VPS server

Performance comparison of hierarchical models, Django and PostgreSQL

From Tomsk to Silicon Valley and Back