标题中无法忽略标点符号 - Solr 5和Drupal [英] Can't ignore punctuation in titles - Solr 5 and Drupal

查看:149
本文介绍了标题中无法忽略标点符号 - Solr 5和Drupal的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

另外一周我发布了一个关于从Drupal中的solr搜索中删除标点符号的问题。那就是使用Solr 4.然而,从那时起,我正在做的开发工作已经从solr4变成了solr 5,现在我遇到了同样的问题,但是在无法删除Solr中的标点符号不再有效。由于许多内容标题有引号,所以会导致问题排除。

The other week I posted a question about removing punctuation from solr search in Drupal. That was using Solr 4. However, since then the development I am doing has changed from solr 4 to solr 5, and now I am having the same problem but the fix at Can't remove punctuation in Solr no longer works. This causes problems when sorting by titles since a lot of content titles have quotes around.

<field name="label" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            />
    <filter class="solr.WordDelimiterFilterFactory"
            protected="protwords.txt"
            generateWordParts="1"
            generateNumberParts="1"
            catenateWords="1"
            catenateNumbers="1"
            catenateAll="0"
            splitOnCaseChange="0"
            preserveOriginal="1"/>
    <filter class="solr.LengthFilterFactory" min="2" max="100" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

我尝试添加以下规则,但是撇号和引号在那里固执地停留,并且会在排序时干扰标题,首先在列表中添加引号。

I've tried adding the following rules but apostrophes and quotation marks stay there stubbornly and interfere when sorting by titles, putting anything with quotes at the beginning first on the list.

    <charFilter class="solr.HTMLStripCharFilterFactory" />
    <filter class="solr.ApostropheFilterFactory"/>
    <filter class="solr.PatternReplaceFilterFactory"
        pattern="^\p{Punct}*(.*?)\p{Punct}*$"
        replacement="$1"/>


推荐答案

我尝试的所有Solr解决方案都是无效的,所以我从Drupal方面解决了这个问题,原来是很简单的。下面的代码替换所有特殊字符和数字,将字符串转换为小写,然后将其添加到solr文档。第二个函数将它添加到可用的排序方法中。

All of the Solr solutions I tried were ineffective unfortunately, so I solved it from the Drupal side which turned out to be a lot simpler. The code below replaces all special characters and numbers, turns the string to lowercase and then adds it to the solr document. The second function adds it to the available sort methods.

function my_module_apachesolr_index_document_build(ApacheSolrDocument $document, $entity, $entity_type, $env_id) {

      # to keep letters only
      $title = trim($entity->title);
      $title = str_replace(' ', '_', $title);
      $title = preg_replace('/[^a-z]+/i', '', $title);
      $title = strtolower($title);
      $document->addField('ss_new_sort',$title);

}

function my_module_apachesolr_query_prepare(DrupalSolrQueryInterface $query) {
      $query->setAvailableSort('ss_new_sort', array('title' => t('Title'), 'default' => 'asc'));
}

这篇关于标题中无法忽略标点符号 - Solr 5和Drupal的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆