有选择地从Sitecore的Lucene搜索索引中排除项目-在使用IndexViewer重建时有效,但在使用Sitecore的内置工具时无效 [英] Excluding items selectively from Sitecore's Lucene search index - works when rebuilding with IndexViewer, but not when using Sitecore's built-in tools

查看:110
本文介绍了有选择地从Sitecore的Lucene搜索索引中排除项目-在使用IndexViewer重建时有效,但在使用Sitecore的内置工具时无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在由Sitecore 6.2驱动的网站上,我需要为用户提供从搜索结果中选择性排除项目的功能。

On a site powered by Sitecore 6.2, I need to give the user the ability to selectively exclude items from search results.

为实现这一点,我添加了一个复选框字段包含在搜索结果中,我创建了一个自定义数据库搜寻器来检查该字段的值:

To accomplish this, I have added a checkbox field entitled "Include in Search Results", and I created a custom database crawler to check that field's value:

〜\App_Config\Include\Search Indexes\ \Website.config:

~\App_Config\Include\Search Indexes\Website.config:

<search>
  <configuration type="Sitecore.Search.SearchConfiguration, Sitecore.Kernel" singleInstance="true">
    <indexes hint="list:AddIndex">
      <index id="website" singleInstance="true" type="Sitecore.Search.Index, Sitecore.Kernel">
        ...

        <locations hint="list:AddCrawler">
          <master type="MyProject.Lib.Search.Indexing.CustomCrawler, MyProject">
            ...
          </master>

          <!-- Similar entry for web database. -->
        </locations>
      </index>
    </indexes>
  </configuration>
</search>

〜\Lib\Search\Indexing\CustomCrawler.cs:

~\Lib\Search\Indexing\CustomCrawler.cs:

using Lucene.Net.Documents;
using Sitecore.Search.Crawlers;
using Sitecore.Data.Items;

namespace MyProject.Lib.Search.Indexing
{
  public class CustomCrawler : DatabaseCrawler
  {
    /// <summary>
    ///   Determines if the item should be included in the index.
    /// </summary>
    /// <param name="item"></param>
    /// <returns></returns>
    protected override bool IsMatch(Item item)
    {
      if (item["include in search results"] != "1")
      {
        return false;
      }

      return base.IsMatch(item);
    }
  }
}

有趣的是,如果我使用Index Viewer应用程序重建索引,所有操作均正常。未选中包括在搜索结果中复选框的项目将不包含在搜索索引中。

What's interesting is, if I rebuild the index using the Index Viewer application, everything behaves as normal. Items whose "Include in Search Results" checkbox is not checked will not be included in the search index.

但是,当我在Sitecore控制面板中使用搜索索引重建器时应用程序或IndexingManager自动更新搜索索引时,所有项目都会被包括在内,无论其包括在搜索结果中复选框的状态如何。

However, when I use the search index rebuilder in the Sitecore Control Panel application or when the IndexingManager auto-updates the search index, all items are included, regardless of the state of their "Include in Search Results" checkbox.

我也在我的自定义搜寻器类中设置了多个断点,当我使用内置索引器重建搜索索引时,应用程序从不命中任何一个。当我使用Index Viewer时,它确实会达到我设置的所有断点。

I've also set numerous breakpoints in my custom crawler class, and the application never hits any of them when I rebuild the search index using the built-in indexer. When I use Index Viewer, it does hit all the breakpoints I've set.

我如何获得Sitecore的内置索引编制程序来尊重我的包含在搜索结果中复选框?

How do I get Sitecore's built-in indexing processes to respect my "Include in Search Results" checkbox?

推荐答案

昨天我与Alex Shyba进行了交谈,我们能够弄清楚发生了什么。我的配置存在一些问题,导致一切无法正常工作:

I spoke with Alex Shyba yesterday, and we were able to figure out what was going on. There were a couple of problems with my configuration that was preventing everything from working correctly:


  • 正如塞思所指出的,有两个截然不同的地方在Sitecore中搜索API。我的配置文件同时使用了它们。要使用更新的API,只需设置 sitecore / search / configuration 部分(除了我在OP中发布的内容,我还在 sitecore / indexes sitecore / databases / database / indexes 中添加索引,这是不正确的)。

  • As Seth noted, there are two distinct search APIs in Sitecore. My configuration file was using both of them. To use the newer API, only the sitecore/search/configuration section needs to be set up (In addition to what I posted in my OP, I was also adding indexes in sitecore/indexes and sitecore/databases/database/indexes, which is not correct).

我应该覆盖了 AddItem,而不是覆盖 IsMatch() ()。由于Lucene的工作方式,您无法就地更新文档;相反,您必须先删除它,然后添加更新的版本。

Instead of overriding IsMatch(), I should have been overriding AddItem(). Because of the way Lucene works, you can't update a document in place; instead, you have to first delete it and then add the updated version.

Sitecore.Search.Crawlers.DatabaseCrawler.UpdateItem()运行,它检查 IsMatch()以查看是否应删除并重新添加该项目。如果 IsMatch()返回false,则该项目不会从索引中删除,即使它不应该放在首位

When Sitecore.Search.Crawlers.DatabaseCrawler.UpdateItem() runs, it checks IsMatch() to see if it should delete and re-add the item. If IsMatch() returns false, the item won't be removed from the index even if it shouldn't be there in the first place.

通过覆盖 AddItem(),我能够指示搜寻器是否应将该项目添加到索引中其现有文件已被删除后。更新后的类如下所示:

By overriding AddItem(), I was able to instruct the crawler whether the item should be added to the index after its existing documents had already been removed. Here is what the updated class looks like:

〜\Lib\Search\Indexing\CustomCrawler.cs:

~\Lib\Search\Indexing\CustomCrawler.cs:

using Sitecore.Data.Items;
using Sitecore.Search;
using Sitecore.Search.Crawlers;

namespace MyProject.Lib.Search.Indexing
{
  public class CustomCrawler : DatabaseCrawler
  {
    protected override void AddItem(Item item, IndexUpdateContext context)
    {
      if (item["include in search results"] == "1")
      {
        base.AddItem(item, context);
      }
    }
  }
}


Alex还指出我的某些可伸缩性设置不正确。具体来说:

Alex also pointed out that some of my scalability settings were incorrect. Specifically:


  • InstanceName 设置为空,这可能会导致问题在临时名称(云)实例上,计算机名称可能在两次执行之间更改。我们将每个实例上的此设置更改为具有恒定且不同的值(例如, CMS CD )。

  • The InstanceName setting was empty, which can cause problems on ephemeral (cloud) instances where the machine name might change between executions. We changed this setting on each instance to have a constant and distinct value (e.g., CMS and CD).

Indexing.ServerSpecificProperties 设置必须 true ,以便每个实例都保留其上一次更新搜索索引的时间记录。

The Indexing.ServerSpecificProperties setting needs to be true so that each instance maintains its own record of when it last updated its search index.

EnableEventQueues 设置必须为 true ,以防止搜索索引编制和缓存刷新过程之间出现竞争状况。

The EnableEventQueues setting needs to be true to prevent race conditions between the search indexing and cache flush processes.

在开发过程中,应将 Indexing.UpdateInterval 设置为较小的值(例如, 00:00:15 )。这对生产环境不是很好,但是可以减少对搜索索引问题进行故障排除时的等待量。

When in development, the Indexing.UpdateInterval should be set to a relatively small value (e.g., 00:00:15). This is not great for production environments, but it cuts down on the amount of waiting you have to do when troubleshooting search indexing problems.

确保历史记录引擎为每个Web数据库(包括远程发布目标)打开:

Make sure the history engine is turned on for each web database, including remote publishing targets:

<database id="production">
  <Engines.HistoryEngine.Storage>
    <obj type="Sitecore.Data.$(database).$(database)HistoryStorage, Sitecore.Kernel">
      <param connectionStringName="$(id)" />
      <EntryLifeTime>30.00:00:00</EntryLifeTime>
    </obj>
  </Engines.HistoryEngine.Storage>
  <Engines.HistoryEngine.SaveDotNetCallStack>false</Engines.HistoryEngine.SaveDotNetCallStack>
</database>


要在CD实例上手动重建搜索索引,由于无法访问Sitecore后端,因此我还安装了 RebuildDatabaseCrawlers.aspx (从本文)。

To manually rebuild the search indexes on CD instances, since there is no access to the Sitecore backend, I also installed RebuildDatabaseCrawlers.aspx (from this article).

这篇关于有选择地从Sitecore的Lucene搜索索引中排除项目-在使用IndexViewer重建时有效,但在使用Sitecore的内置工具时无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆