Sunspot 如何修改 Solr 的 schema.xml?它会修改它吗? [英] How does Sunspot modify Solr's schema.xml? Does it modify it at all?

查看:49
本文介绍了Sunspot 如何修改 Solr 的 schema.xml?它会修改它吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我错了,请告诉我,但我认为 solr 只需要 schema.xml 中已经提到的字段.因此,如果我有一个名为title"的字段,我需要在架构中提及这一点.

Let me know if I am wrong, but I think solr only expects fields that are already mentioned in the schema.xml. So, if I have a field called 'title', I need to mention this in the schema.

Sunspot 的文档中没有提到修改 schema.xml.我只是想知道 Sunspot 如何修改 schema.xml 以允许将自定义字段输入到索引中.

There is no mentioning about modifying the schema.xml in the Sunspot's documentation. I am just wondering how Sunspot modifies schema.xml allowing custom fields to be entered to the index.

我也知道 Sunspot 使用 RSolr 来做事.因此,如果有一种方法可以使用 RSolr 修改架构并将数据从 DB 重新加载到 Solr,请告诉我.

I also know that Sunspot uses RSolr to do things. So if there is a way to modify the schema and reload data from DB to Solr using RSolr, please let me know.

推荐答案

正如 karmajunkie 所暗示的那样,Sunspot 使用自己的标准架构.我将在此处更详细地介绍其工作原理.

As karmajunkie alludes to, Sunspot uses its own standard schema. I'll go in to how that works in a bit more detail here.

就本次讨论而言,Solr 模式主要由两部分组成:类型定义和字段定义.

For the purposes of this discussion, Solr schemas are mostly comprised of two things: type definitions, and field definitions.

type 定义通过指定类型的名称、类型的 Java 类以及在某些类型(特别是文本)的情况下,配置该类型的从属 XML 块来设置类型处理.

A type definition sets up a type by specifying its name, the Java class for the type, and in the case of some types (notably text), a subordinate block of XML configuring how that type is handled.

field 定义允许您定义字段的名称,以及该字段中包含的值类型的名称.这允许 Solr 将文档中字段的名称与其类型以及一些其他选项相关联,从而在索引中处理该字段的值.

A field definition allows you to define the name of a field, and the name of the type of value contained in that field. This allows Solr to correlate the name of a field in a document with its type, and a handful of other options, and thus how that field's value should be processed in your index.

Solr 还支持 dynamicField 定义,它可以代替静态字段名称,让您指定一个包含 glob 的模式.传入字段的名称可以与这些模式匹配,以确定它们的类型.

Solr also supports a dynamicField definition, which, instead of a static field name, lets you specify a pattern with a glob in it. Incoming fields can have their names matched against these patterns in order to determine their types.

Sunspot 的架构有一些field 定义用于内部使用的字段,例如 ID 和模型名称.此外,Sunspot 自由地使用 dynamicField 定义来建立基于类型的命名约定.

Sunspot's schema has a handful of field definitions for internally used fields, such as the ID and model name. Additionally, Sunspot makes liberal use of dynamicField definitions to establish naming conventions based on types.

这种字段命名约定的使用允许 Sunspot 定义一个配置 DSL,该 DSL 创建从您的模型到 XML 文档的映射,该文档准备被 Solr 索引.

This use of field naming conventions allows Sunspot to define a configuration DSL that creates a mapping from your model into an XML document ready to be indexed by Solr.

例如,模型中的这个简单配置块...

For example, this simple configuration block in your model…

searchable do
  text :body
end

...将被 Sunspot 用于创建 body_text 的字段名称.此字段名称与架构中以下 dynamicField 定义的 *_text 模式匹配:

…will be used by Sunspot to create a field name of body_text. This field name is matched against the *_text pattern for the following dynamicField definition in the schema:

<dynamicField name="*_text" type="text" indexed="true" stored="false" multiValued="true"/>

这会将带有后缀 _text 的任何字段映射到 Sunspot 对 text 类型的定义.如果您查看 Sunspot 的 schema.xml,您会看到许多其他类型和选项的类似约定.:stored =>例如,true 选项通常会在该类型的后缀上添加 s(例如,_texts).

This maps any field with the suffix _text to Sunspot's definition of the text type. If you take a look at Sunspot's schema.xml, you'll see many other similar conventions for other types and options. The :stored => true option, for example, will typically add an s on that type's suffix (e.g., _texts).

根据我对客户和我自己的项目的经验,有两个很好的案例可以修改 Sunspot 的架构.首先,根据您的应用程序可能需要的不同功能对 text 字段的分析器进行更改.其次,用于为 Solr 分析器的更细粒度应用创建全新的类型(通常基于文本类型).

In my experience with clients', and my own, projects, there are two good cases for modifying Sunspot's schema. First, for making changes to the text field's analyzers based on the different features your application might need. And, second, for creating brand new types (usually based on the text type) for a more fine-grained application of Solr analyzers.

例如,可以通过匹配也使用语言词干或 NGrams 的基于文本的特殊字段来扩大与模糊"搜索的搜索匹配.原始 text 字段中的标记可用于填充拼写检查,或增强精确匹配.当更严格的匹配失败时,自定义 text_ngramtext_en 中的标记可以用来扩大搜索结果.

For example, widening search matches with "fuzzy" searches can be done with matches against a special text-based field that also uses linguistic stems, or NGrams. The tokens in the original text field may be used to populate spellcheck, or to boost exact matches. And the tokens in the custom text_ngram or text_en can serve to broaden search results when the stricter matching fails.

Sunspot 的 DSL 提供了一项最终功能,用于将您的字段映射到这些自定义字段.一旦您设置了 type 及其相应的 dynamicField 定义,您就可以使用 Sunspot 的 :as 选项来覆盖基于约定的名称生成.

Sunspot's DSL provides one final feature for mapping your fields to these custom fields. Once you have set up the type and its corresponding dynamicField definition(s), you can use Sunspot's :as option to override the convention-based name generation.

例如,为上述添加自定义 ngram 类型,我们最终可能会再次使用 NGrams 和以下 Ruby 代码处理正文:

For example, adding a custom ngram type for the above, we might end up processing the body again with NGrams with the following Ruby code:

searchable do
  text :body
  text :body_ngram, :as => 'body_ngram'
end

这篇关于Sunspot 如何修改 Solr 的 schema.xml?它会修改它吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆