Solr 的无模式功能如何工作?如何将其恢复为经典模式? [英] How does Solr's schema-less feature work? How to revert it to classic schema?

查看:21
本文介绍了Solr 的无模式功能如何工作?如何将其恢复为经典模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

刚刚发现 Solr 5 不需要预定义架构文件,它会根据正在执行的索引生成架构.我想知道这在后台是如何工作的?

Just found that Solr 5 doesn't require a schema file to be predefined and it generates the schema, based on the indexing being performed. I would like to know how does this work in the background?

这是否是一个好习惯?有什么办法可以禁用它吗?

And whether it's a good practice or not? Is there any way to disable it?

推荐答案

从 Solr 版本开始,schemaless 特性就已经存在4.3.但它现在可能更稳定,因为 并发问题在 4.10 中修复.

The schemaless feature has been in Solr since version 4.3. But it might be more stable only now as a concurrency issue with it was fixed in 4.10.

它也被称为托管模式.当您将 Solr 配置为使用托管模式时,Solr 使用特殊的 UpdateRequestProcessor 来拦截文档索引请求并猜测字段类型.

It is also called managed schema. When you configure Solr to use managed schema, Solr uses a special UpdateRequestProcessor to intercept document indexing requests and it guesses field types.

Solr 从您的 schema.xml 文件开始,并创建一个默认名为 managed-schema 的新文件来存储所有推断的架构信息.Solr 会在检测到架构更改时自动覆盖此文件.

Solr starts with your schema.xml file and creates a new file called, by default, managed-schema to store all the inferred schema information. This file is automatically overwritten by Solr as it detects changes to the schema.

如果您想对架构进行更改,那么您应该使用 Schema API.另请参阅无架构模式文档.

You should then use the Schema API if you want to make changes to the Schema. See also the Schemaless Mode documentation.

停止 Solr:$ bin/solr stop

转到 server/solr/mycore/conf,其中mycore"是您的核心/集合的名称.

Go to server/solr/mycore/conf, where "mycore" is the name of your core/collection.

编辑solrconfig.xml:

  • 搜索 并注释整个元素
  • 搜索 <schemaFactory class="ClassicIndexSchemaFactory"/> 并取消注释
  • 搜索引用 add-unknown-fields-to-the-schema 元素并注释掉整个 ;...</initParams>
  • search for <schemaFactory class="ManagedIndexSchemaFactory"> and comment the whole element
  • search for <schemaFactory class="ClassicIndexSchemaFactory"/> and uncomment it
  • search for the <initParams> element that refers to add-unknown-fields-to-the-schema and comment out the whole <initParams>...</initParams>

managed-schema 重命名为 schema.xml 就大功告成了.

Rename managed-schema to schema.xml and you are done.

您现在可以再次启动 Solr:$ bin/solr start,转到 http:///localhost:8983/solr/#/mycore/documents 并检查 Solr 现在是否拒绝使用 schema.xml 中尚未指定的新字段索引文档.

You can now start Solr again: $ bin/solr start, go to http://localhost:8983/solr/#/mycore/documents and check that Solr now refuses to index a document with a new field not yet specified in schema.xml.

这取决于你想要什么.如果您想强制执行特定的文档结构(例如,根据您的定义确保所有文档都是格式良好"的),那么您需要使用经典的模式管理.

It depends on what you want. If you want to enforce a specific document structure (e.g. to make sure that all docs are "well-formed" according to your definition), then you want to use the classical schema management.

另一方面,如果您事先不知道文档结构是什么,那么您可能想要使用无架构功能.

If on the other hand you don't know upfront what the doc structure is then you might want to use the schema-less feature.

虽然它被称为无模式,但您可以索引的结构种类是有限的.顺便说一下,对于 Solr 和 Elasticsearch 来说都是如此.例如,如果您首先索引此文档:

While it is called schema-less, there are limits to the kinds of structures that you can index. This is true both for Solr and Elasticsearch, by the way. For example, if you first index this doc:

{"name":"John Doe"}

然后,如果您尝试为下一个这样的文档编制索引,则会出现错误:

then you will get an error if you try to index a doc like that next:

{"name": {
   "first": "Daniel",
   "second": "Dennett"
   }
}

这是因为在第一种情况下 name 字段是字符串类型,而在第二种情况下它是一个对象.

That is because in the first case the field name was of type string while in the second case it is an object.

如果您想使用超出这些限制的索引,那么您可以使用 SIREn - 它是一个 开源 半结构化信息检索引擎,作为 Solr 和 Elasticsearch 的插件实现.(免责声明:我为开发 SIREn 的公司工作)

If you would like to use indexing which goes beyond these limitations then you could use SIREn - it is an open source semi-structured information retrieval engine which is implemented as a plugin for both Solr and Elasticsearch. (Disclaimer: I worked for the company that develops SIREn)

这篇关于Solr 的无模式功能如何工作?如何将其恢复为经典模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆