Solr的无模式功能如何工作?如何将其还原为经典架构? [英] How does Solr's schema-less feature work? How to revert it to classic schema?

查看:102
本文介绍了Solr的无模式功能如何工作?如何将其还原为经典架构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

仅发现 Solr 5 不需要预定义架构文件,而是根据正在执行的索引生成架构.我想知道这在后台如何工作?

Just found that Solr 5 doesn't require a schema file to be predefined and it generates the schema, based on the indexing being performed. I would like to know how does this work in the background?

这是否是一个好习惯?有什么办法可以禁用它?

And whether it's a good practice or not? Is there any way to disable it?

推荐答案

无模式功能自版本起就已存在于Solr中4.3 .但是它可能现在才更稳定,因为与之并发的问题已在4.10中修复.

The schemaless feature has been in Solr since version 4.3. But it might be more stable only now as a concurrency issue with it was fixed in 4.10.

它也称为托管模式.当您将Solr配置为使用托管模式时,Solr使用特殊的UpdateRequestProcessor来拦截文档索引请求,并猜测字段类型.

It is also called managed schema. When you configure Solr to use managed schema, Solr uses a special UpdateRequestProcessor to intercept document indexing requests and it guesses field types.

Solr从您的schema.xml文件开始,并创建一个默认情况下称为managed-schema的新文件,以存储所有推断出的模式信息. Solr在检测到模式更改时会自动覆盖此文件.

Solr starts with your schema.xml file and creates a new file called, by default, managed-schema to store all the inferred schema information. This file is automatically overwritten by Solr as it detects changes to the schema.

如果要更改架构,则应使用架构API .另请参见无模式模式文档.

You should then use the Schema API if you want to make changes to the Schema. See also the Schemaless Mode documentation.

停止Solr:$ bin/solr stop

转到server/solr/mycore/conf,其中"mycore"是您的核心/集合的名称.

Go to server/solr/mycore/conf, where "mycore" is the name of your core/collection.

编辑solrconfig.xml:

  • 搜索<schemaFactory class="ManagedIndexSchemaFactory">并注释整个元素
  • 搜索<schemaFactory class="ClassicIndexSchemaFactory"/>并取消注释
  • 搜索引用add-unknown-fields-to-the-schema<initParams>元素,并注释掉整个<initParams>...</initParams>
  • search for <schemaFactory class="ManagedIndexSchemaFactory"> and comment the whole element
  • search for <schemaFactory class="ClassicIndexSchemaFactory"/> and uncomment it
  • search for the <initParams> element that refers to add-unknown-fields-to-the-schema and comment out the whole <initParams>...</initParams>

managed-schema重命名为schema.xml,您就完成了.

Rename managed-schema to schema.xml and you are done.

您现在可以再次启动Solr:$ bin/solr start,转到 http://localhost:8983/solr/#/mycore/documents 并检查Solr现在拒绝使用schema.xml中尚未指定的新字段索引文档.

You can now start Solr again: $ bin/solr start, go to http://localhost:8983/solr/#/mycore/documents and check that Solr now refuses to index a document with a new field not yet specified in schema.xml.

这取决于您想要什么.如果您要强制执行特定的文档结构(例如,确保所有文档根据您的定义都是格式正确的"),那么您想使用经典的架构管理.

It depends on what you want. If you want to enforce a specific document structure (e.g. to make sure that all docs are "well-formed" according to your definition), then you want to use the classical schema management.

另一方面,如果您不预先了解文档结构是什么,那么您可能希望使用无模式功能.

If on the other hand you don't know upfront what the doc structure is then you might want to use the schema-less feature.

虽然它被称为无模式",但是您可以建立索引的结构类型受到限制.顺便说一下,对于Solr和Elasticsearch都是如此.例如,如果您首先将此文档编入索引:

While it is called schema-less, there are limits to the kinds of structures that you can index. This is true both for Solr and Elasticsearch, by the way. For example, if you first index this doc:

{"name":"John Doe"}

然后,如果您尝试为下一个文档编制索引,则会出现错误:

then you will get an error if you try to index a doc like that next:

{"name": {
   "first": "Daniel",
   "second": "Dennett"
   }
}

这是因为在第一种情况下,字段name是字符串类型,而在第二种情况下,它是对象.

That is because in the first case the field name was of type string while in the second case it is an object.

如果您想使用超出这些限制的索引,则可以使用 SIREn -它是

If you would like to use indexing which goes beyond these limitations then you could use SIREn - it is an open source semi-structured information retrieval engine which is implemented as a plugin for both Solr and Elasticsearch. (Disclaimer: I worked for the company that develops SIREn)

这篇关于Solr的无模式功能如何工作?如何将其还原为经典架构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆