使用 Solr 搜索索引作为数据库 - 这是“错误的"吗? [英] Using Solr search index as a database - is this "wrong"?

查看:63
本文介绍了使用 Solr 搜索索引作为数据库 - 这是“错误的"吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的团队正在与使用 Solr 作为搜索索引的第三方 CMS 合作.我注意到作者似乎使用 Solr 作为排序数据库,因为返回的每个文档都包含两个字段:

My team is working with a third party CMS that uses Solr as a search index. I've noticed that it seems like the authors are using Solr as a database of sorts in that each document returned contains two fields:

  1. Solr 文档 ID(基本上是类名和数据库 ID)
  2. 整个对象的 XML 表示

所以基本上它会针对 Solr 运行搜索,下载对象的 XML 表示,然后从 XML 实例化对象,而不是使用 id 在数据库中查找它.

So basically it runs a search against Solr, download the XML representation of the object, and then instantiate the object from the XML rather than looking it up in the database using the id.

我的直觉告诉我这是一个不好的做法.Solr 是一个搜索索引,而不是一个数据库......所以对我来说对 Solr 执行我们复杂的搜索,获取文档 ID,然后从数据库中提取相应的行更有意义.

My gut feeling tells me this is a bad practice. Solr is a search index, not a database... so it makes more sense to me to execute our complex searches against Solr, get the document ids, and then pull the corresponding rows out of the database.

当前的实现是否完美无缺,或者是否有数据支持重构已经成熟的想法?

Is the current implementation perfectly sound, or is there data to support the idea that this is ripe for refactoring?

当我说XML 表示"时 - 我的意思是一个存储字段,其中包含所有对象属性的 XML 字符串,而不是多个存储字段.

When I say "XML representation" - I mean one stored field that contains an XML string of all of the object's properties, not multiple stored fields.

推荐答案

是的,您可以将 SOLR 用作数据库,但有一些非常严重的警告:

Yes, you can use SOLR as a database but there are some really serious caveats :

  1. SOLR 最常见的访问模式,即通过 http 对批量查询的响应不是特别好.此外,SOLR 不流式传输数据 --- 因此您不能一次懒惰地遍历数百万条记录.这意味着您在使用 SOLR 设计大规模数据访问模式时必须非常周到.

尽管 SOLR 性能可以横向扩展(更多机器、更多内核等)以及纵向扩展(更多 RAM、更好的机器等),但与成熟的关系型数据库.也就是说,有一些很好的功能,比如字段统计查询,非常方便.

Although SOLR performance scales horizontally (more machines, more cores, etc..) as well as vertically (more RAM, better machines, etc), its querying capabilities are severely limited compared to those of a mature RDBMS. That said, there are some excellent functions, like the field stats queries, which are quite convenient.

习惯使用关系数据库的开发人员在 SOLR 范式中使用相同的 DAO 设计模式时经常会遇到问题,因为 SOLR 在查询中使用过滤器的方式.开发正确的方法来构建使用 SOLR 进行部分大型查询或有状态修改的应用程序需要一个学习曲线.

Developers who are used to using relational databases will often run into problems when they use the same DAO design patterns in a SOLR paradigm, because of the way SOLR uses filters in queries. There will be a learning curve for developing the right approach to building an application that uses SOLR for part of its large queries or statefull modifications.

允许高级会话管理和有状态实体的企业"工具,许多高级 Web 框架(Ruby、Hibernate 等)提供的这些工具将不得不完全抛弃强>.

关系数据库旨在处理复杂的数据和关系 - 因此它们配备了最先进的指标和自动化分析工具.在 SOLR 中,我发现自己编写了这样的工具并手动进行了很多压力测试,这可能会浪费时间.

Relational databases are meant to deal with complex data and relationships - and they are thus accompanied by state of the art metrics and automated analysis tools. In SOLR, I've found myself writing such tools and manually stress-testing alot, which can be a time sink.

加入:这是大杀器.关系数据库支持构建和优化基于简单谓词连接元组的视图和查询的方法.在 SOLR 中,没有任何可靠的方法可以跨索引连接数据.

Joining : this is the big killer. Relational databases support methods for building and optimizing views and queries that join tuples based on simple predicates. In SOLR, there aren't any robust methods for joining data across indices.

弹性:为了实现高可用性,SolrCloud 在底层使用分布式文件系统(即 HCFS).该模型与关系数据库的模型完全不同,关系数据库通常使用从站和主站或 RAID 等来实现弹性.因此,如果您希望它具有云可扩展性和抗性,您必须准备好提供 SOLR 所需的弹性基础设施.

Resiliency : For high availability, SolrCloud uses a distributed file system underneath (i.e. HCFS). This model is quite different then that of a relational database, which usually does resiliency using slaves and masters, or RAID, and so on. So you have to be ready to provide the resiliency infrastructure SOLR requires if you want it to be cloud scalable and resistent.

也就是说 - SOLR 对于某些任务有很多明显的优势:(参见 http://wiki.apache.org/solr/WhyUseSolr) -- 松散查询更容易运行并返回有意义的结果.索引是默认完成的,因此大多数任意查询都非常有效地运行(与 RDBMS 不同,在 RDBMS 中,您通常必须在事后进行优化和反规范化).

That said - there are plenty of obvious advantages to SOLR for certain tasks : (see http://wiki.apache.org/solr/WhyUseSolr) -- loose queries are much easier to run and return meaningful results. Indexing is done as a matter of default, so most arbitrary queries run pretty effectively (unlike a RDBMS, where you often have to optimize and de-normalize after the fact).

结论:尽管您可以将 SOLR 用作 RDBMS,但您可能会发现(正如我所见)最终没有免费午餐"——以及超酷 lucene 文本的成本节省- 搜索和高性能内存索引通常是通过灵活性降低和采用新的数据访问工作流来支付的.

Conclusion: Even though you CAN use SOLR as an RDBMS, you may find (as I have) that there is ultimately "no free lunch" - and the cost savings of super-cool lucene text-searches and high-performance, in-memory indexing, are often paid for by less flexibility and adoption of new data access workflows.

这篇关于使用 Solr 搜索索引作为数据库 - 这是“错误的"吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆