使用弹性搜索作为中央数据库 [英] Using elasticsearch as central data repository

查看:100
本文介绍了使用弹性搜索作为中央数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用弹性搜索来索引并执行大约10M文档的搜索。它的工作正常,我们对其性能感到满意。我的同事开始使用弹性搜索,相信它可以用作中央数据存储库和其他数据系统(例如SQL Server,Hadoop / Hive)可以将数据推送给他们。我没有任何论据反对,因为我对两者的了解太有限。但是,我很担心。

We are currently using elasticsearch to index and perform searches on about 10M documents. It works fine and we are happy with its performance. My colleague who initiated the use of elasticsearch is convinced that it can be used as the central data repository and other data systems (e.g. SQL Server, Hadoop/Hive) can have data pushed to them. I didn't have any arguments against it because my knowledge of both is too limited. However, I am concerned.

我知道弹性搜索中的数据以对文本搜索有效的方式存储。 Hadoop存储数据就像文件系统一样,但是可以通过多个数据节点来扩展/复制块来实现有效的方式。因此,在我看来,使用Hadoop(因为它对数据的看法更为不可知)看起来更有利于中央数据存储库。然后将数据从Hadoop推送到SQL,弹性搜索等...

I do know that data in elasticsearch is stored in a manner that is efficient for text searching. Hadoop stores data just as a file system would but in a manner that is efficient to scale/replicate blocks over over multiple data nodes. Therefore, in my mind it seems more beneficial to use Hadoop (as it is more agnostic w.r.t its view on data) as a central data repository. Then push data from Hadoop to SQL, elasticsearch, etc...

我已经阅读了关于Hadoop和弹性搜索用例的几篇文章,似乎常规使用Hadoop中央数据存储库。然而,我找不到任何建议弹性搜索不会是一个不错的选择。

I've read a few articles on Hadoop and elasticsearch use cases and it seems conventional to use Hadoop as the central data repository. However, I can't find anything that would suggest that elasticsearch wouldn't be a decent alternative.

请帮忙!

推荐答案

我非常不鼓励大多数用户使用弹性搜索作为主数据存储。它会很好的工作,直到您的群集由于网络分区而崩溃。即使ES专业人员始终设置的minimum_master_nodes等设置也不会保存。看到Aphyr与他的Call Me Maybe系列的这个优秀的分析:
http: //aphyr.com/posts/317-call-me-maybe-elasticsearch

I'd highly discourage most users from using elasticsearch as your primary datastore. It will work great until your cluster melts down due to a network partition. Even settings such as minimum_master_nodes that the ES pros always set won't save you. See this excellent analysis by Aphyr with his Call Me Maybe series: http://aphyr.com/posts/317-call-me-maybe-elasticsearch

eliasah,是对的,这取决于你的用例,但如果您的数据(和工作)对您很重要,请远离。

eliasah, is right, it depends on your use case, but if your data (and job) is important to you, stay away.

将您的数据的黄金记录保存在真正专注于持久存储的数据中,并将数据同步到搜索从那里。它增加了额外的复杂性和资源,但会导致更好的睡眠:)

Keep your golden record of your data stored in something really focused on persisting and sync your data out to search from there. It adds extra complexity and resources, but will result in a better nights rest :)

有很多方法可以解决这个问题,如果弹性搜索完成所需要的一切,你可以看看卡夫卡是否坚持将所有事件进入一个集群,如果事情出错,可以重播。我喜欢这种方法,因为它提供了一个异步摄取管道到弹性搜索,也是持久性。

There are plenty of ways to go about this and if elasticsearch does everything you need, you can look into Kafka for persisting all the events going into a cluster which would allow replaying if things go wrong. I like this approach as it provides an async ingestion pipeline into elasticsearch that also does the persistence.

这篇关于使用弹性搜索作为中央数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆