弹性搜索索引互联网 [英] Elastic Search Indexing the Internet

查看:105
本文介绍了弹性搜索索引互联网的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这主要是弹性搜索的设计模式问题。

This is mostly a Design Pattern Question for Elastic Search.

如果我想使用弹性搜索索引Internet,那么最有效的方法是组织这样的一个任务?

If I wanted to index The Internet with Elastic Search, what would be the most efficient way to organize such a task?

@kimchy 谈论不同的模式和 Rafal Kuc 讨论了大规模集群的扩展,但我没有理解看完这些之后如何组织互联网的索引。

@kimchy talks about different patterns and Rafal Kuc discusses scaling massive clusters, but I didnt get a sense of how to organize an index of the internet after watching these.

我认为逻辑上你可以通过为每个域创建一个新的索引来组织这样的努力。所以你可以大量地分摊索引,如Stackoverflow.com,但也可能只有1分碎片为索引,如momandpopsite.com

I think logically you could organize such an effort by creating a new index for each domain. So you could shard heavily on indexes like Stackoverflow.com but maybe have as little as 1 shard for indexes like momandpopsite.com

这看起来效率你ES社区?我不知道,因为我们可以很快地进入数百万个索引,而不是提到他们的个人碎片。现在我想知道这种类型的设计是否有很多开销,并且会变得blo肿。 (也就是说,这种模式的结构是否产生了太多的开销?)

Does that look efficient to you ES Community? I'm not sure because we can very quickly get into millions of indexes not to mention their individual shards. And now I'm wondering if there is a lot of overhead associated with this type of design and it becomes bloated. (That is, does this pattern's structure create too much overhead?).

我知道这个问题必须是理论上的,因为没有指定资源。但是,如果您可以使用自己的想象力,并尝试纯粹参与设计策略,您将如何索引全球网络?让我们说有2.75亿个域名。

I know this question has to be theoretical because resources are not specified. But if you could use your imagination and try to stick purely to a design strategy -- how would you index the world wide web? Lets say there are 275 million domains. What is the most efficient design pattern for indexing the internet using Elastic Search?

推荐答案

每个域的索引(所以有2.75亿个索引)是最有效的设计模式,是不可行的索引确实有一个开销,我已经失去了参考,但是我不认为你在一个正常服务器上需要超过〜100个索引。

An index per domain (so 275 million indexes) is not feasible. Indexes do have an overhead, and I've lost the reference, but I don't think you want more than ~100 indexes on a single "normal" server.

为了让更多的站点成为一个索引,你需要介绍路由和视图,但是我想象一下,一个索引也会引起不必要的开销。我猜,但路由规则查找可能会变得非常大等等。所以你想要找到一些拆分索引的方法。在这么高的数量,你不能设计所有的纸上,所以我建议PoC工作,以确定你对不同大小的索引获得什么样的性能。然后,您将看到使用别名映射到底层索引。

To get more sites into a single Index, you would want to introduce routing and views, but I would imagine that a single index for everything would also introduce un-needed overhead. I'm guessing, but the routing rule look up might become incredibly large etc. So you would want to find some way of splitting things across indexes. At such a high volume, you can't design it all on paper, so I would advise PoC work to determine what kind of performance you get for different sized indexes. You would then look to use aliases to map correctly to the underlying index.

进一步阅读:
https://groups.google.com/forum/#!searchin/elasticsearch/index $ 20per $ 20user / elasticsearch / i -G5N1P1VeY / PK9vVP0myAgJ

For further reading: https://groups.google.com/forum/#!searchin/elasticsearch/index$20per$20user/elasticsearch/i-G5NlP1VeY/PK9vVP0myAgJ

https://groups.google.com/forum/#!msg/elasticsearch/9L5cWIAib94/K7zdHEW-4P0J

这篇关于弹性搜索索引互联网的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆