用于存储计量数据的ELK堆栈 [英] ELK stack for storing metering data
问题描述
在我们的项目中,我们使用ELK堆栈来将日志存储在一个集中的地方。但是我注意到,最近版本的ElasticSearch支持各种聚合。另外,Kibana 4支持很好的图形化方式来构建图形。即使是最近的grafana版本,现在也可以使用Elastic Search 2数据源。
In our project we're using an ELK stack for storing logs in a centralized place. However I've noticed that recent versions of ElasticSearch support various aggregations. In addition Kibana 4 supports nice graphical ways to build graphs. Even recent versions of grafana can now work with Elastic Search 2 datasource.
所有这一切是否意味着ELK堆栈现在可用于存储系统中收集的计量信息,或者仍然不能被认为是现有解决方案的严重竞争对手:石墨,流入数据库等。
如果是,有没有人在生产中使用ELK进行计量?你可以分享你的经验吗?
So, does all this mean that ELK stack can now be used for storing metering information gathered inside the system or it still cannot be considered as a serious competitor to existing solutions: graphite, influx db and so forth. If so, does anyone use ELK for metering in production? Could you please share your experience?
只是为了澄清这个概念,我认为计量数据是可以聚合的东西,并在图形中显示随着时间的推移到主要用例正在搜索的常规日志消息。
Just to clarify the notions, I consider metering data as something that can be aggregated and and show in a graph 'over time' as opposed to regular log message where the main use case is searching.
非常感谢提前
推荐答案
是的,您可以使用弹性搜索来存储和分析时间序列数据。
Yes you can use Elasticsearch to store and analyze time-series data.
要更精确 - 这取决于您的用例。对于我的用例(金融工具价格记录历史数据开发中)示例,我可以获得 40.000个文档插入/秒(〜125字节文档,每个11个字段 - 1个时间戳,字符串和小数,意味着 5MB / s的有用数据) 14小时/天,由公司SAN支持的单节点(具有192GB RAM的大型现代服务器)支持,这是由旋转磁盘支持的 ,而不是SSD!我去存储 1TB的数据,但是我预测有2-4TB也可以在单个节点上工作。
To be more precise - it depends on your use case. For example in my use case (financial instrument price tick history data, in development) I am able to get 40.000 documents inserted / sec (~125 byte documents with 11 fields each - 1 timestamp, strings and decimals, meaning 5MB/s of useful data) for 14 hrs/day, on a single node (big modern server with 192GB ram) backed by corporate SAN (which is backed by spinning disks, not SSD!). I went to store up to 1TB of data, but I predict having 2-4TB could also work on a single node.
所有这些都是使用默认配置文件设置,除了ES_HEAP_SIZE为30GB。我怀疑有可能通过一些调整来获得显着更好的写硬件性能(例如,我发现iostat报告设备使用率为25-30%,就像弹性上限/保存I / o带宽读取一样)或合并...但也可能是%util是SAN设备的不可预测的度量。)
All this is with default config file settings, except for the ES_HEAP_SIZE of 30GB. I am suspecting it would be possible to get significantly better write performance on that hardware with some tuning (eg. I find it strange that iostat reports device util at 25-30% as if Elastic was capping it / conserving i/o bandwith for reads or merges... but it could also be that the %util is an unrealiable metric for SAN devices..)
查询性能也很好 - 查询/ Kibana图返回只要您在时间和/或其他字段中限制结果数据集即可。
Query performance is also fine - queries / Kibana graphs return quick as long as you restrict the result dataset with time and/or other fields.
在这种情况下,您将不会使用Logstash来加载数据
In this case you would not be using Logstash to load your data, but bulk inserts of big batches directly into the Elasticsearch. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
您还需要定义一个映射 https://www.elastic.co/guide/en/elasticsearch/reference/current/ mapping.html ,以确保弹性分析您的数据(数字,日期等)创建索引的想要级别等。
You also need to define a mapping https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html to make sure elastic parses your data as you want it (numbers, dates, etc..) creates the wanted level of indexing, etc..
此用例的其他推荐做法是每天使用单独的索引(或根据您的插入速率选择月份/周),并确保索引是使用只有有足够的碎片保存1天的数据(默认情况下,新的索引将使用5个碎片创建,并且碎片的性能在碎片增长一定大小后开始降级 - 通常几十GB, BU
Other recommended practices for this use case are to use a separate index for each day (or month/week depending on your insert rate), and make sure that index is created with just enough shards to hold 1 day of data (by default new indexes get created with 5 shards, and performance of shards starts degrading after a shard grows over a certain size - usually few tens of GB, but it might differ for your use case - you need to measure/experiment).
使用Elasticsearch 别名 https://www.elastic.co/guide/en/elasticsearch/reference/current/indices -aliases.html 有助于处理多个索引,是一般推荐的最佳做法。
Using Elasticsearch aliases https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html helps with dealing with multiple indexes, and is a generally recommended best practice.
这篇关于用于存储计量数据的ELK堆栈的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!