Cassandra中的分布式日志 [英] Distributed logs in Cassandra

查看:300
本文介绍了Cassandra中的分布式日志的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找在Cassandra中存储应用程序日志的方法.

I am finding the way to store the application logs in Cassandra.

我有三个节点设置(节点1,节点2和节点3),其中我的Web应用程序在这三个节点中都作为群集运行,并且负载均衡,因此将从所有节点生成日志.

I have three node setup (Node 1, Node 2 and Node 3) in which my web application runs as cluster in all the three nodes and load balanced so logs will be generated from all nodes.

Cassandra在所有三个节点中运行,并且日志从所有三个Web应用程序转储到Cassandra群集中,该群集每天进行分区.

Cassandra runs in all the three nodes and logs are dumped from all the three web application into Cassandra cluster which is partitioned for every day.

此方法中的问题:
1)我正在使用Web应用程序将数据写入Cassandra.
2)对于每天的分区,数据量非常大

Problem in this approach :
1) I am using my web application to write the data to Cassandra.
2) For every day partition, the amount of data is very high

那么有没有更好的方法呢?

So Is there a better approach for this?

这是一种好的设计方法吗?

推荐答案

在Cassandra中存储日志的选择值得商;;因为对该数据的分析变得困难而可行. ELK(Elastic-Logstash-Kibana)或Splunk由于其原生的文本"搜索支持和仪表板而成为日志分析的更受欢迎的选择.

The choice of storing logs in Cassandra is debatable; as the analysis of that data becomes difficult but doable. ELK (Elastic-Logstash-Kibana) or Splunk are more popular choices for log analysis because of their native "text" search support and dashboards.

话虽如此,让我们看一下手头的问题

Having said that, lets look at the problems in hand

1)我正在使用Web应用程序将数据写入Cassandra.

1) I am using my web application to write the data to Cassandra.

我想到的建议是:

  • 写入是否异步完成?推荐.
  • 这些写入期间使用的一致性级别是多少?一致性越高,Web应用的等待时间就越长,因为它在C *上等待的时间更长(假设同步写入).记住,C *可能仍然具有RF = 3,但是您可以使一致性= 1.
  • 如果C *群集出现故障,会发生什么?网络应用程序会随之下降吗?

2)对于每天的分区,数据量非常大

2) For every day partition, the amount of data is very high

  • 这里有两个问题-胖分区和同一节点整天受到攻击(导致出现热点).工作负载并未分配到整个集群.
  • 分区大小可以减少为每小时一次,而不是整天.但是我们只是将一个命中节点的占用空间从一天减少到一个小时.它仍然是一个小时的热点.
  • 您可以执行第二"级分区,以在节点之间获得均匀的数据分布,而不会造成巨大的分区(取决于应用程序的健谈程度).但这是C *用于日志监视的优点值得怀疑的地方吗?
  • C *将解决什么所有查询?我将如何汇总二级数据分区并回答典型日志分析期间出现的各种问题?
    • There are two problems here - Fat partitions and same node being hit for the entire day (resulting in hot spots). The workload isn't being distributed to the entire cluster.
    • Partition sizing can be reduced to be hourly instead of entire day. But we just reduced the footprint of one node being hit from a day to an hour. Its still hot spot for the hour.
    • You could do "second" level partition, to get an uniform distribution of data across nodes and not cause huge partitions (depends on how chatty the app is). But this is where merits of C* for log monitoring becomes questionable?
    • What are all the queries that C* would solve? How would I aggregate the second level data partition and answer various questions arising during typical log analysis?
    • 使用C * DB必须回答的所有日志分析问题(查询)重新访问设计?答案应该自动排列.

      Revisit the design with what are all the log analysis questions (queries) that this C* DB would have to answer? Answers should line up automagically.

      这篇关于Cassandra中的分布式日志的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆