ClickHouse Kafka效果 [英] ClickHouse Kafka Performance

查看:224
本文介绍了ClickHouse Kafka效果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按照文档中的示例进行操作: https://clickhouse.yandex/docs/en/table_engines/kafka/

我使用Kafka Engine和一个实例化视图创建了一个表,该视图将数据推送到 MergeTree 表.

这是我的表的结构:

  CREATE TABLE游戏(UserId UInt32,ActivityType UInt8,浮动金额32,CurrencyId UInt8,日期字串)引擎= Kafka('XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092",游戏","click-1","JSONEachRow","3");创建表tests.games_transactions(日期,UserId UInt32,浮动金额32,CurrencyId UInt8,时间值DateTime,ActivityType UInt8)ENGINE = MergeTree(day,(day,UserId),8192);创建材料视图test.games_consumer到tests.games_transactionsAS SELECT toDate(replaceRegexpOne(Date,'\\ .. *',''))作为日期,UserId,Amount,CurrencyId,toDateTime(replaceRegexpOne(Date,'\\ .. *',''))作为时间值,活动类型从default.games; 

在Kafka主题中,我每秒收到约150条消息.

一切都很好,部分原因是表中的数据更新有很大的延迟,绝对不是实时的.

似乎只有当我收到准备在卡夫卡消费的 65536条新消息时,数据才从卡夫卡发送到表中

我应该设置一些特定的配置吗?

我试图从cli更改配置:

  SET max_insert_block_size = 1048设定max_block_size = 655设置stream_flush_interval_ms = 750 

但是没有改善

我应该更改任何特定的配置吗?
在创建表之前,我应该更改上述配置吗?

解决方案

ClickHouse github上存在此问题- https://github.com/yandex/ClickHouse/issues/2169 ..>

基本上,您需要设置max_block_size( http在创建表格之前//://clickhouse-docs.readthedocs.io/zh-CN/latest/settings/settings.html#max-block-size ),否则将无法使用.

我在覆盖users.xml时使用了该解决方案:

 < yandex><个人资料><默认>< max_block_size> 100</max_block_size></default></profiles></yandex> 

我删除了表和数据库,然后重新创建了它们.它为我工作.现在可能每100条记录中的表就会更新一次.

Following the example from the documentation: https://clickhouse.yandex/docs/en/table_engines/kafka/

I created a table with Kafka Engine and a materialized view that pushes data to a MergeTree table.

Here the structure of my tables:

CREATE TABLE games (
    UserId UInt32,
    ActivityType UInt8,
    Amount Float32,
    CurrencyId UInt8,
    Date String
  ) ENGINE = Kafka('XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092', 'games', 'click-1', 'JSONEachRow', '3');


CREATE TABLE tests.games_transactions (
    day Date,
    UserId UInt32,
    Amount Float32,
    CurrencyId UInt8,
    timevalue DateTime,
    ActivityType UInt8
 ) ENGINE = MergeTree(day, (day, UserId), 8192);


  CREATE MATERIALIZED VIEW tests.games_consumer TO tests.games_transactions
    AS SELECT toDate(replaceRegexpOne(Date,'\\..*','')) as day, UserId, Amount, CurrencyId, toDateTime(replaceRegexpOne(Date,'\\..*','')) as timevalue, ActivityType
    FROM default.games;

In the Kafka topic I am getting around 150 messages per second.

Everything is fine, a part that the data are updated in the table with a big delay, definitely not in real time.

Seems that the data are sent from Kafka to the table only when I reach 65536 new messages ready to consume in Kafka

Should I set some particular configuration?

I tried to change the configurations from the cli:

SET max_insert_block_size=1048
SET max_block_size=655
SET stream_flush_interval_ms=750

But there was no improvement

Should I change any particular configuration?
Should I have changed the above configurations before to create the tables?

解决方案

There is an issue for this on ClickHouse github - https://github.com/yandex/ClickHouse/issues/2169.

Basically you need to set max_block_size (http://clickhouse-docs.readthedocs.io/en/latest/settings/settings.html#max-block-size) before table is created, otherwise it will not work.

I used the solution with overriding users.xml:

<yandex>
    <profiles>
        <default>
           <max_block_size>100</max_block_size>
        </default>
    </profiles>
</yandex>

I deleted my table and db and recreated them. It has worked for me. Now may tables get updated every 100 records.

这篇关于ClickHouse Kafka效果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆