ClickHouse Kafka效果 [英] ClickHouse Kafka Performance
问题描述
按照文档中的示例进行操作: https://clickhouse.yandex/docs/en/table_engines/kafka/
我使用Kafka Engine和一个实例化视图创建了一个表,该视图将数据推送到 MergeTree 表.
这是我的表的结构:
CREATE TABLE游戏(UserId UInt32,ActivityType UInt8,浮动金额32,CurrencyId UInt8,日期字串)引擎= Kafka('XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092",游戏","click-1","JSONEachRow","3");创建表tests.games_transactions(日期,UserId UInt32,浮动金额32,CurrencyId UInt8,时间值DateTime,ActivityType UInt8)ENGINE = MergeTree(day,(day,UserId),8192);创建材料视图test.games_consumer到tests.games_transactionsAS SELECT toDate(replaceRegexpOne(Date,'\\ .. *',''))作为日期,UserId,Amount,CurrencyId,toDateTime(replaceRegexpOne(Date,'\\ .. *',''))作为时间值,活动类型从default.games;
在Kafka主题中,我每秒收到约150条消息.
一切都很好,部分原因是表中的数据更新有很大的延迟,绝对不是实时的.
似乎只有当我收到准备在卡夫卡消费的 65536条新消息时,数据才从卡夫卡发送到表中
我应该设置一些特定的配置吗?
我试图从cli更改配置:
SET max_insert_block_size = 1048设定max_block_size = 655设置stream_flush_interval_ms = 750
但是没有改善
我应该更改任何特定的配置吗?
在创建表之前,我应该更改上述配置吗?
ClickHouse github上存在此问题- https://github.com/yandex/ClickHouse/issues/2169 ..>
基本上,您需要设置max_block_size( http在创建表格之前//://clickhouse-docs.readthedocs.io/zh-CN/latest/settings/settings.html#max-block-size ),否则将无法使用.
我在覆盖users.xml时使用了该解决方案:
< yandex><个人资料><默认>< max_block_size> 100</max_block_size></default></profiles></yandex>
我删除了表和数据库,然后重新创建了它们.它为我工作.现在可能每100条记录中的表就会更新一次.
Following the example from the documentation: https://clickhouse.yandex/docs/en/table_engines/kafka/
I created a table with Kafka Engine and a materialized view that pushes data to a MergeTree table.
Here the structure of my tables:
CREATE TABLE games (
UserId UInt32,
ActivityType UInt8,
Amount Float32,
CurrencyId UInt8,
Date String
) ENGINE = Kafka('XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092', 'games', 'click-1', 'JSONEachRow', '3');
CREATE TABLE tests.games_transactions (
day Date,
UserId UInt32,
Amount Float32,
CurrencyId UInt8,
timevalue DateTime,
ActivityType UInt8
) ENGINE = MergeTree(day, (day, UserId), 8192);
CREATE MATERIALIZED VIEW tests.games_consumer TO tests.games_transactions
AS SELECT toDate(replaceRegexpOne(Date,'\\..*','')) as day, UserId, Amount, CurrencyId, toDateTime(replaceRegexpOne(Date,'\\..*','')) as timevalue, ActivityType
FROM default.games;
In the Kafka topic I am getting around 150 messages per second.
Everything is fine, a part that the data are updated in the table with a big delay, definitely not in real time.
Seems that the data are sent from Kafka to the table only when I reach 65536 new messages ready to consume in Kafka
Should I set some particular configuration?
I tried to change the configurations from the cli:
SET max_insert_block_size=1048
SET max_block_size=655
SET stream_flush_interval_ms=750
But there was no improvement
Should I change any particular configuration?
Should I have changed the above configurations before to create the tables?
There is an issue for this on ClickHouse github - https://github.com/yandex/ClickHouse/issues/2169.
Basically you need to set max_block_size (http://clickhouse-docs.readthedocs.io/en/latest/settings/settings.html#max-block-size) before table is created, otherwise it will not work.
I used the solution with overriding users.xml:
<yandex>
<profiles>
<default>
<max_block_size>100</max_block_size>
</default>
</profiles>
</yandex>
I deleted my table and db and recreated them. It has worked for me. Now may tables get updated every 100 records.
这篇关于ClickHouse Kafka效果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!