Apache NiFi 用于将数据从 RDMBS 导入 HDFS - 与 SQOOP 的性能比较 [英] Apache NiFi For Importing Data From RDMBS to HDFS - Performance Comparison with SQOOP
问题描述
我们正在探索将 Apache NiFi 作为满足我们企业需求的通用数据摄取工具.
We are exploring Apache NiFi as a general purpose data ingestion tool for our enterprise requirements.
一个典型的数据摄取要求是从 RDBMS 移动数据系统到 HDFS.
One typical data ingestion requirement is moving data from RDBMS systems to HDFS.
我能够使用 NiFi 提供的 GenerateTableFetch 和 ExecuteSQL 处理器在 NiFi 中构建到 HDFS 数据移动流的 RDBMS,对于较小的表,一切正常.
I was able to build RDBMS to HDFS data movement flow in NiFi using GenerateTableFetch and ExecuteSQL Processors provided by NiFi and everything worked fine for smaller tables.
但是,我无法测试更大表的流程,因为我使用的是独立发行版.
But, I couldn't test the flow for bigger tables as I was using a standalone distribution.
有没有人做过类似要求的 NiFi 和 SQOOP 的性能比较?
Has anyone done a performance comparison of NiFi with SQOOP for similar requirements ?
推荐答案
ExecuteSQL
和 ExecuteSQLRecord
是更好的选择.前者只会自动将结果集转换为 Avro 序列.后者让您更自由地编写输出(JSON、CSV 等).ExecuteSQL
的一个好处是你可以将它与 MergeRecord
链接起来,将多个中等大小的结果页面组合成一个更大的数据块,并且 MergeRecord
可以使用 ParquetWriter
为您提供现成的 Parquet 以插入 HDFS.
ExecuteSQL
and ExecuteSQLRecord
are a better choice. The former will just automatically convert result sets into an Avro sequence. The latter gives you more freedom on how you write the output (JSON, CSV, etc.). One nice thing about ExecuteSQL
is that you can chain it with MergeRecord
to combine multiple modest-sized result pages into a much bigger block of data, and MergeRecord
can use the ParquetWriter
to give you ready-made Parquet for insertion into HDFS.
这篇关于Apache NiFi 用于将数据从 RDMBS 导入 HDFS - 与 SQOOP 的性能比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!