用于将数据从RDMBS导入HDFS的Apache NiFi-与SQOOP的性能比较 [英] Apache NiFi For Importing Data From RDMBS to HDFS - Performance Comparison with SQOOP
问题描述
我们正在探索Apache NiFi作为满足我们企业需求的通用数据提取工具.
We are exploring Apache NiFi as a general purpose data ingestion tool for our enterprise requirements.
一个典型的数据摄取要求是从RDBMS移出数据系统到HDFS.
One typical data ingestion requirement is moving data from RDBMS systems to HDFS.
我能够使用NiFi提供的GenerateTableFetch和ExecuteSQL处理器在NiFi中构建RDBMS到HDFS数据移动流,并且对于较小的表来说一切正常.
I was able to build RDBMS to HDFS data movement flow in NiFi using GenerateTableFetch and ExecuteSQL Processors provided by NiFi and everything worked fine for smaller tables.
但是,由于我使用的是独立发行版,因此无法测试较大表的流程.
But, I couldn't test the flow for bigger tables as I was using a standalone distribution.
有人针对类似要求对NiFi和SQOOP进行了性能比较吗?
Has anyone done a performance comparison of NiFi with SQOOP for similar requirements ?
推荐答案
ExecuteSQL
和 ExecuteSQLRecord
是更好的选择.前者将自动将结果集转换为Avro序列.后者使您可以更自由地编写输出(JSON,CSV等).关于 ExecuteSQL
的一件好事是,您可以将其与 MergeRecord
链接起来,以将多个大小适中的结果页组合成更大的数据块和 MergeRecord
可以使用 ParquetWriter
为您提供现成的Parquet,以便插入HDFS.
ExecuteSQL
and ExecuteSQLRecord
are a better choice. The former will just automatically convert result sets into an Avro sequence. The latter gives you more freedom on how you write the output (JSON, CSV, etc.). One nice thing about ExecuteSQL
is that you can chain it with MergeRecord
to combine multiple modest-sized result pages into a much bigger block of data, and MergeRecord
can use the ParquetWriter
to give you ready-made Parquet for insertion into HDFS.
这篇关于用于将数据从RDMBS导入HDFS的Apache NiFi-与SQOOP的性能比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!