使用Apache NiFi按列内容写入CSV文件 [英] Using Apache NiFi to write CSV files by contents of column
问题描述
我有一个Apache NiFi流程,在其中读取了一个大型的.csv
文件.这是.csv
示例:
I have an Apache NiFi flow, where I read in a massive .csv
file. Here's a sample .csv
:
school, date, city
Vanderbilt, xxxx, xxxx
Georgetown, xxxx, xxxx
Duke, xxxx, xxxx
Vanderbilt, xxxx, xxxx
我想使用NiFi读取文件,然后通过school
名称输出另一个.csv
文件. IE.将有一个.csv
文件,其中包含两个Vanderbilt
记录(总共两行,b/c两个记录),一个用于Georgetown
的文件,一个用于Duke
的文件.
I want to use NiFi to read the file, and then output another .csv
file by school
name. I.e. there would be a .csv
file of two Vanderbilt
records (two lines total, b/c two records), and one file for Georgetown
, and one file for Duke
.
我已经使用GetFile
来绘制我的文件(工程,已验证),然后使用SplitText
(行拆分计数= 1和标题行计数= 1),然后使用ExtractText
,但是我有那是一个非常错误的配置.最后,我有PutFile
,它写到我需要去的地方.谢谢.
I've used GetFile
to draw in my file (works, verified), and then SplitText
(line split count = 1 & header line count = 1), and then ExtractText
, but I have a very wrong config in that one. Lastly, I have PutFile
, which writes to where I need it to go. Thanks.
推荐答案
看看NiFi的记录处理功能,您将需要使用PartitionRecord对学校领域进行分区,这将产生您所描述的内容.
Take a look at NiFi's record processing capabilities, you will want to use PartitionRecord to partition on the school field, which will produce exactly what you are describing.