提取filname并将名称存储在csv文件的新列中 [英] Extract filname and store the name in a new column in csv file
问题描述
我想提取文件名并将文件名存储在CSV文件的现有列之一中.这该怎么做?使用哪个处理器?什么配置? 例如,我有一个文件名"FE_CHRGRSIM_20171207150616_CustRec.csv",我想提取"FE_CHRGRSIM_20171207150616"并将此值存储在Same CSV文件中的现有列下.请帮忙. TIA
I want to extract filename and store the filename in one of the existing column in the CSV file. How to do this? Which processor to use? what configuration? Ex- i have a filename 'FE_CHRGRSIM_20171207150616_CustRec.csv' and i want to extract ''FE_CHRGRSIM_20171207150616' and store this value under an existing column in the Same CSV file. Please help. TIA
推荐答案
通常,真实"文件名可以用作流文件中名为文件名"的属性.您可以将UpdateRecord与文字价值"的替换策略一起使用;添加一个名为/filename
的用户定义属性,并将其值设置为${filename:substringBeforeLast('.')}
.您需要确保将文件名"字段添加到架构中(通过UpdateRecord或手动添加).如果您不提前知道自己的CSV模式,则可以使用InferAvroSchema,它将尝试找出它.
Usually the "real" file name is available as an attribute on the flow file called "filename". You can use UpdateRecord with a Replacement Strategy of "Literal Value"; add a user-defined property called /filename
and set the value to ${filename:substringBeforeLast('.')}
. You'll need to make sure that the "filename" field is added to your schema (either by UpdateRecord or manually). If you won't know your CSV schema ahead of time you can use InferAvroSchema and it will try to figure it out.
如果UpdateRecord和架构东西似乎对您不起作用,则另一种方法(因为它是CSV)是使用ReplaceText,匹配整行,然后替换为该值,后跟,${filename:substringBeforeLast('.')}
.那应该将文件名(扩展名被删除)添加为传出CSV的最后一列.
If UpdateRecord and the schema stuff doesn't seem to be working for you, an alternative (since it's CSV) is to use ReplaceText, match the entire line, then replace with that value followed by ,${filename:substringBeforeLast('.')}
. That should add the filename (with extension removed) as the last column in the outgoing CSV.
这篇关于提取filname并将名称存储在csv文件的新列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!