使用 Apache Nifi 在 CSV 中转换日期格式 [英] Transform date format inside CSV using Apache Nifi

查看:67
本文介绍了使用 Apache Nifi 在 CSV 中转换日期格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 Apache Nifi 环境中修改 CSV 文件.

I need to modify CSV file in Apache Nifi environment.

我的 CSV 看起来像文件:

My CSV looks like file:

Advertiser ID,Campaign Start Date,Campaign End Date,Campaign Name
10730729,1/29/2020 3:00:00 AM,2/20/2020 3:00:00 AM,Nestle
40376079,2/1/2020 3:00:00 AM,4/1/2020 3:00:00 AM,Heinz
...

我想将具有上午/下午值的日期转换为简单日期格式.从 1/29/2020 3:00:00 AM2020-01-29 每行.我阅读了有关 UpdateRecord 处理器的信息,但存在问题.如您所见,CSV 标头包含空格,我什至无法同时使用替换值策略(文字和记录路径)解析这些字段.

I want to transform dates with AM/PM values to simple date format. From 1/29/2020 3:00:00 AM to 2020-01-29 for each row. I read about UpdateRecord processor, but there is a problem. As you can see, CSV headers contain spaces and I can't even parse these fields with both Replacement Value Strategy (Literal and Record Path).

有解决这个问题的想法吗?也许我应该以某种方式将标题从 Advertiser ID 修改为 advertiser_id 等?

Any ideas to solve this problem? Maybe somehow I should modify headers from Advertiser ID to advertiser_id, etc?

推荐答案

您实际上不需要自己进行转换,您可以让您的读者和作家为您处理.不过,要让 CSV 阅读器识别日期,您需要为行定义架构.您的架构看起来像这样(我从列名中删除了空格,因为它们是不允许的):

You don't need to actually make the transformation yourself, you can let your Readers and Writers handle it for you. To get the CSV Reader to recognize dates though, you will need to define a schema for your rows. Your schema would look something like this (I've removed the spaces from the column names because they are not allowed):

{
    "type": "record",
    "name": "ExampleCSV",
    "namespace": "Stackoverflow",
    "fields": [
        {"name": "AdvertiserID", "type": "string"},
        {"name": "CampaignStartDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
        {"name": "CampaignEndDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
        {"name": "CampaignName", "type": "string"}
    ]
}

要配置阅读器,请设置以下属性:

To configure the reader, set the following properties:

  • 架构访问策略 = 使用架构文本"属性
  • 架构文本 =(代码块上方)
  • 将第一行视为标题 = True
  • 时间戳格式 =MM/dd/yyyy hh:mm:ss a"

此外,如果您不想或无法更改上游系统以删除空格,您可以将此属性设置为忽略 CSV 的标题.

Additionally you can set this property to ignore the Header of the CSV if you don't want to or are unable to change the upstream system to remove the spaces.

  • 忽略 CSD 标题列名称 = True

然后在您的 CSVRecordSetWriter 服务中,您可以指定以下内容:

Then in your CSVRecordSetWriter service you can specify the following:

  • 架构访问策略 = 继承记录架构
  • 时间戳格式 =yyyy-MM-dd"

您可以使用 UpdateRecord 或 ConvertRecord(或其他人,只要它们允许您指定读取器和写入器),它只会为您进行转换.UpdateRecord 和 ConvertRecord 之间的区别在于 UpdateRecord 要求您指定一个用户定义的属性,因此如果这是您要做的唯一更改,只需使用 ConvertRecord.如果您有其他转换,则应使用 UpdateRecord 并同时进行这些更改.

You can use UpdateRecord or ConvertRecord (or others as long as they allow you to specify both a reader and a writer)and it will just do the conversion for you. The difference between UpdateRecord and ConvertRecord is that UpdateRecord requires you to specify a user defined property, so if this is the only change you will make, just use ConvertRecord. If you have other transformations, you should use UpdateRecord and make those changes at the same time.

警告:这将使用新的列名(在我的示例中,没有空格的列名)重写文件,因此请记住这一点以供下游使用.

Caveat: This will rewrite the file using the new column names (in my example, ones without spaces) so keep that in mind for downstream usage.

这篇关于使用 Apache Nifi 在 CSV 中转换日期格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆