我的 NiFi GetFile 处理器中文件过滤器属性的正则表达式失败 [英] My Regex for File Filter Attribute in NiFi GetFile Processor is Failing

查看:102
本文介绍了我的 NiFi GetFile 处理器中文件过滤器属性的正则表达式失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要复制到 HDFS 的文件列表.

I have a list of files to copy to HDFS.

文件名如下:

  1. 示例 11072016
  2. 示例 11082016
  3. 示例 11062016
  4. 示例 11062016
  5. Denodo-09082016
  6. Denodo-09122016
  7. Denodo-11082016
  8. Denodo-11072016

现在我正在尝试编写一个正则表达式来选择今天的示例文件.文件后面的数字是日期,如

Now I am trying to write a regex which would pick Today's Sample file. The digits following the file are dates as in

Sample-11082016 是日期 11/08/2016 的文件

Sample-11082016 is the file of date 11/08/2016

我尝试的正则表达式是 [Sample]-(0-9){8}当我检查 8 位数字时,此正则表达式将返回所有日期的所有示例文件.您能否建议如何找到具有今天日期的文件.这里的问题是文件名 Sample 在日期不断变化时保持不变.我必须编写一个正则表达式,以便它只选择今天日期的文件.

The regex I tried is [Sample]-(0-9){8} This regex would return all Sample files with of all dates as I am checking for 8 digits. Could you please suggest on how to find the file with today's date. The problem here is the File name Sample stays constant where as the date keeps changing. I have to write a regex so that it would pick the file of today's date only.

我对正则表达式很陌生,是否可以编写一个正则表达式来检查日期是否是今天的日期.

I am pretty new to Regex, is it possible to write a regex to check if the date is today's date.

任何建议都会有所帮助.NIFI 正则表达式规则与 Java 正则表达式规则相同.正则表达式应该用于 GetFile Processor

Any suggestions would help. NIFI regex Rules are same as Java Regex rules. The Regex Expression should be used against the File Filter Attribute of GetFile Processor

问候,

Sai_PB.

推荐答案

正则表达式差不多了.通过将示例"放在方括号('[' 和 ']')之间,您是在说第一个字符应与这些字符中的一个匹配".这是一个 link 更深入地解释了它(参见字符类"部分).

You're almost there on the regex. By putting "Sample" in between the square brackets ('[' and ']'), you're saying "The first character should match one of these characters". Here is a link that explains it a bit more in depth (see the "Character Classes" section).

此外,通过将0-9"放在括号中,您是在说捕获与字符 '0-9' 完全匹配的这个组".这是您想要方括号的位置.

Also by putting "0-9" in paranthesis, you're saying "Capture this group that matches the characters '0-9' exactly". Here is where you want the square brackets.

所以您应该使用的正则表达式是Sample-[0-9]{8}"(您可以使用\d"而不是0-9",但我想保留尽可能多的初始正则表达式可能).

So the regex you should be using is "Sample-[0-9]{8}" (you can use "\d" instead of "0-9" but I wanted to keep as much of your initial regex as possible).

您可以使用这个网站来测试您的正则表达式.

You can test your regex using this website.

为了解决提取当天日志文件的第二个问题,您应该可以使用上面的正则表达式作为文件过滤器.然后将调度策略"调整为每天运行一次(预计当天写入文件后).最后将Maximum File Age"设置为24h"(根据需要进行调整以确保只有最新的有效).这些配置将导致处理器每天运行一次,仅选取与相应过滤器匹配且不超过一天的文件.

In order to solve the second problem of picking up the current day's log file, you should be able to use the above regex as the File Filter. Then adjust the "Scheduling Strategy" to run once a day (after the file is expected to be written for the day). Lastly set the "Maximum File Age" to "24h" (adjust as necessary to be sure only the latest is valid). These configurations will cause the processor to run once per day, picking up only a file that matches the appropriate filter and is not older than a day old.

这篇关于我的 NiFi GetFile 处理器中文件过滤器属性的正则表达式失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆