不符合年份模式的抓取日期YYYY-01-01 [英] Grepping dates that do not satisfy year-pattern YYYY-01-01

查看:70
本文介绍了不符合年份模式的抓取日期YYYY-01-01的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要匿名化元数据文件中的出生日期,并编辑月和日字段,例如,我需要将1976-05-25转换为1976-01-01.为了备份,我首先需要测试文件是否包含未编辑的生日.我通常使用grep进行这些测试,像这样

if grep -E PATTERN $file > /dev/null; then cp $file /backups/; fi 

但是,我很难找到一种美观而优雅的方式来完成此任务.我已经尝试过

grep -E '([12][09][0-9][0-9])-(^(01))-(^(01))'

,但不接受例如2001-10-11或任何其他日期.

我当然也可以按照以下方式做一些事情

([12][09][0-9][0-9]-0[0-9]-0[^1]|[12][09][0-9][0-9]-0[0-9]-1[0-9]|...)

但这太复杂且容易出错.

当然,我不希望它接受格式为YYYY-01-01的日期以避免重复备份.

什么是在单个模式中重复显示这些日期的简单方法(读作:优雅)?

解决方案

好吧,无论内容如何,​​我都可能会对其进行备份,但这是因为我有更多的磁盘空间来担心这样的事情:-)

但是,一种方法可能是相反的看待它.计算整个文件中的行,然后使用-01-01计算仅包含模式的行.

如果它们相同,则所有日期都是-01-01品种,不需要备份.

请注意,您需要注意每行是否有多个日期,但是在这种情况下,您可以使用其他过滤器来获取您感兴趣的数据.

例如,考虑文件infile:

2009-01-01 A very good year
2010-02-01 A moderately good year
2011-01-01 A better year
2012-12-31 Not so good
2013-01-01 Back to normal

您可以在所需格式的行的开头检测日期并将其计数,然后将其与完整文件进行比较:

if [[ $(wc -l <infile) -ne $(grep -E '^[0-9]{4}-01-01' infile | wc -l) ]]
then
    echo File needs backing up
fi


另一种可能性是使用-v选项排除 01-01模式:

pax> grep -Ev '[0-9]{4}-01-01' infile
2010-02-01 A moderately good year
2012-12-31 Not so good

这相对容易从if语句中检测出来:

if [[ ! -z "$(grep -Ev '^[0-9]{4}-01-01' infile)" ]] ; then
    echo File needs backing up
fi

I need to anonymize birth dates in metadata files and redact the month and day fields, e.g., I need to convert 1976-05-25 into 1976-01-01. For backup purposes, I first need to test whether a file contains a non-redacted birth date. I ususally use grep for these tests, like this

if grep -E PATTERN $file > /dev/null; then cp $file /backups/; fi 

However, I struggle to find a nice and elegant pattern for this task. I've tried

grep -E '([12][09][0-9][0-9])-(^(01))-(^(01))'

but it does not accept, e.g., 2001-10-11 or any other date.

I could of course also do something along the lines of

([12][09][0-9][0-9]-0[0-9]-0[^1]|[12][09][0-9][0-9]-0[0-9]-1[0-9]|...)

but this is too complicated and error prone.

Of course, I do not want it to accept dates of the form YYYY-01-01 to avoid a double-backup.

What is a simple (read: elegant) way to grep these dates in a single pattern?

解决方案

Well, I would probably just back it up regardless of content but that's because I have more disk space than time to worry about things like this :-)

However, one approach could be to look at it in reverse. Count the lines in the full file then count the lines containing just the pattern with -01-01.

If they're the same then all the dates are of the -01-01 variety and no backup is needed.

Just be aware you need to watch out if there are multiple dates per line but, in that case, you could use other filters to get just the data you're interested in.

As an example, consider the file infile:

2009-01-01 A very good year
2010-02-01 A moderately good year
2011-01-01 A better year
2012-12-31 Not so good
2013-01-01 Back to normal

You can detect dates at the start of the line of the format you want and count them, comparing that to the full file:

if [[ $(wc -l <infile) -ne $(grep -E '^[0-9]{4}-01-01' infile | wc -l) ]]
then
    echo File needs backing up
fi


One other possibility would be to exclude the 01-01 patterns with the -v option:

pax> grep -Ev '[0-9]{4}-01-01' infile
2010-02-01 A moderately good year
2012-12-31 Not so good

This is relatively easy to detect from an if statement:

if [[ ! -z "$(grep -Ev '^[0-9]{4}-01-01' infile)" ]] ; then
    echo File needs backing up
fi

这篇关于不符合年份模式的抓取日期YYYY-01-01的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆