RegEx从Google表格单元格中的字符串获取日期格式的最后一个匹配项 [英] RegEx get last match of a date format from string inside a Google Sheets cell

查看:55
本文介绍了RegEx从Google表格单元格中的字符串获取日期格式的最后一个匹配项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是使用Google表格中的Regex(日期函数:regexextract)提取日期字符串和以下字符,其中字符串是单元格的最后一行,并以日期格式"yyyy-DD-MM"开头,后跟:".因此,我目前拥有的RegExpression看起来像: \ d {4}-\ d {2}-\ d {2}:.+

这可以正常工作,但它返回第一个匹配项.相反,我想从单元格的末尾开始,并在有多个日期字符串时提取最后一个匹配项.这是因为内容按日期升序存储在单元格中.

示例单元格

  2020-05-20:状态更新等等2020-05-27:PO发行等等 

请求的结果:我希望最终结果是一个以日期开头的字符串,以及最后一个结果为"2020-05-27:PO Issued blah blah"的字符.但是,我总是得到上面示例中的第一个匹配项:"2020-05-20:状态更新等等"

我也在Google表格中使用regexextract()来做到这一点,它不会在regex上有所作为,而只想提及它..

我发现Sheets使用的是RE2,所以我认为确实有所作为.

解决方案

您可以使用

  = REGEXEXTRACT(A1,(?m)^ \ d {4}-\ d {2}-\ d {2}:.* \ z") 

请参见

(?m)^ \ d {4}-\ d {2}-\ d {2}:.* \ z 正则表达式匹配

  • (?m)-一种MULTILINE修饰符,它使 ^ 匹配行的开头,而 $ 匹配行的结尾
  • ^ -一行的开头
  • \ d {4}-\ d {2}-\ d {2}:.* -4位数字,-,2位数字,-,两位数,:,然后是的其余行.默认情况下不匹配换行符
  • \ z -字符串的末尾(不受(?m)修饰符的影响).

请注意(?s).* \ n(\ d {4}-\ d {2}-\ d {2}:.*) \d{4}-\d{2}-\d{2}:.+

This works fine but it returns the first match. Instead I want to start at the end of the cell and extract the last match when there are multiple date strings. This is because the contents are stored ascending by date inside the cell.

Sample cell:

2020-05-20: Status update blah blah
2020-05-27: PO Issued blah blah

Result requested: I want the end result to be a string starting with date and the characters that follow "2020-05-27: PO Issued blah blah" which is the last result. However I always get the first match which in the example above is: "2020-05-20: Status update blah blah"

Also I'm doing this in google sheets using regexextract() which shouldn't make a difference in the regex but just wanted to mention it.

Edit: I found out Sheets is using RE2 so I guess it did make a difference.

解决方案

You may use

=REGEXEXTRACT(A1, "(?m)^\d{4}-\d{2}-\d{2}:.*\z")

See the RE2 regex demo and the Google Sheets screenshot:

The (?m)^\d{4}-\d{2}-\d{2}:.*\z regex matches

  • (?m) - a MULTILINE modifier that makes ^ match start of a line and $ match end of a line
  • ^ - start of a line
  • \d{4}-\d{2}-\d{2}:.* - 4 digits, -, 2 digit, -, 2 digits, : and then rest of the line since . does not match line break chars by default
  • \z - the very end of the string (it is not affected by the (?m) modifier).

Note that (?s).*\n(\d{4}-\d{2}-\d{2}:.*) I suggested in the top comment below the question will match the last lines starting with a date, see a regex demo.

这篇关于RegEx从Google表格单元格中的字符串获取日期格式的最后一个匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆