在正则表达式和Openrefine中具有/n匹配的文本 [英] Text with /n matching in regex and Openrefine

查看:240
本文介绍了在正则表达式和Openrefine中具有/n匹配的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试过滤开放精简中具有new lines的文本.

I'm trying to filter a text that has new lines in open refine.

输入为:

Them Spanish girls love me like I'm Aventura
I'm the man, y'all don't get it, do ya?
Type of money, everybody acting like they knew ya
Go Uptown, New York City, bitch
Them Spanish girls love me like I'm Aventura
Tell Uncle Luke I'm out in Miami, too
Them Spanish girls love me like I'm Aventura

预期结果将是:

Type of money, everybody acting like they knew ya
Go Uptown, New York City, bitch
Them Spanish girls love me like I'm Aventura

我正在尝试获取包含关键字的行以及之前和之后的行.

I'm trying to get the line with the keyword and the lines before and after.

我使用标准正则表达式执行的代码如下:

My code to do it with standard regex looks like that:

/((.*\n){2})^.*\b(New York)\b.*((.*\n){3})/m

但这在公开优化中不起作用. 我尝试了以下操作,但只返回'null'

But that doesn't work in open refine. I tried the following, but it only returns 'null'

value.match(/.*(\New York)/.*)

有人知道我该怎么做吗? 我真的需要保持台词,所以我不能 replace(/\n/,'')在比赛之前.

Any one has an idea how I could do it? I really need to keep the lines, so I cant do a replace(/\n/,'') before the match.

推荐答案

全新的 find()函数match()更用户友好.

The brand new OpenRefine 3 has a find() function much more user friendly than match().

我认为此正则表达式可以解决问题:

I think this regex should do the trick :

value.find(/(.*\n){1}.+New York.+(\n.*){1}/).join('\n')

结果:

如果出于某种原因您更喜欢使用OpenRefine 2.8,则Python/Jython提供了另一种选择:

If for some reason you prefer to stay in OpenRefine 2.8, Python/Jython offers an alternative:

import re
matches = re.findall(r".+?\n.+New York.+\n.+", value)
return "\n".join(matches)

结果:

这篇关于在正则表达式和Openrefine中具有/n匹配的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆