Python:使用CSV中存储的正则表达式 [英] Python: Using regex stored in CSV
问题描述
我只是测试一个小的python脚本,我将在更大的脚本中使用它.基本上,我试图在CSV文件(其中包含正则表达式)中查找字段,并在正则表达式测试中使用它.原因是(一个非常奇怪的用例的一部分),它将使维护CSV文件(而不是脚本)更加容易.我缺少以下内容吗?...
I am just testing out a small python script of which I will use part in a larger script. Basically I am trying to lookup a field in a CSV file (where it contains a regex), and use this in a regex test. The reason is (part of a very wierd use-case) and will enable easier maintenance of a CSV file instead of the script. Is there something I am missing with the following....
test.csv:
field0,field1,field2
foo,bar,"\d+\.\d+"
bar,foo,"\w+"
test.py(用于测试的额外 print
):
test.py (extra print
's used for testing):
import sys
import re
import csv
input = sys.argv[1]
print input
reader = csv.reader(open('test.csv','rb'), delimiter=',', quotechar="\"")
for row in reader:
print row
value = row[0]
print value
if value in input:
regex = row[2]
print regex
pat = re.compile(regex)
test = re.match(pat,input)
out = test.group(1)
print out
如果我向脚本传递诸如" foo blah 38902462986.328946239846
"之类的值,我希望它能够包含它包含 foo
的值,然后使用正则表达式 \ d + \.\ d +
,以提取 38902462986.328946239846
.但是,当我运行脚本时,得到以下信息:
If I pass a value like "foo blah 38902462986.328946239846
" to the script, I would expect this to pick up that it contains foo
and then use the regex, \d+\.\d+
, to extract 38902462986.328946239846
. However when I run the script I get the following:
foo blah 0920390239.90239029
['field0', 'field1', 'field2']
field0
['foo', 'bar', '\\d+\\.\\d+']
foo
\d+\.\d+
Traceback (most recent call last):
File "reg.py", line 19, in <module>
out = test.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
不确定到底发生了什么.
Not sure what's going on really.
P.S Python是一个广阔的世界,并且 仍然 学习.
P.S Python is a big world and still learning.
推荐答案
根据文档 re.match
在输入字符串的开头进行匹配.您需要使用 re.search
.此外,如果您以后不重用它们,则无需进行编译.只需说 test = re.search(正则表达式,输入)
.
According to the docs re.match
matches at the beginning of the input string. You need to use re.search
. Also, there's no need to compile if you don't reuse them afterwards. Just say test = re.search(regex, input)
.
在示例中的正则表达式中,您没有任何捕获组,因此即使 input
.
In the regular expressions in your example you don't have any capture groups, so test.group(1)
is going to fail, even if there's a match in the input
.
import sys
import re
import csv
input = 'foo blah 38902462986.328946239846'
reader = csv.reader(open('test.csv','rb'), delimiter=',', quotechar="\"")
for row in reader:
value = row[0]
if value in input:
regex = row[2]
test = re.search(regex, input)
print input[test.start():test.end()]
打印:
38902462986.328946239846
这篇关于Python:使用CSV中存储的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!