Python:使用CSV中存储的正则表达式 [英] Python: Using regex stored in CSV

查看:73
本文介绍了Python:使用CSV中存储的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是测试一个小的python脚本,我将在更大的脚本中使用它.基本上,我试图在CSV文件(其中包含正则表达式)中查找字段,并在正则表达式测试中使用它.原因是(一个非常奇怪的用例的一部分),它将使维护CSV文件(而不是脚本)更加容易.我缺少以下内容吗?...

I am just testing out a small python script of which I will use part in a larger script. Basically I am trying to lookup a field in a CSV file (where it contains a regex), and use this in a regex test. The reason is (part of a very wierd use-case) and will enable easier maintenance of a CSV file instead of the script. Is there something I am missing with the following....

test.csv:

field0,field1,field2
foo,bar,"\d+\.\d+"
bar,foo,"\w+"

test.py(用于测试的额外 print ):

test.py (extra print's used for testing):

import sys
import re
import csv

input = sys.argv[1]
print input

reader = csv.reader(open('test.csv','rb'), delimiter=',', quotechar="\"")
for row in reader:
        print row
        value = row[0]
        print value
        if value in input:
                regex = row[2]
                print regex

                pat = re.compile(regex)
                test = re.match(pat,input)
                out = test.group(1)
                print out

如果我向脚本传递诸如" foo blah 38902462986.328946239846 "之类的值,我希望它能够包含它包含 foo 的值,然后使用正则表达式 \ d + \.\ d + ,以提取 38902462986.328946239846 .但是,当我运行脚本时,得到以下信息:

If I pass a value like "foo blah 38902462986.328946239846" to the script, I would expect this to pick up that it contains foo and then use the regex, \d+\.\d+, to extract 38902462986.328946239846. However when I run the script I get the following:

foo blah 0920390239.90239029
['field0', 'field1', 'field2']
field0
['foo', 'bar', '\\d+\\.\\d+']
foo
\d+\.\d+
Traceback (most recent call last):
  File "reg.py", line 19, in <module>
    out = test.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

不确定到底发生了什么.

Not sure what's going on really.

P.S Python是一个广阔的世界,并且 仍然 学习.

P.S Python is a big world and still learning.

推荐答案

根据文档 re.match 在输入字符串的开头进行匹配.您需要使用 re.search .此外,如果您以后不重用它们,则无需进行编译.只需说 test = re.search(正则表达式,输入).

According to the docs re.match matches at the beginning of the input string. You need to use re.search. Also, there's no need to compile if you don't reuse them afterwards. Just say test = re.search(regex, input).

在示例中的正则表达式中,您没有任何捕获组,因此即使 input .

In the regular expressions in your example you don't have any capture groups, so test.group(1) is going to fail, even if there's a match in the input.

import sys
import re
import csv

input = 'foo blah 38902462986.328946239846'

reader = csv.reader(open('test.csv','rb'), delimiter=',', quotechar="\"")
for row in reader:
    value = row[0]
    if value in input:
        regex = row[2]
        test = re.search(regex, input)
        print input[test.start():test.end()]

打印:

38902462986.328946239846

这篇关于Python:使用CSV中存储的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆