如何使用结合了regex&的Python在文本文件中搜索模式字符串/文件操作并存储模式实例? [英] How do I search for a pattern within a text file using Python combining regex & string/file operations and store instances of the pattern?

查看:89
本文介绍了如何使用结合了regex&的Python在文本文件中搜索模式字符串/文件操作并存储模式实例?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,基本上,我正在文本文件内的两个尖括号内寻找4位代码.我知道我需要打开文本文件,然后逐行解析,但是我不确定在检查文件中的行"之后构造代码的最佳方法.

So essentially I'm looking for specifically a 4 digit code within two angle brackets within a text file. I know that I need to open the text file and then parse line by line, but I am not sure the best way to go about structuring my code after checking "for line in file".

我认为我可以以某种方式拆分,剥离或分区,但是我还编写了一个正则表达式,并在其上进行了编译,因此如果返回匹配对象,我认为我不能将其与基于字符串的对象一起使用操作.另外我不确定我的正则表达式是否足够贪婪...

I think I can either somehow split it, strip it, or partition, but I also wrote a regex which I used compile on and so if that returns a match object I don't think I can use that with those string based operations. Also I'm not sure whether my regex is greedy enough or not...

我想将所有找到的匹配的所有实例存储为元组或列表中的字符串.

I'd like to store all instances of those found hits as strings within either a tuple or a list.

这是我的正则表达式:

regex = re.compile("(<(\d{4,5})>)?")

考虑到目前为止相当基本,我认为我不需要包含所有代码.

I don't think I need to include all that much code considering its fairly basic so far.

推荐答案

import re
pattern = re.compile("<(\d{4,5})>")

for i, line in enumerate(open('test.txt')):
    for match in re.finditer(pattern, line):
        print 'Found on line %s: %s' % (i+1, match.group())

关于正则表达式的一些注意事项:

A couple of notes about the regex:

  • 如果您不想将数字与尖括号匹配,而只希望数字本身,则不需要结尾的?和外部的(...)
  • 它与尖括号之间的4位或5位数字匹配
  • You don't need the ? at the end and the outer (...) if you don't want to match the number with the angle brackets, but only want the number itself
  • It matches either 4 or 5 digits between the angle brackets

更新:重要的是要了解正则表达式中的 match capture 可能完全不同.我上面的代码段中的正则表达式与带有尖括号的模式匹配,但是我要求仅捕获内部编号,不带有尖括号.

Update: It's important to understand that the match and capture in a regex can be quite different. The regex in my snippet above matches the pattern with angle brackets, but I ask to capture only the internal number, without the angle brackets.

有关python正则表达式的更多信息可以在这里找到: 正则表达式操作

More about regex in python can be found here : Regular Expression HOWTO

这篇关于如何使用结合了regex&amp;的Python在文本文件中搜索模式字符串/文件操作并存储模式实例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆