使用 ^ 匹配 Python 正则表达式中的行首 [英] Using ^ to match beginning of line in Python regex

查看:75
本文介绍了使用 ^ 匹配 Python 正则表达式中的行首的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 Thomson-Reuters Web of Science 中提取出版年份 ISI 样式的数据.出版年"这一行看起来像这样(在一行的开头):

I'm trying to extract publication years ISI-style data from the Thomson-Reuters Web of Science. The line for "Publication Year" looks like this (at the very beginning of a line):

PY 2015

对于我正在编写的脚本,我定义了以下正则表达式函数:

For the script I'm writing I have defined the following regex function:

import re
f = open('savedrecs.txt')
wosrecords = f.read()

def findyears():
    result = re.findall(r'PY (\d\d\d\d)', wosrecords)
    print result

findyears()

然而,这会产生假阳性结果,因为该模式可能出现在数据的其他地方.

This, however, gives false positive results because the pattern may appear elsewhere in the data.

所以,我只想匹配一行开头的模式.通常我会为此使用 ^,但是 r'^PY (\d\d\d\d)' 无法匹配我的结果.另一方面,使用 \n 似乎可以做我想做的事,但这可能会给我带来更多的麻烦.

So, I want to only match the pattern at the beginning of a line. Normally I would use ^ for this purpose, but r'^PY (\d\d\d\d)' fails at matching my results. On the other hand, using \n seems to do what I want, but that might lead to further complications for me.

推荐答案

re.findall(r'^PY (\d\d\d\d)', wosrecords, flags=re.MULTILINE)

应该可以

这篇关于使用 ^ 匹配 Python 正则表达式中的行首的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆