使用 ^ 匹配 Python 正则表达式中的行首 [英] Using ^ to match beginning of line in Python regex
问题描述
我正在尝试从 Thomson-Reuters Web of Science 中提取出版年份 ISI 样式的数据.出版年"这一行看起来像这样(在一行的开头):
I'm trying to extract publication years ISI-style data from the Thomson-Reuters Web of Science. The line for "Publication Year" looks like this (at the very beginning of a line):
PY 2015
对于我正在编写的脚本,我定义了以下正则表达式函数:
For the script I'm writing I have defined the following regex function:
import re
f = open('savedrecs.txt')
wosrecords = f.read()
def findyears():
result = re.findall(r'PY (\d\d\d\d)', wosrecords)
print result
findyears()
然而,这会产生假阳性结果,因为该模式可能出现在数据的其他地方.
This, however, gives false positive results because the pattern may appear elsewhere in the data.
所以,我只想匹配一行开头的模式.通常我会为此使用 ^
,但是 r'^PY (\d\d\d\d)'
无法匹配我的结果.另一方面,使用 \n
似乎可以做我想做的事,但这可能会给我带来更多的麻烦.
So, I want to only match the pattern at the beginning of a line. Normally I would use ^
for this purpose, but r'^PY (\d\d\d\d)'
fails at matching my results. On the other hand, using \n
seems to do what I want, but that might lead to further complications for me.
推荐答案
re.findall(r'^PY (\d\d\d\d)', wosrecords, flags=re.MULTILINE)
应该可以
这篇关于使用 ^ 匹配 Python 正则表达式中的行首的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!