Python正则表达式匹配单引号中的文本，忽略转义引号(和制表符/换行符) [英] Python regex to match text in single quotes, ignoring escaped quotes (and tabs/newlines)

查看：234 发布时间：2021/7/6 19:27:07 python regex

本文介绍了Python正则表达式匹配单引号中的文本，忽略转义引号(和制表符/换行符)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给定一个文本文件，其中我要匹配的字符由单引号分隔，但可能有零个或一个转义单引号，以及零个或多个制表符和换行符(未转义)-我只想匹配文本.示例:

Given a file of text, where the character I want to match are delimited by single-quotes, but might have zero or one escaped single-quote, as well as zero or more tabs and newline characters (not escaped) - I want to match the text only. Example:

menu_item = 'casserole';
menu_item = 'meat 
            loaf';
menu_item = 'Tony\'s magic pizza';
menu_item = 'hamburger';
menu_item = 'Dave\'s famous pizza';
menu_item = 'Dave\'s lesser-known
    gyro';

我只想获取文本(和空格)，忽略制表符/换行符 - 我实际上并不关心转义引号是否出现在结果中，只要它不影响匹配:

I want to grab only the text (and spaces), ignoring the tabs/newlines - and I don't actually care if the escaped quote appears in the results, as long as it doesn't affect the match:

casserole
meat loaf
Tonys magic pizza
hamburger
Daves famous pizza
Dave\'s lesser-known gyro # quote is okay if necessary.

我设法创建了一个几乎的正则表达式 - 它处理转义的引号，但不处理换行符:

I have manage to create a regex that almost does it - it handles the escaped quotes, but not the newlines:

menuPat = r"menu_item = \'(.*)(\\\')?(\t|\n)*(.*)\'"
for line in inFP.readlines():
    m = re.search(menuPat, line)
    if m is not None:
        print m.group()

肯定有大量的正则表达式问题 - 但大多数都在使用 Perl，如果有一个可以做我想要的，我无法弄清楚:) 而且由于我使用的是 Python，我不不在乎它是否分布在多个组中，很容易将它们重新组合.

There are definitely a ton of regular expression questions out there - but most are using Perl, and if there's one that does what I want, I couldn't figure it out :) And since I'm using Python, I don't care if it is spread across multiple groups, it's easy to recombine them.

有些答案说只用代码来解析文本.虽然我确定我可以这样做 - 我非常接近有一个有效的正则表达式:)而且看起来它应该可行.

Some Answers have said to just go with code for parsing the text. While I'm sure I could do that - I'm so close to having a working regex :) And it seems like it should be doable.

更新:我刚刚意识到我正在做一个 Python readlines() 来获取每一行，这显然打破了传递给正则表达式的行.我正在考虑重新编写它，但关于这部分的任何建议也会非常有帮助.

Update: I just realized that I am doing a Python readlines() to get each line, which obviously is breaking up the lines getting passed to the regex. I'm looking at re-writing it, but any suggestions on that part would also be very helpful.

Python正则表达式匹配单引号中的文本，忽略转义引号(和制表符/换行符) [英] Python regex to match text in single quotes, ignoring escaped quotes (and tabs/newlines)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python正则表达式匹配单引号中的文本，忽略转义引号(和制表符/换行符) [英] Python regex to match text in single quotes, ignoring escaped quotes (and tabs/newlines)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭