使用python分隔符分隔字符串,同时忽略分隔符并在引号内转义引号 [英] Using python to split a string with delimiter, while ignoring the delimiter and escape quotes inside quotes

查看:418
本文介绍了使用python分隔符分隔字符串,同时忽略分隔符并在引号内转义引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图根据定界符的位置分割字符串(我试图从Fortran代码中删除注释)。我可以在以下字符串中使用进行拆分:

I am trying to split a string based on the location of a delimiter (I am trying to remove comments from Fortran code). I can split using ! in the following string:

x = '''print "hi!" ! Remove me'''
pattern = '''(?:[^!"]|"[^"]*")+'''
y = re.search(pattern, x)

但是,如果字符串包含转义引号,则失败,例如

However, this fails if the string contains escape quotes, e.g.

z = '''print "h\"i!" ! Remove me'''

可以修改正则表达式以处理转义报价吗?还是我甚至不应该使用正则表达式来解决此类问题?

Can the regex be modified to handle escape quotes? Or should I not even be using regexps for this sort of problem?

推荐答案

这是一个经过验证的正则表达式(来自掌握正则表达式)匹配可能包含反斜杠转义引号的双引号字符串文字:

Here's a proven regex (from Mastering Regular Expressions) for matching double-quoted string literals which may contain backslash-escaped quotes:

r'"[^"\\]*(?:\\.[^"\\]*)*"'

在分隔引号内,它使用以反斜杠开头的任何一对字符,而不必费心识别第二个字符;这使它可以轻松处理转义的反斜杠和其他转义序列。在没有拥有数量词原子组,Python不支持。

Within the delimiting quotes, it consumes any pair of characters that starts with a backslash without bothering to identify the second character; that allows it to handle escaped backslashes and other escape sequences with no extra hassle. It's also as efficient as can be in the absence of possessive quantifiers and atomic groups, which aren't supported by Python.

您的应用程序的完整正则表达式为:

The full regex for your application would be:

r'^((?:[^!"]+|"[^"\\]*(?:\\.[^"\\]*)*")*)!.*$'

这仅匹配包含注释的 行,并捕获注释之前的所有内容在#1组中。对于以开始的行,捕获可能为零长度。此正则表达式旨在与<$ c一起使用$ c> sub 而不是 search ,如下所示:

This matches only lines that contain comments, and captures everything preceding the comment in group #1. The capture may be zero-length, for lines that start with !. This regex is intended for use with sub rather than search, as shown here:

import re

pattern = r'^((?:[^!"]+|"[^"\\]*(?:\\.[^"\\]*)*")*)!.*$'

x = '''print "hi!" ! Remove me'''
y = re.sub(pattern, r'\1', x)
print(y)

在ideone.com上进行操作

See it in action on ideone.com

免责声明:此答案与FORTRAN无关,仅与遵循问题中指定规则的代码有关。我从来没有使用过FORTRAN,最近一个小时左右发现的所有参考文献似乎都在描述一种完全不同的语言。 h!

DISCLAIMER: This answer is not about FORTRAN, only about code that follows the rules specified in the question. I've never worked with FORTRAN, and every reference I've found in the last hour or so seems to describe a completely different language. Meh!

这篇关于使用python分隔符分隔字符串,同时忽略分隔符并在引号内转义引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆