删除不在引号内的散列注释 [英] Removing hash comments that are not inside quotes

查看:128
本文介绍了删除不在引号内的散列注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用python来浏览文件并删除任何注释。注释被定义为哈希和其右边的任何内容,只要哈希不在双引号内即可。我目前有一个解决方案,但似乎次优:

I am using python to go through a file and remove any comments. A comment is defined as a hash and anything to the right of it as long as the hash isn't inside double quotes. I currently have a solution, but it seems sub-optimal:

filelines = []
    r = re.compile('(".*?")')
    for line in f:
        m = r.split(line)
        nline = ''
        for token in m:
            if token.find('#') != -1 and token[0] != '"':
                nline += token[:token.find('#')]
                break
            else:
                nline += token
        filelines.append(nline)

有没有办法找到第一个哈希不在引号内没有for循环(即通过正则表达式?)

Is there a way to find the first hash not within quotes without for loops (i.e. through regular expressions?)

示例:

' "Phone #":"555-1234" ' -> ' "Phone #":"555-1234" '
' "Phone "#:"555-1234" ' -> ' "Phone "'
'#"Phone #":"555-1234" ' -> ''
' "Phone #":"555-1234" #Comment' -> ' "Phone #":"555-1234" '

编辑:这里是一个由user2357112创建的纯regex解决方案。我测试它,它工作伟大:


Here is a pure regex solution created by user2357112. I tested it, and it works great:

filelines = []
r = re.compile('(?:"[^"]*"|[^"#])*(#)')
for line in f:
    m = r.match(line)
    if m != None:
        filelines.append(line[:m.start(1)])
    else:
        filelines.append(line)

有关如何使用正则表达式的更多细节,请参阅他的回复。

See his reply for more details on how this regex works.

Edit2:这里有一个版本的user2357112的代码我修改为考虑转义字符(\)。此代码还通过包括字符串($)结尾的检查来消除if:

Here's a version of user2357112's code that I modified to account for escape characters (\"). This code also eliminates the 'if' by including a check for end of string ($):

filelines = []
r = re.compile(r'(?:"(?:[^"\\]|\\.)*"|[^"#])*(#|$)')
for line in f:
    m = r.match(line)
    filelines.append(line[:m.start(1)])


推荐答案

r'''(?:        # Non-capturing group
      "[^"]*"  # A quote, followed by not-quotes, followed by a quote
      |        # or
      [^"#]    # not a quote or a hash
    )          # end group
    *          # Match quoted strings and not-quote-not-hash characters until...
    (#)        # the comment begins!
'''

这是一个冗长的正则表达式,因此请务必使用 re.VERBOSE 标志,并一次输入一行。它将捕获第一个未引用的散列作为组1(如果有),因此您可以使用 match.start(1)获取索引。它不处理反斜杠转义,如果你想能够在字符串中放置一个反斜杠转义的引号。这是未经测试的。

This is a verbose regex, designed to operate on a single line, so make sure to use the re.VERBOSE flag and feed it one line at a time. It'll capture the first unquoted hash as group 1 if there is one, so you can use match.start(1) to get the index. It doesn't handle backslash escapes, if you want to be able to put a backslash-escaped quote in a string. This is untested.

这篇关于删除不在引号内的散列注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆