如何在 Python 中获取字符串的原始表示? [英] How do I get the raw representation of a string in Python?

查看:42
本文介绍了如何在 Python 中获取字符串的原始表示?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个严重依赖正则表达式的类.

假设我的类是这样的:

class 示例:def __init__(self, regex):self.regex = 正则表达式def __repr__(self):返回 '​​Example({})'.format(repr(self.regex.pattern))

假设我是这样使用它的:

导入重新示例 = 示例(重新编译(r'\d+'))

如果我执行 repr(example),我得到 'Example('\\\\d+')',但我想要 'Example(r'\\d+')'.考虑打印时额外的反斜杠,它显示正确.我想我可以实现它以返回 "r'{}'".format(regex.pattern) ,但这对我来说并不合适.万一 Python 软件基金会有一天会改变指定原始字符串文字的方式,我的代码不会反映这一点.不过,这是假设性的.我主要关心的是这是否总是有效.不过,我想不出一个边缘案例.有没有更正式的方法?

格式规范迷你语言中似乎没有出现任何内容,printf-style 字符串格式指南 或 string 模块.

解决方案

原始字符串表示的问题在于,您无法以可移植(即不使用控制字符)的方式表示所有内容.例如,如果您的字符串中有一个换行符,您必须将字符串从字面上换行到下一行,因为它不能表示为原始字符串.

也就是说,获得原始字符串表示的实际方法是您已经给出的方法:

"r'{}'".format(regex.pattern)

rawstrings 的定义是除了它们以它们开始的引号字符结束并且您可以使用反斜杠转义所述引号字符之外,没有应用任何规则.因此,例如,您不能在原始字符串表示中存储类似于 "\" 的字符串(r"\" 产生 SyntaxError 和 r"\\" 产生 "\\\\").

如果你真的想这样做,你应该使用像这样的包装器:

def rawstr(s):"""返回字符串的原始字符串表示(使用 r'')文字*s* 如果可用.如果遇到任何无效字符(或不能表示为 rawstr 的字符串),默认的 repr() 结果被退回."""如果有的话(0 <= ord(ch) <32 for ch in s):退货代表如果 (len(s) - len(s.rstrip("\\"))) % 2 == 1:退货代表模式 = "r'{0}'"如果 '"' 在 s:如果 "'" 在 s:退货代表elif "'" in s:模式 = 'r"{0}"'返回 pattern.format(s)

测试:

<预><代码>>>>测试1 =\\">>>test2 = "foobar \n">>>test3 = r"a \valid rawstring">>>test4 = "foo \\\\\\">>>test5 = r"foo \\">>>test6 = r"'">>>test7 = r'"'>>>打印(rawstr(test1))'\\'>>>打印(rawstr(test2))'foobar \n'>>>打印(rawstr(test3))r'a \valid rawstring'>>>打印(rawstr(test4))'foo \\\\\\'>>>打印(rawstr(test5))r'foo \\'>>>打印(rawstr(test6))r"'">>>打印(rawstr(test7))r'"'

I am making a class that relies heavily on regular expressions.

Let's say my class looks like this:

class Example:
    def __init__(self, regex):
        self.regex = regex

    def __repr__(self):
        return 'Example({})'.format(repr(self.regex.pattern))

And let's say I use it like this:

import re

example = Example(re.compile(r'\d+'))

If I do repr(example), I get 'Example('\\\\d+')', but I want 'Example(r'\\d+')'. Take into account the extra backslash where that upon printing, it appears correctly. I suppose I could implement it to return "r'{}'".format(regex.pattern), but that doesn't sit well with me. In the unlikely event that the Python Software Foundation someday changes the manner for specifying raw string literals, my code won't reflect that. That's hypothetical, though. My main concern is whether or not this always works. I can't think of an edge case off the top of my head, though. Is there a more formal way of doing this?

EDIT: Nothing seems to appear in the Format Specification Mini-Language, the printf-style String Formatting guide, or the string module.

解决方案

The problem with rawstring representation is, that you cannot represent everything in a portable (i.e. without using control characters) manner. For example, if you had a linebreak in your string, you had to literally break the string to the next line, because it cannot be represented as rawstring.

That said, the actual way to get rawstring representation is what you already gave:

"r'{}'".format(regex.pattern)

The definition of rawstrings is that there are no rules applied except that they end at the quotation character they start with and that you can escape said quotation character using a backslash. Thus, for example, you cannot store the equivalent of a string like "\" in raw string representation (r"\" yields SyntaxError and r"\\" yields "\\\\").

If you really want to do this, you should use a wrapper like:

def rawstr(s):
    """
    Return the raw string representation (using r'') literals of the string
    *s* if it is available. If any invalid characters are encountered (or a
    string which cannot be represented as a rawstr), the default repr() result
    is returned.
    """
    if any(0 <= ord(ch) < 32 for ch in s):
        return repr(s)

    if (len(s) - len(s.rstrip("\\"))) % 2 == 1:
        return repr(s)

    pattern = "r'{0}'"
    if '"' in s:
        if "'" in s:
            return repr(s)
    elif "'" in s:
        pattern = 'r"{0}"'

    return pattern.format(s)

Tests:

>>> test1 = "\\"
>>> test2 = "foobar \n"
>>> test3 = r"a \valid rawstring"
>>> test4 = "foo \\\\\\"
>>> test5 = r"foo \\"
>>> test6 = r"'"
>>> test7 = r'"'
>>> print(rawstr(test1))
'\\'
>>> print(rawstr(test2))
'foobar \n'
>>> print(rawstr(test3))
r'a \valid rawstring'
>>> print(rawstr(test4))
'foo \\\\\\'
>>> print(rawstr(test5))
r'foo \\'
>>> print(rawstr(test6))
r"'"
>>> print(rawstr(test7))
r'"'

这篇关于如何在 Python 中获取字符串的原始表示?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆