可以在 Python 中制作自定义字符串文字前缀吗? [英] Possible to make custom string literal prefixes in Python?

查看:57
本文介绍了可以在 Python 中制作自定义字符串文字前缀吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个从 str 派生的自定义类,它实现/覆盖了一些方法:

Let's say I have a custom class derived from str that implements/overrides some methods:

class mystr(str):
    # just an example for a custom method:
    def something(self):
        return "anything"

现在我必须通过在构造函数中传递一个字符串来手动创建 mystr 的实例:

Now currently I have to manually create instances of mystr by passing it a string in the constructor:

ms1 = mystr("my string")

s = "another string"
ms2 = mystr(s)

这还不错,但它导致使用类似于 b'bytes string'r'raw string'<的自定义字符串前缀会很酷的想法/code> 或 u'unicode string'.

This is not too bad, but it lead to the idea that it would be cool to use a custom string prefix similar to b'bytes string' or r'raw string' or u'unicode string'.

是否有可能在 Python 中创建/注册这样的自定义字符串文字前缀,如 m,以便文字 m'my string' 产生一个新的实例mystr?
或者这些前缀是硬编码到 Python 解释器中的吗?

Is it somehow possible in Python to create/register such a custom string literal prefix like m so that a literal m'my string' results in a new instance of mystr?
Or are those prefixes hard-coded into the Python interpreter?

推荐答案

那些前缀在解释器中是硬编码的,你不能注册更多的前缀.

Those prefixes are hardcoded in the interpreter, you can't register more prefixes.

可以做的是使用自定义源编解码器预处理您的 Python 文件.这是一个相当巧妙的技巧,需要您注册自定义编解码器,并了解和应用源代码转换.

What you could do however, is preprocess your Python files, by using a custom source codec. This is a rather neat hack, one that requires you to register a custom codec, and to understand and apply source code transformations.

Python 允许您使用顶部的特殊注释指定源代码的编码:

Python allows you to specify the encoding of source code with a special comment at the top:

# coding: utf-8

会告诉 Python 源代码使用 UTF-8 编码,并在解析之前相应地解码文件.Python 在 codecs 模块注册表中查找编解码器.而且您可以注册自己的编解码器.

would tell Python that the source code encoded with UTF-8, and will decode the file accordingly before parsing. Python looks up the codec for this in the codecs module registry. And you can register your own codecs.

pyxl 项目 使用这个技巧从 Python 文件中解析出 HTML 语法,并用实际的 Python 替换它们构建该 HTML 的语法,所有这些都在解码"步骤中.请参阅该项目中的 codec,其中 register 模块注册一个自定义codec搜索功能在 Python 实际解析和编译它之前转换源代码.自定义 .pth 文件 安装到您的 site-packages 目录在 Python 启动时加载此注册步骤.另一个解析 Ruby 样式字符串格式的项目是 interpy.

The pyxl project uses this trick to parse out HTML syntax from Python files to replace them with actual Python syntax to build that HTML, all in a 'decoding' step. See the codec package in that project, where the register module registers a custom codec search function that transforms source code before Python actually parses and compiles it. A custom .pth file is installed into your site-packages directory to load this registration step at Python startup time. Another project that does the same thing to parse out Ruby-style string formatting is interpy.

然后你所要做的就是构建这样一个编解码器来解析 Python 源文件(标记它,也许使用 tokenize 模块) 并使用 mystr() 调用将字符串文字替换为您的自定义前缀.您要解析的任何文件都用 # coding: yourcustomcodec 标记.

All you have to do then, is build such a codec too that'll parse a Python source file (tokenizes it, perhaps with the tokenize module) and replaces string literals with your custom prefix with mystr(<string literal>) calls. Any file you want parsed you mark with # coding: yourcustomcodec.

我将把那部分留给读者作为练习.祝你好运!

I'll leave that part as an exercise for the reader. Good luck!

注意,这个转换的结果然后被编译成字节码,被缓存;您的转换只需在每个源代码修订版一次运行,使用您的编解码器的模块的所有其他导入都将加载缓存的字节码.

Note that the result of this transformation is then compiled into bytecode, which is cached; your transformation only has to run once per source code revision, all other imports of a module using your codec will load the cached bytecode.

这篇关于可以在 Python 中制作自定义字符串文字前缀吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆