cpython的字符串实习的规则是什么? [英] What are the rules for cpython's string interning?

查看:92
本文介绍了cpython的字符串实习的规则是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在python 3.5中,是否可以预测什么时候我们将得到一个实习字符串或什么时候我们将得到一个副本?在阅读了有关此问题的一些堆栈溢出答案后,我发现这一个最有用但仍不全面。比我看过 Python文档,但是不能保证实习默认情况下

In python 3.5, is it possible to predict when we will get an interned string or when we will get a copy? After reading a few Stack Overflow answers on this issue I've found this one the most helpful but still not comprehensive. Than I looked at Python docs, but the interning is not guaranteed by default


通常 ,Python程序中使用的名称会自动被屏蔽,

Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

所以,我的问题是关于内部 intern()条件,即决策(无论是否为内部字符串文字):为什么同一段代码可以在一个系统上运行而不能在另一个系统上运行,以及该程序的作者制定了哪些规则在提到的主题上回答表示

So, my question is about inner intern() conditions, i.e. decision-making (whether to intern string literal or not): why the same piece of code works on one system and not on another one and what rules did author of the answer on mentioned topic mean when saying


发生这种情况的规则非常复杂

the rules for when this happens are quite convoluted


推荐答案

您认为存在规则

唯一的实习规则是 intern的返回值已被拘留。其他一切都取决于谁决定某个代码段应该或不应该进行实习。例如, left PyCodeNew

The only rule for interning is that the return value of intern is interned. Everything else is up to the whims of whoever decided some piece of code should or shouldn't do interning. For example, "left" gets interned by PyCodeNew:

/* Intern selected string constants */
for (i = PyTuple_GET_SIZE(consts); --i >= 0; ) {
    PyObject *v = PyTuple_GetItem(consts, i);
    if (!all_name_chars(v))
        continue;
    PyUnicode_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}

此处的规则是<$ c $中的字符串对象如果Python代码对象的c> co_consts 仅由Python标识符中合法的ASCII字符组成,则会被屏蔽。 被拘留,但 as,df 不会,而即使标识符不能以数字开头,也会插入 1234 。尽管标识符可以包含非ASCII字符,但此检查仍会拒绝此类字符。 实际标识符永远不会通过此代码;他们无条件地被禁闭了几行,无论是否为ASCII。这段代码可能会发生变化,还有很多其他代码可以进行内部或类似内部的事情。

The "rule" here is that a string object in the co_consts of a Python code object gets interned if it consists purely of ASCII characters that are legal in a Python identifier. "left" gets interned, but "as,df" wouldn't be, and "1234" would be interned even though an identifier can't start with a digit. While identifiers can contain non-ASCII characters, such characters are still rejected by this check. Actual identifiers don't ever pass through this code; they get unconditionally interned a few lines up, ASCII or not. This code is subject to change, and there's plenty of other code that does interning or interning-like things.

向我们索要字符串实习的规则就像问气象学家,婚礼上是否下雨有什么规则。我们可以告诉您很多有关其工作原理的信息,但对您而言并没有太大用处,您总会感到惊讶。

Asking us for the "rules" for string interning is like asking a meteorologist what the rules are for whether it rains on your wedding. We can tell you quite a lot about how it works, but it won't be much use to you, and you'll always get surprises.

这篇关于cpython的字符串实习的规则是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆