“你"到底在做什么?和“r"字符串标志可以,什么是原始字符串文字? [英] What exactly do "u" and "r" string flags do, and what are raw string literals?

查看:49
本文介绍了“你"到底在做什么?和“r"字符串标志可以,什么是原始字符串文字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

同时询问 这个问题,我意识到我对原始字符串了解不多.对于自称是 Django 培训师的人来说,这很糟糕.

我知道什么是编码,而且我知道 u'' 单独做了什么,因为我知道什么是 Unicode.

  • 但是 r'' 究竟做了什么?它产生什么样的字符串?

  • 最重要的是,ur'' 到底做了什么?

  • 最后,有没有什么可靠的方法可以从 Unicode 字符串返回到简单的原始字符串?

  • 啊,顺便说一句,如果您的系统和文本编辑器字符集设置为 UTF-8,u'' 真的可以做任何事情吗?

解决方案

实际上并没有任何原始的字符串";有原始的字符串文字,它们正是在开头引号前由 'r' 标记的字符串文字.

原始字符串文字"与字符串文字的语法略有不同,其中反斜杠 \ 被视为只是一个反斜杠"(除非它正好位于引用否则会终止文字)——没有转义序列"来表示换行符、制表符、退格符、换页符等.在正常的字符串文字中,每个反斜杠都必须加倍以避免被视为转义序列的开始.

这种语法变体的存在主要是因为正则表达式模式的语法带有大量反斜杠(但永远不会在最后,所以上面的except"子句无关紧要)并且当您避免将每个反斜杠加倍时看起来会好一些其中——仅此而已.表达本机 Windows 文件路径(使用反斜杠而不是其他平台上的常规斜杠)也很受欢迎,但这很少需要(因为正常的斜杠在 Windows 上大多也能正常工作)并且不完美(由于except"子句)以上).

r'...' 是一个字节字符串(在 Python 2.* 中),ur'...' 是一个 Unicode 字符串(同样,在 Python 中2.*),其他三种引用中的任何一种也会产生完全相同类型的字符串(例如 r'...', r'''...''', r"...", r"""...""" 都是字节串,依此类推).

不知道going back"是什么意思 - 本质上没有前后方向,因为没有原始字符串 type,它只是一种替代语法表达完全正常的字符串对象,字节或 Unicode.

是的,在 Python 2.* 中,u'...' is 当然总是不同于 '...' -- 前者是unicode字符串,后者是byte字符串.文字可以用什么编码表示是一个完全正交的问题.

例如,考虑(Python 2.6):

<预><代码>>>>sys.getsizeof('ciao')28>>>sys.getsizeof(u'ciao')34

Unicode 对象当然需要更多的内存空间(对于很短的字符串来说差别很小,显然 ;-).

While asking this question, I realized I didn't know much about raw strings. For somebody claiming to be a Django trainer, this sucks.

I know what an encoding is, and I know what u'' alone does since I get what is Unicode.

  • But what does r'' do exactly? What kind of string does it result in?

  • And above all, what the heck does ur'' do?

  • Finally, is there any reliable way to go back from a Unicode string to a simple raw string?

  • Ah, and by the way, if your system and your text editor charset are set to UTF-8, does u'' actually do anything?

解决方案

There's not really any "raw string"; there are raw string literals, which are exactly the string literals marked by an 'r' before the opening quote.

A "raw string literal" is a slightly different syntax for a string literal, in which a backslash, \, is taken as meaning "just a backslash" (except when it comes right before a quote that would otherwise terminate the literal) -- no "escape sequences" to represent newlines, tabs, backspaces, form-feeds, and so on. In normal string literals, each backslash must be doubled up to avoid being taken as the start of an escape sequence.

This syntax variant exists mostly because the syntax of regular expression patterns is heavy with backslashes (but never at the end, so the "except" clause above doesn't matter) and it looks a bit better when you avoid doubling up each of them -- that's all. It also gained some popularity to express native Windows file paths (with backslashes instead of regular slashes like on other platforms), but that's very rarely needed (since normal slashes mostly work fine on Windows too) and imperfect (due to the "except" clause above).

r'...' is a byte string (in Python 2.*), ur'...' is a Unicode string (again, in Python 2.*), and any of the other three kinds of quoting also produces exactly the same types of strings (so for example r'...', r'''...''', r"...", r"""...""" are all byte strings, and so on).

Not sure what you mean by "going back" - there is no intrinsically back and forward directions, because there's no raw string type, it's just an alternative syntax to express perfectly normal string objects, byte or unicode as they may be.

And yes, in Python 2.*, u'...' is of course always distinct from just '...' -- the former is a unicode string, the latter is a byte string. What encoding the literal might be expressed in is a completely orthogonal issue.

E.g., consider (Python 2.6):

>>> sys.getsizeof('ciao')
28
>>> sys.getsizeof(u'ciao')
34

The Unicode object of course takes more memory space (very small difference for a very short string, obviously ;-).

这篇关于“你"到底在做什么?和“r"字符串标志可以,什么是原始字符串文字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆