Python中unicode字符串的显示宽度 [英] Display width of unicode strings in Python
问题描述
如何确定 Python 3.x 中 Unicode 字符串的显示宽度,有没有办法使用该信息将这些字符串与 str.format()
对齐?
激励示例: 将字符串表打印到控制台.一些字符串包含非 ASCII 字符.
<预><代码>>>>对于 d.keys() 中的标题:>>>打印("{:<20} | {}".format(title, d[title]))动物园- |动物园.zootekni- |佐泰克.动物园|动物园.zooveterinar- |动物园.zoovetinstitut- |动物园妈妈 |母母>>>s = 'è'>>>镜片)2>>>[ord(c) for c in s][101, 768]>>>unicodedata.name(s[1])'结合严重的口音'>>>s2 = '母'>>>镜头(s2)1可以看出,str.format()
只是将字符串中的代码点数 (len(s)
) 作为其宽度,导致输出中的倾斜列.搜索 unicodedata
模块,我没有找到任何建议解决方案的内容.
Unicode 规范化 可以解决 è 的问题,但不能解决通常具有较大显示宽度的亚洲字符的问题.类似地,存在零宽度 unicode 字符(例如,允许单词内的换行符的零宽度空间).您无法通过规范化来解决这些问题,因此请不要建议规范化您的字符串".
添加了有关规范化的信息.
编辑 2:在我的原始数据集中也有一些欧洲组合字符,即使在标准化后也不会产生单个代码点:
zwemwater |茨威姆.zwia̢z- |zw.>>>s3 = 'a\u0322' # 来自 zwiaz 的 'a + 下面结合卷曲钩'>>>len(unicodedata.normalize('NFC', s3))2
您有几个选择:
某些控制台支持转义序列以精确定位光标.不过可能会造成一些叠印.
历史记录:Amiga 终端使用这种方法在控制台窗口中显示图像,方法是打印一行文本,然后将光标向下移动一个像素.文本行的剩余像素慢慢构建了一个图像.
在您的代码中创建一个表格,其中包含控制台/终端窗口中使用的字体中所有 Unicode 字符的实际(像素)宽度.使用一个 UI 框架和一个小的 Python 脚本来生成这个表.
然后添加使用此表计算文本实际宽度的代码.但是,结果可能不是控制台中字符宽度的倍数.结合像素精确的光标移动,这可能会解决您的问题.
注意:您必须为连字(fi、fl)和添加特殊处理复合物.或者,您可以在不打开窗口的情况下加载 UI 框架并使用图形基元来计算字符串宽度.
使用制表符 (
\t
) 进行缩进.但这只有在您的 shell 实际使用实际文本宽度来放置光标时才会有所帮助.许多终端只会计算字符数.创建一个带有表格的 HTML 文件并在浏览器中查看.
How can I determine the display width of a Unicode string in Python 3.x, and is there a way to use that information to align those strings with str.format()
?
Motivating example: Printing a table of strings to the console. Some of the strings contain non-ASCII characters.
>>> for title in d.keys():
>>> print("{:<20} | {}".format(title, d[title]))
zootehni- | zooteh.
zootekni- | zootek.
zoothèque | zooth.
zooveterinar- | zoovet.
zoovetinstitut- | zoovetinst.
母 | 母母
>>> s = 'è'
>>> len(s)
2
>>> [ord(c) for c in s]
[101, 768]
>>> unicodedata.name(s[1])
'COMBINING GRAVE ACCENT'
>>> s2 = '母'
>>> len(s2)
1
As can be seen, str.format()
simply takes the number of code-points in the string (len(s)
) as its width, leading to skewed columns in the output. Searching through the unicodedata
module, I have not found anything suggesting a solution.
Unicode normalization can fix the problem for è, but not for Asian characters, which often have larger display width. Similarly, zero-width unicode characters exist (e.g. zero-width space for allowing line breaks within words). You can't work around these issues with normalization, so please do not suggest "normalize your strings".
Edit: Added info about normalization.
Edit 2: In my original dataset also have some European combining characters that don't result in a single code-point even after normalization:
zwemwater | zwemw.
zwia̢z- | zw.
>>> s3 = 'a\u0322' # The 'a + combining retroflex hook below' from zwiaz
>>> len(unicodedata.normalize('NFC', s3))
2
You have several options:
Some consoles support escape sequences for pixel-exact positioning of the cursor. Might cause some overprinting, though.
Historical note: This approach was used in the Amiga terminal to display images in a console window by printing a line of text and then advancing the cursor down by one pixel. The leftover pixels of the text line slowly built an image.
Create a table in your code which contains the real (pixel) widths of all Unicode characters in the font that is used in the console / terminal window. Use a UI framework and a small Python script to generate this table.
Then add code which calculates the real width of the text using this table. The result might not be a multiple of the character width in the console, though. Together with pixel-exact cursor movement, this might solve your issue.
Note: You'll have to add special handling for ligatures (fi, fl) and composites. Alternatively, you can load a UI framework without opening a window and use the graphics primitives to calculate the string widths.
Use the tab character (
\t
) to indent. But that will only help if your shell actually uses the real text width to place the cursor. Many terminals will simply count characters.Create a HTML file with a table and look at it in a browser.
这篇关于Python中unicode字符串的显示宽度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!