如何控制包含东亚字符的 Unicode 字符串的填充 [英] How to control padding of Unicode string containing east Asia characters

查看：34 发布时间：2021/8/31 18:36:28 python unicode string-formatting

本文介绍了如何控制包含东亚字符的 Unicode 字符串的填充的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我得到了三个 UTF-8 字符串:

你好，世界你好，世界你好，世rld

我只想要前 10 个 ascii-char-width 以便将括号放在一列中:

[你好，你][你好，世][你好，世r]

在控制台:

width('世界')==width('世界')width('世')==width('wor') #'世'后面有一个空格

一个中文字符是三个字节，但在控制台显示时只有2个ascii字符宽度:

<预><代码>>>>bytes("你好，世界", encoding='utf-8')b'你好，\xe4\xb8\x96\xe7\x95\x8c'

python 的 format() 在 UTF-8 字符混合时没有帮助

<预><代码>>>>for s in ['[{0:<{1}.{1}}]'.format(s, 10) for s in ['hello, world', 'hello, world', 'hello, worldrld']]:... 印刷)...[你好，我][你好，世界][你好，世rl]

它不漂亮:

 -----------歌曲-----------|1:蝴蝶||2:心之城||3: 支持你的爱人 ||4:根生的种子||5:鸽子歌(CUCURRUCUCU PALO||6:林地之间 ||7:显微镜||8: 在你看来 ||9:肖邦离别曲||10:西行(魔戒王者再临主题曲)(INTO ||X 11:深陷爱河||X 12:钟爱大地(THE MO RUN AIR ||X 13:时光流逝||X 14: 卡农 ||X 15:舒伯特小夜曲(SERENADE) ||X 16:甜蜜的摇篮曲(Sweet Lullaby|---------------------------

所以，我想知道是否有标准的方法来做 UTF-8 填充人员?

解决方案

当尝试将 ASCII 文本与固定宽度字体的中文对齐时，有一组可打印 ASCII 字符的全角版本.下面我做了一个ASCII到全角版本的翻译表:

# 编码:utf8# 全角版本(空格与 ! 至 ~ 不连续)空格 = '\N{表意空格}'EXCLA = '\N{全宽感叹号}'波浪号 = '\N{全宽波浪号}'# ASCII 和全角字符的字符串(顺序相同)west = ''.join(chr(i) for i in range(ord(' '),ord('~')))东 = 空格 + ''.join(chr(i) for i in range(ord(EXCLA),ord(TILDE)))# 构建翻译表full = str.maketrans(西，东)数据 = '''\蝴蝶(一首歌)心之城(另一首歌)支持你的爱人(又一首歌)根生的种子鸽子歌林地之间勉强在你看来肖邦离别曲西行(魔戒王者再临主题曲)(Into something)深陷爱河钟爱大地时光流逝卡农舒伯特小夜曲(SERENADE)甜蜜的摇篮曲(Sweet Lullaby)'''# 用全角替换ASCII字符，并创建一个歌曲列表.data = data.translate(full).rstrip().split('\n')# 翻译每个可打印的行.print(' ----------歌曲-----------'.translate(full))对于我，枚举(数据)中的歌曲:line = '|{:4}: {:20.20}|'.format(i+1,song)打印(行.翻译(全))print(' --------------------------'.translate(full))

输出

 －－－－－－－－－－－－Songgs－－－－－－－－－－－－｜ １: 蝴蝶(Ａ songg) ｜｜ ２: 心之城(另外一个歌声) ｜｜ ３: 支持你的爱人(Yett anoother s｜｜ ４: 根生的种子 ｜｜ ５: 鸽子歌(CUCUURRUCUCUCU palo｜｜ ６: 林地 ｜｜ ７: 妈妈 】也而 】】】而 】 了 了 了 了｜ ８: 在你｜ ９: 肖邦离别曲 ｜｜ １０: 西行(魔戒王者再临主题曲)(Ｉｎｔｏ ｓ｜｜ １１: 深陷爱河 ｜｜ １２: 钟大地 ｜｜ １３: 时间流逝 ｜｜ １４: 卡农 ｜｜ １５: 舒伯特小夜曲(ＳＥＲＥＮＡＤＥ) ｜｜ １６: 甜蜜的摇篮曲(Ｓｗｅｔ Ｌｕｌａｂｙ｜－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

它不是太漂亮，但排列整齐.

I got three UTF-8 stings:

hello, world
hello, 世界
hello, 世rld

I only want the first 10 ascii-char-width so that the bracket in one column:

[hello, wor]
[hello, 世 ]
[hello, 世r]

In console:

width('世界')==width('worl')
width('世 ')==width('wor')  #a white space behind '世'

One chinese char is three bytes, but it only 2 ascii chars width when displayed in console:

>>> bytes("hello, 世界", encoding='utf-8')
b'hello, \xe4\xb8\x96\xe7\x95\x8c'

python's format() doesn't help when UTF-8 chars mixed in

>>> for s in ['[{0:<{1}.{1}}]'.format(s, 10) for s in ['hello, world', 'hello, 世界', 'hello, 世rld']]:
...    print(s)
...
[hello, wor]
[hello, 世界 ]
[hello, 世rl]

It's not pretty:

 -----------Songs-----------
|    1: 蝴蝶                  |
|    2: 心之城                 |
|    3: 支持你的爱人              |
|    4: 根生的种子               |
|    5: 鸽子歌(CUCURRUCUCU PALO|
|    6: 林地之间                |
|    7: 蓝光                  |
|    8: 在你眼里                |
|    9: 肖邦离别曲               |
|   10: 西行( 魔戒王者再临主题曲)(INTO |
| X 11: 深陷爱河                |
| X 12: 钟爱大地(THE MO RUN AIR |
| X 13: 时光流逝                |
| X 14: 卡农                  |
| X 15: 舒伯特小夜曲(SERENADE)    |
| X 16: 甜蜜的摇篮曲(Sweet Lullaby|
 ---------------------------

So, I wonder if there is a standard way to do the UTF-8 padding staff?

解决方案

When trying to line up ASCII text with Chinese in fixed-width font, there is a set of full width versions of the printable ASCII characters. Below I made a translation table of ASCII to full width version:

# coding: utf8

# full width versions (SPACE is non-contiguous with ! through ~)
SPACE = '\N{IDEOGRAPHIC SPACE}'
EXCLA = '\N{FULLWIDTH EXCLAMATION MARK}'
TILDE = '\N{FULLWIDTH TILDE}'

# strings of ASCII and full-width characters (same order)
west = ''.join(chr(i) for i in range(ord(' '),ord('~')))
east = SPACE + ''.join(chr(i) for i in range(ord(EXCLA),ord(TILDE)))

# build the translation table
full = str.maketrans(west,east)

data = '''\
蝴蝶(A song)
心之城(Another song)
支持你的爱人(Yet another song)
根生的种子
鸽子歌(Cucurrucucu palo whatever)
林地之间
蓝光
在你眼里
肖邦离别曲
西行（魔戒王者再临主题曲）(Into something)
深陷爱河
钟爱大地
时光流逝
卡农
舒伯特小夜曲(SERENADE)
甜蜜的摇篮曲(Sweet Lullaby)
'''

# Replace the ASCII characters with full width, and create a song list.
data = data.translate(full).rstrip().split('\n')

# translate each printable line.
print(' ----------Songs-----------'.translate(full))
for i,song in enumerate(data):
    line = '|{:4}: {:20.20}|'.format(i+1,song)
    print(line.translate(full))
print(' --------------------------'.translate(full))

Output

　－－－－－－－－－－Ｓｏｎｇｓ－－－－－－－－－－－
｜　　　１：　蝴蝶（Ａ　ｓｏｎｇ）　　　　　　　　　　｜
｜　　　２：　心之城（Ａｎｏｔｈｅｒ　ｓｏｎｇ）　　　｜
｜　　　３：　支持你的爱人（Ｙｅｔ　ａｎｏｔｈｅｒ　ｓ｜
｜　　　４：　根生的种子　　　　　　　　　　　　　　　｜
｜　　　５：　鸽子歌（Ｃｕｃｕｒｒｕｃｕｃｕ　ｐａｌｏ｜
｜　　　６：　林地之间　　　　　　　　　　　　　　　　｜
｜　　　７：　蓝光　　　　　　　　　　　　　　　　　　｜
｜　　　８：　在你眼里　　　　　　　　　　　　　　　　｜
｜　　　９：　肖邦离别曲　　　　　　　　　　　　　　　｜
｜　　１０：　西行（魔戒王者再临主题曲）（Ｉｎｔｏ　ｓ｜
｜　　１１：　深陷爱河　　　　　　　　　　　　　　　　｜
｜　　１２：　钟爱大地　　　　　　　　　　　　　　　　｜
｜　　１３：　时光流逝　　　　　　　　　　　　　　　　｜
｜　　１４：　卡农　　　　　　　　　　　　　　　　　　｜
｜　　１５：　舒伯特小夜曲（ＳＥＲＥＮＡＤＥ）　　　　｜
｜　　１６：　甜蜜的摇篮曲（Ｓｗｅｅｔ　Ｌｕｌｌａｂｙ｜
　－－－－－－－－－－－－－－－－－－－－－－－－－－

It's not overly pretty, but it lines up.

这篇关于如何控制包含东亚字符的 Unicode 字符串的填充的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何控制包含东亚字符的 Unicode 字符串的填充 [英] How to control padding of Unicode string containing east Asia characters

问题描述

输出

Output

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何控制包含东亚字符的 Unicode 字符串的填充 [英] How to control padding of Unicode string containing east Asia characters

问题描述

输出

Output

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭