python中perluniprops的等价物是什么? [英] What is the equivalence of perluniprops in python?

查看:82
本文介绍了python中perluniprops的等价物是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

perl 中,有 Unicode 7 的 perluniprops 索引,http://perldoc.perl.org/perluniprops.html 在这里我可以执行以下操作来填充开始和结束标点:

s/(\p{Open_Punctuation})/$1/g;s/(\p{Close_Punctuation})/$1/g;

使用 perl 时填充的开始/结束标点符号的完整列表是什么?python 中的等价物是什么?

相关问题:用空格填充多个字符 - python 用空格填充多个字符 - python;这个问题是由回答者投票分开提出的,它应该分开.

解决方案

您是否在询问如何确定给定的开放标点对应的结束标点是什么?Unicode 没有定义这个.事实上,甚至没有一对一的关系.

$ unichars '\p{Open_Punctuation}' |wc -l75$ unichars '\p{Close_Punctuation}' |wc -l73

但是,您构建自己的映射应该相对容易.

$ unichars '\p{Open_Punctuation}' |猫( U+0028 左括号[ U+005B 左方括号{ U+007B 左花括号༺ U+0F3A TIBETAN MARK GUG RTAGS GYON༼ U+0F3C 藏语 MARK ANG KHANG GYON᚛ U+169B 欧格姆羽毛印记‚ U+201A 单个低 9 引号„ U+201E 双低 9 引号⁅ U+2045 左方支架带套筒⁽ U+207D 上标左括号₍ U+208D 订阅左括号⌈ U+2308 左天花板⌊ U+230A 左楼〈 U+2329 左尖角支架❨ U+2768 中左括号装饰❪ U+276A 中号扁平左圆括号装饰❬ U+276C 中号左尖角支架饰品❮ U+276E 重左指角引号装饰品❰ U+2770 重型左尖角支架装饰❲ U+2772 轻左玳瑁托饰❴ U+2774 中左花括号装饰⟅ U+27C5 左 S 形袋分隔符⟦ U+27E6 数学左白方括号⟨ U+27E8 数学左角支架⟪ U+27EA 数学左双角支架⟬ U+27EC 数学左白龟壳支架⟮ U+27EE 数学左平括号⦃ U+2983 左白色卷曲支架⦅ U+2985 左白括号⦇ U+2987 Z 符号左图像括号⦉ U+2989 Z NOTATION 左绑定括号⦋ U+298B 左方支架带底杆⦍ U+298D 左方括号,顶角有勾⦏ U+298F 左方支架,底角有勾⦑ U+2991 带圆点的左角支架⦓ U+2993 左圆弧小于支架⦕ U+2995 双左圆弧大于支架⦗ U+2997 左黑龟甲支架⧘ U+29D8 左摆动栅栏⧚ U+29DA 左双摆动栅栏⧼ U+29FC 左指弯角支架⸢ U+2E22 左上半支架⸤ U+2E24 左下半支架⸦ U+2E26 左侧 U 型支架⸨ U+2E28 左双括号⹂ U+2E42 双低反转 9 引号〈 U+3008 左角支架《 U+300A左双角支架「 U+300C 左角支架『 U+300E 左白角支架【 U+3010 左黑色透镜支架〔 U+3014 左龟甲支架〖 U+3016 左白透镜支架〘 U+3018 左白龟甲支架〚 U+301A 左白方括号〝 U+301D 反双引号﴿ U+FD3F 华丽的右括号︗ U+FE17 垂直左白透镜支架演示表格︵ U+FE35 垂直左括号的演示表格︷ U+FE37 立式左花括号展示表︹ U+FE39 立式左龟壳支架展示表︻ U+FE3B 立式左侧黑色透镜支架演示表格︽ U+FE3D 立式左双角支架展示表格︿ U+FE3F 立式左角支架展示表格﹁ U+FE41 立式左角支架展示表格﹃ U+FE43 立式左白角支架演示表格﹇ U+FE47 立式左方括号演示表格﹙ U+FE59 左小括号﹛ U+FE5B 左小花括号﹝ U+FE5D 左小龟壳支架( U+FF08 全宽左括号[ U+FF3B 全宽左方括号{ U+FF5B 全宽左花括号⦅ U+FF5F 全宽左白括号★ U+FF62 左半角支架

$ unichars '\p{Close_Punctuation}' |猫) U+0029 右括号] U+005D 右方支架} U+007D 右花括号༻ U+0F3B TIBETAN MARK GUG RTAGS GYAS༽ U+0F3D 藏语 MARK ANG KHANG GYAS᚜ U+169C OGHAM 反面羽毛印记⁆ U+2046 右方支架带套筒⁾ U+207E 上标右括号₎ U+208E 订阅右括号⌉ U+2309 右天花板⌋ U+230B 右楼〉 U+232A 直角支架❩ U+2769 中号右括号饰品❫ U+276B 中号扁平右括号装饰品❭ U+276D 中号直角支架饰品❯ U+276F 重直角引号装饰品❱ U+2771 重型直角支架饰品❳ U+2773 灯右玳瑁支架饰品❵ U+2775 中号右花括号装饰品⟆ U+27C6 右 S 形袋分隔符⟧ U+27E7 数学右白方括号⟩ U+27E9 数学直角支架⟫ U+27EB 数学右双角支架⟭ U+27ED 数学右白龟甲支架⟯ U+27EF 数学右平括号⦄ U+2984 右白色卷曲支架⦆ U+2986 右白括号⦈ U+2988 Z 符号右图括号⦊ U+298A Z 符号右绑定支架⦌ U+298C 右方支架带底杆⦎ U+298E 右方括号,底角有勾⦐ U+2990 右方支架,顶角有勾号⦒ U+2992 带圆点的直角支架⦔ U+2994 右圆弧大于支架⦖ U+2996双右圆弧小于支架⦘ U+2998 右黑龟甲支架⧙ U+29D9 右摇摆栅栏⧛ U+29DB 右双摆动栅栏⧽ U+29FD 直角弯角支架⸣ U+2E23 右上半支架⸥ U+2E25 右下半支架⸧ U+2E27 右侧 U 支架⸩ U+2E29 右双括号〉 U+3009 直角支架》 U+300B 右双角支架」 U+300D 右角支架』 U+300F 右白角支架】 U+3011 右黑色透镜支架〕 U+3015 右龟壳支架】 U+3017 右白透镜支架〙 U+3019 右白龟甲支架〛 U+301B 右白方括号─ U+301E 双引号〟 U+301F 低双引号﴾ U+FD3E 华丽的左括号︘ U+FE18 立式右白透镜支架演示表格︶ U+FE36 垂直右括号演示表格︸ U+FE38 立式右花括号展示表︺ U+FE3A 立式右龟壳支架展示表︼ U+FE3C 立式右黑透镜支架展示表︾ U+FE3E 立式右双角支架展示表格﹀ U+FE40 立式直角支架演示表格﹂ U+FE42 立式右角支架展示表格﹄ U+FE44 立式右白角支架展示表﹈ U+FE48 立式右方括号展示表﹚ U+FE5A 右小括号﹜ U+FE5C 小右卷曲支架﹞ U+FE5E 小右龟壳支架) U+FF09 全宽右括号] U+FF3D 全宽右方括号} U+FF5D 全宽右卷曲支架⦆ U+FF60 全宽右白括号’ U+FF63 半宽右角支架

在python中安装unicharscpan Unicode::Tussle后:

<预><代码>>>>导入子流程>>>cmd = "unichars '\p{Open_Punctuation}' | cut -f2 -d' ' | tr -d '\n'">>>open_punct = subprocess.check_output(cmd, shell=True).decode('utf8')Smartmatch 在/usr/local/bin/unichars 第 546 行是实验性的.>>>打印(open_punct)([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦸⦕⦢〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅★

In perl, there's the perluniprops index of Unicode 7, http://perldoc.perl.org/perluniprops.html where I can do the following to pad opening and closing punctuations:

s/(\p{Open_Punctuation})/ $1 /g;
s/(\p{Close_Punctuation})/ $1 /g;

What is the full list of opening/closing punctuations that gets padded when using the perl? And what is the equivalence in python?

Related question: Padding multiple character with space - python Padding multiple character with space - python; this question was asked separatedly by answerer's vote that it should be separate.

解决方案

Are you asking how to determine what's the corresponding closing punctuation for a given open punctuation? Unicode does not define this. In fact, there's not even a 1:1 relationship.

$ unichars '\p{Open_Punctuation}' | wc -l
75

$ unichars '\p{Close_Punctuation}' | wc -l
73

However, It should be relatively easy for you to build your own mapping.

$ unichars '\p{Open_Punctuation}' | cat
 (  U+0028 LEFT PARENTHESIS
 [  U+005B LEFT SQUARE BRACKET
 {  U+007B LEFT CURLY BRACKET
 ༺  U+0F3A TIBETAN MARK GUG RTAGS GYON
 ༼  U+0F3C TIBETAN MARK ANG KHANG GYON
 ᚛  U+169B OGHAM FEATHER MARK
 ‚  U+201A SINGLE LOW-9 QUOTATION MARK
 „  U+201E DOUBLE LOW-9 QUOTATION MARK
 ⁅  U+2045 LEFT SQUARE BRACKET WITH QUILL
 ⁽  U+207D SUPERSCRIPT LEFT PARENTHESIS
 ₍  U+208D SUBSCRIPT LEFT PARENTHESIS
 ⌈  U+2308 LEFT CEILING
 ⌊  U+230A LEFT FLOOR
 〈 U+2329 LEFT-POINTING ANGLE BRACKET
 ❨  U+2768 MEDIUM LEFT PARENTHESIS ORNAMENT
 ❪  U+276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
 ❬  U+276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
 ❮  U+276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
 ❰  U+2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
 ❲  U+2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
 ❴  U+2774 MEDIUM LEFT CURLY BRACKET ORNAMENT
 ⟅  U+27C5 LEFT S-SHAPED BAG DELIMITER
 ⟦  U+27E6 MATHEMATICAL LEFT WHITE SQUARE BRACKET
 ⟨  U+27E8 MATHEMATICAL LEFT ANGLE BRACKET
 ⟪  U+27EA MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
 ⟬  U+27EC MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
 ⟮  U+27EE MATHEMATICAL LEFT FLATTENED PARENTHESIS
 ⦃  U+2983 LEFT WHITE CURLY BRACKET
 ⦅  U+2985 LEFT WHITE PARENTHESIS
 ⦇  U+2987 Z NOTATION LEFT IMAGE BRACKET
 ⦉  U+2989 Z NOTATION LEFT BINDING BRACKET
 ⦋  U+298B LEFT SQUARE BRACKET WITH UNDERBAR
 ⦍  U+298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
 ⦏  U+298F LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
 ⦑  U+2991 LEFT ANGLE BRACKET WITH DOT
 ⦓  U+2993 LEFT ARC LESS-THAN BRACKET
 ⦕  U+2995 DOUBLE LEFT ARC GREATER-THAN BRACKET
 ⦗  U+2997 LEFT BLACK TORTOISE SHELL BRACKET
 ⧘  U+29D8 LEFT WIGGLY FENCE
 ⧚  U+29DA LEFT DOUBLE WIGGLY FENCE
 ⧼  U+29FC LEFT-POINTING CURVED ANGLE BRACKET
 ⸢  U+2E22 TOP LEFT HALF BRACKET
 ⸤  U+2E24 BOTTOM LEFT HALF BRACKET
 ⸦  U+2E26 LEFT SIDEWAYS U BRACKET
 ⸨  U+2E28 LEFT DOUBLE PARENTHESIS
 ⹂  U+2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK
 〈 U+3008 LEFT ANGLE BRACKET
 《 U+300A LEFT DOUBLE ANGLE BRACKET
 「 U+300C LEFT CORNER BRACKET
 『 U+300E LEFT WHITE CORNER BRACKET
 【 U+3010 LEFT BLACK LENTICULAR BRACKET
 〔 U+3014 LEFT TORTOISE SHELL BRACKET
 〖 U+3016 LEFT WHITE LENTICULAR BRACKET
 〘 U+3018 LEFT WHITE TORTOISE SHELL BRACKET
 〚 U+301A LEFT WHITE SQUARE BRACKET
 〝 U+301D REVERSED DOUBLE PRIME QUOTATION MARK
 ﴿  U+FD3F ORNATE RIGHT PARENTHESIS
 ︗ U+FE17 PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
 ︵ U+FE35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
 ︷ U+FE37 PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
 ︹ U+FE39 PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
 ︻ U+FE3B PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
 ︽ U+FE3D PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
 ︿ U+FE3F PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
 ﹁ U+FE41 PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
 ﹃ U+FE43 PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
 ﹇ U+FE47 PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET
 ﹙ U+FE59 SMALL LEFT PARENTHESIS
 ﹛ U+FE5B SMALL LEFT CURLY BRACKET
 ﹝ U+FE5D SMALL LEFT TORTOISE SHELL BRACKET
 ( U+FF08 FULLWIDTH LEFT PARENTHESIS
 [ U+FF3B FULLWIDTH LEFT SQUARE BRACKET
 { U+FF5B FULLWIDTH LEFT CURLY BRACKET
 ⦅ U+FF5F FULLWIDTH LEFT WHITE PARENTHESIS
 「  U+FF62 HALFWIDTH LEFT CORNER BRACKET

$ unichars '\p{Close_Punctuation}' | cat
 )  U+0029 RIGHT PARENTHESIS
 ]  U+005D RIGHT SQUARE BRACKET
 }  U+007D RIGHT CURLY BRACKET
 ༻  U+0F3B TIBETAN MARK GUG RTAGS GYAS
 ༽  U+0F3D TIBETAN MARK ANG KHANG GYAS
 ᚜  U+169C OGHAM REVERSED FEATHER MARK
 ⁆  U+2046 RIGHT SQUARE BRACKET WITH QUILL
 ⁾  U+207E SUPERSCRIPT RIGHT PARENTHESIS
 ₎  U+208E SUBSCRIPT RIGHT PARENTHESIS
 ⌉  U+2309 RIGHT CEILING
 ⌋  U+230B RIGHT FLOOR
 〉 U+232A RIGHT-POINTING ANGLE BRACKET
 ❩  U+2769 MEDIUM RIGHT PARENTHESIS ORNAMENT
 ❫  U+276B MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT
 ❭  U+276D MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT
 ❯  U+276F HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT
 ❱  U+2771 HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT
 ❳  U+2773 LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT
 ❵  U+2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT
 ⟆  U+27C6 RIGHT S-SHAPED BAG DELIMITER
 ⟧  U+27E7 MATHEMATICAL RIGHT WHITE SQUARE BRACKET
 ⟩  U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET
 ⟫  U+27EB MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
 ⟭  U+27ED MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET
 ⟯  U+27EF MATHEMATICAL RIGHT FLATTENED PARENTHESIS
 ⦄  U+2984 RIGHT WHITE CURLY BRACKET
 ⦆  U+2986 RIGHT WHITE PARENTHESIS
 ⦈  U+2988 Z NOTATION RIGHT IMAGE BRACKET
 ⦊  U+298A Z NOTATION RIGHT BINDING BRACKET
 ⦌  U+298C RIGHT SQUARE BRACKET WITH UNDERBAR
 ⦎  U+298E RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
 ⦐  U+2990 RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER
 ⦒  U+2992 RIGHT ANGLE BRACKET WITH DOT
 ⦔  U+2994 RIGHT ARC GREATER-THAN BRACKET
 ⦖  U+2996 DOUBLE RIGHT ARC LESS-THAN BRACKET
 ⦘  U+2998 RIGHT BLACK TORTOISE SHELL BRACKET
 ⧙  U+29D9 RIGHT WIGGLY FENCE
 ⧛  U+29DB RIGHT DOUBLE WIGGLY FENCE
 ⧽  U+29FD RIGHT-POINTING CURVED ANGLE BRACKET
 ⸣  U+2E23 TOP RIGHT HALF BRACKET
 ⸥  U+2E25 BOTTOM RIGHT HALF BRACKET
 ⸧  U+2E27 RIGHT SIDEWAYS U BRACKET
 ⸩  U+2E29 RIGHT DOUBLE PARENTHESIS
 〉 U+3009 RIGHT ANGLE BRACKET
 》 U+300B RIGHT DOUBLE ANGLE BRACKET
 」 U+300D RIGHT CORNER BRACKET
 』 U+300F RIGHT WHITE CORNER BRACKET
 】 U+3011 RIGHT BLACK LENTICULAR BRACKET
 〕 U+3015 RIGHT TORTOISE SHELL BRACKET
 〗 U+3017 RIGHT WHITE LENTICULAR BRACKET
 〙 U+3019 RIGHT WHITE TORTOISE SHELL BRACKET
 〛 U+301B RIGHT WHITE SQUARE BRACKET
 〞 U+301E DOUBLE PRIME QUOTATION MARK
 〟 U+301F LOW DOUBLE PRIME QUOTATION MARK
 ﴾  U+FD3E ORNATE LEFT PARENTHESIS
 ︘ U+FE18 PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET
 ︶ U+FE36 PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS
 ︸ U+FE38 PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET
 ︺ U+FE3A PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET
 ︼ U+FE3C PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET
 ︾ U+FE3E PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET
 ﹀ U+FE40 PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET
 ﹂ U+FE42 PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET
 ﹄ U+FE44 PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET
 ﹈ U+FE48 PRESENTATION FORM FOR VERTICAL RIGHT SQUARE BRACKET
 ﹚ U+FE5A SMALL RIGHT PARENTHESIS
 ﹜ U+FE5C SMALL RIGHT CURLY BRACKET
 ﹞ U+FE5E SMALL RIGHT TORTOISE SHELL BRACKET
 ) U+FF09 FULLWIDTH RIGHT PARENTHESIS
 ] U+FF3D FULLWIDTH RIGHT SQUARE BRACKET
 } U+FF5D FULLWIDTH RIGHT CURLY BRACKET
 ⦆ U+FF60 FULLWIDTH RIGHT WHITE PARENTHESIS
 」  U+FF63 HALFWIDTH RIGHT CORNER BRACKET

After installing unichars with cpan Unicode::Tussle, in python:

>>> import subprocess
>>> cmd = "unichars '\p{Open_Punctuation}' | cut -f2 -d' ' | tr -d '\n'"
>>> open_punct = subprocess.check_output(cmd, shell=True).decode('utf8')
Smartmatch is experimental at /usr/local/bin/unichars line 546.
>>> print (open_punct)
([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「

这篇关于python中perluniprops的等价物是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆