是否有所有国际句点标点符号集? [英] Are there character collections for all international full stop punctuations?

查看:121
本文介绍了是否有所有国际句点标点符号集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将utf-8字符串解析为一口大小"的段.例如,我想将文本分解为句子".

I am trying to parse utf-8 strings into "bite sized" segments. For example, I would like to break down a text into "sentences".

是否存在与所有语言的句子结尾相对应的字符(或正则表达式)的全面集合?我正在寻找能够捕捉到拉丁时期,感叹号和讯问号,中文和日文句号等等的东西.

Is there a comprehensive collection of characters (or regex) that correspond to end of sentences in all languages? I'm looking for something that would capture the Latin period, exclamation and interrogation marks, the Chinese and Japanese full stop, etc.

类似于上面的内容,但是相当于一个逗号也是很好的.

Something like the above but for the equivalent of a comma would be great too.

推荐答案

我没有遇到过此类信息的任何汇编,我希望它是收集该信息的主要工作.对于某些广泛使用的语言,您可以从《芝加哥样式手册》中获取信息.在

I haven’t encountered any compilations of such information, and I would expect it to be a major effort to collect it. For some widely used languages, you could get the information from The Chicago Manual of Style. There is some information about punctuation marks commonly used in different languages at http://unicode.org/repos/cldr-tmp/trunk/diff/by_type/misc.exemplarCharacters-other.html but is covers just a small set of languages and does not distinguish sentence-terminating characters.

仅使用字符是不够的,因为在英文中,句号."发生在许多不终止句子的上下文中,例如例如"或"1.5"中.

Using just characters won’t be enough, since e.g. in English, the full stop "." occurs in many contexts where it does not terminate a sentence, as in "e.g." or in "1.5".

这篇关于是否有所有国际句点标点符号集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆