Java：如何检查字符是否属于特定的unicode块？ [英] Java: how to check if character belongs to a specific unicode block?

查看：881 发布时间：2018/12/12 18:19:24 java regex unicode char

本文介绍了Java：如何检查字符是否属于特定的unicode块？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要确定输入所属的自然语言。
目标是区分混合输入中的阿拉伯语和英语单词，其中输入为Unicode并从XML文本节点中提取。
我注意到了类 Character.UnicodeBlock 。这与我的问题有关吗？我怎样才能让它工作？

I need to identify what natural language my input belongs to. The goal is to distinguish between Arabic and English words in a mixed input, where the input is Unicode and is extracted from XML text nodes. I have noticed the class Character.UnicodeBlock. Is it related to my problem? How can I get it to work?

编辑：
Character.UnicodeBlock 方法对阿拉伯语有用，但显然不适用于英语（或其他欧洲语言），因为 BASIC_LATIN Unicode块包含符号和不可打印字符以及信件。
所以现在我正在使用 String 对象的 matches（）方法，正则表达式[A-Za-z] +。我可以忍受它，但也许有人可以建议更好/更快的方式。

The Character.UnicodeBlock approach was useful for Arabic, but apparently doesn't do it for English (or other European languages) because the BASIC_LATIN Unicode block covers symbols and non-printable characters as well as letters. So now I am using the matches() method of the String object with the regex expression "[A-Za-z]+" instead. I can live with it, but perhaps someone can suggest a nicer/faster way.

Java：如何检查字符是否属于特定的unicode块？ [英] Java: how to check if character belongs to a specific unicode block?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java：如何检查字符是否属于特定的unicode块？ [英] Java: how to check if character belongs to a specific unicode block?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭