我怎么能在java中从字符串中删除阿拉伯标点符号 [英] how could i remove arabic punctuation form a String in java

查看:25
本文介绍了我怎么能在java中从字符串中删除阿拉伯标点符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一本阿拉伯语词典,我得到的句子是
String original = "'أَبَنَ فُلانًا: عَابَه ورَمَاه بخَلَّة سَوء.'";从我的数据库中,但我无法在不删除重音和标点符号的情况下处理句子

i am working on an arabic dictionary and i am getting sentences like
String original = "'أَبَنَ فُلانًا: عَابَه ورَمَاه بخَلَّة سَوء.'"; from my database but i cant process the sentence without removing the accents and punctuation

我尝试使用

import java.text.Normalizer;
import java.text.Normalizer.Form;
import java.util.regex.Pattern;

public static String deAccent(String str) {
    String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD); 
    Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
    return pattern.matcher(nfdNormalizedString).replaceAll("");
} 

但是没用

推荐答案

为什么不直接使用 Unicode 标点符号/标记、非间距类别?

Why don't you just go for the Unicode punctuation / mark, non-spacing categories?

不确定您的预期结果,因为它没有发布 - 而且我看不懂阿拉伯语:),但试试这个代码:

Not sure of your expected result as it's not posted - and I can't read Arabic :), but try this code:

String input = "'أَبَنَ فُلانًا: عَابَه ورَمَاه بخَلَّة سَوء.'";
Pattern p = Pattern.compile("[\\p{P}\\p[Mn]");
Matcher m = p.matcher(input);
while (m.find()) {
    System.out.println("found: " + m.group());
}
m.reset();
System.out.println("Replaced: " + m.replaceAll(" "));

输出:

found: '
found: َ
found: َ
found: َ
found: ُ
found: ً
found: :
found: َ
found: َ
found: َ
found: َ
found: َ
found: ّ
found: َ
found: َ
found: .
found: '
Replaced:  أ ب ن  ف لان ا  ع اب ه ور م اه بخ ل  ة س وء  

我想这不是您想要的最终结果,但我希望您可以使用它.

I suppose it's not your desired final result, but I hope it's something you can work with.

此外,这个是 Unicode 信息的金矿类别.我相信大多数都适用于 Java Pattern.

Also, this is a gold mine of information on the Unicode categories. I believe most are applicable in a Java Pattern.

这篇关于我怎么能在java中从字符串中删除阿拉伯标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆