删除希伯来语“niqqud"使用 r [英] removing Hebrew "niqqud" using r

查看：32 发布时间：2021/9/6 19:10:06 regex r text unicode hebrew

本文介绍了删除希伯来语“niqqud"使用 r的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

一直在努力删除 niqqud(用于表示元音或区分其他发音的变音符号希伯来字母表中的字母).例如，我有这个变量:sample1 <- "הֻסְמַק"

Have been struggling to remove niqqud ( diacritical signs used to represent vowels or distinguish between alternative pronunciations of letters of the Hebrew alphabet). I have for instance this variable: sample1 <- "הֻסְמַק"

而且我找不到有效的方法来去除字母下方的标志.

And i cannot find effective way to remove the signs below the letters.

尝试使用str_replace_all(sample1, "[^[:alnum:]]", "")试过 gsub('[:punct:]','',sample1)

没有成功... :-(有什么想法吗?

no success... :-( any ideas?

推荐答案

您可以使用 \p{M} Unicode 类别来匹配具有类似 Perl 的正则表达式的变音符号，以及 gsub 所有这些都像这样:

You can use the \p{M} Unicode category to match diacritics with Perl-like regex, and gsub all of them in one go like this:

sample1 <- "הֻסְמַק"
gsub("\\p{M}", "", sample1, perl=T)

结果:[1] "הסמק"

参见演示

\p{M} 或 \p{Mark}:用于与另一个字符组合的字符(例如重音符号、变音符号、封闭框等).

\p{M} or \p{Mark}: a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.).

在 Regular-Expressions.info，Unicode 类别".

这篇关于删除希伯来语“niqqud"使用 r的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

删除希伯来语“niqqud"使用 r [英] removing Hebrew "niqqud" using r

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

删除希伯来语“niqqud"使用 r [英] removing Hebrew &quot;niqqud&quot; using r

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

删除希伯来语“niqqud"使用 r [英] removing Hebrew "niqqud" using r

登录关闭