javascript +动态删除变音符号 [英] javascript+remove arabic text diacritic dynamically

查看:87
本文介绍了javascript +动态删除变音符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何动态删除阿拉伯语变音符号 我正在设计一本电子书"chm",并且包含多个包含阿拉伯文本的html页面 但是有一段时间搜索引擎想要突出显示一些 阿拉伯语单词是因为它的变音符号,因此在页面加载时可以使用JavaScript功能剥离阿拉伯语变音符号文字吗? 但必须具有再次启用的选项,所以我 不想从物理上将其从HTML中删除,而是暂时的,

how to remove dynamically Arabic diacritic I'm designing an ebook "chm" and have multi html pages contain Arabic text but some time the search engine want highlight some of Arabic words because its diacritic so is it possible when page load to use JavaScript functions that would strip the Arabic diacritic text ?? but must have option to enabled again so i don't want to remove it from HTML physically but temporary,

问题是我不知道从哪里开始,什么才是正确的功能

the thing is i don't know where to start and what is the right function to use

谢谢:)

例如

Text : الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ
converted to : الحمد لله رب العالمين 

推荐答案

尝试一下

Text : الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ
converted to : الحمد لله رب العالمين 

http://www.suhailkaleem .com/2009/08/26/remove-diacritics-from-arabic-text-quran/

尽管代码不是C#而是C#. 仍在尝试找出如何在javascript中实现这一点

The code is C# not javascript though. Still trying to figure out how to achieve this in javascript

显然,这在javascript中非常容易.泛音符号存储为单独的字母",可以很容易地将其删除.

Apparently it's very easy in javascript. The diacratics are stored as separate "letters" and they can be removed quite easily.

var CHARCODE_SHADDA = 1617;
var CHARCODE_SUKOON = 1618;
var CHARCODE_SUPERSCRIPT_ALIF = 1648;
var CHARCODE_TATWEEL = 1600;
var CHARCODE_ALIF = 1575;

function isCharTashkeel(letter)
{
    if (typeof(letter) == "undefined" || letter == null)
        return false;

    var code = letter.charCodeAt(0);
    //1648 - superscript alif
    //1619 - madd: ~
    return (code == CHARCODE_TATWEEL || code == CHARCODE_SUPERSCRIPT_ALIF || code >= 1612 && code <= 1631); //tashkeel
}

function stripTashkeel(input)
{
  var output = "";
  //todo consider using a stringbuilder to improve performance
  for (var i = 0; i < input.length; i++)
  {
    var letter = input.charAt(i);
    if (!isCharTashkeel(letter)) //tashkeel
      output += letter;                                
  }


return output;                   
}

这是使用BuckData的另一种方法 http://qurandev.github.com/

Here is another way to do it using BuckData http://qurandev.github.com/

优势 Buck使用更少的带宽在Javascript中,您可以通过 1张照片中的整个巴克古兰经文字.与阿拉伯文搜索相比直观 将Buck转换为阿拉伯语,将阿拉伯语转换为Buck是一个简单的js调用.现场玩 此处的示例: http://jsfiddle.net/BrxJP/您可以剥离所有元音 从Buck文本中以几毫秒为单位.为什么要这样?你可以搜索 javascript,而忽略了taskheel的差异(Fathah,Dammah, 卡拉(Kasrah).这导致更多的点击.正则表达式+ buck文本可能导致 很棒的优化.所有搜索都可以在本地运行. http://qurandev.appspot.com 数据是如何生成的?一对一 使用 http://corpus.quran.com/java/buckwalter.jsp

Advantages Buck uses less bandwidth In Javascript, u can search thru entire Buck quran text in 1 shot. intuitive compared to Arabic search Buck to Arabic and Arabic to Buck is a simple js call. Play with live sample here: http://jsfiddle.net/BrxJP/ You can strip out all vowels from Buck text in few millisecs. Why do this? u can search in javascript, ignoring the taskheel differences (Fathah, Dammah, Kasrah). Which leads to more hits. Regex + buck text can lead to awesome optimizations. All the searches can be run locally. http://qurandev.appspot.com How data generated? just one-to-one mapping using: http://corpus.quran.com/java/buckwalter.jsp

这篇关于javascript +动态删除变音符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆