计算外语中的字符数 [英] Count number of characters present in foreign language

查看:129
本文介绍了计算外语中的字符数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有任何最佳方式来实现非英文字母的字符数?例如,如果我们用英语中的母亲这个词,它就是一个6个字母的单词。但是如果你在泰米尔语中键入相同的单词(மதர்),它是一个三个字母的单词(ம+த+ர்),但最后一个字母(ர்)将被系统视为两个字符(ர+ஂ=ர்)。那么有没有办法计算真实角色的数量?

Is there any optimal way to implement character count for non English letters? For example, if we take the word "Mother" in English, it is a 6 letter word. But if you type the same word(மதர்) in Tamil, it is a three letter word(ம+த+ர்) but the last letter(ர்) will be considered as two characters(ர+ஂ=ர்) by the system. So is there any way to count the number of real characters?

一个线索是,如果我们将键盘中的光标移动到单词(மதர்)中,它将仅通过3个字母,而不是系统考虑的4个字符,那有没有办法通过使用这个来找到解决方案?任何有关这方面的帮助将不胜感激......

One clue is that if we move the cursor in keyboard into the word (மதர்), it will pass through 3 letters only and not into 4 chars considering by the system, so is there any way to find the solution by using this? Any help on this would be greatly appreciated...

推荐答案

更新



从午餐回来=)
我担心之前的语言不能用任何外语这么好用
所以我添加了另一个小提琴可能的方式

Update

Back from lunch =) I'm afraid that the previous won't work this well with any foreign language So i added another fiddle with a possible way

var UnicodeNsm = [Array 1280] //It holds all escaped Unicode Non Space Marks
function countNSMString(str) {
    var chars = str.split("");
    var count = 0;
    for (var i = 0,ilen = chars.length;i<ilen;i++) {
      if(UnicodeNsm.indexOf(escape(chars[i])) == -1) {
        count++;
       }
    }
    return count;
}

var English = "Mother";  
var Tamil = "மதர்";
var Vietnamese = "mẹ"
var Hindi = "मां"

function logL (str) {    
      console.log(str + " has " + countNSMString(str) + " visible Characters and " + str.length + " normal Characters" ); //"மதர் has 3 visible Characters"
}

logL(English) //"Mother has 6 visible Characters and 6 normal Characters"
logL(Tamil) //"மதர் has 3 visible Characters and 4 normal Characters"
logL(Vietnamese) //"mẹ has 2 visible Characters and 3 normal Characters"
logL(Hindi) //"मां has 1 visible Characters and 3 normal Characters"

所以这只是检查字符串中的任何字符是否是Unicode NSM字符并忽略对于这个,这个应该适用于大多数语言,而不仅仅是泰米尔语,
和一个包含1280个元素的数组不应该是性能问题那么大

So this just checks if theres any Character in the String which is a Unicode NSM character and ignores the count for this, this should work for the Most languages, not Tamil only, And an array with 1280 Elements shouldn't be that big of a performance issue

这是一个包含Unicode NSM
的列表 http:/ /www.fileformat.info/info/unicode/category/Mn/list.htm

Here is a list with the Unicode NSM's http://www.fileformat.info/info/unicode/category/Mn/list.htm

这是相应的 JSB在

Here is the according JSBin

在尝试使用字符串操作后,结果是
String.indexOf 返回

After experimenting a bit with string operations, it turns out String.indexOf returns the same for

ர்
含义

ர்ரர.indexOf(ர்)== ர்ரர.indexOf(ர+்)// true 但是

ர்ரர.indexOf(ர)== ர்ரர.indexOf(ர+ர) // false

"ர்" and for "ர" meaning
"ர்ரர".indexOf("ர்") == "ர்ரர".indexOf("ர" + "்") //true but
"ர்ரர".indexOf("ர") == "ர்ரர".indexOf("ர" + "ர") //false

我抓住这个机会尝试过这样的事情

I took this opportunity and tried something like this

//ர்

var char = "ரர்ர்ரர்்";
var char2 = "ரரர்ர்ரர்்";    
var char3 = "ர்ரர்ர்ரர்்";

function countStr(str) {
         var  chars = str.split("");
         var count = 0;
          for(var i = 0, ilen = chars.length;i<ilen;i++) {
                 var chars2 = chars[i] + chars[i+1];   
                 if (str.indexOf(chars[i]) == str.indexOf(chars2))
                   i += 1;
               count++;
            }
         return count;
 }


console.log("--");

console.log(countStr(char)); //6

console.log(countStr(char2)); //7

console.log(countStr(char3)); //7

这似乎适用于上面的字符串,可能需要一些调整,因为我不喜欢不知道关于编码和东西的事情,但也许你可以开始点

Which seems to work for the String above, it may take some adjustments, as i don't know a thing about Encoding and stuff, but maybe its a point you can begin with

继承人 JSBin

这篇关于计算外语中的字符数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆