如何将波斯语(波斯语)段落转换成Java语言中的单词列表 [英] how to turn a Persian (Farsi) paragraph into its list of words in Javascript

查看:163
本文介绍了如何将波斯语(波斯语)段落转换成Java语言中的单词列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试用一个段落显示一个对象,该段落显示带有频率的单词.

I am trying to make an object out of a paragraph that shows words with their frequencies.

var pattern = /\w+/g,
//the farsi paragraph
    string = "من امروز در مورد مهر خروج مشمولین اطلاعات جدیدی از سفارت ایران در مالزی گرفتم",
    matchedWords = string.match( pattern );

/* The Array.prototype.reduce method assists us in producing a single value from an
   array. In this case, we're going to use it to output an object with results. */
var counts = matchedWords.reduce(function ( stats, word ) {

    /* `stats` is the object that we'll be building up over time.
       `word` is each individual entry in the `matchedWords` array */
    if ( stats.hasOwnProperty( word ) ) {
        /* `stats` already has an entry for the current `word`.
           As a result, let's increment the count for that `word`. */
        stats[ word ] = stats[ word ] + 1;
    } else {
        /* `stats` does not yet have an entry for the current `word`.
           As a result, let's add a new entry, and set count to 1. */
        stats[ word ] = 1;
    }

    /* Because we are building up `stats` over numerous iterations,
       we need to return it for the next pass to modify it. */
    return stats;

}, {})

var dict = []; // create an empty array
// this for loop makes a dictionary for you
for (i in counts){
dict.push({'text':i, "size": counts[i]});


};

/* lets print and see if you can solve your problem */

console.log( dict);

该代码最初是为英文段落设计的.但是我需要将它用于波斯语. 我知道应该是其他东西,而不是"/\ w +/g":

the code originally worked out for an English paragraph. However I need to use it for a Farsi one. I know that it should be something else instead of "/\w+/g" in:

var pattern = /\w+/g,

但我不知道.

推荐答案

在正则表达式中,将变量用于除空白以外的任何字符",即\S. 空格被认为是换行符,制表符和空格)

In your regex use the variable for "any character but whitespace" that is \S. whitespace is considered a newline, a tab and a space)

var pattern = /\S+/g,
//the farsi paragraph
    string = "من امروز در مورد مهر خروج مشمولین اطلاعات جدیدی از سفارت ایران در مالزی گرفتم",
    matchedWords = string.match( pattern );

/* The Array.prototype.reduce method assists us in producing a single value from an
   array. In this case, we're going to use it to output an object with results. */
var counts = matchedWords.reduce(function ( stats, word ) {

    /* `stats` is the object that we'll be building up over time.
       `word` is each individual entry in the `matchedWords` array */
    if ( stats.hasOwnProperty( word ) ) {
        /* `stats` already has an entry for the current `word`.
           As a result, let's increment the count for that `word`. */
        stats[ word ] = stats[ word ] + 1;
    } else {
        /* `stats` does not yet have an entry for the current `word`.
           As a result, let's add a new entry, and set count to 1. */
        stats[ word ] = 1;
    }

    /* Because we are building up `stats` over numerous iterations,
       we need to return it for the next pass to modify it. */
    return stats;

}, {})

var dict = []; // create an empty array
// this for loop makes a dictionary for you
for (i in counts){
dict.push({'text':i, "size": counts[i]});


};

/* lets print and see if you can solve your problem */

console.log( dict);

这篇关于如何将波斯语(波斯语)段落转换成Java语言中的单词列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆