如何实现我的算法文本校正以替换文本中的单词? [英] How to implement my algorithm text correction for the replacement of words in the text?

查看:92
本文介绍了如何实现我的算法文本校正以替换文本中的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

帮我创建一个新功能或更改功能correct(),以便结果对于输入文本以case-insensitive的方式工作.

Help me to create a new function or change the function correct() so that the result works in a case-insensitive manner for the input text.

correct()方法的用法示例:

$text = "Точик ТОЧИК точик ТоЧиК тоЧИК";

$text = correct($text, $base_words);
echo "$text";

预期结果

输入:Точик ТОЧИК точик ТоЧиК тоЧИК
输出:Тоҷик ТОҶИК тоҷик ТоҶиК тоҶИК

Expected Result

Input: Точик ТОЧИК точик ТоЧиК тоЧИК
Output: Тоҷик ТОҶИК тоҷик ТоҶиК тоҶИК

下面是所有数组和函数,因此您可以轻松复制它们:

Here are all the arrays and functions below so you can easily copy them:

$default_words = array
(
    'бур',
    'кори',
    'давлати',
    'забони',
    'фанни'
);

$base_words = array
(
    "точик"    => "тоҷик",
    "точики"   => "тоҷики",
    "точикон"  => "тоҷикон",
    "чахонгир" => "ҷаҳонгир",
    "галат"    => "ғалат",
    "уктам"    => "ӯктам",
);

$base_special_words = array
(
    "кори хатти"     => "кори хаттӣ",
    "хатти аз"       => "хаттӣ аз",
    "забони точики"  => "забони тоҷикӣ",
    "точики барои"   => "тоҷикӣ барои",
    "забони давлати" => "забони давлатӣ",
    "давлати дар"    => "давлатӣ дар",
    "микёси чахони"  => "миқёси ҷаҳонӣ",
);


function correct($request, $dictionary)
{
    $search  = array("ғ","ӣ","ҷ","ҳ","қ","ӯ","Ғ","Ӣ","Ҷ","Ҳ","Қ","Ӯ");
    $replace = array("г","и","ч","х","к","у","Г","И","Ч","Х","К","У");
    $request = str_replace($search, $replace, $request); // replace special letters to default cyrillic letters

    $result = preg_replace_callback("/\pL+/u", function ($m) use ($dictionary) {
    $word = mb_strtolower($m[0]);
    if (isset($dictionary[$word])) {
        $repl = $dictionary[$word];
        // Check for some common ways of upper/lower case
        // 1. all lower case
        if ($word === $m[0]) return $repl;
        // 2. all upper case
        if (mb_strtoupper($word) === $m[0]) return mb_strtoupper($repl);
        // 3. Only first letters are upper case
        if (mb_convert_case($word,  MB_CASE_TITLE) === $m[0]) return mb_convert_case($repl,  MB_CASE_TITLE);
        // Otherwise: check each character whether it should be upper or lower case
        for ($i = 0, $len = mb_strlen($word); $i < $len; ++$i) {
            $mixed[] = mb_substr($word, $i, 1) === mb_substr($m[0], $i, 1) 
                ? mb_substr($repl, $i, 1)
                : mb_strtoupper(mb_substr($repl, $i, 1));
        }
        return implode("", $mixed);
    }
    return $m[0]; // Nothing changes
    }, $request);


    return $result;
}


问题

如何正确纠正输入文字?

输入


Questions

How do I properly correct the input text?

Input

Кори хатти аз фанни забони точики барои забони давлати дар микёси чахони.

输出

Кори хаттӣ аз фанни забони тоҷикӣ барои забони давлатӣ дар миқёси ҷаҳонӣ.

最有可能在这里,您需要使用3个数组逐步修复文本.我的算法未给出合适的结果.因此,我创建了一个由两个单词($base_special_words)组成的数组.

Here, most likely, you need to fix the text step by step using 3 arrays. My algorithm did not give suitable results. And so I created an array that consists of two words ($base_special_words).

您需要根据句子中出现的那些单词从$base_special_words数组的元素创建一个temp array.临时数组如下所示:

You need to create a temp array from the elements of the $base_special_words array from those words that occur in the sentence. The temp array looks like this:

$temp_for_base_special_words = array
(
    "кори хатти",
    "хатти аз",
    "забони точики",
    "точики барои",
    "забони давлати",
    "давлати дар",
    "микёси чахони",   
);

所有这些单词在句子中相遇.然后我们切出temp数组中的那些单词.从句子中切出这些词后,句子看起来像这样:

All these words meet in the sentence. Then we cut out those words that are in the temp array. After cutting out those words from the sentence, the sentence looks like this:

Кори хатти аз фанни забони точики барои забони давлати дар микёси чахони. Точик мард аст.

句子的删节部分:

Cutted part of sentence:

Кори хатти аз забони точики барои забони давлати дар микёси чахони

切后句子:

фанни. Точик мард аст.

第2步.

然后将使用$ default_words数组检查句子的其余部分,并剪切句子中该数组中的单词.

Step 2.

Then the remaining part of the sentence will be checked with the array $default_words and the words that are in this array from the sentence are cut.

фанни. Точик мард аст.

切割部分:

фанни

切后句子:

. Точик мард аст.

带有切词的数组:

Array with cutted words:

$temp_for_default_words = array("фанни");

第3步.

从句子的其余部分中删除$ base_words数组中可用的那些单词.

Step 3.

Cut those words from the rest of the sentence that are available in the $base_words array.

. Точик мард аст.

切割部分:

Точик

切后句子:

. мард аст.

带有切词的数组:

Array with cutted words:

$temp_for_base_words = array ("точик");

要约的其余部分必须被暂时削减和隐藏,以便对其不予处理.

The rest of the offer must be temporarily cut and hidden so that there is no treatment with it.

. мард аст.

最后,您需要使用字典替换三个新数组,并返回隐藏部分.

And in the end, you need to replace using three new arrays using the dictionary and return the hidden part.


使用$temp_for_base_special_words值查找$base_special_words中的键($temp_for_base_special_words[$value])与的值,并将该键替换为输入文本中的值.


Using $temp_for_base_special_words values for find values for with keys( $temp_for_base_special_words[$value]) in $base_special_words with and replace that keys to value in input text.


使用$temp_for_default_words值查找$base_default_words中的键($temp_for_default_words[$value])与的值,并将该键替换为输入文本中的值.


Using $temp_for_default_words values for find values for with keys( $temp_for_default_words[$value]) in $base_default_words with and replace that keys to value in input text.


使用$temp_for_base_words值查找$base_words中的键($temp_for_base_words[$value])与的值,并将该键替换为输入文本中的值.


Using $temp_for_base_words values for find values for with keys( $temp_for_base_words[$value]) in $base_words with and replace that keys to value in input text.

推荐答案

@ctwheels想要告诉您的是使用

What @ctwheels wanted to tell you is to use str_ireplace (documentation), if you want to correct word with case-insensitive.

<?php
     $test="Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
     $word=explode(" ",$test); //This function is need for take all the words individually, the link of the function is above
     foreach($word as $key=>$value)
        if (array_key_exists($value,$YourArrayWithCorrectWord))
            $word[$key]=$YourArrayWithCorrectWord[$value]; //This, if i don't make mistakes, take the correct word and assigns to the wrong word.

     $TestCorrect=implode(" ",$word);
?>

如果您不了解某些内容,请给我写信.

If there is something that you don't understand, write me.

希望我能对您有所帮助.

I hope I have helped you.

文档: 此处是爆炸的文档

此处是爆破的文档

此处是array_key_exsist的文档

P.S.这种方法存在无法同时纠正两个或多个单词的问题.

P.S. This method have the problem that you can't correct two or more words together.

这篇关于如何实现我的算法文本校正以替换文本中的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆