如何删除C ++ std :: string中的重音符和波浪号 [英] How to remove accents and tilde in a C++ std::string

查看:301
本文介绍了如何删除C ++ std :: string中的重音符和波浪号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个C ++中的字符串有一个西班牙语有几个单词的问题。这意味着我有很多带有口音和符号的词。我想替换他们没有重音的同行。示例:我要替换这个词:habíafor habia。我尝试直接替换它,但使用字符串类的替换方法,但我不能让它工作。



我使用这个代码:

  for(it = dictionary.begin(); it!= dictionary.end(); it ++)
{
strMine = - >第一);
found = toReplace.find_first_of(strMine);
while(found!= std :: string :: npos)
{
strAux =(it-> second);
toReplace.erase(found,strMine.length());
toReplace.insert(found,strAux);
found = toReplace.find_first_of(strMine,found + 1);
}
}

其中字典是这样的地图(有更多的条目):

  dictionary.insert(std :: pair& :string,std :: string>(á,a)); 
dictionary.insert(std :: pair< std :: string,std :: string>(é,e));
dictionary.insert(std :: pair< std :: string,std :: string>(í,i));
dictionary.insert(std :: pair< std :: string,std :: string>(ó,o));
dictionary.insert(std :: pair< std :: string,std :: string>(ú,u))
dictionary.insert(std :: pair< std :: string,std :: string>(ñ,n));

toReplace 字符串是:

  std :: string toReplace =á-é-í-ó-é-á-é-í-ó-ú- ñ; 

我显然必须缺少某些东西。我不能想出来。


解决方案

div>

首先,这是一个很糟糕的主意:你通过删除字母来破坏某人的语言。虽然对于只有英语的人来说,像天真这样的词的额外的点似乎是多余的,但在世界上有成千上万的书写系统,其中这样的区别是非常重要的。编写软件来扼杀某人的言语,使你在使用计算机作为扩大人类表达的领域与压迫工具之间的紧张的正确方面。



是你试图这样做的原因?是进一步下来的东西窒息的口音?很多人会乐意帮助你解决这个问题。



这就是说,libicu可以为你做这个。打开转换演示;将西班牙语文本复制并粘贴到输入框中;输入

  NFD; [:M:] remove; NFC 

为复合1,然后单击变换。


$ b b

(在 ICU中的Unicode变换幻灯片9的帮助下)幻灯片29-30显示了如何使用API 。)


I have a problem with a string in C++ which has several words in Spanish. This means that I have a lot of words with accents and tildes. I want to replace them for their not accented counterparts. Example: I want to replace this word: "había" for habia. I tried replace it directly but with replace method of string class but I could not get that to work.

I'm using this code:

for (it= dictionary.begin(); it != dictionary.end(); it++)
{
    strMine=(it->first);
    found=toReplace.find_first_of(strMine);
    while (found!=std::string::npos)
    {
        strAux=(it->second);
        toReplace.erase(found,strMine.length());
        toReplace.insert(found,strAux);
        found=toReplace.find_first_of(strMine,found+1);
    }
}

Where dictionary is a map like this (with more entries):

dictionary.insert ( std::pair<std::string,std::string>("á","a") );
dictionary.insert ( std::pair<std::string,std::string>("é","e") );
dictionary.insert ( std::pair<std::string,std::string>("í","i") );
dictionary.insert ( std::pair<std::string,std::string>("ó","o") );
dictionary.insert ( std::pair<std::string,std::string>("ú","u") );
dictionary.insert ( std::pair<std::string,std::string>("ñ","n") );

and toReplace strings is:

std::string toReplace="á-é-í-ó-ú-ñ-á-é-í-ó-ú-ñ";

I obviously must be missing something. I can't figure it out. Is there any library I can use?.

Thanks,

解决方案

First, this is a really bad idea: you’re mangling somebody’s language by removing letters. Although the extra dots in words like "naïve" seem superfluous to people who only speak English, there are literally thousands of writing systems in the world in which such distinctions are very important. Writing software to mutilate someone’s speech puts you squarely on the wrong side of the tension between using computers as means to broaden the realm of human expression vs. tools of oppression.

What is the reason you’re trying to do this? Is something further down the line choking on the accents? Many people would love to help you solve that.

That said, libicu can do this for you. Open the transform demo; copy and paste your Spanish text into the "Input" box; enter

NFD; [:M:] remove; NFC

as "Compound 1" and click transform.

(With help from slide 9 of Unicode Transforms in ICU. Slides 29-30 show how to use the API.)

这篇关于如何删除C ++ std :: string中的重音符和波浪号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆