C ++从字符串中剥离非ASCII字符 [英] C++ Strip non-ASCII Characters from string

查看:126
本文介绍了C ++从字符串中剥离非ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

开始之前;是的,我知道这是一个重复的问题,是的,我已经查看了发布的解决方案.我的问题是我无法让他们工作.

Before you get started; yes I know this is a duplicate question and yes I have looked at the posted solutions. My problem is I could not get them to work.

bool invalidChar (char c)
{ 
    return !isprint((unsigned)c); 
}
void stripUnicode(string & str)
{
    str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end()); 
}

我在Prusæus,Ægyptians"上测试了此方法,但没有执行任何操作 我还尝试用isprint代替isalnum

I tested this method on "Prusæus, Ægyptians," and it did nothing I also attempted to substitute isprint for isalnum

真正的问题是在程序的另一部分中转换string-> wstring-> string时发生的.如果string-> wstring转换中存在unicode字符,则转换失败.

The real problem occurs when, in another section of my program I convert string->wstring->string. the conversion balks if there are unicode chars in the string->wstring conversion.

参考:

如何剥离非-字符串中的ASCII字符? (在C#中)

如何剥离所有非来自c ++中字符串的字母数字字符?

无论是否有帮助,我仍然想删除所有非ASCII字符,这是我崩溃的地方:

I still would like to remove all non-ASCII chars regardless yet if it helps, here is where I am crashing:

// Convert to wstring
wchar_t* UnicodeTextBuffer = new wchar_t[ANSIWord.length()+1];
wmemset(UnicodeTextBuffer, 0, ANSIWord.length()+1);
mbstowcs(UnicodeTextBuffer, ANSIWord.c_str(), ANSIWord.length());
wWord = UnicodeTextBuffer; //CRASH

错误对话框

MSVC ++调试库

MSVC++ Debug Library

调试断言失败!

程序://myproject

Program: //myproject

文件:f:\ dd \ vctools \ crt_bld \ self_x86 \ crt \ src \ isctype.c

File: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c

行://上方

表达式:(unsigned)(c + 1)< = 256

Expression:(unsigned)(c+1)<=256

更麻烦的是:我从中读取的.txt文件是ANSI编码的. 中的所有内容均应有效.

Further compounding the matter: the .txt file I am reading in from is ANSI encoded. Everything within should be valid.

解决方案:

bool invalidChar (char c) 
{  
    return !(c>=0 && c <128);   
} 
void stripUnicode(string & str) 
{ 
    str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());  
}

如果其他人想复制/粘贴此内容,我可以取消选中此问题.

If someone else would like to copy/paste this, I can check this question off.

供以后参考:尝试使用 __isascii,iswascii 命令

For future reference: try using the __isascii, iswascii commands

推荐答案

解决方案:

bool invalidChar (char c) 
{  
    return !(c>=0 && c <128);   
} 
void stripUnicode(string & str) 
{ 
    str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());  
}

供以后参考:请尝试使用__isascii,iswascii命令

For future reference: try using the __isascii, iswascii commands

这篇关于C ++从字符串中剥离非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆