C ++从字符串中剥离非ASCII字符 [英] C++ Strip non-ASCII Characters from string
问题描述
开始之前;是的,我知道这是一个重复的问题,是的,我已经查看了发布的解决方案.我的问题是我无法让他们工作.
Before you get started; yes I know this is a duplicate question and yes I have looked at the posted solutions. My problem is I could not get them to work.
bool invalidChar (char c)
{
return !isprint((unsigned)c);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
我在Prusæus,Ægyptians"上测试了此方法,但没有执行任何操作
我还尝试用isprint
代替isalnum
I tested this method on "Prusæus, Ægyptians," and it did nothing
I also attempted to substitute isprint
for isalnum
真正的问题是在程序的另一部分中转换string-> wstring-> string时发生的.如果string-> wstring转换中存在unicode字符,则转换失败.
The real problem occurs when, in another section of my program I convert string->wstring->string. the conversion balks if there are unicode chars in the string->wstring conversion.
参考:
无论是否有帮助,我仍然想删除所有非ASCII字符,这是我崩溃的地方:
I still would like to remove all non-ASCII chars regardless yet if it helps, here is where I am crashing:
// Convert to wstring
wchar_t* UnicodeTextBuffer = new wchar_t[ANSIWord.length()+1];
wmemset(UnicodeTextBuffer, 0, ANSIWord.length()+1);
mbstowcs(UnicodeTextBuffer, ANSIWord.c_str(), ANSIWord.length());
wWord = UnicodeTextBuffer; //CRASH
错误对话框
MSVC ++调试库
MSVC++ Debug Library
调试断言失败!
程序://myproject
Program: //myproject
文件:f:\ dd \ vctools \ crt_bld \ self_x86 \ crt \ src \ isctype.c
File: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c
行://上方
表达式:(unsigned)(c + 1)< = 256
Expression:(unsigned)(c+1)<=256
更麻烦的是:我从中读取的.txt文件是ANSI编码的. 中的所有内容均应有效.
Further compounding the matter: the .txt file I am reading in from is ANSI encoded. Everything within should be valid.
解决方案:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
如果其他人想复制/粘贴此内容,我可以取消选中此问题.
If someone else would like to copy/paste this, I can check this question off.
供以后参考:尝试使用 __isascii,iswascii 命令
For future reference: try using the __isascii, iswascii commands
推荐答案
解决方案:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
供以后参考:请尝试使用__isascii,iswascii命令
For future reference: try using the __isascii, iswascii commands
这篇关于C ++从字符串中剥离非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!