从c#中的ISO编码字符串中删除unicode字符 [英] Remove unicode characters from ISO encoded string in c#

查看:247
本文介绍了从c#中的ISO编码字符串中删除unicode字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个编码格式为UTF-8的数据。在将UTF-8数据转换为ISO时,某些字符会被破坏。我需要删除ISO编码数据中存在的所有unicode断字符。我想在c#中执行此操作。请提供一些解决方案。



当前代码为如下:



I have a data whose encoding format is in UTF-8.While converting the UTF-8 data to ISO,certain characters gets broken. I need to remove all the unicode broken characters present in the ISO encoded data.I would like to do this in c#.Please suggest some solution.

Current code is as follows:

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
strRawJobtext = utf8.GetString(nonunicodeBytes);
//nonunicodeBytes is a raw input data,
nonunicodeBytes = Encoding.Convert(utf8, iso, utf8.GetBytes(strRawJobtext));
strRawJobtext = iso.GetString(nonunicodeBytes);





谢谢,

Ruthra Vijayakumar



Thanks,
Ruthra Vijayakumar

推荐答案

请参阅我对该问题的评论。您的输入可以是任何编码,但如果您的输入是UTF-8,则它可能涵盖所有Unicode 字符保留曲目。这意味着,您应该只使用Unicode进行进一步处理,没有别的。您可以将数据存储在任何UTF中,它们都是等效的,但几乎在所有情况下都是最优选的UTF-8。使用其他编码,您可能会丢失字符。



-SA
Please see my comment to the question. Your input can be in any encoding, but if your input is UTF-8, it potentially covers all the Unicode character repertoire. It means, that you should use only Unicode for further processing, nothing else. You can store the data in any of the UTFs, they all are equivalent, but UTF-8 is the most preferable in almost all cases. With other encodings, you may loose characters.

—SA


这篇关于从c#中的ISO编码字符串中删除unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆