特殊字符将转换为“?”从CSV导入时 [英] Special Characters are getting converted to "?" on importing from CSV

查看:74
本文介绍了特殊字符将转换为“?”从CSV导入时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我从excel表格导入特殊字符时遇到问题。



特殊字符包括(¡¢£¤¥|§¨©ª«¬®¯°±²³'μ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÜÝÞßàáâãäåæçèéêëìíîïðñòóôö÷øùúûüýþÿ)





我使用streamreader导入此文本,并且没有参数传递给streamreader,因此默认为UTF-8。

导入后,特殊字符显示为 ?。

其中Encoding.Default作为参数传递给streamreader,特殊包机显示属性。





但是我遇到了这个msdn帖子,上面写着Encoding.Deafult不建议使用。



https://msdn.microsoft.com/en-us/library/system.text.encoding.default(v = vs.110).aspx



Encoding.Default是否可以在2个不同的系统中更改并处理其他语言的特殊字符?



我的代码如下所示:



FileStream fs = new fileStream(sFileName,FileMode.Open,FileAccess.Read,FileShare.ReadWrite);

StreamReader sw = new StreamReader(fs);

String sbuf = sw.ReadLine();



当我读取上面的csv文件时提到特殊字符,sbuf值显示为?



当我像这样使用Streamreader时工作正常。

StreamReader sw = new StreamReader( fs,Encoding.Default);



我只想知道如果我使用Encoding.Default会有任何问题。



如果来自不同代码页的更多字符出现,它的行为如何。这是否会导致数据丢失。





有人可以分享你的想法吗。

Hi,

I am facing an issue while importing special characters from excel sheet.

Special characters includes (¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÜÝÞßàáâãäåæçèéêëìíîïðñòóôö÷øùúûüýþÿ)


I am using streamreader to import this text and no parameter is passed to streamreader, so default is UTF-8.
After import the special chater are displayes as "?".
Where Encoding.Default is passed as parameter to streamreader the special charter are displaying propery.


But I came across this msdn post which says Encoding.Deafult is not advisable to use.

https://msdn.microsoft.com/en-us/library/system.text.encoding.default(v=vs.110).aspx

Does Encoding.Default can change in 2 different systems and will it handle special characters from other language?

My code looks like this:

FileStream fs = new fileStream(sFileName,FileMode.Open,FileAccess.Read,FileShare.ReadWrite);
StreamReader sw = new StreamReader(fs);
String sbuf = sw.ReadLine();

When I read the csv file having above mentioned special characters, sbuf value is showing as ?

It works fine when I use Streamreader like this.
StreamReader sw = new StreamReader(fs,Encoding.Default);

I just want to know will there be any issues if I use Encoding.Default.

How it behaves if some more characters from different code page comes. Does this result in data loss.


Can someone share your thoughts on this.

推荐答案

First总之,那些人物并不特别;我真的不知道什么是特殊性格,可能只是某人的幻想。一个角色可以是特殊的只与某些特定形式的东西有关,而不是它本身。



现在,忘记所有默认垃圾并找出什么是您正在阅读的文本文件的真实编码。在一些(许多)基于Unicode的编码(UTF)的情况下,文本文件可以具有指示它是什么的BOM。然后文本编辑器将告诉您另存为的含义。请参阅: http://unicode.org/faq/utf_bom.html [ ^ ]。



然而,怎么做如果没有BOM或编码不是Unicode?我的秘密武器是任何好的Web浏览器。将输入文件重命名为* .HTML并打开。如果它不可读,请使用浏览器的查看/字符编码菜单快速找到正确的选项。



然后在实例化<$ c $时使用正确的编码c> StreamReader :

https://msdn.microsoft.com/en-us/library/system.io.streamreader.streamreader%28v=vs.110%29.aspx [ ^ ](选择一个编码参数),

https://msdn.microsoft.com/en-us/library/system.text.encoding%28v=vs.110%29 .aspx [ ^ ]。



就这么简单。



-SA
First of all, those characters are no special; I don't really know what is "special character", probably it's just someone's fantasy. A character can be "special" only in relation to some specific format of something, not by itself.

Now, forget all the "default" rubbish and find out what is the real encoding of the text file your are reading. In some (many) cases of Unicode-based encodings (UTFs), the text file can have BOM indicating what is it. Then text editors will tell you what it is on "Save As". Please see: http://unicode.org/faq/utf_bom.html[^].

However, what to do if there is no BOM or encoding is not Unicode? My "secret weapon" is any good Web browser. Rename your input file as *.HTML and open. If it is not readable, use View / Character Encoding menu of your browser to quickly find out right option.

And then use proper encoding when you instantiate StreamReader:
https://msdn.microsoft.com/en-us/library/system.io.streamreader.streamreader%28v=vs.110%29.aspx[^] (pick one with Encoding argument),
https://msdn.microsoft.com/en-us/library/system.text.encoding%28v=vs.110%29.aspx[^].

As simple as that.

—SA


使用下面的代码



text = Regex.Replace( cell.Text,@[^ \ u0020-\ u007E],string.Empty);



你需要添加命名空间

System.Text.RegularExpressions;
Use below code

text=Regex.Replace(cell.Text, @"[^\u0020-\u007E]", string.Empty);

You need to add the namespace
System.Text.RegularExpressions;


这篇关于特殊字符将转换为“?”从CSV导入时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆