从包含重音字符剪贴板(从Excel粘贴)获取CSV数据 [英] Get CSV Data from Clipboard (pasted from Excel) that contains accented characters

查看:243
本文介绍了从包含重音字符剪贴板(从Excel粘贴)获取CSV数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

情景




  • 我的用户会从Excel(因此将它放到剪贴板)

  • 复制单元格
  • 我的应用程序将检索这些细胞从剪贴板



问题




  • 我的代码从剪贴板

  • 然而,如果原始Excel内容包含像是字符CSV格式(带变音符号),然后检索CSV字符串没有正确的字符(最终呈现方形对我来说)

  • 在比较,如果我的代码从剪贴板中的一切Unicode文本格式正常工作:在原始 - - 的问题
    $ a是从剪贴板



源代码检索到的字符串中保留b $ b

  [STAThread] 
静态无效的主要(字串[] args)
{
VAR fmt_csv = System.Windows.Forms.DataFormats .CommaSeparatedValue;

//读取CSV
VAR数据对象= System.Windows.Forms.Clipboard.GetDataObject();
VAR流=(的System.IO.Stream)dataobject.GetData(fmt_csv);
变种ENC =新System.Text.UTF8Encoding();
变种读卡器=新就是System.IO.StreamReader(流,ENC);
串data_csv = reader.ReadToEnd();

//读取unicode字符串
串data_string = System.Windows.Forms.Clipboard.GetText();



}



结果时RUNNING示例代码




  • 摄制步骤:在Excel中输入一些文字(我用了分身加一些数字),并简单地按下按Ctrl - C将它复制到剪贴板中,然后运行上面的代码。

  • data_csv设置为doppelgnger,1\r\\\
    2,3\r\\\
    \\ \\0

  • data_string设置为doppelgänger\t1\r\\\
    2\t3\r\\\



问题




  • 我能做些什么,以获得正确的字符?



评论




  • 是的,我知道我可以通过使用Unicode文本解决此问题。但其实我是想明白是怎么回事的CSV

  • 使用或不使用UTF-8编码时,检索流使得结果没有区别



答案



看着的意见,并密切注意什么Excel软件将在后剪贴板CSV,它似乎是合理的Excel中可能将使用传统,而不是编码UTF-8的内容。于是,我尝试使用Windows 1252代码页的编码和它的工作。见下面



源代码的代码 - 的答案



  [ STAThread] 
静态无效的主要(字串[] args)
{
VAR fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue;

//读取CSV
VAR数据对象= System.Windows.Forms.Clipboard.GetDataObject();
VAR流=(的System.IO.Stream)dataobject.GetData(fmt_csv);
VAR ENC = System.Text.Encoding.GetEncoding(1252);
变种读卡器=新就是System.IO.StreamReader(流,ENC);
串data_csv = reader.ReadToEnd(); (

//读取Unicode字符串
串data_string = System.Windows.Forms.Clipboard.GetText);
}


解决方案

Excel存储在弦上剪贴板使用Unicode字符编码。当您尝试读取ANSI字符串你得到一个正方形的原因是,有你的系统中的代码页ANSI该字符不表示。你应该使用Unicode。如果你将要处理的本地化问题,那么ANSI仅仅是更多的麻烦比它的价值。



编辑:乔尔斯波斯基写了很好的介绍字符编码,这绝对是值得一试:绝对最低每一个软件开发人员绝对,积极必备了解Unicode和字符集(没有借口!)


SCENARIO

  • My users will copy cells from Excel (thus placing it into the clipboard)
  • And my application will retrieve those cells from the clipboard

THE PROBLEM

  • My code retrieves the CSV format from the clipboard
  • However, the if the original Excel content contains characters like ä (a with umlaut) then retrieved CSV string doesn't have the correct characters (ä ends up showing as a "square" for me)
  • In comparison, if my code retrieves the Unicode text format from the clipboard everything works fine: the ä is preserved in the string retrieved from the clipboard

SOURCE CODE - ORIGINAL - WITH THE PROBLEM

[STAThread]
static void Main(string[] args)
{
    var fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue;

    // read the CSV
    var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
    var stream = (System.IO.Stream)dataobject.GetData(fmt_csv);
    var enc = new System.Text.UTF8Encoding();
    var reader = new System.IO.StreamReader(stream,enc);
    string data_csv = reader.ReadToEnd();

    // read the unicode string
    string data_string = System.Windows.Forms.Clipboard.GetText();



}

THE RESULTS WHEN RUNNING THE SAMPLE CODE

  • Repro steps: Enter some text in Excel (I used the word "doppelgänger" plus some numbers) and simply hit Ctrl-C to copy it to the clipboard and then run the code above.
  • data_csv is set to "doppelg�nger,1\r\n2,3\r\n\0"
  • data_string is set to "doppelgänger\t1\r\n2\t3\r\n"

QUESTION

  • What can I do to get the correct characters?

COMMENTS

  • Yes, I know I could workaround this problem by using the Unicode text. But I actually want to understand what is going on with the CSV
  • using or not using the UTF-8 encoding when retrieving the stream makes no difference in the results

THE ANSWER

After looking at the comments, and paying close attention to what Excel was putting on the clipboard for CSV, it seemed reasonable that Excel might be placing the contents using an "legacy" encoding instead of UTF-8. So I tried the using the Windows 1252 codepage as the encoding and it worked. See the code below

SOURCE CODE - WITH THE ANSWER

[STAThread]
static void Main(string[] args)
{
    var fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue;

    //read the CSV
    var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
    var stream = (System.IO.Stream)dataobject.GetData(fmt_csv);
    var enc = System.Text.Encoding.GetEncoding(1252);
    var reader = new System.IO.StreamReader(stream,enc);
    string data_csv= reader.ReadToEnd();

    //read the Unicode String
    string data_string = System.Windows.Forms.Clipboard.GetText();
}

解决方案

Excel stores the string on the clipboard using the Unicode character encoding. The reason you get a square when you try to read the string in ANSI is that there is no representation for that character in your system's ANSI codepage. You should just use Unicode. If you're going to be dealing with localization issues, then ANSI is just more trouble than it's worth.

Edit: Joel Spolsky wrote an excellent introduction to character encodings, which is definitely worth checking out: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

这篇关于从包含重音字符剪贴板(从Excel粘贴)获取CSV数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆