从剪贴板获取CSV数据(从Excel粘贴),其中包含重音字符 [英] Get CSV Data from Clipboard (pasted from Excel) that contains accented characters

查看:247
本文介绍了从剪贴板获取CSV数据(从Excel粘贴),其中包含重音字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SCENARIO




  • 我的用户将从Excel复制单元格(从而将其放入剪贴板)

  • < >
    $ b b b b b b b ul>
  • 我的代码从剪贴板中检索CSV格式

  • 但是,如果原始Excel内容包含ä(a用umlaut)没有正确的字符(ä最后显示为正方形为我)

  • 相比之下,如果我的代码从剪贴板检索Unicode文本格式一切正常: ä保存在从剪贴板检索的字符串中



源代码 - 原因 - 有问题



  [STAThread] 
static void Main(string [] args)
{
var fmt_csv = System.Windows.Forms.DataFormats .CommaSeparatedValue;

//读取CSV
var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
var stream =(System.IO.Stream)dataobject.GetData(fmt_csv);
var enc = new System.Text.UTF8Encoding();
var reader = new System.IO.StreamReader(stream,enc);
string data_csv = reader.ReadToEnd();

//读取unicode字符串
string data_string = System.Windows.Forms.Clipboard.GetText();



}



运行结果示例代码




  • 重复步骤:在Excel中输入一些文本(我用doppelgänger加上一些数字),然后按Ctrl- C将其复制到剪贴板,然后运行上面的代码。

  • data_csv设置为doppelg nger,1 \r\\\
    2,3\r\\\
    \\ \\ 0

  • data_string设置为doppelgänger\t1\r\\\
    2\t3\r\\\



QUESTION




  • 我可以如何取得正确的字元?



评论




  • 我可以解决这个问题通过使用Unicode文本。但我实际上想要了解CSV所发生的变化

  • 在检索流时使用或不使用UTF-8编码对结果没有影响



回答



查看评论后,剪贴板为CSV,它似乎是合理的Excel可能使用遗留编码,而不是UTF-8放置内容。所以我试着使用Windows 1252代码页作为编码,它的工作。请参阅下面的代码



源代码 - 与答案



  STAThread] 
static void Main(string [] args)
{
var fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue;

//读取CSV
var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
var stream =(System.IO.Stream)dataobject.GetData(fmt_csv);
var enc = System.Text.Encoding.GetEncoding(1252);
var reader = new System.IO.StreamReader(stream,enc);
string data_csv = reader.ReadToEnd();

//读取Unicode字符串
string data_string = System.Windows.Forms.Clipboard.GetText();
}


解决方案

剪贴板使用Unicode字符编码。当你尝试读取ANSI中的字符串时,得到一个正方形的原因是,在系统的ANSI代码页中没有该字符的表示。你应该使用Unicode。



编辑:Joel Spolsky写了一篇关于本地化问题的文章,对字符编码的极好介绍,这绝对值得一试:绝对最低限度每个软件开发人员绝对,积极地必须知道Unicode和字符集(无例外!)


SCENARIO

  • My users will copy cells from Excel (thus placing it into the clipboard)
  • And my application will retrieve those cells from the clipboard

THE PROBLEM

  • My code retrieves the CSV format from the clipboard
  • However, the if the original Excel content contains characters like ä (a with umlaut) then retrieved CSV string doesn't have the correct characters (ä ends up showing as a "square" for me)
  • In comparison, if my code retrieves the Unicode text format from the clipboard everything works fine: the ä is preserved in the string retrieved from the clipboard

SOURCE CODE - ORIGINAL - WITH THE PROBLEM

[STAThread]
static void Main(string[] args)
{
    var fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue;

    // read the CSV
    var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
    var stream = (System.IO.Stream)dataobject.GetData(fmt_csv);
    var enc = new System.Text.UTF8Encoding();
    var reader = new System.IO.StreamReader(stream,enc);
    string data_csv = reader.ReadToEnd();

    // read the unicode string
    string data_string = System.Windows.Forms.Clipboard.GetText();



}

THE RESULTS WHEN RUNNING THE SAMPLE CODE

  • Repro steps: Enter some text in Excel (I used the word "doppelgänger" plus some numbers) and simply hit Ctrl-C to copy it to the clipboard and then run the code above.
  • data_csv is set to "doppelg�nger,1\r\n2,3\r\n\0"
  • data_string is set to "doppelgänger\t1\r\n2\t3\r\n"

QUESTION

  • What can I do to get the correct characters?

COMMENTS

  • Yes, I know I could workaround this problem by using the Unicode text. But I actually want to understand what is going on with the CSV
  • using or not using the UTF-8 encoding when retrieving the stream makes no difference in the results

THE ANSWER

After looking at the comments, and paying close attention to what Excel was putting on the clipboard for CSV, it seemed reasonable that Excel might be placing the contents using an "legacy" encoding instead of UTF-8. So I tried the using the Windows 1252 codepage as the encoding and it worked. See the code below

SOURCE CODE - WITH THE ANSWER

[STAThread]
static void Main(string[] args)
{
    var fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue;

    //read the CSV
    var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
    var stream = (System.IO.Stream)dataobject.GetData(fmt_csv);
    var enc = System.Text.Encoding.GetEncoding(1252);
    var reader = new System.IO.StreamReader(stream,enc);
    string data_csv= reader.ReadToEnd();

    //read the Unicode String
    string data_string = System.Windows.Forms.Clipboard.GetText();
}

解决方案

Excel stores the string on the clipboard using the Unicode character encoding. The reason you get a square when you try to read the string in ANSI is that there is no representation for that character in your system's ANSI codepage. You should just use Unicode. If you're going to be dealing with localization issues, then ANSI is just more trouble than it's worth.

Edit: Joel Spolsky wrote an excellent introduction to character encodings, which is definitely worth checking out: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

这篇关于从剪贴板获取CSV数据(从Excel粘贴),其中包含重音字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆