从包含重音字符剪贴板(从Excel粘贴)获取CSV数据 [英] Get CSV Data from Clipboard (pasted from Excel) that contains accented characters
问题描述
情景
- 我的用户会从Excel(因此将它放到剪贴板)
- 我的应用程序将检索这些细胞从剪贴板
复制单元格
问题
- 我的代码从剪贴板
- 然而,如果原始Excel内容包含像是字符CSV格式(带变音符号),然后检索CSV字符串没有正确的字符(最终呈现方形对我来说)
- 在比较,如果我的代码从剪贴板中的一切Unicode文本格式正常工作:在原始 - - 的问题
$ a是从剪贴板
源代码检索到的字符串中保留b $ b [STAThread]
静态无效的主要(字串[] args)
{
VAR fmt_csv = System.Windows.Forms.DataFormats .CommaSeparatedValue;
//读取CSV
VAR数据对象= System.Windows.Forms.Clipboard.GetDataObject();
VAR流=(的System.IO.Stream)dataobject.GetData(fmt_csv);
变种ENC =新System.Text.UTF8Encoding();
变种读卡器=新就是System.IO.StreamReader(流,ENC);
串data_csv = reader.ReadToEnd();
//读取unicode字符串
串data_string = System.Windows.Forms.Clipboard.GetText();
}
[STAThread]
静态无效的主要(字串[] args)
{
VAR fmt_csv = System.Windows.Forms.DataFormats .CommaSeparatedValue;
//读取CSV
VAR数据对象= System.Windows.Forms.Clipboard.GetDataObject();
VAR流=(的System.IO.Stream)dataobject.GetData(fmt_csv);
变种ENC =新System.Text.UTF8Encoding();
变种读卡器=新就是System.IO.StreamReader(流,ENC);
串data_csv = reader.ReadToEnd();
//读取unicode字符串
串data_string = System.Windows.Forms.Clipboard.GetText();
}
结果时RUNNING示例代码
- 摄制步骤:在Excel中输入一些文字(我用了分身加一些数字),并简单地按下按Ctrl - C将它复制到剪贴板中,然后运行上面的代码。
- data_csv设置为doppelgnger,1\r\\\
2,3\r\\\
\\ \\0 - data_string设置为doppelgänger\t1\r\\\
2\t3\r\\\
问题
- 我能做些什么,以获得正确的字符?
评论
- 是的,我知道我可以通过使用Unicode文本解决此问题。但其实我是想明白是怎么回事的CSV
- 使用或不使用UTF-8编码时,检索流使得结果没有区别
答案
看着的意见,并密切注意什么Excel软件将在后剪贴板CSV,它似乎是合理的Excel中可能将使用传统,而不是编码UTF-8的内容。于是,我尝试使用Windows 1252代码页的编码和它的工作。见下面
源代码的代码 - 的答案
[ STAThread]
静态无效的主要(字串[] args)
{
VAR fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue;
//读取CSV
VAR数据对象= System.Windows.Forms.Clipboard.GetDataObject();
VAR流=(的System.IO.Stream)dataobject.GetData(fmt_csv);
VAR ENC = System.Text.Encoding.GetEncoding(1252);
变种读卡器=新就是System.IO.StreamReader(流,ENC);
串data_csv = reader.ReadToEnd(); (
//读取Unicode字符串
串data_string = System.Windows.Forms.Clipboard.GetText);
}
Excel存储在弦上剪贴板使用Unicode字符编码。当您尝试读取ANSI字符串你得到一个正方形的原因是,有你的系统中的代码页ANSI该字符不表示。你应该使用Unicode。如果你将要处理的本地化问题,那么ANSI仅仅是更多的麻烦比它的价值。
编辑:乔尔斯波斯基写了很好的介绍字符编码,这绝对是值得一试:绝对最低每一个软件开发人员绝对,积极必备了解Unicode和字符集(没有借口!)
SCENARIO
- My users will copy cells from Excel (thus placing it into the clipboard)
- And my application will retrieve those cells from the clipboard
THE PROBLEM
- My code retrieves the CSV format from the clipboard
- However, the if the original Excel content contains characters like ä (a with umlaut) then retrieved CSV string doesn't have the correct characters (ä ends up showing as a "square" for me)
- In comparison, if my code retrieves the Unicode text format from the clipboard everything works fine: the ä is preserved in the string retrieved from the clipboard
SOURCE CODE - ORIGINAL - WITH THE PROBLEM
[STAThread]
static void Main(string[] args)
{
var fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue;
// read the CSV
var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
var stream = (System.IO.Stream)dataobject.GetData(fmt_csv);
var enc = new System.Text.UTF8Encoding();
var reader = new System.IO.StreamReader(stream,enc);
string data_csv = reader.ReadToEnd();
// read the unicode string
string data_string = System.Windows.Forms.Clipboard.GetText();
}
THE RESULTS WHEN RUNNING THE SAMPLE CODE
- Repro steps: Enter some text in Excel (I used the word "doppelgänger" plus some numbers) and simply hit Ctrl-C to copy it to the clipboard and then run the code above.
- data_csv is set to "doppelg�nger,1\r\n2,3\r\n\0"
- data_string is set to "doppelgänger\t1\r\n2\t3\r\n"
QUESTION
- What can I do to get the correct characters?
COMMENTS
- Yes, I know I could workaround this problem by using the Unicode text. But I actually want to understand what is going on with the CSV
- using or not using the UTF-8 encoding when retrieving the stream makes no difference in the results
THE ANSWER
After looking at the comments, and paying close attention to what Excel was putting on the clipboard for CSV, it seemed reasonable that Excel might be placing the contents using an "legacy" encoding instead of UTF-8. So I tried the using the Windows 1252 codepage as the encoding and it worked. See the code below
SOURCE CODE - WITH THE ANSWER
[STAThread]
static void Main(string[] args)
{
var fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue;
//read the CSV
var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
var stream = (System.IO.Stream)dataobject.GetData(fmt_csv);
var enc = System.Text.Encoding.GetEncoding(1252);
var reader = new System.IO.StreamReader(stream,enc);
string data_csv= reader.ReadToEnd();
//read the Unicode String
string data_string = System.Windows.Forms.Clipboard.GetText();
}
Excel stores the string on the clipboard using the Unicode character encoding. The reason you get a square when you try to read the string in ANSI is that there is no representation for that character in your system's ANSI codepage. You should just use Unicode. If you're going to be dealing with localization issues, then ANSI is just more trouble than it's worth.
Edit: Joel Spolsky wrote an excellent introduction to character encodings, which is definitely worth checking out: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
这篇关于从包含重音字符剪贴板(从Excel粘贴)获取CSV数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!