解析.NET分隔CSV [英] Parse Delimited CSV in .NET

查看:256
本文介绍了解析.NET分隔CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,它是一个逗号分隔的格式,由分隔的在大多数领域。我试图获取到的东西我可以枚举(通用收集,例如)。我没有对如何在文件输出,也不是用来分隔符的角色控制。

I have a text file that is in a comma separated format, delimited by " on most fields. I am trying to get that into something I can enumerate through (Generic Collection, for example). I don't have control over how the file is output nor the character it uses for the delimiter.

在这种情况下,字段由逗号和文本字段分开封装在标记。我遇到的问题是,一些领域有引号他们(即8 托盘),并意外地被拾起作为下一个领域。在数字字段的情况下,它们不具有周围的报价,但他们并开始以+或 - 符号(描绘的正/负号)。

In this case, the fields are separated by a comma and text fields are enclosed in " marks. The problem I am running into is that some fields have quotation marks in them (i.e. 8" Tray) and are accidentally being picked up as the next field. In the case of numeric fields, they don't have quotes around them, but they do start with a + or a - sign (depicting a positive/negative number).

我在想一个正则表达式,但我的技能是不是很大,所以希望有人能想出一些想法,我可以尝试。有此文件中19000条记录,所以我想尽可能高效地做到这一点。这里有几个例子行的数据:

I was thinking of a RegEx, but my skills aren't that great so hopefully someone can come up with some ideas I can try. There are about 19,000 records in this file, so I am trying to do it as efficiently as possible. Here are a couple of example rows of data:

00,000000112260,丿南瓜,RET,6.99,,EA,+ 0000000006.99000 00,000000304078,丿苹果焦糖,RET,9.99,,EA,+ 0000000009.99000 00,的StringValue这里,食品的8托盘,RET,6.99,,EA,-00000000005.3200

"00","000000112260 ","Pie Pumpkin ","RET","6.99 "," ","ea ",+0000000006.99000 "00","000000304078 ","Pie Apple caramel ","RET","9.99 "," ","ea ",+0000000009.99000 "00","StringValue here","8" Tray of Food ","RET","6.99 "," ","ea ",-00000000005.3200

有很多更多的领域,但你可以得到的图片......

There are a lot more fields, but you can get the picture....

我使用VB.NET和我有一个泛型列表的设置来接受数据。我曾尝试使用 CSVReader ,它似乎很好地工作,直到你达到创纪录的像第3一(在文本字段报价)。如果我能以某种方式得到它来处理额外的报价,比CSVReader选项将工作的伟大。

I am using VB.NET and I have a generic List setup to accept the data. I have tried using CSVReader and it seems to work well until you hit a record like the 3rd one (with a quote in the text field). If I could somehow get it to handle the additional quotes, than the CSVReader option will work great.

谢谢!

推荐答案

从的此处

Encoding fileEncoding = GetFileEncoding(csvFile);
// get rid of all doublequotes except those used as field delimiters
string fileContents = File.ReadAllText(csvFile, fileEncoding);
string fixedContents = Regex.Replace(fileContents, @"([^\^,\r\n])""([^$,\r\n])", @"$1$2");
using (CsvReader csv =
       new CsvReader(new StringReader(fixedContents), true))
{
       // ... parse the CSV

这篇关于解析.NET分隔CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆