解析半冒号分隔符文件 [英] Parsing semi colon delimeter file

查看:142
本文介绍了解析半冒号分隔符文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CSV文件,但定界符是半冒号;,每列都用双引号引起来.在某些值(例如& amp;

I have a CSV file but the delimiter is a semi colon ; and each column is enclosed with double quotes. There are also occurrences of ; in some values such as & amp;

我正在使用 TextFieldParser 来解析文件.这是示例数据:

I am using TextFieldParser to parse the file. This is the sample data:

"A001";"RT:This is a tweet"; "http://www.whatever.com/test/module & amp;one"

对于上面的示例,我获得的列/字段超出了应有的数量.

For the above example , I am getting more columns/fields than what I should get.

Field[0] = "A001"
Field[1] = "RT:This is a tweet"
Field[2] = "http://www.whatever.com/test/module&amp"
Field[3] = "one"

Field[0] = "A001"
Field[1] = "RT:This is a tweet"
Field[2] = "http://www.whatever.com/test/module&amp"
Field[3] = "one"

这是我的代码.要应对这种情况,需要做哪些更改?

This is my code. What changes need to be done to handle such scenario?

 using (var parser  =  new TextFieldParser(fileName))
            {
                parser.TextFieldType = FieldType.Delimited;
                parser.SetDelimiters(";");
                parser.TrimWhiteSpace = true;
                parser.HasFieldsEnclosedInQuotes = false;

                int rowIndex = 0;
                PropertyInfo[] properties = typeof(TwitterData).GetProperties();
                while (parser.PeekChars(1) != null)
                {
                    var cleanFieldRowCells = parser.ReadFields().Select(
                        f => f.Trim(new[] { ' ', '"' }));

                    var twitter = new TwitterData();
                    int index = 0;
                    foreach (string c in cleanFieldRowCells)
                    {
                            string str = c;

                            if (properties[index].PropertyType == typeof(DateTime))
                            {
                                string twitterDateTemplate = "ddd MMM dd HH:mm:ss +ffff yyyy";
                                DateTime createdAt = DateTime.ParseExact(str, twitterDateTemplate, new System.Globalization.CultureInfo("en-AU"));
                                properties[index].SetValue(twitter, createdAt);
                            }
                            else
                            {
                                properties[index].SetValue(twitter, str);
                            }

                        index++;
                    }
                }

-艾伦-

推荐答案

使用上面具有的两个示例字符串并将HasFieldsEnclosedInQuotes属性设置为true对我而言有效.

Using the two sample strings you have above and setting the HasFieldsEnclosedInQuotes property to true works for me.

string LINES = @"
    ""A001"";""RT:This is a tweet""; ""http://www.whatever.com/test/module&one""
    ""A001"";""RT: Test1 ; Test2"";""test.com"";   
";
using (var sr = new StringReader(LINES))
{
    using (var parser = new TextFieldParser(sr))
    {
        parser.TextFieldType = FieldType.Delimited;
        parser.SetDelimiters(";");
        parser.TrimWhiteSpace = true;
        parser.HasFieldsEnclosedInQuotes = true;

        while (parser.PeekChars(1) != null)
        {
            var cleanFieldRowCells = parser.ReadFields().Select(
                f => f.Trim(new[] { ' ', '"' })).ToArray();
            Console.WriteLine("New Line");
            for (int i = 0; i < cleanFieldRowCells.Length; ++i)
            {
                Console.WriteLine(
                    "Field[{0}] = [{1}]", i, cleanFieldRowCells[i]
                );
            }
            Console.WriteLine("{0}", new string('=', 40));
        }
    }
}

输出:

New Line
Field[0] = [A001]
Field[1] = [RT:This is a tweet]
Field[2] = [http://www.whatever.com/test/module&amp;one]
========================================
New Line
Field[0] = [A001]
Field[1] = [RT: Test1 ; Test2]
Field[2] = [test.com]
Field[3] = []
========================================

这篇关于解析半冒号分隔符文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆