处理CSV文件中的逗号 [英] Dealing with commas in a CSV file

查看:154
本文介绍了处理CSV文件中的逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找如何处理正在创建,然后由我们的客户上传的csv文件的建议,并且可能在值(如公司名称)中有逗号。

I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.

我们正在查看的一些想法是:引用的标识符(值,值,等)或使用|而不是逗号。最大的问题是,我们必须使它容易,否则客户不会这样做。

Some of the ideas we are looking at are: quoted Identifiers (value "," values ","etc) or using a | instead of a comma. The biggest problem is that we have to make it easy, or the customer won't do it.

推荐答案

正如其他人所说,您需要转义包含引号的值。这里有一个C♯中的一个CSV读取器支持引用的值,包括嵌入的引号和回车符。

As others have said, you need to escape values that include quotes. Here’s a little CSV reader in C♯ that supports quoted values, including embedded quotes and carriage returns.

顺便说一下,这是单元测试的代码。我现在发布,因为这个问题似乎出现了很多,其他人可能不想要一个完整的库,当简单的CSV支持将做。

By the way, this is unit-tested code. I’m posting it now because this question seems to come up a lot and others may not want an entire library when simple CSV support will do.

您可以使用它如下:

using System;
public class test
{
    public static void Main()
    {
        using ( CsvReader reader = new CsvReader( "data.csv" ) )
        {
            foreach( string[] values in reader.RowEnumerator )
            {
                Console.WriteLine( "Row {0} has {1} values.", reader.RowIndex, values.Length );
            }
        }
        Console.ReadLine();
    }
}

请注意,您可以使用 Csv.Escape 函数写入有效的CSV。

Here are the classes. Note that you can use the Csv.Escape function to write valid CSV as well.

using System.IO;
using System.Text.RegularExpressions;

public sealed class CsvReader : System.IDisposable
{
    public CsvReader( string fileName ) : this( new FileStream( fileName, FileMode.Open, FileAccess.Read ) )
    {
    }

    public CsvReader( Stream stream )
    {
        __reader = new StreamReader( stream );
    }

    public System.Collections.IEnumerable RowEnumerator
    {
        get {
            if ( null == __reader )
                throw new System.ApplicationException( "I can't start reading without CSV input." );

            __rowno = 0;
            string sLine;
            string sNextLine;

            while ( null != ( sLine = __reader.ReadLine() ) )
            {
                while ( rexRunOnLine.IsMatch( sLine ) && null != ( sNextLine = __reader.ReadLine() ) )
                    sLine += "\n" + sNextLine;

                __rowno++;
                string[] values = rexCsvSplitter.Split( sLine );

                for ( int i = 0; i < values.Length; i++ )
                    values[i] = Csv.Unescape( values[i] );

                yield return values;
            }

            __reader.Close();
        }
    }

    public long RowIndex { get { return __rowno; } }

    public void Dispose()
    {
        if ( null != __reader ) __reader.Dispose();
    }

    //============================================


    private long __rowno = 0;
    private TextReader __reader;
    private static Regex rexCsvSplitter = new Regex( @",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" );
    private static Regex rexRunOnLine = new Regex( @"^[^""]*(?:""[^""]*""[^""]*)*""[^""]*$" );
}

public static class Csv
{
    public static string Escape( string s )
    {
        if ( s.Contains( QUOTE ) )
            s = s.Replace( QUOTE, ESCAPED_QUOTE );

        if ( s.IndexOfAny( CHARACTERS_THAT_MUST_BE_QUOTED ) > -1 )
            s = QUOTE + s + QUOTE;

        return s;
    }

    public static string Unescape( string s )
    {
        if ( s.StartsWith( QUOTE ) && s.EndsWith( QUOTE ) )
        {
            s = s.Substring( 1, s.Length - 2 );

            if ( s.Contains( ESCAPED_QUOTE ) )
                s = s.Replace( ESCAPED_QUOTE, QUOTE );
        }

        return s;
    }


    private const string QUOTE = "\"";
    private const string ESCAPED_QUOTE = "\"\"";
    private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };
}

这篇关于处理CSV文件中的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆