处理 CSV 文件中的逗号 [英] Dealing with commas in a CSV file
问题描述
我正在寻找有关如何处理 csv 文件的建议,该文件正在创建,然后由我们的客户上传,该文件的值中可能包含逗号,例如公司名称.
I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.
我们正在考虑的一些想法是:引用标识符(值、"值、"等)或使用 |而不是逗号.最大的问题是我们必须让它变得容易,否则客户不会这样做.
Some of the ideas we are looking at are: quoted Identifiers (value "," values ","etc) or using a | instead of a comma. The biggest problem is that we have to make it easy, or the customer won't do it.
推荐答案
正如其他人所说,您需要对包含引号的值进行转义.这是 C♯ 中的一个小型 CSV 阅读器,它支持引用值,包括嵌入的引号和回车.
As others have said, you need to escape values that include quotes. Here’s a little CSV reader in C♯ that supports quoted values, including embedded quotes and carriage returns.
顺便说一下,这是经过单元测试的代码.我现在发布它是因为这个问题似乎经常出现,而当简单的 CSV 支持可以做到时,其他人可能不想要整个库.
By the way, this is unit-tested code. I’m posting it now because this question seems to come up a lot and others may not want an entire library when simple CSV support will do.
您可以按如下方式使用它:
You can use it as follows:
using System;
public class test
{
public static void Main()
{
using ( CsvReader reader = new CsvReader( "data.csv" ) )
{
foreach( string[] values in reader.RowEnumerator )
{
Console.WriteLine( "Row {0} has {1} values.", reader.RowIndex, values.Length );
}
}
Console.ReadLine();
}
}
这是课程.请注意,您也可以使用 Csv.Escape
函数来编写有效的 CSV.
Here are the classes. Note that you can use the Csv.Escape
function to write valid CSV as well.
using System.IO;
using System.Text.RegularExpressions;
public sealed class CsvReader : System.IDisposable
{
public CsvReader( string fileName ) : this( new FileStream( fileName, FileMode.Open, FileAccess.Read ) )
{
}
public CsvReader( Stream stream )
{
__reader = new StreamReader( stream );
}
public System.Collections.IEnumerable RowEnumerator
{
get {
if ( null == __reader )
throw new System.ApplicationException( "I can't start reading without CSV input." );
__rowno = 0;
string sLine;
string sNextLine;
while ( null != ( sLine = __reader.ReadLine() ) )
{
while ( rexRunOnLine.IsMatch( sLine ) && null != ( sNextLine = __reader.ReadLine() ) )
sLine += "
" + sNextLine;
__rowno++;
string[] values = rexCsvSplitter.Split( sLine );
for ( int i = 0; i < values.Length; i++ )
values[i] = Csv.Unescape( values[i] );
yield return values;
}
__reader.Close();
}
}
public long RowIndex { get { return __rowno; } }
public void Dispose()
{
if ( null != __reader ) __reader.Dispose();
}
//============================================
private long __rowno = 0;
private TextReader __reader;
private static Regex rexCsvSplitter = new Regex( @",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" );
private static Regex rexRunOnLine = new Regex( @"^[^""]*(?:""[^""]*""[^""]*)*""[^""]*$" );
}
public static class Csv
{
public static string Escape( string s )
{
if ( s.Contains( QUOTE ) )
s = s.Replace( QUOTE, ESCAPED_QUOTE );
if ( s.IndexOfAny( CHARACTERS_THAT_MUST_BE_QUOTED ) > -1 )
s = QUOTE + s + QUOTE;
return s;
}
public static string Unescape( string s )
{
if ( s.StartsWith( QUOTE ) && s.EndsWith( QUOTE ) )
{
s = s.Substring( 1, s.Length - 2 );
if ( s.Contains( ESCAPED_QUOTE ) )
s = s.Replace( ESCAPED_QUOTE, QUOTE );
}
return s;
}
private const string QUOTE = """;
private const string ESCAPED_QUOTE = """";
private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '
' };
}
这篇关于处理 CSV 文件中的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!