从Perl中的CSV文件中删除多余的逗号 [英] Removing extra commas from csv file in perl
问题描述
我有多个CSV文件,每个文件的条目数量不同,每个条目大约300行.
I have a multiple CSV files each with a different amount of entries each with roughly 300 lines each.
每个文件的第一行是数据标签
The first line in each file is the Data labels
Person_id, person_name, person_email, person_address, person_recruitmentID, person_comments... etc
每个文件中的其余行都包含数据
The Rest of the lines in each file contain the data
"0001", "bailey", "123 fake, street", "bailey@mail.com", "0001", "this guy doesnt know how to get rid of, commas!"... etc
我想删除引号之间的逗号. 我目前正在阅读Text :: CSV文档,但过程缓慢.
I want to get rid of commas that are in between quotation marks. I'm currently going through the Text::CSV documentation but its a slow process.
推荐答案
一个好的CSV解析器对此没有任何问题,因为逗号位于引号中,因此您可以简单地对其进行解析.
A good CSV parser will have no trouble with this since commas are inside the quoted fields, so you can simply parse the file with it.
一个非常好的模块是 Text :: CSV_XS ,默认情况下,当您使用包装器 Text :: CSV .数据中唯一要解决的问题是字段之间的空格,因为它们不在CSV规范中,因此我在下面的示例中使用该选项.
A really nice module is Text::CSV_XS, which is loaded by default when you use the wrapper Text::CSV. The only thing to address in your data is the spaces between fields since they aren't in CSV specs, so I use the option for that in the example below.
如果您确实必须删除逗号以进行进一步的工作,请在解析器将代码交给您时进行操作.
If you indeed must remove commas for further work do that as the parser hands you lines.
use warnings;
use strict;
use feature 'say';
use Text::CSV;
my $file = 'commas_in_fields.csv';
my $csv = Text::CSV->new( { binary => 1, allow_whitespace => 1 } )
or die "Cannot use CSV: " . Text::CSV->error_diag ();
open my $fh, '<', $file or die "Can't open $file: $!";
my @headers = @{ $csv->getline($fh) }; # if there is a separate header line
while (my $line = $csv->getline($fh)) { # returns arrayref
tr/,//d for @$line; # delete commas from each field
say "@$line";
}
这会在for
循环中的$_
上使用tr
,更改数组的元素以使其简洁.
This uses tr
on $_
in the for
loop, changing the elements of the array, for conciseness.
我想重复并强调其他人的解释:不要手工解析CSV,因为只有麻烦在等待;使用图书馆.这非常类似于解析XML和类似格式:请不要使用正则表达式,但请使用库.
I'd like to repeat and emphasize what others have explained: do not parse CSV by hand, since only trouble awaits; use a library. This is very much akin to parsing XML and similar formats: no regex please, but libraries.
这篇关于从Perl中的CSV文件中删除多余的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!