从Perl中的CSV文件中删除多余的逗号 [英] Removing extra commas from csv file in perl

查看:244
本文介绍了从Perl中的CSV文件中删除多余的逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个CSV文件,每个文件的条目数量不同,每个条目大约300行.

I have a multiple CSV files each with a different amount of entries each with roughly 300 lines each.

每个文件的第一行是数据标签

The first line in each file is the Data labels

Person_id, person_name, person_email, person_address, person_recruitmentID, person_comments... etc

每个文件中的其余行都包含数据

The Rest of the lines in each file contain the data

"0001", "bailey", "123 fake, street", "bailey@mail.com", "0001", "this guy doesnt know how to get rid of, commas!"... etc

我想删除引号之间的逗号. 我目前正在阅读Text :: CSV文档,但过程缓慢.

I want to get rid of commas that are in between quotation marks. I'm currently going through the Text::CSV documentation but its a slow process.

推荐答案

一个好的CSV解析器对此没有任何问题,因为逗号位于引号中,因此您可以简单地对其进行解析.

A good CSV parser will have no trouble with this since commas are inside the quoted fields, so you can simply parse the file with it.

一个非常好的模块是 Text :: CSV_XS ,默认情况下,当您使用包装器 Text :: CSV .数据中唯一要解决的问题是字段之间的空格,因为它们不在CSV规范中,因此我在下面的示例中使用该选项.

A really nice module is Text::CSV_XS, which is loaded by default when you use the wrapper Text::CSV. The only thing to address in your data is the spaces between fields since they aren't in CSV specs, so I use the option for that in the example below.

如果您确实必须删除逗号以进行进一步的工作,请在解析器将代码交给您时进行操作.

If you indeed must remove commas for further work do that as the parser hands you lines.

use warnings;
use strict;
use feature 'say';

use Text::CSV;

my $file = 'commas_in_fields.csv';

my $csv = Text::CSV->new( { binary => 1, allow_whitespace => 1 } ) 
    or die "Cannot use CSV: " . Text::CSV->error_diag (); 

open my $fh, '<', $file or die "Can't open $file: $!";

my @headers = @{ $csv->getline($fh) };   # if there is a separate header line

while (my $line = $csv->getline($fh)) {  # returns arrayref
    tr/,//d for @$line;                  # delete commas from each field
    say "@$line";
}

这会在for循环中的$_上使用tr,更改数组的元素以使其简洁.

This uses tr on $_ in the for loop, changing the elements of the array, for conciseness.

我想重复并强调其他人的解释:不要手工解析CSV,因为只有麻烦在等待;使用图书馆.这非常类似于解析XML和类似格式:请不要使用正则表达式,但请使用库.

I'd like to repeat and emphasize what others have explained: do not parse CSV by hand, since only trouble awaits; use a library. This is very much akin to parsing XML and similar formats: no regex please, but libraries.

这篇关于从Perl中的CSV文件中删除多余的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆