从 perl 中的 csv 文件中删除多余的逗号 [英] Removing extra commas from csv file in perl

查看:26
本文介绍了从 perl 中的 csv 文件中删除多余的逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个 CSV 文件,每个文件都有不同数量的条目,每个文件大约有 300 行.

I have a multiple CSV files each with a different amount of entries each with roughly 300 lines each.

每个文件的第一行是数据标签

The first line in each file is the Data labels

Person_id, person_name, person_email, person_address, person_recruitmentID, person_comments... etc

每个文件中的其余行包含数据

The Rest of the lines in each file contain the data

"0001", "bailey", "123 fake, street", "bailey@mail.com", "0001", "this guy doesnt know how to get rid of, commas!"... etc

我想去掉引号之间的逗号.我目前正在阅读 Text::CSV 文档,但过程缓慢.

I want to get rid of commas that are in between quotation marks. I'm currently going through the Text::CSV documentation but its a slow process.

推荐答案

一个好的 CSV 解析器不会有这个问题,因为逗号在引用的字段中,所以你可以简单地用它来解析文件.

A good CSV parser will have no trouble with this since commas are inside the quoted fields, so you can simply parse the file with it.

一个非常好的模块是Text::CSV_XS,当您使用包装器时默认加载 文本::CSV.数据中唯一需要解决的是字段之间的空格,因为它们不在 CSV 规范中,因此我在下面的示例中使用了该选项.

A really nice module is Text::CSV_XS, which is loaded by default when you use the wrapper Text::CSV. The only thing to address in your data is the spaces between fields since they aren't in CSV specs, so I use the option for that in the example below.

如果您确实必须删除逗号以进行进一步的工作,请在解析器递给您行时这样做.

If you indeed must remove commas for further work do that as the parser hands you lines.

use warnings;
use strict;
use feature 'say';

use Text::CSV;

my $file = 'commas_in_fields.csv';

my $csv = Text::CSV->new( { binary => 1, allow_whitespace => 1 } ) 
    or die "Cannot use CSV: " . Text::CSV->error_diag (); 

open my $fh, '<', $file or die "Can't open $file: $!";

my @headers = @{ $csv->getline($fh) };   # if there is a separate header line

while (my $line = $csv->getline($fh)) {  # returns arrayref
    tr/,//d for @$line;                  # delete commas from each field
    say "@$line";
}

为了简洁,这在 for 循环中的 $_ 上使用 tr,更改数组的元素.

This uses tr on $_ in the for loop, changing the elements of the array, for conciseness.

我想重复并强调其他人的解释:不要手动解析 CSV,因为只有麻烦在等着;使用图书馆.这非常类似于解析 XML 和类似格式:请不要使用正则表达式,而是使用库.

I'd like to repeat and emphasize what others have explained: do not parse CSV by hand, since only trouble awaits; use a library. This is very much akin to parsing XML and similar formats: no regex please, but libraries.

这篇关于从 perl 中的 csv 文件中删除多余的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆