解析固定宽度的文件 [英] Parse fixed-width files
问题描述
我有很多带有固定宽度字段的文本文件:
I have a lot of text files with fixed-width fields:
<c> <c> <c>
Dave Thomas 123 Main
Dan Anderson 456 Center
Wilma Rainbow 789 Street
其余文件采用类似的格式,其中<c>
将标记列的开头,但是它们具有各种(未知)列&空间宽度.解析这些文件的最佳方法是什么?
The rest of the files are in a similar format, where the <c>
will mark the beginning of a column, but they have various (unknown) column & space widths. What's the best way to parse these files?
我尝试使用Text::CSV
,但是由于没有定界符,因此很难获得一致的结果(除非我使用的模块错误):
I tried using Text::CSV
, but since there's no delimiter it's hard to get a consistent result (unless I'm using the module wrong):
my $csv = Text::CSV->new();
$csv->sep_char (' ');
while (<FILE>){
if ($csv->parse($_)) {
my @columns=$csv->fields();
print $columns[1] . "\n";
}
}
推荐答案
如user604939所述,unpack
是用于固定宽度字段的工具.但是,需要将unpack
传递给模板以使用.由于您说字段可以更改宽度,因此解决方案是从文件的第一行构建此模板:
As user604939 mentions, unpack
is the tool to use for fixed width fields. However, unpack
needs to be passed a template to work with. Since you say your fields can change width, the solution is to build this template from the first line of your file:
my @template = map {'A'.length} # convert each to 'A##'
<DATA> =~ /(\S+\s*)/g; # split first line into segments
$template[-1] = 'A*'; # set the last segment to be slurpy
my $template = "@template";
print "template: $template\n";
my @data;
while (<DATA>) {
push @data, [unpack $template, $_]
}
use Data::Dumper;
print Dumper \@data;
__DATA__
<c> <c> <c>
Dave Thomas 123 Main
Dan Anderson 456 Center
Wilma Rainbow 789 Street
打印:
template: A8 A10 A*
$VAR1 = [
[
'Dave',
'Thomas',
'123 Main'
],
[
'Dan',
'Anderson',
'456 Center'
],
[
'Wilma',
'Rainbow',
'789 Street'
]
];
这篇关于解析固定宽度的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!