如何从多个文件中提取一列,然后将这些列粘贴到一个文件中? [英] How to extract one column from multiple files, and paste those columns into one file?

查看:43
本文介绍了如何从多个文件中提取一列,然后将这些列粘贴到一个文件中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从多个文件中提取以数字顺序命名的第 5 列,然后将这些列粘贴到并排排列一个输出文件中.

I want to extract the 5th column from multiple files, named in a numerical order, and paste those columns in sequence, side by side, into one output file.

文件名如下:

sample_problem1_part1.txt
sample_problem1_part2.txt

sample_problem2_part1.txt
sample_problem2_part2.txt

sample_problem3_part1.txt
sample_problem3_part2.txt
......

每个问题文件(1,2,3 ...)有两个部分(第1部分,第2部分).每个文件具有相同的行数.内容如下:

Each problem file (1,2,3...) has two parts (part1, part2). Each file has the same number of lines. The content looks like:

sample_problem1_part1.txt
1 1 20 20 1
1 7 21 21 2
3 1 22 22 3
1 5 23 23 4
6 1 24 24 5
2 9 25 25 6
1 0 26 26 7

sample_problem1_part2.txt
1 1 88 88 8
1 1 89 89 9
2 1 90 90 10
1 3 91 91 11
1 1 92 92 12
7 1 93 93 13
1 5 94 94 14

sample_problem2_part1.txt
1 4 330 30 a
3 4 331 31 b
1 4 332 32 c
2 4 333 33 d
1 4 334 34 e
1 4 335 35 f
9 4 336 36 g

输出应类似于:(按问题 1 _part 1 问题 1 _part 2 问题 2 _part 1 问题 2 _part 2 问题 3 _part 1 问题 3 _part 2 等)

The output should look like: (in a sequence of problem1_part1, problem1_part2, problem2_part1, problem2_part2, problem3_part1, problem3_part2,etc.,)

1 8 a ...
2 9 b ...
3 10 c ...
4 11 d ...
5 12 e ...
6 13 f ...
7 14 g ...

我正在使用:

 paste sample_problem1_part1.txt sample_problem1_part2.txt > \
     sample_problem1_partall.txt
 paste sample_problem2_part1.txt sample_problem2_part2.txt > \
     sample_problem2_partall.txt
 paste sample_problem3_part1.txt sample_problem3_part2.txt > \
     sample_problem3_partall.txt

然后:

for i in `find . -name "sample_problem*_partall.txt"`
do
    l=`echo $i | sed 's/sample/extracted_col_/'`
    `awk '{print $5, $10}'  $i > $l`
done    

并且:

paste extracted_col_problem1_partall.txt \
      extracted_col_problem2_partall.txt \
      extracted_col_problem3_partall.txt > \
    extracted_col_problemall_partall.txt

它对于很少个文件可以很好地工作,但是当文件数量很大(超过 4000 )时,这是一种疯狂的方法.请问有人可以为我提供能够处理多个文件的更简单的解决方案吗?谢谢!

It works fine with a few files, but it's a crazy method when the number of files is large (over 4000). Could anyone help me with simpler solutions that are capable of dealing with multiple files, please? Thanks!

推荐答案

这是使用 awk 和文件排序的文件的一种方式:

Here's one way using awk and a sorted glob of files:

awk '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $5 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls -1v *)

结果:

1 8 a
2 9 b
3 10 c
4 11 d
5 12 e
6 13 f
7 14 g

说明:

  • 对于每个输入文件的每一行输入:

  • For each line of input of each input file:

  • 将文件行号添加到值列为5的数组中.

  • Add the files line number to an array with a value of column 5.

(a [FNR]?a [FNR] FS:")是一个三元操作,用于将数组值建立为记录.它只是询问文件行号是否已在数组中.如果是这样,请在添加第五列之前添加数组值,后跟默认文件分隔符.否则,如果行号不在数组中,则不要添加任何内容,只需使其等于第五列即可.

(a[FNR] ? a[FNR] FS : "") is a ternary operation, which is set up to build up the arrays value as a record. It simply asks if the files line number is already in the array. If so, add the arrays value followed by the default file separator before adding the fifth column. Else, if the line number is not in the array, don't prepend anything, just let it equal the fifth column.

在脚本末尾:

  • 使用C样式循环遍历数组,打印每个数组的值.

这篇关于如何从多个文件中提取一列,然后将这些列粘贴到一个文件中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆