根据从另一个文件中选择的标题从文件中提取列 [英] Extract columns from a file based on header selected from another file

查看:88
本文介绍了根据从另一个文件中选择的标题从文件中提取列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要在awk中解决以下问题.我有一个大的文本表,用逗号分隔,由100k行和5k cols组成.第一行是标题,第一列是记录ID.然后,我有了第二个文本文件,其中包含第一个文件中标头的子集.我想提取第一个文件的所有列,其标题包含在第二个文件中给出的列表中.这里是输入和所需输出的示例:

I have the following problem that I want to solve in awk. I have one large text table, comma separated, consisting of 100k rows and 5k cols. The first row is a header and the first column is a record id. I then have a second text file that contains a subset of the headers in the first file. I want to extract all the columns of the first file whose header is contained in the list given in the second file. Here an example of the inputs and the desired output:

DATA.TXT

   ID, head1, head2, head3, head4  
    1, 25.5, 1364.0, 22.5, 13.2  
    2, 10.1, 215.56, 1.15, 22.2  

LIST.TXT

head1  
head4  

所需的输出:

ID, head1, head4  
1, 25.5, 13.2  
2, 10.1, 22.2

任何人都可以给我一些有关如何以awk或通过unix脚本解决此问题的建议吗?预先感谢您的帮助!

Anybody can give me some advice on how to solve this problem in awk or however through unix scripting? Thanks in advance for any help!

推荐答案

我有个主意,但是由于我没有shell编程经验(也不知道awk),所以这看起来像是在以荒谬的方式重新发明了一些轮子:

I have an idea, but since I'm not experienced in shell programming (and don't know awk) this looks like reinventing some wheels in a ridiculous way:

$ cat DATA.TXT 
ID, head1, head2, head3, head4
1, 25.5, 1364.0, 22.5, 13.2
2, 10.1, 215.56, 1.15, 22.2

$ cat LIST.TXT 
head1
head4

$ cols=($(sed '1!d;s/, /\n/g' DATA.TXT | grep -nf LIST.TXT | sed 's/:.*$//'))

$ cut -d ',' -f 1$(printf ",%s" "${cols[@]}") DATA.TXT 
ID, head1, head4
1, 25.5, 13.2
2, 10.1, 22.2

P.S.我从 this

P.S. I used some very basic ideas about bash arrays from this and this answers.

这篇关于根据从另一个文件中选择的标题从文件中提取列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆