ash? -将文件合并为CSV [英] bash? - combining files into CSVs

查看:60
本文介绍了ash? -将文件合并为CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道(请参见此处),您可以如果每个文件都包含一列,请使用paste将多个文件合并为.csv文件

I know (see here) that you can use paste to combine multiple files into a .csv file if each file holds a column

paste -d "," column1.dat column2.dat column3.dat ... > myDat.csv将导致

myDat.csv

myDat.csv

column1,   column2,   column3, ...
c1-1,      c2-1,      c3-1,    ...
c1-2,      c2-2,      c3-2,    ...
...        ...        ...

(不带标签.只需插入标签以使其更具可读性)

(without the tabs. just inserted them to make it more readable)

如果我进行多次测量该怎么办?

What if I have multiple measurements, instead?

例如

file1.dat的格式为<xvalue> <y1value>

file2.dat的格式为<xvalue> <y2avlue>

file3.dat的格式为<xvalue> <uvalue> <vvalue>

而我最终想要的是csv

and I ultimately want a csv like

<xvalue>, <y1value>, <y2value>, <empty column>, <uvalue>, <vvalue>

?

我现在如何合并文件?

编辑

请注意,尽管每个文件都是经过排序的(或者可以对不排序的文件进行排序),但它们不一定在相同的行上包含相同的xvalue.

Note that although each file is sorted (or can be sorted if it's not), they don't necessarily contain the same xvalues on the same lines.

如果文件没有另一个文件具有的xvalue,则其对应的列条目应为空白.

If a file doesn't have an xvalue that another file does have, its corresponding column entry should be blank.

(实际上,我认为删除所有文件中都不存在的xvalue的行也应该可行.)

(Actually, I think dropping the rows for xvalues that aren't present in all files should also work.)

推荐答案

好,这是我在Gnu awk中提出的解决方案,它试图趋向于成为一个更通用的解决方案,并使用外部工具来处理多余的空列.它在Gnu awk中,因为它使用多维数组,但也很容易将其推广到其他awk.

Ok, here is my solution in Gnu awk which tries to lean towards being a more generic solution and handles that extra empty column with external tools. It is in Gnu awk since it uses multidimensional arrays but could probably easily be generalized to other awks as well.

该程序将希望每个文件的第一个字段作为键列的字段合并在一起.如果找不到要连接的键,它将创建一个新键并在输出时将不存在的字段输出为空(数据文件中的注意键x_3x_4x_5).

The program joins fields expecting the first field of each file to be the key column. If it does not find a key to join to, it creates a new key and outputs nonexistent fields as empty when outputing (notice keys x_3, x_4 and x_5 below in data files).

首先是数据文件:

$ cat file[123].dat             # 3 files, separated by empty lines for clarity
x_1 y1_1
x_2 y1_2
x_3 y1_3

x_1 y2_1
x_2 y2_2
x_4 y2_4

x_1 u_1 v_1
x_2 u_2 v_2
x_5 u_5 v_5

和代码:

$ cat program.awk
BEGIN { OFS=", " }
FNR==1 { f++ }                                # counter of files
{
    a[0][$1]=$1                               # reset the key for every record 
    for(i=2;i<=NF;i++)                        # for each non-key element
        a[f][$1]=a[f][$1] $i ( i==NF?"":OFS ) # combine them to array element
}
END {                                         # in the end
    for(i in a[0])                            # go thru every key
        for(j=0;j<=f;j++)                     # and all related array elements
            printf "%s%s", a[j][i], (j==f?ORS:OFS)
}                                             # output them, nonexistent will output empty

用法和输出:

$ awk -f program.awk \
file1.dat \
file2.dat \
<(grep -h . file[123].dat|cut -d\  -f 1|sort|uniq) \
file3.dat 
x_1, y1_1, y2_1, , u_1, v_1
x_2, y1_2, y2_2, , u_2, v_2
x_3, y1_3, , , 
x_4, , y2_4, , 
x_5, , , , u_5, v_5

file2.dat之后的空列将由空字段生成,该空字段是通过收集所有键并将它们作为另一个文件"(使用进程替换<())输入来创建的,以使程序更通用:

The empty column after file2.dat will be generated with empty field created by gathering all the keys and inputing them as another "file" (using process substitution <()) to keep the program more generic:

$ grep -h . file[123].dat|cut -d\  -f 1|sort|uniq
x_1
x_2
x_3
x_4
x_5

这篇关于ash? -将文件合并为CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆