如何使用UNIX工具合并同一列中的行 [英] How to merge rows from the same column using unix tools

查看:114
本文介绍了如何使用UNIX工具合并同一列中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,如下所示:

I have a text file that looks like the following:

1000000    45    M    This is a line        This is another line  Another line
                      that breaks into      that also breaks      that has a blank
                      multiple rows         into multiple rows -  row below.
                                            How annoying!

1000001    50    F    I am another          I am well behaved.    
                      column that has
                      text spanning
                      multiple rows

我想将其转换为如下所示的csv文件:

I would like to convert this into a csv file that looks like:

1000000, 45, M, This is a line that breaks into multiple rows, This is another line that also breaks into multiple rows - How annoying!
1000001, 50, F, I am another column that has text spanning multiple rows, I am well behaved.

文本文件输出来自于1984年编写的程序,我无法修改输出.我希望它采用csv格式,以便我可以尽可能轻松地将其转换为Excel.我不确定从哪里开始,而不是重新发明轮子,而是希望有人可以指出我正确的方向.谢谢!

The text file output comes from a program that was written in 1984, and I have no way to modify the output. I want it in csv format so that I can convert it to Excel as painlessly as possible. I am not sure where to start, and rather than reinvent the wheel, was hoping someone could point me in the right direction. Thanks!

==编辑==

我修改了文本文件,使行之间具有\n-也许这会有所帮助吗?

I've modified the text file to have \n between rows - maybe this will be helpful?

==编辑2 ==

我将文本文件修改为具有空白行.

I've modified the text file to have a blank row.

推荐答案

使用GNU awk

gawk '
    BEGIN { FIELDWIDTHS="11 6 5 22 22" }
    length($1) == 11 {
        if ($1 ~ /[^[:blank:]]/) { 
            if (f1) print_line()
            f1=$1; f2=$2; f3=$3; f4=$4; f5=$5
        }
        else { 
            f4 = f4" "$4; f5 = f5" "$5
        }
    }
    function rtrim(str) {
        sub(/[[:blank:]]+$/, "", str)
        return str
    }
    function print_line() {
        gsub(/[[:blank:]]{2,}/, " ", f4); gsub(/"/, "&&", f4)
        gsub(/[[:blank:]]{2,}/, " ", f5); gsub(/"/, "&&", f5)
        printf "%s,%s,%s,\"%s\",\"%s\"\n", rtrim(f1), rtrim(f2), rtrim(f3),f4,f5
    }
    END {if (f1) print_line()}
' file

1000000,45,M,"This is a line that breaks into multiple rows ","This is another line that also breaks into multiple rows - How annoying!"
1000001,50,F,"I am another column that has text spanning multiple rows","I am well behaved. "

我已引用了最后两列(以防它们包含逗号),并将任何可能的内部双引号加倍.

I've quoted the last 2 columns in case they contain commas, and doubled any potential inner double quotes.

这篇关于如何使用UNIX工具合并同一列中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆