使用awk将一个大型,复杂的一列文件拆分为几列 [英] Splitting a large, complex one column file into several columns with awk
问题描述
我有一个由某些商业软件生成的文本文件,如下所示.它由方括号分隔的部分组成,每个部分都包含数百万个元素,但确切值会从一种情况变为另一种情况.
I have a text file produced by some commercial software, looking like below. It consists in brackets delimited sections, each of which counts several million elements but the exact value changes from one case to another.
(1
2
3
...
)
(11
22
33
...
)
(111
222
333
...
)
我需要实现如下输出:
1; 11; 111
2; 22; 222
3; 33; 333
... ... ...
我发现一种复杂的方式是:
I found a complicated way that is:
-
执行sed操作以获取
perform sed operations to get
1
2
3
...
#
11
22
33
...
#
111
222
333
...
按如下所示使用awk将我的文件拆分为几个子文件
use awk as follows to split my file in several sub-files
awk -v RS="#" '{print > ("splitted-" NR ".txt")}'
使用sed再次删除子文件中的空格
remove white spaces from my subfiles again with sed
sed -i '/^[[:space:]]*$/d' splitted*.txt
将所有内容组合在一起:
join everything together:
paste splitted*.txt > out.txt
添加一个字段分隔符(在我的bash脚本中定义)
add a field separator (defined in my bash script)
awk -v sep=$my_sep 'BEGIN{OFS=sep}{$1=$1; print }' out.txt > formatted.txt
我几次循环遍历一百万行时,感觉很糟糕. 即使返回时间很正常(〜80秒),我也想找到一个完整的awk解决方案,但无法解决. 像这样:
I feel this is crappy as I loop over million lines several time. Even if the return time is quite OK (~80sec), I'd like to find a full awk solution but can't get to it. Something like:
awk 'BEGIN{RS="(\\n)"; OFS=";"} { print something } '
我发现了一些相关的问题,尤其是这个使用awk 进行列转换,但是它假定括号之间的行数恒定,这是我无法做到的.
I found some related questions, especially this one row to column conversion with awk, but it assumes a constant number of lines between brackets which I can't do.
任何帮助将不胜感激.
推荐答案
使用GNU awk用于多字符RS和真正的多维数组:
With GNU awk for multi-char RS and true multi dimensional arrays:
$ cat tst.awk
BEGIN {
RS = "(\\s*[()]\\s*)+"
OFS = ";"
}
NR>1 {
cell[NR][1]
split($0,cell[NR])
}
END {
for (rowNr=1; rowNr<=NF; rowNr++) {
for (colNr=2; colNr<=NR; colNr++) {
printf "%6s%s", cell[colNr][rowNr], (colNr<NR ? OFS : ORS)
}
}
}
$ awk -f tst.awk file
1; 11; 111
2; 22; 222
3; 33; 333
...; ...; ...
这篇关于使用awk将一个大型,复杂的一列文件拆分为几列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!