改善csv文件的bash脚本 [英] Improving a bash script for csv files
本文介绍了改善csv文件的bash脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的文件夹中有一堆CSV文件.它们都在相同的结构上.超过2k列.第一列是ID.
I have a bunch of CSV files in a folder. All of them on the same structure. more than 2k columns. The first column is ID.
我需要对每个文件执行以下操作: 对于每n个奇数列(第一列除外),请执行以下操作:
I need to do the following for each file: For each n odd column (except the first column), do the following:
- 如果
n
的值为0,则对于所有行,然后删除n
列以及n-1
列 - 如果
n
的值为100,则对于所有行,然后删除n列 - 打印已删除列的索引
- If
n
value is 0, for all of the rows, then delete then
column and alson-1
column - If
n
value is 100, for all of the rows, then delete the n column - print the indexes of the removed columns
我有以下代码:
for f in *.csv; do
awk 'BEGIN { FS=OFS="," }
NR==1 {
for (i=3; i<=NF; i+=2)
a[i]
}FNR==NR {
for (i=1; i<=NF; i++)
sums[i] += $i;
++r;
next
} {
for (i=1; i<=NF; i++)
if (sums[i] > 0 && sums[i+1]>0 && sums[i] != 100*r)
printf "%s%s", (i>1)?OFS:"", $i;
else print "removed index: " i > "removed.index"
print ""
}' "$f" "$f" > "new_$f"
done
由于某些原因,ID列(第一列)已被删除.
For some reason the ID column (first column) is been removed.
输入:
23232,0,0,5,0,1,100,3,0,33,100
21232,0,0,5,0,1,100,3,0,33,100
23132,0,0,5,0,1,100,3,0,33,100
23212,0,0,5,0,1,100,3,0,33,100
24232,0,0,5,0,1,100,3,0,33,100
27232,0,0,5,0,1,100,3,0,33,100
当前输出(错误):
,1,33
,1,33
,1,33
,1,33
,1,33
,1,33
预期输出:
23232,1,33
21232,1,33
23132,1,33
23212,1,33
24232,1,33
27232,1,33
任何人都可以检查出什么问题吗?
Can anyone check what is the issue?
推荐答案
您需要从逻辑中跳过第一列,以检查上一列中的0:
You need to skip first column from the logic to check for 0 in previous column:
awk 'BEGIN{FS=OFS=","; out=ARGV[1] ".removed.index"}
FNR==NR {
for (i=1; i<=NF; i++)
sums[i] += $i;
++r;
next
} FNR==1 {
for (i=3; i<=NF; i++) {
if (sums[i] == 0) {
if (i-1 in sums) {
delete sums[i-1];
print "removed index: " (i-1) > out
}
delete sums[i];
print "removed index: " i > out
} else if (sums[i] == 100*r) {
delete sums[i];
print "removed index: " i > out
}
}
} {
printf "%s", $1
for (i=2; i<=NF; i++)
if (i in sums)
printf "%s%s", OFS, $i;
printf "%s", ORS
} END{close(out)}' file file
输出:
23232,1,33
21232,1,33
23132,1,33
23212,1,33
24232,1,33
27232,1,33
还删除的索引是:
cat file.removed.index
cat removed.index
removed index: 2
removed index: 3
removed index: 4
removed index: 5
removed index: 7
removed index: 8
removed index: 9
removed index: 11
这篇关于改善csv文件的bash脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文