用字符替换列中的空白 [英] Replace empty spaces in a column with a character
问题描述
我的文件如下:
Scenario 1 0.20 0.00 0.00 r
Scenario 2 0.08 0.34 & 0.34 r
Scenario 3 6 12.95
Scenario 4 0.00 0.08 0.00 0.00 & 0.35 r
Scenario 5 0.07 0.08 & 0.42 r
Scenario 6 6 8.70
Scenario 7 0.00 0.07 0.00 0.00 & 0.42 r
Scenario 8 0.31 0.28 & 0.70 f
Scenario 9 5 5.06
我的目标是: 用-"(总共8个字段)替换具有空单元格/空格/缺失值的列
My objectives is: To replace columns with empty cells/spaces/absent values with "-" (there are a total of 8 fields)
使用awk命令执行此操作时面临的问题是字段分隔符在每一行中都在不断变化.
The problem I'm facing while using the awk command to do this is that the field separator keeps changing with every line.
我到目前为止所做的事情: 我提取了具有某些字段模式的行,并将其放置在不同的文件中.例如:我将方案3,6和9放在一个文件中,其余的放在另一个文件中,以使处理数据更加容易.我现在拥有的是:
What I've done so far: I've extracted the lines which have certain field patterns and placed them in different files. Eg: I have placed Scenario 3,6 and 9 in one file and the rest in another file to make it easier to work on the data. What I have now is:
文件1:
Scenario 3 6 12.95
Scenario 6 6 8.70
Scenario 9 5 5.06
文件2:
Scenario 1 0.20 0.00 0.00 r
Scenario 2 0.08 0.34 & 0.34 r
Scenario 4 0.00 0.08 0.00 0.00 & 0.35 r
Scenario 5 0.07 0.08 & 0.42 r
Scenario 7 0.00 0.07 0.00 0.00 & 0.42 r
Scenario 8 0.31 0.28 & 0.70 f
预期输出:
Scenario 1 - - 0.20 - 0.00 0.00 r
Scenario 2 - - 0.08 - 0.34 & 0.34 r
Scenario 3 6 12.95 - - - -
Scenario 4 - 0.00 0.08 0.00 0.00 & 0.35 r
Scenario 5 - - 0.07 - 0.08 & 0.42 r
Scenario 6 6 8.70 - - - -
Scenario 7 - 0.00 0.07 0.00 0.00 & 0.42 r
Scenario 8 - - 0.31 0.28 & 0.70 f
Scenario 9 5 5.06 - - - -
案例1(将awk与FIELDWIDTHS结合使用):
$ awk 'BEGIN { FIELDWIDTHS="37 3 7 7 7 9 9 "} {for(i=1;i<=NF;++i){printf $i"|"};print""}' main1.txt
| I_BLENDER_0/R_137/CLK (SDFFX2_HVT) | | | 0.20 | | 0.00 | 0.00 r
| I_BLENDER_0/R_137/Q (SDFFX2_HVT) | | | 0.08 | | 0.34 & | 0.34 r
| I_BLENDER_0/n2757 (net) | 6 | 12.95|
| I_BLENDER_0/U4847/A1 (AND2X1_LVT) | | 0.00 | 0.08 | 0.00 | 0.00 & | 0.35 r
| I_BLENDER_0/U4847/Y (AND2X1_LVT) | | | 0.07 | | 0.08 & | 0.42 r
| I_BLENDER_0/n2616 (net) | 6 | 8.70 |
| I_BLENDER_0/U1/A4 (NAND4X0_HVT) | | 0.00 | 0.07 | 0.00 | 0.00 & | 0.42 r
| I_BLENDER_0/U1/Y (NAND4X0_HVT) | | | 0.31 | | 0.28 & | 0.70 f
案例2(使用sed命令):
$ sed "s/^\(.\{,36\}\)$/\1`echo -$_{1..30}|tr -d '-'`/;
s/^\(.\{38\}\) /\1-/;
s/^\(.\{43\}\) /\1-/;
s/^\(.\{50\}\) /\1-/;
s/^\(.\{57\}\) /\1-/;
s/^\(.\{64\}\) /\1-/;
s/^\(.\{73\}\) /\1-/;
s/ *$//"
I_BLENDER_0/R_137/CLK (SDFFX2_HVT) - - 0.20 - 0.00 0.00 r
I_BLENDER_0/R_137/Q (SDFFX2_HVT) - - 0.08 - 0.34 & 0.34 r
I_BLENDER_0/n2757 (net) 6 12.95
I_BLENDER_0/U4847/A1 (AND2X1_LVT) - 0.00 0.08 0.00 0.00 & 0.35 r
I_BLENDER_0/U4847/Y (AND2X1_LVT) - - 0.07 - 0.08 & 0.42 r
I_BLENDER_0/n2616 (net) 6 8.70
推荐答案
为此,您可以在Gnu awk中使用FIELDWIDTHS
:
To do this, you can make use of FIELDWIDTHS
in Gnu awk:
基本上,我们将您的行拆分为等宽的字段.下面显示了正确分割的行:
Basically, we split your lines in constant width fields. The following shows that the lines are split correctly:
$ awk 'BEGIN{ FIELDWIDTHS="13 25 2 7 7 7 9 9"}
{for(i=1;i<=NF;++i){printf $i"|"};print""}' file
Scenario 1 | | | | 0.20 | | 0.00 | 0.00 r|
Scenario 2 | | | | 0.08 | | 0.34 & | 0.34 r|
Scenario 3 | | 6 | 12.95| ||||
Scenario 4 | | | 0.00 | 0.08 | 0.00 | 0.00 & | 0.35 r|
Scenario 5 | | | | 0.07 | | 0.08 & | 0.42 r|
Scenario 6 | | 6 | 8.70 |||||
Scenario 7 | | | 0.00 | 0.07 | 0.00 | 0.00 & | 0.42 r|
Scenario 8 | | | | 0.31 | | 0.28 & | 0.70 f|
Scenario 9 | | 5 | 5.06 |||||
因此,我们需要做的就是在需要时用破折号替换空白字段.
So all we need to do is replace the empty fields with the dash if needed.
$ awk 'BEGIN{ FIELDWIDTHS="13 24 3 7 7 7 9 9"}
{s=$1$2}
{s=s ($3~/^[[:blank:]]*$/?" - ":$3)}
{s=s ($4~/^[[:blank:]]*$/?" - ":$4)}
{s=s ($5~/^[[:blank:]]*$/?" - ":$5)}
{s=s ($6~/^[[:blank:]]*$/?" - ":$6)}
{s=s ($7~/^[[:blank:]]*$/?" - ":$7)}
{s=s ($8~/^[[:blank:]]*$/?" - ":$8)}
{print s}' file
这给出了:
Scenario 1 - - 0.20 - 0.00 0.00 r
Scenario 2 - - 0.08 - 0.34 & 0.34 r
Scenario 3 6 12.95 - - - -
Scenario 4 - 0.00 0.08 0.00 0.00 & 0.35 r
Scenario 5 - - 0.07 - 0.08 & 0.42 r
Scenario 6 6 8.70 - - - -
Scenario 7 - 0.00 0.07 0.00 0.00 & 0.42 r
Scenario 8 - - 0.31 - 0.28 & 0.70 f
Scenario 9 5 5.06 - - - -
备注:
- 最好使用用于设置这些文件的真实格式.
- 我总是在字段之前留出多余的空间来说明可能的负号
- 看起来浮标是用格式
%-5.2f
编写的.这就是为什么数字12.95
不对齐的原因. (%6.2f
会更好)
- it would be better to use the real formatting that was used to set up these files.
- I always leave an extra space before the fields to account for possible minus-signs
- It looks like the floats are written with format
%-5.2f
. This is why the number12.95
is not aligned. (%6.2f
would have been better)
注意:如果您玩了一些,实际上可以做得更短.但是,您可能会感觉不到发生了什么.
note: if you play a bit around, you can actually do it shorter. But you sort of lose the feeling of what is going on.
awk 'BEGIN{ FIELDWIDTHS="13 23 5 7 7 7 9 9"}
{for(i=3;i<=NF;++i)$i=$i~/^[[:blank:]]*$/?" -":$i}
{printf "%-13s%-23s%-5s%-7s%-7s%-7s%-9s%-9s\n",$1,$2,$3,$4,$5,$6,$7,$8}' file
或更短
awk 'BEGIN{ FIELDWIDTHS="36 5 7 7 7 9 9"; split(FIELDWIDTHS,a)}
{for(i=1;i<=NF;++i) printf "%-*s",a[i], ($i~/^ *$/?" -":$i); print ""}'
这篇关于用字符替换列中的空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!