如何通过多种条件将一列拆分为另一列? [英] how to split one column into another by multiple conditions?

查看:91
本文介绍了如何通过多种条件将一列拆分为另一列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须创建一个awk脚本来实现以下转换:

I have to create an awk script to achieve the following transformation:

  1. 列顺序是随机的
  2. 没有固定的结构.其实这是一个大问题
  3. FLAG列必须在FLAG1FLAG2
  4. 中拆分 在以下条件下填充
  5. FLAG1FLAG2:

  1. Column order is random
  2. There is no fixed structure. actually this is a big issue
  3. The FLAG column must be splitt in FLAG1 and FLAG2
  4. FLAG1 and FLAG2 are filled given the following conditions:

if the VAL is ":" then NUM is null
if the VAL is ":" and FLAG "c" then NUM is null and FLAG1 is "c"
if the VAL is ":" and FLAG "u" then NUM is null and FLAG2 is "u"
if the VAL is "14,385" and FLAG "d" then NUM is "14385" and FLAG(both) is null
if the VAL is "14,385" and FLAG "du" then NUM is "14385" and FLAG2 is "u"
if the VAL is ":" and FLAG "cd" then NUM is null and FLAG1 is "c"
if the VAL is ":" and FLAG "bc" then NUM is null and FLAG1 is "c" and FLAG2 is "b"
if the VAL is ":" and FLAG "z" then NUM is 0 and FLAG2 is "z"

csv输入文件是:

"PRIM",  "TRD",   "GTR",   "VAL",   "FLAG"
"TPP",   "T5-78", "HT",    ":",   c
"TCP",   "T5-78", "HT",    "12,385",  c
"TZP",   "T5-78", "HT",    ":",   z
"TNP",   "T5-78", "HT",    ":",   z
"TNP",   "T5-78", "HT",    ":",   cd
"TNP",   "T5-78", "HT",    ":",   du
"TNP",   "T5-78", "HT",    "12,524,652",  dfg

输出的.dat文件应如下所示:

The output .dat file should look like this:

PRIM    TRD GTR NUM FLAG1   FLAG2
TPP T5-78   HT  null    c   null
TCP T5-78   HT  12385   c   null
TZP T5-78   HT  0   null    z
TNP T5-78   HT  0   null    z
TNP T5-78   HT  null    c   null
TNP T5-78   HT  null    null    u
TNP T5-78   HT  12524652    null    dfg

我尝试的代码无法正常工作,因为仅满足前三个要求,而第四个则无法工作.

The code I have tried doesn't work properly as only the first 3 requirements are met while the 4th just doesn't work.

BEGIN {
      FS=","; OFS="\t";
      a["PRIM"]=1;a["TRD"]=1;a["GTR"]=1;a["VAL"]=1;a["FLAG"]=1;
    }
    NR==1 {

   { $a["VAL"] = "NUMB" ; $a["FLAG"] = "FLAG1" ; $5 = "FLAG2" ; print ; next }
    $a["VAL"]=="12,385" && $a["FLAG"] == "d"  { $a["VAL"] = "14385" ; $a["FLAG"] = $5 = "" }
    $a["VAL"]=="12,385" && $a["FLAG"] == "du" { $a["VAL"] = "14385" ; $a["FLAG"] = "" ; $9 = "u" }
    $a["VAL"] != ":" { print ; next }
    $a["FLAG"] == "z" { $a["VAL"] = "0" ; $a["FLAG"] = "" ; $5 = "z" }
     $a["FLAG"] != "z" { $a["VAL"] = "" }

        $NF=substr($NF,1,length($NF)-1);
        for(i=1;i<=NF;i++) if($i in a) a[$i]=i;
    }
    {   print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
        NR==1?"FLAG1"OFS"FLAG2":($a["FLAG"]?""OFS$a["FLAG"]:$a["FLAG"]);


这是我认为可以使用的最新代码.现在我无法解决的问题是最后一个值(FLAG2)打印在第二行上.我试图放OFS,但不能解决问题.您能告诉我在这种情况下怎么了.


Here is the latest code which I think it will work. the problem now which I cannot solve is that the last value (FLAG2) is printed on the second line. I tried to put OFS but it doesn't fix the problem. Could you please tell me what is wrong in this case.

BEGIN {
FS=","; 
OFS="\t";
a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}

NR==1 {
    $NF=substr($NF,1,length($NF)-1);
    for(i=1;i<=NF;i++) 
#if($i in a) 
a[$i]=i;

a["FLAG1"] = i;
a["FLAG2"]=i;
a["FLAG1"] = a["FLAG"];  # just for testing and it is ok
a["FLAG2"] = a["FLAG"];  # just for testing and it is ok

}

{   

print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
    NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];

}

输出类似于

PRIM    TRD GTR NUM FLAG1   FLAG2
TPP T5-78   HT  null    c
   null
TCP T5-78   HT  12385   c
   null
TZP T5-78   HT  0   null
    z


经过这么多建议,这是我的最后一个版本,但是仍然不成功...现在,当我添加if语句来满足上述要求时,什么也没有发生.我相信if语句要么不正确,要么放在正确的位置. 如果NR> 1是灾难性的,则打印这些值. 您能告诉我我的脚本出了什么问题吗?我不得不承认,我是3天前开始使用这个awk的,到目前为止很痛苦...问题是我应该从上周开始就完成此脚本


After so many advices this is my last version but it is still unsuccesful... Now when I'm adding if statements for fulfilling the requirements above, nothing happens. I believe the if statements either are not correct nor putted in the right place. printing the values if NR>1 is a catastropha.. could you pls tell me what is wrong with my script? I have to admit that I started with this awk 3 days ago and so far it is painful...the problem is that I have should have finished this script since last week

BEGIN {
FS=",";
OFS="\t";

a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}

NR==1 {

$NF=substr($NF,1,length($NF)-1);
    for(i=1;i<=NF;i++)
#if($i in a)
a[$i]=i;

#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];

a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}

{
#initialisation of the new flags
a["FLAG1"]=="";
a["FLAG2"]=="";
}

#MY IF STATEMENTS GO HERE   - TEST MODE   

a["FLAG"] == "cd"   {a["FLAG1"]= "c"}
a["FLAG"] == "du"   {a["FLAG2"]= "u"}

{  
#print header
print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"], NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];
}

#print content
NR>1
{
    for(j=1;j<=NF;j++)
#if($i in a)
a[$j]=j;

#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];

a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}
#MY IF STATEMENTS GO HERE   - TEST MODE   

a["FLAG"] == "cd"   {a["FLAG1"]= "c"}
a["FLAG"] == "du"   {a["FLAG2"]= "u"}

{
print $a["PRIM"],$a["TRD"],$a["GTR"],$a["VAL"], $a["FLAG1"], $a["FLAG2"]
}

推荐答案

这要求所有输入字段具有双引号.

$ echo '"PRIM",  "TRD",   "GTR",   "VAL",   "FLAG"
"TPP",   "T5-78", "HT",    ":",   "c"
"TCP",   "T5-78", "HT",    "12,385",  "c"
"TZP",   "T5-78", "HT",    ":",   "z"
"TNP",   "T5-78", "HT",    ":",   "z"
"TNP",   "T5-78", "HT",    ":",   "cd"
"TNP",   "T5-78", "HT",    ":",   "du"
"TNP",   "T5-78", "HT",    "12,524,652",  "dfg"' | 
awk -F '",[ \t]*"' '
    { sub(/^"/, "", $1); sub(/"$/, "", $NF)}
    NR == 1 {
        for (i=1; i<=NF; i++) col[$i] = i
        print "PRIM TRD GTR NUM FLAG1 FLAG2"
        next
    } 
    {
        f = $col["FLAG"] 
        v = $col["VAL"]; gsub(/,/, "", v) 
        num = "null"; flag1 = "null"; flag2 = "null"
    }
    v == ":"      &&  f == "c"   {flag1 = "c"}
    v == ":"      &&  f == "u"   {flag2 = "u"} 
    v == "14385"  &&  f == "d"   {num = $4}
    v == "14385"  &&  f == "du"  {num = $4; flag2 = "u"}
    v == ":"      &&  f == "cd"  {flag1 = "c"}
    v == ":"      &&  f == "bc"  {flag1 = "c"; flag2 = "b"}
    v == ":"      &&  f == "z"   {num = 0; flag2 = "z"}
    {print $col["PRIM"],$col["TRD"],$col["GTR"],num,flag1,flag2}
'

PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c null
TCP T5-78 HT null null null
TZP T5-78 HT null null z
TNP T5-78 HT null null z
TNP T5-78 HT null c null
TNP T5-78 HT null null null
TNP T5-78 HT null null null

我的输出与您的输出看起来不一样.查看您的规格,并确保样本输入足以涵盖这些规格.

My output does not look like yours. Review your specifications and make sure the sample input is sufficient to cover them.

这篇关于如何通过多种条件将一列拆分为另一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆