AWK设置数组元素 [英] awk set elements in array

查看:199
本文介绍了AWK设置数组元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的.csv文件来处理,我的元素被随机安排是这样的:

xxxxxx,xx,<$c$c>MLOCAL,<$c$c>MREMOTE,<$c$c>33222,<$c$c>56,<$c$c>22/10/2012,<$c$c>18/10/2012
xxxxxx,xx,<$c$c>MREMOTE,<$c$c>MLOCAL,<$c$c>33222,<$c$c>56,<$c$c>22/10/2012,<$c$c>18/10/2012
xxxxxx,xx,<$c$c>MLOCAL,<$c$c>341993,<$c$c>22/10/2012

XXXXXX,XX的 MREMOTE 9356828 2012年8月10日 的结果
xxxxxx,xx,<$c$c>LOCAL,<$c$c>REMOTE,<$c$c>19316,<$c$c>15253,<$c$c>22/10/2012,<$c$c>22/10/2012

xxxxxx,xx,<$c$c>REMOTE,<$c$c>LOCAL,<$c$c>1865871,<$c$c>383666,<$c$c>22/10/2012,<$c$c>22/10/2012

XXXXXX,XX,远程 1180306134 19/10/2012

在这里领域的 本地 远程 的< STRONG> MLOCAL MREMOTE 的显示,如:


  1. 当它们被显示为对(本地 /的远程的),如果 3 字段是 MLOCAL 4 的字段是 MREMOTE 的,然后点击 5 7日字段重新present的价值和日期 MLOCAL 6 8 的再present的价值和日期的 MREMOTE

  2. 当它们被显示为单一(仅 本地 或只的 远程 的),那么第4和第5场重新present场3

现在,我已经分手用这些行:

  NAWK'BEGIN {而(函数getline&下;'$ FILEDATA')
拆分($ 0英尺,);
名称=英尺[1];
ID =英尺[2]
 ?=英尺[3]
 ?=英尺[4]
....................

但因为我无法找到第三和第四场的模式,我pretty坚持继续以使用它们进行进一步的处理分配变种名称为每个数组元素。

现在,我试图用案例的声明,但不工作的AWK或NAWK(仅在gawk的工作如预期)。我也试过这样的:

 如果(英尺[3] ==MLOCAL&放大器;&安培;!英尺[4] =MREMOTE)
{
        MLOCAL =英尺[3];
        MLOCAL_qty =英尺[4];
        MLOCAL_TIMESTAMP =英尺[5];
}
否则如果(英尺[3] == MLOCAL&放大器;&放大器;英尺[4] == MREMOTE)
{
        MLOCAL =英尺[3];
        MREMOTE =英尺[4];
        MOCAL_qty =英尺[5];
        MREMOTE_qty =英尺[6];
        MOCAL_TIMESTAMP =英尺[7];
        MREMOTE_TIMESTAMP =英尺; [8]
}
否则如果(英尺[3] == MREMOTE&放大器;&放大器;!英尺[4] =莫卡尔)
{
        MREMOTE =英尺[3];
        MREMOTE_qty =英尺[4];
        MREMOTE_TIMESTAMP =英尺[5];
..........................................

但它不工作为好。

所以,如果你有任何想法如何处理这个问题,我将不胜感激给我一个提示,以便能够以涵盖上述所有可能的情况下,找到一种模式。

修改

我不知道该怎么感谢你这一切帮助。现在,我要做的是更复杂的比我上面写的,我会尽力来形容简单,只要我可以,否则我会让你们pretty困惑。
我的输出应该像下面这样:

<$c$c>NAME,<$c$c>UNIQUE_ID,<$c$c>VOLUME_ALOCATED,<$c$c>MLOCAL_VALUE,<$c$c>MLOCAL_TIMESTMP,<$c$c>MLOCAL_limit,<$c$c>LOCAL_VALUE,<$c$c>LOCAL_TIMESTAMP,<$c$c>LOCAL_limit,<$c$c>MREMOTE_VALUE,<$c$c>MREMOTE_TIMESTAMP,<$c$c>REMOTE_VALUE,<$c$c>REMOTE_TIMESTAMP

(其中 MLOCAL_limit LOCAL_limit 是之间的减法结果 VOLUME_ALOCATED MLOCAL_VALUE LOCAL_VALUE

所以,在我的输出文件,字段位置应安排这样的:
第四场 = MLOCAL_VALUE 第5场 = MLOCAL_TIMESTMP 第七场 = LOCAL_VALUE
第8场 = LOCAL_TIMESTAMP 第10场 = MREMOTE_VALUE 第11场 = MREMOTE_TIMESTAMP 第12场 = REMOTE_VALUE 第13场 = REMOTE_TIMESTAMP

现在,一个例子是这样的:
对以下输入: <$c$c>name,<$c$c>ID,<$c$c>VOLUME_ALLOCATED,<$c$c>MLOCAL,<$c$c>MREMOTE,<$c$c>33222,<$c$c>56,<$c$c>22/10/2012,<$c$c>18/10/2012

<$c$c>name,<$c$c>ID,<$c$c>VOLUME_ALLOCATED,<$c$c>REMOTE,<$c$c>234455,<$c$c>19/12/2012

我应该处理这条线和输出应该是这样的:

<$c$c>name,<$c$c>ID,<$c$c>VOLUME_ALLOCATED,<$c$c>33222,<$c$c>22/10/2012,<$c$c>MLOCAL_LIMIT,<$c$c> ,<$c$c>,<$c$c>,<$c$c>56,<$c$c>18/10/2012,<$c$c>,<$c$c>

7 8 9 12 13 字段为空,因为没有相关的信息: <$c$c>LOCAL_VALUE,<$c$c>LOCAL_TIMESTAMP,<$c$c>LOCAL_limit,<$c$c>REMOTE_VALUE,和 REMOTE_TIMESTAMP

<$c$c>name,<$c$c>ID,<$c$c>VOLUME_ALLOCATED,<$c$c>,<$c$c>,<$c$c>,<$c$c>,<$c$c>,<$c$c>,<$c$c>,<$c$c>,<$c$c>234455,<$c$c>9/12/2012

<$c$c>4th,<$c$c>5th,<$c$c>6th,<$c$c>7th,<$c$c>8th,<$c$c>9th,<$c$c>10thand , 11 ,字段为空值,因为没有关于没有资料: <$c$c>MLOCAL_VALUE,<$c$c>MLOCAL_TIMESTAMP,<$c$c>MLOCAL_LIMIT,<$c$c>LOCAL_VALUE,<$c$c>LOCAL_TIMESTAMP,<$c$c>LOCAL_LIMIT,<$c$c>MREMOTE_VALUE,<$c$c>MREMOTE_TIMESTAMP

VOLUME_ALLOCATED 从其他CSV文件中检索(称为info.csv),根据 ID 字段,前面的脚本处理,如:

info.csv

VOLUME_ALLOCATED ID 客户端
   5242881 64 用户
   567743 24 游客

data.csv

名称 64 MLOCAL 341993 23/10/2012
  <$c$c>NAME,<$c$c>24,<$c$c>LOCAL$<$c$c>REMOTE,<$c$c>2347$<$c$c>4324,<$c$c>19/12/2012$<$c$c>18/12/2012

现在,我的code是这样的:

 #!在/ usr /斌/庆典输入=info.csv
FILEDATA =data.csv
OUTFILE =走出去NAWK'BEGIN {
而(函数getline&LT;'$输入')
{
拆分($ 0英尺,);
体积=英尺[1];
ID =英尺[2];
客户=英尺[3];关键= ID;
volumeArr [关键] =体积;
clientArr [关键] =客户端;
}
关闭('$输入');而(函数getline&下;'$ FILEDATA')
{
GSUB(/ \\ $ /,,); #替换$分离器逗号
拆分($ 0英尺,);
体积= volumeArr [ID] #获取来自volumeArr音量,使用ID为重点
段= clientArr [ID] #从clientArr获取客户端模式,采用ID为重点
NAME =英尺[1];
ID =英尺[2];


  

在这里我坚持,我无法找到设置的其余部分的正确方法
  因为我的字段不知道如何处理的第三和第四个领域。


 ? =英尺[3];
? =英尺[4];

对不起,如果我让你弄得$ $ p但ptty这是我目前的状况现在。
谢谢


解决方案

您没有提供预期的输出从您的样品输入,但这里有一个开端,说明如何获取值的2个不同的格式输入线:

  $猫tst.awk
BEGIN {FS =,; OFS =\\ t的}
{
   删除值#或采用分体式(,value)如果您的awk不能删除阵列
   如果($ 4〜/ LOCAL |远程/){
      值[$ 3] = $ 5
      日期[$ 3] = $ 7
      值[$ 4] = $ 6个
      日期[$ 4] = $ 8个
   }
   其他{
      值[$ 3] = $ 4个
      日期[$ 3] = $ 5
   }   打印
   对(价值型){
      printf的%15秒15秒%15S%的\\ n,类型,值[类型],日期[类型]
   }
}
$ AWK -f tst.awk文件
XXXXXX,XX,MLOCAL,MREMOTE,33222,56,22 / 10 / 2012,18 /二千零十二分之十
        MREMOTE 56 18/10/2012
         MLOCAL 33222 22/10/2012
XXXXXX,XX,MREMOTE,MLOCAL,33222,56,22 / 10 / 2012,18 /二千零十二分之十
        MREMOTE 33222 22/10/2012
         MLOCAL 56 18/10/2012
XXXXXX,XX,MLOCAL,* 341993,22 / 10/2012 *
         MLOCAL * 341993 22日/ 10/2012 *
XXXXXX,XX,MREMOTE,9356828,08 /二千零十二分之十
        MREMOTE 9356828 2012年8月10日
XXXXXX,XX,本地,远程19316,15253,22 / 10 / 2012,22 /二千零十二分之十
         REMOTE 15253 22/10/2012
          LOCAL 19316 22/10/2012
XXXXXX,XX,远程,本地,1865871,383666,22 / 10 / 2012,22 /二千零十二分之十
         REMOTE 1865871 22/10/2012
          LOCAL 383666 22/10/2012
XXXXXX,XX,远程1180306134,19 /二千零十二分之一十
         REMOTE 1180306134 19/10/2012

如果你发布预期的输出,我们可以帮助你。

I have a large .csv file to to process and my elements are arranged randomly like this:

xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012 xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012 xxxxxx,xx,MLOCAL,341993,22/10/2012
xxxxxx,xx,MREMOTE,9356828,08/10/2012
xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012
xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012
xxxxxx,xx,REMOTE,1180306134,19/10/2012

where fields LOCAL, REMOTE, MLOCAL or MREMOTE are displayed like:

  1. when they are displayed as pairs (LOCAL/REMOTE) if 3rd field is MLOCAL, and 4th field is MREMOTE, then 5th and 7th field represent the value and date of MLOCAL, and 6th and 8th represent the value and date of MREMOTE
  2. when they are displayed as single (only LOCAL or only REMOTE) then the 4th and 5th fields represent the value and date of field 3.

Now, I have split these rows using:

nawk 'BEGIN{

while (getline < "'"$filedata"'")
split($0,ft,",");
name=ft[1];
ID=ft[2]
 ?=ft[3]
 ?=ft[4]
....................

but because I can't find a pattern for the 3rd and 4th field I'm pretty stuck to continue to assign var names for each of the array elements in order to use them for further processing.

Now, I tried to use "case" statement but isn't working for awk or nawk (only in gawk is working as expected). I also tried this:

if ( ft[3] == "MLOCAL" && ft[4]!= "MREMOTE" )
{
        MLOCAL=ft[3];
        MLOCAL_qty=ft[4];
        MLOCAL_TIMESTAMP=ft[5];
}
else if ( ft[3] == MLOCAL && ft[4] == MREMOTE )
{
        MLOCAL=ft[3];
        MREMOTE=ft[4];
        MOCAL_qty=ft[5];
        MREMOTE_qty=ft[6];
        MOCAL_TIMESTAMP=ft[7];
        MREMOTE_TIMESTAMP=ft[8];
}
else if ( ft[3] == MREMOTE && ft[4] != MOCAL )
{
        MREMOTE=ft[3];
        MREMOTE_qty=ft[4];
        MREMOTE_TIMESTAMP=ft[5];
..........................................

but it's not working as well.

So, if you have any idea how to handle this, I would be grateful to give me a hint in order to be able to find a pattern in order to cover all the possible situations from above.

EDIT

I don't know how to thank you for all this help. Now, what I have to do is more complex than I wrote above, I'll try to describe as simple as I can otherwise I'll make you guys pretty confused. My output should be like following:

NAME,UNIQUE_ID,VOLUME_ALOCATED,MLOCAL_VALUE,MLOCAL_TIMESTMP,MLOCAL_limit,LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_limit,MREMOTE_VALUE,MREMOTE_TIMESTAMP,REMOTE_VALUE,REMOTE_TIMESTAMP

(where MLOCAL_limit and LOCAL_limit are a subtract result between VOLUME_ALOCATED and MLOCAL_VALUE or LOCAL_VALUE)

So, in my output file, fields position should be arranged like: 4th field =MLOCAL_VALUE,5th field =MLOCAL_TIMESTMP,7th field=LOCAL_VALUE, 8th field=LOCAL_TIMESTAMP,10th field=MREMOTE_VALUE,11th field=MREMOTE_TIMESTAMP,12th field=REMOTE_VALUE,13th field=REMOTE_TIMESTAMP

Now, an example would be this: for the following input: name,ID,VOLUME_ALLOCATED,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012

name,ID,VOLUME_ALLOCATED,REMOTE,234455,19/12/2012

I should process this line and the output should be this:

name,ID,VOLUME_ALLOCATED,33222,22/10/2012,MLOCAL_LIMIT, ,,,56,18/10/2012,,

7th ,8th, 9th,12th, and 13th fields are empty because there is no info related to: LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_limit,REMOTE_VALUE, and REMOTE_TIMESTAMP

OR

name,ID,VOLUME_ALLOCATED,,,,,,,,,234455,9/12/2012

4th,5th,6th,7th,8th,9th,10thand ,11th, fields should be empty values because there is no info about: MLOCAL_VALUE,MLOCAL_TIMESTAMP,MLOCAL_LIMIT,LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_LIMIT,MREMOTE_VALUE,MREMOTE_TIMESTAMP

VOLUME_ALLOCATED is retrieved from other csv file (called "info.csv") based on the ID field which is processed earlier in the script like:

info.csv

VOLUME_ALLOCATED,ID,CLIENT 5242881,64,subscriber 567743,24,visitor

data.csv

NAME,64,MLOCAL,341993,23/10/2012 NAME,24,LOCAL$REMOTE,2347$4324,19/12/2012$18/12/2012

Now, my code is this:

    #! /usr/bin/bash

input="info.csv"
filedata="data.csv"
outfile="out"

nawk 'BEGIN{
while (getline < "'"$input"'")
{
split($0,ft,",");
volume=ft[1];
id=ft[2];
client=ft[3];

key=id;
volumeArr[key]=volume;
clientArr[key]=client;
}
close("'"$input"'");

while (getline < "'"$filedata"'")
{
gsub(/\$/,","); # substitute the $ separator with comma
split($0,ft,",");
volume=volumeArr[id]; # Get the volume from the volumeArr, using "id" as key
segment=clientArr[id]; # Get the client mode from the clientArr, using "id" as key
NAME=ft[1];
id=ft[2];

here I'm stuck, I can't find the right way to set the rest of the fields since I don't know how to handle the 3rd and 4th fields.

? =ft[3];
? =ft[4];

Sorry, if I make you pretty confused but this is my current situation right now. Thanks

解决方案

You didn't provide the expected output from your sample input but here's a start to show how to get the values for the 2 different formats of input line:

$ cat tst.awk
BEGIN{ FS=","; OFS="\t" }
{
   delete value       # or use split("",value) if your awk cant delete arrays
   if ($4 ~ /LOCAL|REMOTE/) {
      value[$3] = $5
      date[$3]  = $7
      value[$4] = $6
      date[$4]  = $8
   }
   else {
      value[$3] = $4
      date[$3]  = $5
   }

   print
   for (type in value) {
      printf "%15s%15s%15s\n", type, value[type], date[type]
   }
}
$ awk -f tst.awk file
xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
        MREMOTE             56     18/10/2012
         MLOCAL          33222     22/10/2012
xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012
        MREMOTE          33222     22/10/2012
         MLOCAL             56     18/10/2012
xxxxxx,xx,MLOCAL,*341993,22/10/2012*
         MLOCAL        *341993    22/10/2012*
xxxxxx,xx,MREMOTE,9356828,08/10/2012
        MREMOTE        9356828     08/10/2012
xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012
         REMOTE          15253     22/10/2012
          LOCAL          19316     22/10/2012
xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012
         REMOTE        1865871     22/10/2012
          LOCAL         383666     22/10/2012
xxxxxx,xx,REMOTE,1180306134,19/10/2012
         REMOTE     1180306134     19/10/2012

and if you post the expected output we could help you more.

这篇关于AWK设置数组元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆