AWK设置数组元素 [英] awk set elements in array
问题描述
我有一个大的.csv文件来处理,我的元素被随机安排是这样的:
xxxxxx,xx,<$c$c>MLOCAL$c$c>,<$c$c>MREMOTE$c$c>,<$c$c>33222$c$c>,<$c$c>56$c$c>,<$c$c>22/10/2012$c$c>,<$c$c>18/10/2012$c$c>
xxxxxx,xx,<$c$c>MREMOTE$c$c>,<$c$c>MLOCAL$c$c>,<$c$c>33222$c$c>,<$c$c>56$c$c>,<$c$c>22/10/2012$c$c>,<$c$c>18/10/2012$c$c>
xxxxxx,xx,<$c$c>MLOCAL$c$c>,<$c$c>341993$c$c>,<$c$c>22/10/2012$c$c>
XXXXXX,XX的 MREMOTE
, 9356828
, 2012年8月10日
的结果
xxxxxx,xx,<$c$c>LOCAL$c$c>,<$c$c>REMOTE$c$c>,<$c$c>19316$c$c>,<$c$c>15253$c$c>,<$c$c>22/10/2012$c$c>,<$c$c>22/10/2012$c$c>
xxxxxx,xx,<$c$c>REMOTE$c$c>,<$c$c>LOCAL$c$c>,<$c$c>1865871$c$c>,<$c$c>383666$c$c>,<$c$c>22/10/2012$c$c>,<$c$c>22/10/2012$c$c>
XXXXXX,XX,远程
, 1180306134
, 19/10/2012
在这里领域的 本地
远程
的< STRONG> MLOCAL
或 MREMOTE
的显示,如:
- 当它们被显示为对(本地 /的远程的),如果 3 字段是
MLOCAL
和 4 的字段是MREMOTE
的,然后点击 5 和 7日字段重新present的价值和日期MLOCAL
和 6 和 8 的再present的价值和日期的MREMOTE
的 - 当它们被显示为单一(仅
本地
或只的远程
的),那么第4和第5场重新present场3
现在,我已经分手用这些行:
NAWK'BEGIN {而(函数getline&下;'$ FILEDATA')
拆分($ 0英尺,);
名称=英尺[1];
ID =英尺[2]
?=英尺[3]
?=英尺[4]
....................
但因为我无法找到第三和第四场的模式,我pretty坚持继续以使用它们进行进一步的处理分配变种名称为每个数组元素。
现在,我试图用案例的声明,但不工作的AWK或NAWK(仅在gawk的工作如预期)。我也试过这样的:
如果(英尺[3] ==MLOCAL&放大器;&安培;!英尺[4] =MREMOTE)
{
MLOCAL =英尺[3];
MLOCAL_qty =英尺[4];
MLOCAL_TIMESTAMP =英尺[5];
}
否则如果(英尺[3] == MLOCAL&放大器;&放大器;英尺[4] == MREMOTE)
{
MLOCAL =英尺[3];
MREMOTE =英尺[4];
MOCAL_qty =英尺[5];
MREMOTE_qty =英尺[6];
MOCAL_TIMESTAMP =英尺[7];
MREMOTE_TIMESTAMP =英尺; [8]
}
否则如果(英尺[3] == MREMOTE&放大器;&放大器;!英尺[4] =莫卡尔)
{
MREMOTE =英尺[3];
MREMOTE_qty =英尺[4];
MREMOTE_TIMESTAMP =英尺[5];
..........................................
但它不工作为好。
所以,如果你有任何想法如何处理这个问题,我将不胜感激给我一个提示,以便能够以涵盖上述所有可能的情况下,找到一种模式。
修改
我不知道该怎么感谢你这一切帮助。现在,我要做的是更复杂的比我上面写的,我会尽力来形容简单,只要我可以,否则我会让你们pretty困惑。
我的输出应该像下面这样:
<$c$c>NAME$c$c>,<$c$c>UNIQUE_ID$c$c>,<$c$c>VOLUME_ALOCATED$c$c>,<$c$c>MLOCAL_VALUE$c$c>,<$c$c>MLOCAL_TIMESTMP$c$c>,<$c$c>MLOCAL_limit$c$c>,<$c$c>LOCAL_VALUE$c$c>,<$c$c>LOCAL_TIMESTAMP$c$c>,<$c$c>LOCAL_limit$c$c>,<$c$c>MREMOTE_VALUE$c$c>,<$c$c>MREMOTE_TIMESTAMP$c$c>,<$c$c>REMOTE_VALUE$c$c>,<$c$c>REMOTE_TIMESTAMP$c$c>
(其中 MLOCAL_limit
和 LOCAL_limit
是之间的减法结果 VOLUME_ALOCATED
和 MLOCAL_VALUE
或 LOCAL_VALUE
)
所以,在我的输出文件,字段位置应安排这样的:
第四场 = MLOCAL_VALUE
,第5场 = MLOCAL_TIMESTMP
, 第七场 = LOCAL_VALUE
,
第8场 = LOCAL_TIMESTAMP
,第10场 = MREMOTE_VALUE
, 第11场 = MREMOTE_TIMESTAMP
,第12场 = REMOTE_VALUE
, 第13场 = REMOTE_TIMESTAMP
现在,一个例子是这样的:
对以下输入: <$c$c>name$c$c>,<$c$c>ID$c$c>,<$c$c>VOLUME_ALLOCATED$c$c>,<$c$c>MLOCAL$c$c>,<$c$c>MREMOTE$c$c>,<$c$c>33222$c$c>,<$c$c>56$c$c>,<$c$c>22/10/2012$c$c>,<$c$c>18/10/2012$c$c>
<$c$c>name$c$c>,<$c$c>ID$c$c>,<$c$c>VOLUME_ALLOCATED$c$c>,<$c$c>REMOTE$c$c>,<$c$c>234455$c$c>,<$c$c>19/12/2012$c$c>
我应该处理这条线和输出应该是这样的:
<$c$c>name$c$c>,<$c$c>ID$c$c>,<$c$c>VOLUME_ALLOCATED$c$c>,<$c$c>33222$c$c>,<$c$c>22/10/2012$c$c>,<$c$c>MLOCAL_LIMIT$c$c>,<$c$c>$c$c> ,<$c$c>$c$c>,<$c$c>$c$c>,<$c$c>56$c$c>,<$c$c>18/10/2012$c$c>,<$c$c>$c$c>,<$c$c>$c$c>
7
, 8
, 9
, 12
和 13
字段为空,因为没有相关的信息: <$c$c>LOCAL_VALUE$c$c>,<$c$c>LOCAL_TIMESTAMP$c$c>,<$c$c>LOCAL_limit$c$c>,<$c$c>REMOTE_VALUE$c$c>,和 REMOTE_TIMESTAMP
或
<$c$c>name$c$c>,<$c$c>ID$c$c>,<$c$c>VOLUME_ALLOCATED$c$c>,<$c$c>$c$c>,<$c$c>$c$c>,<$c$c>$c$c>,<$c$c>$c$c>,<$c$c>$c$c>,<$c$c>$c$c>,<$c$c>$c$c>,<$c$c>$c$c>,<$c$c>234455$c$c>,<$c$c>9/12/2012$c$c>
<$c$c>4th$c$c>,<$c$c>5th$c$c>,<$c$c>6th$c$c>,<$c$c>7th$c$c>,<$c$c>8th$c$c>,<$c$c>9th$c$c>,<$c$c>10th$c$c>and , 11
,字段为空值,因为没有关于没有资料: <$c$c>MLOCAL_VALUE$c$c>,<$c$c>MLOCAL_TIMESTAMP$c$c>,<$c$c>MLOCAL_LIMIT$c$c>,<$c$c>LOCAL_VALUE$c$c>,<$c$c>LOCAL_TIMESTAMP$c$c>,<$c$c>LOCAL_LIMIT$c$c>,<$c$c>MREMOTE_VALUE$c$c>,<$c$c>MREMOTE_TIMESTAMP$c$c>
VOLUME_ALLOCATED
从其他CSV文件中检索(称为info.csv),根据 ID
字段,前面的脚本处理,如:
info.csv
VOLUME_ALLOCATED
, ID
,客户端
5242881
, 64
,用户
567743
, 24
,游客
data.csv
名称
, 64
, MLOCAL
, 341993
, 23/10/2012
<$c$c>NAME$c$c>,<$c$c>24$c$c>,<$c$c>LOCAL$c$c>$<$c$c>REMOTE$c$c>,<$c$c>2347$c$c>$<$c$c>4324$c$c>,<$c$c>19/12/2012$c$c>$<$c$c>18/12/2012$c$c>
现在,我的code是这样的:
#!在/ usr /斌/庆典输入=info.csv
FILEDATA =data.csv
OUTFILE =走出去NAWK'BEGIN {
而(函数getline&LT;'$输入')
{
拆分($ 0英尺,);
体积=英尺[1];
ID =英尺[2];
客户=英尺[3];关键= ID;
volumeArr [关键] =体积;
clientArr [关键] =客户端;
}
关闭('$输入');而(函数getline&下;'$ FILEDATA')
{
GSUB(/ \\ $ /,,); #替换$分离器逗号
拆分($ 0英尺,);
体积= volumeArr [ID] #获取来自volumeArr音量,使用ID为重点
段= clientArr [ID] #从clientArr获取客户端模式,采用ID为重点
NAME =英尺[1];
ID =英尺[2];
在这里我坚持,我无法找到设置的其余部分的正确方法
因为我的字段不知道如何处理的第三和第四个领域。
块引用>? =英尺[3];
? =英尺[4];对不起,如果我让你弄得$ $ p但ptty这是我目前的状况现在。
谢谢解决方案您没有提供预期的输出从您的样品输入,但这里有一个开端,说明如何获取值的2个不同的格式输入线:
$猫tst.awk
BEGIN {FS =,; OFS =\\ t的}
{
删除值#或采用分体式(,value)如果您的awk不能删除阵列
如果($ 4〜/ LOCAL |远程/){
值[$ 3] = $ 5
日期[$ 3] = $ 7
值[$ 4] = $ 6个
日期[$ 4] = $ 8个
}
其他{
值[$ 3] = $ 4个
日期[$ 3] = $ 5
} 打印
对(价值型){
printf的%15秒15秒%15S%的\\ n,类型,值[类型],日期[类型]
}
}
$ AWK -f tst.awk文件
XXXXXX,XX,MLOCAL,MREMOTE,33222,56,22 / 10 / 2012,18 /二千零十二分之十
MREMOTE 56 18/10/2012
MLOCAL 33222 22/10/2012
XXXXXX,XX,MREMOTE,MLOCAL,33222,56,22 / 10 / 2012,18 /二千零十二分之十
MREMOTE 33222 22/10/2012
MLOCAL 56 18/10/2012
XXXXXX,XX,MLOCAL,* 341993,22 / 10/2012 *
MLOCAL * 341993 22日/ 10/2012 *
XXXXXX,XX,MREMOTE,9356828,08 /二千零十二分之十
MREMOTE 9356828 2012年8月10日
XXXXXX,XX,本地,远程19316,15253,22 / 10 / 2012,22 /二千零十二分之十
REMOTE 15253 22/10/2012
LOCAL 19316 22/10/2012
XXXXXX,XX,远程,本地,1865871,383666,22 / 10 / 2012,22 /二千零十二分之十
REMOTE 1865871 22/10/2012
LOCAL 383666 22/10/2012
XXXXXX,XX,远程1180306134,19 /二千零十二分之一十
REMOTE 1180306134 19/10/2012如果你发布预期的输出,我们可以帮助你。
I have a large .csv file to to process and my elements are arranged randomly like this:
xxxxxx,xx,
MLOCAL
,MREMOTE
,33222
,56
,22/10/2012
,18/10/2012
xxxxxx,xx,MREMOTE
,MLOCAL
,33222
,56
,22/10/2012
,18/10/2012
xxxxxx,xx,MLOCAL
,341993
,22/10/2012
xxxxxx,xx,MREMOTE
,9356828
,08/10/2012
xxxxxx,xx,LOCAL
,REMOTE
,19316
,15253
,22/10/2012
,22/10/2012
xxxxxx,xx,REMOTE
,LOCAL
,1865871
,383666
,22/10/2012
,22/10/2012
xxxxxx,xx,REMOTE
,1180306134
,19/10/2012
where fields
LOCAL
,REMOTE
,MLOCAL
orMREMOTE
are displayed like:
- when they are displayed as pairs (LOCAL/REMOTE) if 3rd field is
MLOCAL
, and 4th field isMREMOTE
, then 5th and 7th field represent the value and date ofMLOCAL
, and 6th and 8th represent the value and date ofMREMOTE
- when they are displayed as single (only
LOCAL
or onlyREMOTE
) then the 4th and 5th fields represent the value and date of field 3.Now, I have split these rows using:
nawk 'BEGIN{ while (getline < "'"$filedata"'") split($0,ft,","); name=ft[1]; ID=ft[2] ?=ft[3] ?=ft[4] ....................
but because I can't find a pattern for the 3rd and 4th field I'm pretty stuck to continue to assign var names for each of the array elements in order to use them for further processing.
Now, I tried to use "case" statement but isn't working for awk or nawk (only in gawk is working as expected). I also tried this:
if ( ft[3] == "MLOCAL" && ft[4]!= "MREMOTE" ) { MLOCAL=ft[3]; MLOCAL_qty=ft[4]; MLOCAL_TIMESTAMP=ft[5]; } else if ( ft[3] == MLOCAL && ft[4] == MREMOTE ) { MLOCAL=ft[3]; MREMOTE=ft[4]; MOCAL_qty=ft[5]; MREMOTE_qty=ft[6]; MOCAL_TIMESTAMP=ft[7]; MREMOTE_TIMESTAMP=ft[8]; } else if ( ft[3] == MREMOTE && ft[4] != MOCAL ) { MREMOTE=ft[3]; MREMOTE_qty=ft[4]; MREMOTE_TIMESTAMP=ft[5]; ..........................................
but it's not working as well.
So, if you have any idea how to handle this, I would be grateful to give me a hint in order to be able to find a pattern in order to cover all the possible situations from above.
EDIT
I don't know how to thank you for all this help. Now, what I have to do is more complex than I wrote above, I'll try to describe as simple as I can otherwise I'll make you guys pretty confused. My output should be like following:
NAME
,UNIQUE_ID
,VOLUME_ALOCATED
,MLOCAL_VALUE
,MLOCAL_TIMESTMP
,MLOCAL_limit
,LOCAL_VALUE
,LOCAL_TIMESTAMP
,LOCAL_limit
,MREMOTE_VALUE
,MREMOTE_TIMESTAMP
,REMOTE_VALUE
,REMOTE_TIMESTAMP
(where
MLOCAL_limit
andLOCAL_limit
are a subtract result betweenVOLUME_ALOCATED
andMLOCAL_VALUE
orLOCAL_VALUE
)So, in my output file, fields position should be arranged like: 4th field =
MLOCAL_VALUE
,5th field =MLOCAL_TIMESTMP
,7th field=LOCAL_VALUE
, 8th field=LOCAL_TIMESTAMP
,10th field=MREMOTE_VALUE
,11th field=MREMOTE_TIMESTAMP
,12th field=REMOTE_VALUE
,13th field=REMOTE_TIMESTAMP
Now, an example would be this: for the following input:
name
,ID
,VOLUME_ALLOCATED
,MLOCAL
,MREMOTE
,33222
,56
,22/10/2012
,18/10/2012
name
,ID
,VOLUME_ALLOCATED
,REMOTE
,234455
,19/12/2012
I should process this line and the output should be this:
name
,ID
,VOLUME_ALLOCATED
,33222
,22/10/2012
,MLOCAL_LIMIT
,,
,
,
56
,18/10/2012
,,
7th
,8th
,9th
,12th
, and13th
fields are empty because there is no info related to:LOCAL_VALUE
,LOCAL_TIMESTAMP
,LOCAL_limit
,REMOTE_VALUE
, andREMOTE_TIMESTAMP
OR
name
,ID
,VOLUME_ALLOCATED
,,
,
,
,
,
,
,
,
234455
,9/12/2012
4th
,5th
,6th
,7th
,8th
,9th
,10th
and ,11th
, fields should be empty values because there is no info about:MLOCAL_VALUE
,MLOCAL_TIMESTAMP
,MLOCAL_LIMIT
,LOCAL_VALUE
,LOCAL_TIMESTAMP
,LOCAL_LIMIT
,MREMOTE_VALUE
,MREMOTE_TIMESTAMP
VOLUME_ALLOCATED
is retrieved from other csv file (called "info.csv") based on theID
field which is processed earlier in the script like:info.csv
VOLUME_ALLOCATED
,ID
,CLIENT
5242881
,64
,subscriber
567743
,24
,visitor
data.csv
NAME
,64
,MLOCAL
,341993
,23/10/2012
NAME
,24
,LOCAL
$REMOTE
,2347
$4324
,19/12/2012
$18/12/2012
Now, my code is this:
#! /usr/bin/bash input="info.csv" filedata="data.csv" outfile="out" nawk 'BEGIN{ while (getline < "'"$input"'") { split($0,ft,","); volume=ft[1]; id=ft[2]; client=ft[3]; key=id; volumeArr[key]=volume; clientArr[key]=client; } close("'"$input"'"); while (getline < "'"$filedata"'") { gsub(/\$/,","); # substitute the $ separator with comma split($0,ft,","); volume=volumeArr[id]; # Get the volume from the volumeArr, using "id" as key segment=clientArr[id]; # Get the client mode from the clientArr, using "id" as key NAME=ft[1]; id=ft[2];
here I'm stuck, I can't find the right way to set the rest of the fields since I don't know how to handle the 3rd and 4th fields.
? =ft[3]; ? =ft[4];
Sorry, if I make you pretty confused but this is my current situation right now. Thanks
解决方案You didn't provide the expected output from your sample input but here's a start to show how to get the values for the 2 different formats of input line:
$ cat tst.awk BEGIN{ FS=","; OFS="\t" } { delete value # or use split("",value) if your awk cant delete arrays if ($4 ~ /LOCAL|REMOTE/) { value[$3] = $5 date[$3] = $7 value[$4] = $6 date[$4] = $8 } else { value[$3] = $4 date[$3] = $5 } print for (type in value) { printf "%15s%15s%15s\n", type, value[type], date[type] } } $ awk -f tst.awk file xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012 MREMOTE 56 18/10/2012 MLOCAL 33222 22/10/2012 xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012 MREMOTE 33222 22/10/2012 MLOCAL 56 18/10/2012 xxxxxx,xx,MLOCAL,*341993,22/10/2012* MLOCAL *341993 22/10/2012* xxxxxx,xx,MREMOTE,9356828,08/10/2012 MREMOTE 9356828 08/10/2012 xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012 REMOTE 15253 22/10/2012 LOCAL 19316 22/10/2012 xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012 REMOTE 1865871 22/10/2012 LOCAL 383666 22/10/2012 xxxxxx,xx,REMOTE,1180306134,19/10/2012 REMOTE 1180306134 19/10/2012
and if you post the expected output we could help you more.
这篇关于AWK设置数组元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!