提取标签之间的数据<t></t> [英] extract data between tags <t> </t>
本文介绍了提取标签之间的数据<t></t>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有如下数据如何打印两个标签之间的数据我希望数据是命令分隔的 csv 格式
I have a data like below How to print the data which is between two tags I want the data to be command-separated csv format
我的方法是将数据转换为水平格式,然后在每 4 列之后剪切并转换为垂直格式
My approach was to convert data to horizontal format and then cut after every 4th column and convert to vertical
xml 文件中的数据
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
-
<sst uniqueCount="12" count="12"
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
-
<si>
<t>"NAME"</t>
</si>
-
<si>
<t>"Vikas"</t>
</si>
-
<si>
<t>"Vijay"</t>
</si>
-
<si>
<t>"Vilas"</t>
</si>
-
<si>
<t>"AGE"</t>
</si>
-
<si>
<t>"24"</t>
</si>
-
<si>
<t>"34"</t>
</si>
-
<si>
<t>"35"</t>
</si>
-
<si>
<t>"COURSE"</t>
</si>
-
<si>
<t>"MCA"</t>
</si>
-
<si>
<t>"MMS"</t>
</si>
-
<si>
<t>"MBA"</t>
</si>
</sst>
我试过下面这个命令不起作用..
awk '/<t/{flag=1;next}/<t/{flag=0}flag' abc.xml
即使在下面的命令中尝试过,它也提供了单行数据
awk -F'(</*t>|</*t>)' 'NF>1{for(i=2;i<NF; i=i+2) printf("%s%s", $i, (i+1==NF)?ORS:OFS)}' OFS=',' demo.xml
我想要以下数据作为输出
I want below data as output
NAME,AGE,Course
Vikas,"25",MCA
Prabhash,"34",MBA
Arjun,"21",MMS
推荐答案
仅凭您显示的示例,您可以尝试以下操作.
With your shown samples only could you please try following.
awk -v OFS="," '
!NF || /^-$/{ next }
/<t>"COURSE"<\/t>/{
foundAge=foundName=""
foundCourse=1
count=0
}
/<t>"AGE"<\/t>/{
foundAge=1
foundName=""
count=0
}
/<t>"NAME"<\/t>/{
foundName=1
count=0
}
foundAge && match($0,/>[^<]*/){
age[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundName && match($0,/>[^<]*/){
name[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundCourse && match($0,/>[^<]*/){
course[++count]=substr($0,RSTART+1,RLENGTH-1)
}
END{
for(k=1;k<=count;k++){
if(name[k]){
print name[k],age[k],course[k]
}
}
}
' Input_file
说明:为以上添加详细说明.
Explanation: Adding detailed explanation for above.
awk -v OFS="," ' ##Starting awk program from here.
!NF || /^-$/{ next } ##if line is empty or starts with - then skip that line.
/<t>"COURSE"<\/t>/{ ##Checking if line has <t>"COURSE"</t> then do following.
foundAge=foundName="" ##Nullifying foundAge and foundName here.
foundCourse=1 ##Setting foundCourse to 1 here.
count=0 ##Setting count to 0 here.
}
/<t>"AGE"<\/t>/{ ##Checking if line has <t>"AGE"</t> then do following.
foundAge=1 ##Setting foundAge to 1 here.
foundName=foundCourse="" ##Nullifying foundName and foundCourse here.
count=0 ##Setting count to 0 here.
}
/<t>"NAME"<\/t>/{ ##Checking if line has <t>"NAME"</t> then do following.
foundName=1 ##Setting foundName to 1 here.
count=0 ##Setting count to 0 here.
}
foundAge && match($0,/>[^<]*/){ ##Checking if foundAge is set and using match function to get values from > to till < here.
age[++count]=substr($0,RSTART+1,RLENGTH-1) ##Creating age with index of count and having matched regex value here.
}
foundName && match($0,/>[^<]*/){ ##Checking if foundName is set and using match function to get values from > to till < here.
name[++count]=substr($0,RSTART+1,RLENGTH-1) ##Creating name with index of count and having matched regex value here.
}
foundCourse && match($0,/>[^<]*/){ ##Checking if foundCourse is set and using match function to get values from > to till < here.
course[++count]=substr($0,RSTART+1,RLENGTH-1) ##Creating course with index of count and having matched regex value here.
}
END{ ##Starting END block of this awk program from here.
for(k=1;k<=count;k++){ ##Traversing through all elements of name here.
if(name[k]){
print name[k],age[k],course[k] ##Printing respective array values here.
}
}
}
' Input_file ##Mentioning Input_file name here.
根据 OP 的评论,如果在一行中需要所有值,请尝试以下操作:
As per OP's comment, if all values needed in one line then try following:
awk -v OFS="," '
!NF || /^-$/{ next }
/<t>"COURSE"<\/t>/{
foundAge=foundName=""
foundCourse=1
count=0
}
/<t>"AGE"<\/t>/{
foundAge=1
foundName=""
count=0
}
/<t>"NAME"<\/t>/{
foundName=1
count=0
}
foundAge && match($0,/>[^<]*/){
age[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundName && match($0,/>[^<]*/){
name[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundCourse && match($0,/>[^<]*/){
course[++count]=substr($0,RSTART+1,RLENGTH-1)
}
END{
for(k=1;k<=count;k++){
if(name[k]){
nameVal=(nameVal?nameVal OFS:"")name[k]
ageVal=(ageVal?ageVal OFS:"")age[k]
courseVal=(courseVal?courseVal OFS:"")course[k]
}
}
print nameVal,ageVal,courseVal
}
' Input_file
这篇关于提取标签之间的数据<t></t>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文