Shell脚本-将xml拆分为多个文件 [英] Shell scripting - split xml into multiple files

查看:222
本文介绍了Shell脚本-将xml拆分为多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图将一个大型xml文件拆分为多个文件,并在AWK脚本中使用了以下代码.

Am trying to split a big xml file into multiple files, and have used the following code in AWK script.

/<fileItem>/ {
        rfile="fileItem" count ".xml"
        print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" > rfile
        print $0 > rfile
        getline
        while ($0 !~ "<\/fileItem>" ) {
                print > rfile
                getline
        }
        print $0 > rfile
        close(rfile)
        count++
}

上面的代码生成名称为"fileItem_1","fileItem_2","fileItem3"等的xml文件列表.

The code above generates a list of xml files whose names read "fileItem_1", "fileItem_2", "fileItem3", etc.

但是,我希望文件名类似于"item_XXXXX",其中XXXXX是XML内的一个节点-如下图所示

However, I would like the file name to be something like "item_XXXXX" where the XXXXX is a node inside the XML - depicted as below

<fileItem>
<id>12345</id>
<name>XXXXX</name>
</fileItem>

因此,基本上,我希望"id"节点为文件名. 谁能帮我这个忙吗?

So, basically I want the "id" node to be the filename. Can anyone please help me with this?

推荐答案

我不会使用getline. (我什至在AWK书中读到,不建议使用它.)我认为,使用全局变量进行状态处理甚至更简单. (带有全局变量的表达式也可以在模式中使用.)

I would not use getline. (I even read in an AWK book that it is not recommended to use it.) I think, using global variables for state it is even simpler. (Expressions with global variables may be used in patterns too.)

脚本可能看起来像这样:

The script could look like this:

test-split-xml.awk:

/<fileItem>/ {
  collect = 1 ; buffer = "" ; file = "fileItem_"count".xml"
  ++count
}

collect > 0 {
  if (buffer != "") buffer = buffer"\n"
  buffer = buffer $0
}

collect > 0 && /<name>.+<\/name>/ {
  # cut "...<name>"
  i = index($0, "<name>") ; file = substr($0, i + 6)
  # cut "</name>..."
  i = index(file, "</name>") ; file = substr(file, 1, i - 1)
  file = file".xml"
}

/<\/fileItem>/ {
  collect = 0;
  print file
  print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" >file
  print buffer >file
}

我准备了一些样本数据以进行小型测试:

I prepared some sample data for a small test:

test-split-xml.xml:

<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<top>
  <some>
    <fileItem>
      <id>1</id>
      <name>X1</name>
    </fileItem>
  </some>
  <fileItem>
    <id>2</id>
    <name>X2</name>
  </fileItem>
  <fileItem>
    <id>2</id>
    <!--name>X2</name-->
  </fileItem>
  <any> other input </any>
</top>

...并获得以下输出:

... and got the following output:

$ awk -f test-split-xml.awk test-split-xml.xml
X1.xml
X2.xml
fileItem_2.xml

$ more X1.xml 
<?xml version="1.0" encoding="UTF-8"?>
    <fileItem>
      <id>1</id>
      <name>X1</name>
    </fileItem>

$ more X2.xml
<?xml version="1.0" encoding="UTF-8"?>
  <fileItem>
    <id>2</id>
    <name>X2</name>
  </fileItem>

$ more fileItem_2.xml 
<?xml version="1.0" encoding="UTF-8"?>
  <fileItem>
    <id>2</id>
    <!--name>X2</name-->
  </fileItem>

$

Tripleee的评论是合理的.因此,这种处理应仅限于个人使用,因为XML文件的不同(和合法)格式可能会导致此脚本处理中的错误.

The comment of tripleee is reasonable. Thus, such processing should be limited to personal usage because different (and legal) formattings of XML files could cause errors in this script processing.

您会注意到,整个脚本中没有next.这是故意的.

As you will notice, there is no next in the whole script. This is intentionally.

这篇关于Shell脚本-将xml拆分为多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆