高级插入到 XML 文件中 [英] Advanced insert Into XML files

查看:33
本文介绍了高级插入到 XML 文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请查看我之前的这个问题.我正在尝试实现类似的目标,但这次使用了更高级的标准.

Please take a look at this previous question of mine. I'm trying to achieve something similar to it, but this time with more advanced criteria.

简单来说,我需要在它们的父节点下添加一个子节点(XML标签).子节点文本内容由从同一个 XML 文件中提取的 8 位数字组成.这些数字将被提取并存储在一个数组中以供以后处理,您将在下面的脚本中看到.

Put simply, I need to add a child nodes (XML tags) under their parents <NETTOTAL>. The child node text content consists of 8-digit numbers extracted from the same XML file. Those numbers are being extracted and stored in an array for later processing as you will see in the script below.

现有脚本有效,但我怀疑循环逻辑错误.我需要它选择并放置一个 XML 标记,在每个父项下都有相应的 8 位数字,而不是选择、循环和放置完全相同的子项.

The existing script works, but I suspect that the loop logic is wrong. I need it to pick and place one XML tag with it's corresponding 8-digit number under each parent, not pick, loop, and place the same exact child.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<EXPORT>
    <IMPORTMODEL>NEX</IMPORTMODEL>
    <SESSION>1000061</SESSION>
    <CUSTORDERS>
        <RECORD CODE="NX0100103">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>97.40</NETTOTAL>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName1 lastName1 (43700006)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100104">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>38.20</NETTOTAL>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100105">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>63.00</NETTOTAL>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName3 lastName3 (43100014)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100106">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>55.00</NETTOTAL>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
    </CUSTORDERS>
</EXPORT>

预期目标

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<EXPORT>
    <IMPORTMODEL>NEX</IMPORTMODEL>
    <SESSION>1000061</SESSION>
    <CUSTORDERS>
        <RECORD CODE="NX0100103">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>97.40</NETTOTAL>
            <SALESMAN>43700006</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName1 lastName1 (43700006)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100104">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>38.20</NETTOTAL>
            <SALESMAN>43100015</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100105">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>63.00</NETTOTAL>
            <SALESMAN>43100014</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName3 lastName3 (43100014)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100106">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>55.00</NETTOTAL>
            <SALESMAN>43100015</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
    </CUSTORDERS>
</EXPORT>

脚本

$xmlFilesLocation = "C:\XML_dumping"

cd $xmlFilesLocation

$netTotalRegEx = "(<NETTOTAL>\d{1,30}\.\d{1,2}<\/NETTOTAL>)"
$salesManRegEx = "(<SALESMAN>\d{8}<\/SALESMAN>)"

$beginTag = "`t`t`t<SALESMAN>"
$endTag = "</SALESMAN>"

$files = Get-ChildItem -Path $xmlFilesLocation -Filter *.xml

$numberOfFiles = (Get-ChildItem -Path $xmlFilesLocation -Filter *.xml | Measure-Object).Count

# First, loop through all files separately to check if <SALESMAN>[code]</SALESMAN> exists, and skip if true
for ($i=1; $i -le $numberOfFiles; $i++) {
    $content = (Get-Content $files[$i - 1] -Raw)

    # Skip file if <SALESMAN>[code]</SALESMAN> is detected in it
    if ($content -match $salesManRegEx) { break }
}

# Then, loop through all files (again) separately to check if <SALESMAN>[code]</SALESMAN> is missing, and process if true
for ($j=1; $j -le $numberOfFiles; $j++) {
    $content = (Get-Content $files[$j - 1] -Raw)

    # If <SALESMAN>[code]</SALESMAN> is missing in the file
    if ($content -notmatch $salesManRegEx) {
        $contentArray = @()

        # Hold all the content, but split from the brackets
        $contentArray = $content
        $contentArray = $contentArray.Split("()")
        # Now split by line to extract the salesman codes into an array.
        # Example: [43700006, 43100015, 43100014, 43100015]
        $contentArray = $contentArray.Split("")

        for ($k=1; $k -le $contentArray.Length; $k++) {
            # if the salesman code is found...
            if ($contentArray[$k] -match "^\d{8}$") {
                if ($content -notmatch $salesManRegEx) {
                    # Construct the full tag
                    $fullSalesManTag = $beginTag + $contentArray[$k] + $endTag

                    # ...then replace in $content the regular expression with $fullSalesManTag and insert it directly underneath NETTOTAL line
                    $content= [regex]::Replace($content, $netTotalRegEx, ('$1' + "`n" + "$fullSalesManTag"))

                    $content | Out-File -Encoding UTF8 $files[$j - 1]
                }
            }
        }
    }
}

电流输出

输出清楚地表明它只添加数组索引中的最后一个元素.那是循环结束的时候.我明白为什么会发生这种情况,但我无法想出一个解决方案来纠正逻辑.

Current Output

The output is clearly showing that it's only adding the last element in the array index. That's when the loop has ended. I understand why this is happening, but I can't wrap my head around a solution to correct the logic.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<EXPORT>
    <IMPORTMODEL>NEX</IMPORTMODEL>
    <SESSION>1000061</SESSION>
    <CUSTORDERS>
        <RECORD CODE="NX0100103">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>97.40</NETTOTAL>
            <SALESMAN>43700006</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName1 lastName1 (43700006)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100104">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>38.20</NETTOTAL>
            <SALESMAN>43700006</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100105">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>63.00</NETTOTAL>
            <SALESMAN>43700006</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName3 lastName3 (43100014)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100106">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>55.00</NETTOTAL>
            <SALESMAN>43700006</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
    </CUSTORDERS>
</EXPORT>

推荐答案

不要使用正则表达式解析 XML.每次你做彩虹独角兽都会死.

Do not parse XML with regex. Every time you do a rainbow unicorn dies.

但说真的,在大多数情况下,正则表达式是处理 XML 文件的错误工具.如果您有兴趣,这个问题的答案(感谢 kjhughes 提供链接)讨论了正则表达式方法的问题深入.

But seriously, in most cases regular expressions are the wrong tool for working with XML files. If you're interested, the answers to this question (thanks to kjhughes for the link) discuss the issues with the regex approach in depth.

使用适当的 XML 解析器和几个 XPath 表达式提取销售人员 ID 并将其添加为新节点:

Use a proper XML parser and a couple XPath expressions to extract the salesperson ID and add it as a new node:

$xmlfile = 'C:\path\to\your.xml'

[xml]$xml = Get-Content $xmlfile

$xml.SelectNodes('//RECORD') | ForEach-Object {
  $id = $_.SelectSingleNode('.//ITEMDESC').'#text' -replace '.*\((\d+)\).*', '$1'

  $sibling = $_.SelectSingleNode('./NETTOTAL')

  $node = $xml.CreateElement('SALESMAN')
  $node.InnerText = $id
  $_.InsertAfter($node, $sibling)
}

$xml.Save($xmlfile)

这篇关于高级插入到 XML 文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆