XSLT:合并具有不同结构和时间表示的两个日志文件 [英] XSLT: Merging two log files with different structure and time-representation

查看:17
本文介绍了XSLT:合并具有不同结构和时间表示的两个日志文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如 Dimitre Novatchev 所问,我创建了一个新问题,因为旧问题的某些部分发生了变化.

As asked by Dimitre Novatchev I created a new question, as some parts of the old question changed.

(链接到旧问题:使用日期和时间戳合并两个不同的 XML 日志文件(跟踪和消息)?)

我需要合并两个 XML 日志文件(最大 700MB).一个日志文件包含带有位置更新的跟踪.另一个日志文件包含接收到的消息.可以有多个接收到的消息,而在中间没有位置更新,反之亦然.

I need to merge two XML log files (up to 700MB). One log file contains a trace with position updates. The other log file contains the received messages. There can be multiple received messages without having a position update inbetween and the other way round.

两个日志都有时间戳,包括毫秒(在本例中为 123):

Both logs have timestamps including milliseconds (123 in this example):

  • 跟踪日志使用<date>(例如 14.7.2012 11:08:07.123)
  • 消息日志使用unix时间戳<timeStamp>(例如 1342264087123)

还有其他<timeStamp>包含在消息日志中的元素,但只有在 messageList/Message/originator/originatorPosition/timeStamp 路径中的元素是相关的.

There are also other <timeStamp> elements included in the message log, but only the one within the path messageList/Message/originator/originatorPosition/timeStamp is relevant.

以下结构略有简化,因为省略了加速"等附加内容.只需将此附加内容与其余消息/项目一起复制即可.

The following structures are slightly simplified, as additional content like "acceleration" etc. is left out. This additional content just needs to be copied together with the rest of the messages/items.

位置轨迹的结构如下:

<itemList>
    <item>
        <date>14.7.2012 12:13:05.123</date>
        <FilteredPosition>
            <Latitude>51.12235</Latitude>
            <Longitude>9.347214</Longitude>
        </FilteredPosition>
    </item>
    <item>
        <date>14.7.2012 12:13:07.456</date>
        <FilteredPosition>
            <Latitude>51.12235</Latitude>
            <Longitude>9.347214</Longitude>
        </FilteredPosition>
    </item>
</itemList>

消息日志的结构是这样的:

The structure of the message log is like that:

<messageList>
    <Message>
        <messageId>1234</messageId>
        <originator>
            <originatorPosition>
                <nodeId>2345</nodeId>
                <timeStamp>1342264087061</timeStamp>
            </originatorPosition>
            <senderPosition>
                <nodeId>2345</nodeId>
                <timeStamp>1342264087234</timeStamp>
            </senderPosition>
            <medium></medium>
        </originator>
        <MessagePayload>
           <generationTime>
              <timeStamp>1342264087</timeStamp>
              <milliSec>42</milliSec>
           </generationTime>
        </MessagePayload>
    </Message>
    <Message>
        <messageId>1234</messageId>
        <originator>
            <originatorPosition>
                <nodeId>2345</nodeId>
                <timeStamp>1342264088064</timeStamp>
            </originatorPosition>
            <senderPosition>
                <nodeId>2345</nodeId>
                <timeStamp>1342264088254</timeStamp>
            </senderPosition>
            <medium></medium>
        </originator>
        <MessagePayload>
           <generationTime>
              <timeStamp>1342264088</timeStamp>
              <milliSec>42</milliSec>
           </generationTime>
        </MessagePayload>
    </Message>
</messageList>

在进行合并时,应读取时间戳(还转换/比较日期"和时间戳",包括格式为14.7.2012 11:08:07.123"的毫秒)并在右侧添加所有位置和消息订购.

When doing the merging, the timestamps should be read (also converting/comparing "date" and "timestamp" including milliseconds in the format "14.7.2012 11:08:07.123") and all positions and messages added in the right order.

位置数据可以直接添加.但是,该消息应放置在 内部.标签,<日期>应该添加标签(基于消息的 unix 时间,以毫秒为单位)和 <Message>标签应替换为 <m:Message type="received">标签.项目被放置在根 中,就像位置跟踪一样.

The position data can just be added as it is. However, the message should be placed inside of <item> tags, a <date> tag should be added (based on the messages' unix time with milliseconds) and the <Message> tag should be replaced by <m:Message type="received"> tags. The items are placed within the root <itemList>, just as it has been with the position trace.

结果可能如下所示:

<itemList>
    <item>
        <date>14.7.2012 12:13:05.123</date>
        <FilteredPosition>
            <Latitude>51.12235</Latitude>
            <Longitude>9.347214</Longitude>
        </FilteredPosition>
    </item>
    <item>
        <date>14.7.2012 12:13:07.061</date>
        <m:Message type="received">
            <messageId>1234</messageId>
            <originator>
                <originatorPosition>
                    <nodeId>2345</nodeId>
                    <timeStamp>1342264087061</timeStamp>
                </originatorPosition>
                <senderPosition>
                    <nodeId>2345</nodeId>
                    <timeStamp>1342264087234</timeStamp>
                </senderPosition>
                <medium></medium>
            </originator>
            <MessagePayload>
               <generationTime>
                  <timeStamp>1342264087</timeStamp>
                  <milliSec>63</milliSec>
               </generationTime>
            </MessagePayload>
        </m:Message>
    </item>
    <item>
        <date>14.7.2012 12:13:07.456</date>
        <FilteredPosition>
            <Latitude>51.12235</Latitude>
            <Longitude>9.347214</Longitude>
        </FilteredPosition>
    </item>
    <item>
        <date>14.7.2012 12:13:08.064</date>
        <m:Message type="received">
            <messageId>1234</messageId>
            <originator>
                <originatorPosition>
                    <nodeId>2345</nodeId>
                    <timeStamp>1342264088064</timeStamp>
                </originatorPosition>
                <senderPosition>
                    <nodeId>2345</nodeId>
                    <timeStamp>1342264088254</timeStamp>
                </senderPosition>
                <medium></medium>
            </originator>
            <MessagePayload>
               <generationTime>
                  <timeStamp>1342264088</timeStamp>
                  <milliSec>70</milliSec>
               </generationTime>
            </MessagePayload>
        </m:Message>
    </item>
<itemList>  

还有一些<item>位置日志文件中不包含时间戳(并且没有FilteredPosition")的元素.这些项目可以忽略,不需要复制.

There are also some <item> elements that do not contain a timestamp (and no "FilteredPosition") inside the position log file. These items can be ignored and do not need to be copied.

我很感激 XSLT 代码方面的任何帮助,因为我对这个主题很陌生... :-/

I'd appreciate any help with the XSLT-code as I'm quite new to this topic... :-/

推荐答案

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:m="http://www.example.com/"
    exclude-result-prefixes="xs"
    version="2.0">

    <xsl:output indent="yes" method="xml"/>

    <!-- The two source-documents. -->
    <xsl:variable name="doc1" select="doc('log1.xml')"/>
    <xsl:variable name="doc2" select="doc('log2.xml')"/>

    <!-- Timezone adjustment -->
    <xsl:variable name="timezoneAdjustment" select="1"/>

    <!-- Root template to start the transformation. -->
    <xsl:template match="/">
        <!-- Transform and collect all the elements -->
        <xsl:variable name="data" as="node()*">
            <xsl:apply-templates select="$doc1/itemList/item"/>
            <xsl:apply-templates select="$doc2/messageList/Message"/>
        </xsl:variable>
        <!-- Sort by the timestamp, and discard the wrapper. -->
        <itemList>
            <xsl:for-each select="$data">
                <xsl:sort select="@timestamp" data-type="number"/>
                <xsl:copy-of select="item"/>
            </xsl:for-each>
        </itemList>
    </xsl:template>

    <!--
        Template to transform <item> elements in the first format.
        It just parses the date, and adds a wrapper with the timestamp.
    -->
    <xsl:template match="item[date]">
        <xsl:variable name="dateTimeString" select="date" as="xs:string"/>
        <xsl:variable name="datePart" select="substring-before($dateTimeString,' ')"/>
        <xsl:variable name="day" select="xs:integer(substring-before($datePart,'.'))"/>
        <xsl:variable name="month" select="xs:integer(substring-before(substring-after($datePart,'.'),'.'))"/>
        <xsl:variable name="year" select="xs:integer(substring-after(substring-after($datePart,'.'),'.'))"/>
        <xsl:variable name="timePart" select="substring-after($dateTimeString,' ')"/>
        <xsl:variable name="reformatted" select="concat(format-number($year,'0000'),'-',format-number($month,'00'),'-',format-number($day,'00'),'T',$timePart)"/>
        <xsl:variable name="timestamp" select="( xs:dateTime($reformatted) - xs:dateTime('1970-01-01T00:00:00') - $timezoneAdjustment * xs:dayTimeDuration('PT1H') ) div xs:dayTimeDuration('PT0.001S')"/>
        <wrapper timestamp="{$timestamp}">
            <xsl:copy-of select="self::*"/>
        </wrapper>
    </xsl:template>

    <!--
        Template to transform <Message> elements in the second log format.
        It generates an item with the date, and wraps it with the timestamp.
    -->
    <xsl:template match="Message[originator/originatorPosition/timeStamp]">
        <xsl:variable name="timestamp" select="originator/originatorPosition/timeStamp" as="xs:integer"/>
        <xsl:variable name="date" select="xs:dateTime('1970-01-01T00:00:00') + $timezoneAdjustment * xs:dayTimeDuration('PT1H') + $timestamp * xs:dayTimeDuration('PT0.001S')"/>
        <wrapper timestamp="{$timestamp}">
            <item>
                <date>
                    <xsl:value-of select="format-dateTime($date,'[D01].[M01].[Y0001] [H01]:[m01]:[s01].[f001]')"/>
                </date>
                <m:Message type="recieved">
                    <xsl:copy-of select="*"/>
                </m:Message>
            </item>
        </wrapper>
    </xsl:template>

</xsl:stylesheet>

我为消息添加了一个时区调整变量.

I added a variable for timezone adjustment for Messages.

修复了属性名称,因此项目将正确排序.

Fixed the attribute names, so the items will sort correctly.

这篇关于XSLT:合并具有不同结构和时间表示的两个日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆