XSLT:合并两个具有不同结构和时间表示形式的日志文件 [英] XSLT: Merging two log files with different structure and time-representation

查看:58
本文介绍了XSLT:合并两个具有不同结构和时间表示形式的日志文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按照Dimitre Novatchev的要求,由于旧问题的某些部分发生了变化,我创建了一个新问题.

As asked by Dimitre Novatchev I created a new question, as some parts of the old question changed.

(链接到旧问题:我需要合并两个XML日志文件(最大700MB).一个日志文件包含具有位置更新的跟踪.另一个日志文件包含收到的消息.可能有多个接收到的消息,而没有之间的位置更新,反之亦然.

I need to merge two XML log files (up to 700MB). One log file contains a trace with position updates. The other log file contains the received messages. There can be multiple received messages without having a position update inbetween and the other way round.

两个日志的时间戳都包含毫秒(在此示例中为123):

Both logs have timestamps including milliseconds (123 in this example):

  • 跟踪日志使用< date> (例如14.7.2012 11:08:07.123)
  • 消息日志使用unix时间戳< timeStamp> (例如1342264087123)

还有其他< timeStamp>消息日志中包含的元素,但是仅路径messageList/Message/originator/originatorPosition/timeStamp中的元素是相关的.

There are also other <timeStamp> elements included in the message log, but only the one within the path messageList/Message/originator/originatorPosition/timeStamp is relevant.

以下结构略有简化,因为省略了诸如加速"等其他内容.这些额外的内容只需与其他消息/项目一起复制即可.

The following structures are slightly simplified, as additional content like "acceleration" etc. is left out. This additional content just needs to be copied together with the rest of the messages/items.

位置跟踪的结构如下:

<itemList>
    <item>
        <date>14.7.2012 12:13:05.123</date>
        <FilteredPosition>
            <Latitude>51.12235</Latitude>
            <Longitude>9.347214</Longitude>
        </FilteredPosition>
    </item>
    <item>
        <date>14.7.2012 12:13:07.456</date>
        <FilteredPosition>
            <Latitude>51.12235</Latitude>
            <Longitude>9.347214</Longitude>
        </FilteredPosition>
    </item>
</itemList>

消息日志的结构如下:

<messageList>
    <Message>
        <messageId>1234</messageId>
        <originator>
            <originatorPosition>
                <nodeId>2345</nodeId>
                <timeStamp>1342264087061</timeStamp>
            </originatorPosition>
            <senderPosition>
                <nodeId>2345</nodeId>
                <timeStamp>1342264087234</timeStamp>
            </senderPosition>
            <medium></medium>
        </originator>
        <MessagePayload>
           <generationTime>
              <timeStamp>1342264087</timeStamp>
              <milliSec>42</milliSec>
           </generationTime>
        </MessagePayload>
    </Message>
    <Message>
        <messageId>1234</messageId>
        <originator>
            <originatorPosition>
                <nodeId>2345</nodeId>
                <timeStamp>1342264088064</timeStamp>
            </originatorPosition>
            <senderPosition>
                <nodeId>2345</nodeId>
                <timeStamp>1342264088254</timeStamp>
            </senderPosition>
            <medium></medium>
        </originator>
        <MessagePayload>
           <generationTime>
              <timeStamp>1342264088</timeStamp>
              <milliSec>42</milliSec>
           </generationTime>
        </MessagePayload>
    </Message>
</messageList>

进行合并时,应读取时间戳(还转换/比较日期"和时间戳",包括毫秒,格式为"14.7.2012 11:08:07.123"),并在右侧添加所有位置和消息订单.

When doing the merging, the timestamps should be read (also converting/comparing "date" and "timestamp" including milliseconds in the format "14.7.2012 11:08:07.123") and all positions and messages added in the right order.

可以直接添加位置数据.但是,该消息应放在< item>内部.标签,一个< date>标签应添加(基于消息的unix时间(以毫秒为单位)),并且< Message>标签应替换为< m:Message type ="received">标签.这些项目被放置在根< itemList>内,就像位置跟踪一样.

The position data can just be added as it is. However, the message should be placed inside of <item> tags, a <date> tag should be added (based on the messages' unix time with milliseconds) and the <Message> tag should be replaced by <m:Message type="received"> tags. The items are placed within the root <itemList>, just as it has been with the position trace.

结果可能如下所示:

<itemList>
    <item>
        <date>14.7.2012 12:13:05.123</date>
        <FilteredPosition>
            <Latitude>51.12235</Latitude>
            <Longitude>9.347214</Longitude>
        </FilteredPosition>
    </item>
    <item>
        <date>14.7.2012 12:13:07.061</date>
        <m:Message type="received">
            <messageId>1234</messageId>
            <originator>
                <originatorPosition>
                    <nodeId>2345</nodeId>
                    <timeStamp>1342264087061</timeStamp>
                </originatorPosition>
                <senderPosition>
                    <nodeId>2345</nodeId>
                    <timeStamp>1342264087234</timeStamp>
                </senderPosition>
                <medium></medium>
            </originator>
            <MessagePayload>
               <generationTime>
                  <timeStamp>1342264087</timeStamp>
                  <milliSec>63</milliSec>
               </generationTime>
            </MessagePayload>
        </m:Message>
    </item>
    <item>
        <date>14.7.2012 12:13:07.456</date>
        <FilteredPosition>
            <Latitude>51.12235</Latitude>
            <Longitude>9.347214</Longitude>
        </FilteredPosition>
    </item>
    <item>
        <date>14.7.2012 12:13:08.064</date>
        <m:Message type="received">
            <messageId>1234</messageId>
            <originator>
                <originatorPosition>
                    <nodeId>2345</nodeId>
                    <timeStamp>1342264088064</timeStamp>
                </originatorPosition>
                <senderPosition>
                    <nodeId>2345</nodeId>
                    <timeStamp>1342264088254</timeStamp>
                </senderPosition>
                <medium></medium>
            </originator>
            <MessagePayload>
               <generationTime>
                  <timeStamp>1342264088</timeStamp>
                  <milliSec>70</milliSec>
               </generationTime>
            </MessagePayload>
        </m:Message>
    </item>
<itemList>  

也有一些< item>位置日志文件中不包含时间戳记(且不包含"FilteredPosition")的元素.这些项目可以忽略,不需要复制.

There are also some <item> elements that do not contain a timestamp (and no "FilteredPosition") inside the position log file. These items can be ignored and do not need to be copied.

对于XSLT代码的任何帮助,我将不胜感激,因为我对这个主题还很陌生...:-/

I'd appreciate any help with the XSLT-code as I'm quite new to this topic... :-/

推荐答案

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:m="http://www.example.com/"
    exclude-result-prefixes="xs"
    version="2.0">

    <xsl:output indent="yes" method="xml"/>

    <!-- The two source-documents. -->
    <xsl:variable name="doc1" select="doc('log1.xml')"/>
    <xsl:variable name="doc2" select="doc('log2.xml')"/>

    <!-- Timezone adjustment -->
    <xsl:variable name="timezoneAdjustment" select="1"/>

    <!-- Root template to start the transformation. -->
    <xsl:template match="/">
        <!-- Transform and collect all the elements -->
        <xsl:variable name="data" as="node()*">
            <xsl:apply-templates select="$doc1/itemList/item"/>
            <xsl:apply-templates select="$doc2/messageList/Message"/>
        </xsl:variable>
        <!-- Sort by the timestamp, and discard the wrapper. -->
        <itemList>
            <xsl:for-each select="$data">
                <xsl:sort select="@timestamp" data-type="number"/>
                <xsl:copy-of select="item"/>
            </xsl:for-each>
        </itemList>
    </xsl:template>

    <!--
        Template to transform <item> elements in the first format.
        It just parses the date, and adds a wrapper with the timestamp.
    -->
    <xsl:template match="item[date]">
        <xsl:variable name="dateTimeString" select="date" as="xs:string"/>
        <xsl:variable name="datePart" select="substring-before($dateTimeString,' ')"/>
        <xsl:variable name="day" select="xs:integer(substring-before($datePart,'.'))"/>
        <xsl:variable name="month" select="xs:integer(substring-before(substring-after($datePart,'.'),'.'))"/>
        <xsl:variable name="year" select="xs:integer(substring-after(substring-after($datePart,'.'),'.'))"/>
        <xsl:variable name="timePart" select="substring-after($dateTimeString,' ')"/>
        <xsl:variable name="reformatted" select="concat(format-number($year,'0000'),'-',format-number($month,'00'),'-',format-number($day,'00'),'T',$timePart)"/>
        <xsl:variable name="timestamp" select="( xs:dateTime($reformatted) - xs:dateTime('1970-01-01T00:00:00') - $timezoneAdjustment * xs:dayTimeDuration('PT1H') ) div xs:dayTimeDuration('PT0.001S')"/>
        <wrapper timestamp="{$timestamp}">
            <xsl:copy-of select="self::*"/>
        </wrapper>
    </xsl:template>

    <!--
        Template to transform <Message> elements in the second log format.
        It generates an item with the date, and wraps it with the timestamp.
    -->
    <xsl:template match="Message[originator/originatorPosition/timeStamp]">
        <xsl:variable name="timestamp" select="originator/originatorPosition/timeStamp" as="xs:integer"/>
        <xsl:variable name="date" select="xs:dateTime('1970-01-01T00:00:00') + $timezoneAdjustment * xs:dayTimeDuration('PT1H') + $timestamp * xs:dayTimeDuration('PT0.001S')"/>
        <wrapper timestamp="{$timestamp}">
            <item>
                <date>
                    <xsl:value-of select="format-dateTime($date,'[D01].[M01].[Y0001] [H01]:[m01]:[s01].[f001]')"/>
                </date>
                <m:Message type="recieved">
                    <xsl:copy-of select="*"/>
                </m:Message>
            </item>
        </wrapper>
    </xsl:template>

</xsl:stylesheet>

编辑:我为消息的时区调整添加了一个变量.

I added a variable for timezone adjustment for Messages.

:修复了属性名称,因此各项将正确排序.

Fixed the attribute names, so the items will sort correctly.

这篇关于XSLT:合并两个具有不同结构和时间表示形式的日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆