XSLT:合并具有不同结构和时间表示的两个日志文件 [英] XSLT: Merging two log files with different structure and time-representation
问题描述
正如 Dimitre Novatchev 所问,我创建了一个新问题,因为旧问题的某些部分发生了变化.
As asked by Dimitre Novatchev I created a new question, as some parts of the old question changed.
(链接到旧问题:使用日期和时间戳合并两个不同的 XML 日志文件(跟踪和消息)?)
我需要合并两个 XML 日志文件(最大 700MB).一个日志文件包含带有位置更新的跟踪.另一个日志文件包含接收到的消息.可以有多个接收到的消息,而在中间没有位置更新,反之亦然.
I need to merge two XML log files (up to 700MB). One log file contains a trace with position updates. The other log file contains the received messages. There can be multiple received messages without having a position update inbetween and the other way round.
两个日志都有时间戳,包括毫秒(在本例中为 123):
Both logs have timestamps including milliseconds (123 in this example):
- 跟踪日志使用<date>(例如 14.7.2012 11:08:07.123)
- 消息日志使用unix时间戳<timeStamp>(例如 1342264087123)
还有其他<timeStamp>包含在消息日志中的元素,但只有在 messageList/Message/originator/originatorPosition/timeStamp 路径中的元素是相关的.
There are also other <timeStamp> elements included in the message log, but only the one within the path messageList/Message/originator/originatorPosition/timeStamp is relevant.
以下结构略有简化,因为省略了加速"等附加内容.只需将此附加内容与其余消息/项目一起复制即可.
The following structures are slightly simplified, as additional content like "acceleration" etc. is left out. This additional content just needs to be copied together with the rest of the messages/items.
位置轨迹的结构如下:
<itemList>
<item>
<date>14.7.2012 12:13:05.123</date>
<FilteredPosition>
<Latitude>51.12235</Latitude>
<Longitude>9.347214</Longitude>
</FilteredPosition>
</item>
<item>
<date>14.7.2012 12:13:07.456</date>
<FilteredPosition>
<Latitude>51.12235</Latitude>
<Longitude>9.347214</Longitude>
</FilteredPosition>
</item>
</itemList>
消息日志的结构是这样的:
The structure of the message log is like that:
<messageList>
<Message>
<messageId>1234</messageId>
<originator>
<originatorPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264087061</timeStamp>
</originatorPosition>
<senderPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264087234</timeStamp>
</senderPosition>
<medium></medium>
</originator>
<MessagePayload>
<generationTime>
<timeStamp>1342264087</timeStamp>
<milliSec>42</milliSec>
</generationTime>
</MessagePayload>
</Message>
<Message>
<messageId>1234</messageId>
<originator>
<originatorPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264088064</timeStamp>
</originatorPosition>
<senderPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264088254</timeStamp>
</senderPosition>
<medium></medium>
</originator>
<MessagePayload>
<generationTime>
<timeStamp>1342264088</timeStamp>
<milliSec>42</milliSec>
</generationTime>
</MessagePayload>
</Message>
</messageList>
在进行合并时,应读取时间戳(还转换/比较日期"和时间戳",包括格式为14.7.2012 11:08:07.123"的毫秒)并在右侧添加所有位置和消息订购.
When doing the merging, the timestamps should be read (also converting/comparing "date" and "timestamp" including milliseconds in the format "14.7.2012 11:08:07.123") and all positions and messages added in the right order.
位置数据可以直接添加.但是,该消息应放置在
The position data can just be added as it is. However, the message should be placed inside of <item> tags, a <date> tag should be added (based on the messages' unix time with milliseconds) and the <Message> tag should be replaced by <m:Message type="received"> tags. The items are placed within the root <itemList>, just as it has been with the position trace.
结果可能如下所示:
<itemList>
<item>
<date>14.7.2012 12:13:05.123</date>
<FilteredPosition>
<Latitude>51.12235</Latitude>
<Longitude>9.347214</Longitude>
</FilteredPosition>
</item>
<item>
<date>14.7.2012 12:13:07.061</date>
<m:Message type="received">
<messageId>1234</messageId>
<originator>
<originatorPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264087061</timeStamp>
</originatorPosition>
<senderPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264087234</timeStamp>
</senderPosition>
<medium></medium>
</originator>
<MessagePayload>
<generationTime>
<timeStamp>1342264087</timeStamp>
<milliSec>63</milliSec>
</generationTime>
</MessagePayload>
</m:Message>
</item>
<item>
<date>14.7.2012 12:13:07.456</date>
<FilteredPosition>
<Latitude>51.12235</Latitude>
<Longitude>9.347214</Longitude>
</FilteredPosition>
</item>
<item>
<date>14.7.2012 12:13:08.064</date>
<m:Message type="received">
<messageId>1234</messageId>
<originator>
<originatorPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264088064</timeStamp>
</originatorPosition>
<senderPosition>
<nodeId>2345</nodeId>
<timeStamp>1342264088254</timeStamp>
</senderPosition>
<medium></medium>
</originator>
<MessagePayload>
<generationTime>
<timeStamp>1342264088</timeStamp>
<milliSec>70</milliSec>
</generationTime>
</MessagePayload>
</m:Message>
</item>
<itemList>
还有一些<item>位置日志文件中不包含时间戳(并且没有FilteredPosition")的元素.这些项目可以忽略,不需要复制.
There are also some <item> elements that do not contain a timestamp (and no "FilteredPosition") inside the position log file. These items can be ignored and do not need to be copied.
我很感激 XSLT 代码方面的任何帮助,因为我对这个主题很陌生... :-/
I'd appreciate any help with the XSLT-code as I'm quite new to this topic... :-/
推荐答案
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:m="http://www.example.com/"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output indent="yes" method="xml"/>
<!-- The two source-documents. -->
<xsl:variable name="doc1" select="doc('log1.xml')"/>
<xsl:variable name="doc2" select="doc('log2.xml')"/>
<!-- Timezone adjustment -->
<xsl:variable name="timezoneAdjustment" select="1"/>
<!-- Root template to start the transformation. -->
<xsl:template match="/">
<!-- Transform and collect all the elements -->
<xsl:variable name="data" as="node()*">
<xsl:apply-templates select="$doc1/itemList/item"/>
<xsl:apply-templates select="$doc2/messageList/Message"/>
</xsl:variable>
<!-- Sort by the timestamp, and discard the wrapper. -->
<itemList>
<xsl:for-each select="$data">
<xsl:sort select="@timestamp" data-type="number"/>
<xsl:copy-of select="item"/>
</xsl:for-each>
</itemList>
</xsl:template>
<!--
Template to transform <item> elements in the first format.
It just parses the date, and adds a wrapper with the timestamp.
-->
<xsl:template match="item[date]">
<xsl:variable name="dateTimeString" select="date" as="xs:string"/>
<xsl:variable name="datePart" select="substring-before($dateTimeString,' ')"/>
<xsl:variable name="day" select="xs:integer(substring-before($datePart,'.'))"/>
<xsl:variable name="month" select="xs:integer(substring-before(substring-after($datePart,'.'),'.'))"/>
<xsl:variable name="year" select="xs:integer(substring-after(substring-after($datePart,'.'),'.'))"/>
<xsl:variable name="timePart" select="substring-after($dateTimeString,' ')"/>
<xsl:variable name="reformatted" select="concat(format-number($year,'0000'),'-',format-number($month,'00'),'-',format-number($day,'00'),'T',$timePart)"/>
<xsl:variable name="timestamp" select="( xs:dateTime($reformatted) - xs:dateTime('1970-01-01T00:00:00') - $timezoneAdjustment * xs:dayTimeDuration('PT1H') ) div xs:dayTimeDuration('PT0.001S')"/>
<wrapper timestamp="{$timestamp}">
<xsl:copy-of select="self::*"/>
</wrapper>
</xsl:template>
<!--
Template to transform <Message> elements in the second log format.
It generates an item with the date, and wraps it with the timestamp.
-->
<xsl:template match="Message[originator/originatorPosition/timeStamp]">
<xsl:variable name="timestamp" select="originator/originatorPosition/timeStamp" as="xs:integer"/>
<xsl:variable name="date" select="xs:dateTime('1970-01-01T00:00:00') + $timezoneAdjustment * xs:dayTimeDuration('PT1H') + $timestamp * xs:dayTimeDuration('PT0.001S')"/>
<wrapper timestamp="{$timestamp}">
<item>
<date>
<xsl:value-of select="format-dateTime($date,'[D01].[M01].[Y0001] [H01]:[m01]:[s01].[f001]')"/>
</date>
<m:Message type="recieved">
<xsl:copy-of select="*"/>
</m:Message>
</item>
</wrapper>
</xsl:template>
</xsl:stylesheet>
我为消息添加了一个时区调整变量.
I added a variable for timezone adjustment for Messages.
修复了属性名称,因此项目将正确排序.
Fixed the attribute names, so the items will sort correctly.
这篇关于XSLT:合并具有不同结构和时间表示的两个日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!