xslt-processor仅返回请求的/matched标签的一小部分 [英] xslt-processor gives back only a little subset of the requested /matched tags

查看:85
本文介绍了xslt-processor仅返回请求的/matched标签的一小部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的xml文件-源自地理信息学领域.我是从德国子站点或OpenStreetMap-Project:Geograpical-Engineering-站点获取特定区域的OpenStreetMap的每周快照的:我从这里获取了germany.osm.bz2

i have a extremely large xml-file - which is derived from the field of geo informatics. i got it from a German subsite or the OpenStreetMap-Project: the Geograpical-Engineering-site that deilvers a weekly snapshot of OpenStreetMap of a certain area: i took the germany.osm.bz2 from here http://ftp5.gwdg.de/pub/misc/openstreetmap/download.geofabrik.de/

对于使用xslt进行一些测试,我想发出请求以找出某些实体-让我们以餐厅为例.我们想找出该地区所有的餐馆.

For doing some tests with xslt i want to run a request to find out certain entity - let us take for example the restaurants. we want to find out all the restaurants in the area.

现在我们可以直接在下载的bz2压缩文件上运行该文件-例如,如果我们使用以下代码:

now we can run that directly on the bz2 compressed file, that we downloaded - for example if we use the following code:

bzcat germany.osm.bz2 | xsltproc restaurants.xslt - > restaurants,csv

好吧,我用xml_split分割了文件-这是CPAN的一个很棒的perl模块.

well i splitted the file with xml_split -which is a great perl-module from CPAN.

问题:使用以下xslt处理器,我只会得到不好的结果-解析后的文件解析得不够充分,当我在xml文件上运行代码时,我只会得到少量信息.看到xslt处理器-以及下面-如果要检查它,我运行并解析的文件中有少量数据块-仅获取少量数据集-请注意,这是一个分割的文件

The problem: with the following xslt-processor i get only bad results - the parsed files werent not parsed enough i only get a minor set of informations when i run the code on a xml-file. see the xslt-processor - and below - a litte data-chunk out of the file i run and parse if you want to check it - just get the little dataset - note it is a splitted file

在这里您可以获取它: https://rapidshare. com/#!download | 643p12 | 2523227518 | germany-001.xml | 100000

here you can get it: https://rapidshare.com/#!download|643p12|2523227518|germany-001.xml|100000

注意:因此,请参见重要内容:xmlns:xml_split="http://xmltwig.com/xml_split" 这是这里:

Note: see therefore the important lines: xmlns:xml_split="http://xmltwig.com/xml_split" and this one here:

 <xsl:for-each select="xml_split:root/node/tag[@k='amenity' and @v='restaurant']">

注意-您可以进行一些测试-并查看解析需要多长时间 时间xsltproc Restaurants.xslt germany-001.xml> Restaurants-001.csv

Note- you can run a little test - and see how long it takes to parse time xsltproc restaurants.xslt germany-001.xml > restaurants-001.csv

real    0m0.308s
user    0m0.283s
sys     0m0.022s

这里有xslt处理器,其中包含用于解析的代码-(称为atest3.xslt )

here we have the xslt-processor that contains the code for parsing - ( called atest3.xslt )

<xsl:stylesheet version = '1.0'
        xmlns="http://www.w3.org/1999/xhtml"
        xmlns:xml_split="http://xmltwig.com/xml_split"
        xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>

    <xsl:output method="text" encoding="UTF-8"/>
    <xsl:template match="/">

            <xsl:for-each select="xml_split:root/node/tag[@k='amenity' and @v='restaurant']">
            <xsl:value-of select="../@id"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="../@lat"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="../@lon"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:for-each select="../tag[@k='name']">
                <xsl:value-of select="@v"/>
            </xsl:for-each>
            <xsl:text>&#x0A;</xsl:text>
        <xsl:value-of select="./tag[@k = 'cuisine']/@v"/>
        <xsl:text>&#x09;</xsl:text>
        <xsl:value-of select="./tag[@k = 'wheelchair']/@v"/>
        <xsl:text>&#x09;</xsl:text>
        <xsl:value-of select="./tag[@k = 'website']/@v"/>
        <xsl:text>&#x09;</xsl:text>
        <xsl:value-of select="./tag[@k = 'addr:country']/@v"/>
        <xsl:text>&#x09;</xsl:text>
        <xsl:value-of select="./tag[@k = 'addr:city']/@v"/>
        <xsl:text>&#x09;</xsl:text>        
        <xsl:value-of select="./tag[@k = 'addr:street']/@v"/>
        <xsl:text>&#x09;</xsl:text>
        <xsl:value-of select="./tag[@k = 'addr:housenumber']/@v"/>
        <xsl:text>&#x0A;</xsl:text>
    </xsl:for-each>
    </xsl:template>

</xsl:stylesheet>

在下面,我们从解析的xml文件中提取了一个数据块:查看

and here below we have a data-chunk out of the xml-file that we have parsed: see it

<node id="52768810" lat="48.2044749" lon="11.3249434" version="7" changeset="9490517" user="wheelmap_visitor" uid="290680" timestamp="2011-10-07T20:24:46Z">
    <tag k="addr:city" v="Olching" />
    <tag k="addr:country" v="DE" />
    <tag k="addr:housenumber" v="72" />
    <tag k="addr:postcode" v="82140" />
    <tag k="addr:street" v="Hauptstraße" />
    <tag k="amenity" v="restaurant" />
    <tag k="cuisine" v="mexican" />
    <tag k="email" v="info@cantina-olching.de" />
    <tag k="name" v="La Cantina" />
    <tag k="opening_hours" v="Mo-Su 17:00-01:00" />
    <tag k="phone" v="+49 (8142) 444393" />
    <tag k="website" v="http://www.cantina-olching.com/" />
    <tag k="wheelchair" v="no" />
</node>

查看结果-请注意,其中缺少某些部分-.

see the results - note there are missing some parts - unfortunatly..

51923772    49.0812534  8.5637183   Zur Talschänke

52040576    49.4635433  12.4287292  Emil-Kemmer-Haus

52141326    49.4144243  12.4143153  Gasthaus Plecher

52623232    48.9293634  8.2722549   Korfu

52664989    49.0435133  8.3919370   Restaurant Zentrum

52754898    49.3243828  12.3618662  Gasthaus Irlbacher

52762875    49.0099641  8.2528132   Langasthof Stober

52765672    50.0082768  9.2139632   Wirtshaus im Frohnrad

52768810    48.2044749  11.3249434  La Cantina

52768816    48.2051698  11.3257964  Indian Palace

52768826    48.2073264  11.3276147  Dorfstub'n

52768830    48.2075968  11.3281055  Le Candele

52774284    49.0319471  8.2888353   Zum Anker

好吧,我得到结果有点问题-我已经尝试了很多,但是此刻我无法理解为什么我得到的输出很少-这与我在xslt-处理器中的标签完全相反-任何想法和提示将不胜感激

well it is somewhat a problem that i get the results - ive tried alot but at the moment i am glueless why i get the little output - that is totally contrary to the tags i have in the xslt -processor - any idea and hint will be greatly appreciatdd

顺便说一句:毕竟,我想运行大约5000个文件,这些文件是拆分的结果-随后,我想将所有结果收集到mysql数据库中...

btw: after all i want to run approx 5000 files that are the result of the split - and subsequently i want to collect all the results in a mysql-database...

在这里您可以获得原始文件: http://ftp5.gwdg.de/pub/misc/openstreetmap/download .geofabrik.de (germany.osm.bz2 2012年4月1日14:51 1.7G)

here you can get the original-file: http://ftp5.gwdg.de/pub/misc/openstreetmap/download.geofabrik.de ( germany.osm.bz2 01-Apr-2012 14:51 1.7G )

,这里是分割的一个: https://rapidshare.com/#!download|643p12|2523227518 | germany-001.xml | 100000

我必须重构男女同盟-所以问题-是-我怎样才能以一种有效的方式获得mysql结果?

i have to refactor the coed -so the question - is - how can i get the mysql-results on a efficient way?

* update: * thx这个线程的第一个答案我开始重构代码-但仍然缺少一些更好的结果.我必须重试一次..建议进行大量更改-我在xslt-parser上进行了快速遍历:在第一次进行重构的尝试中,我得到了一些有趣的结果.但是我会再试一次-我遍历所有的xslt处理器代码,如果发现错误,请仔细查看,最后我尝试重构所有的xslt文件. -非常欢迎使用任何指针,子类或代码段.零问候

*update:*thx to the first answer in this thread i startet to refactor the code - but still lack of some better results. i have to retry it again..lots of changes were suggested - i did a quick walktrough on the xslt-parser: with the first trial of refactoring i got some funny results. But i will try again - i go trough all the xslt-processor-code and have a closer look if i find the errors and finally i try to refactor all the xslt-file. - any pointers and subbestions or code-snippets are greatly wellcome. Greetings your zero

推荐答案

看起来您的./tag[@k = '???']/@v xpath应该是../tag[@k='???'],因为上下文节点是原始匹配的tag元素,而不是node元素

It looks like your ./tag[@k = '???']/@v xpath should be ../tag[@k='???'], because your context node is your original matching tag element, not the node element.

您应该考虑更改上下文节点,以使代码更清晰,并避免出现以下错误:

You should consider changing your context node to make this code clearer and avoid errors like this:

<xsl:for-each select="xml_split:root/node[tag[@k='amenity' and @v='restaurant']]">

然后,您可以使用select="tag/@id"tag[@k='country']/@v之类的XPath.

Then you can use XPaths like select="tag/@id" and tag[@k='country']/@v.

但是您应该考虑重构此代码以更好地使用template而不是for-each.

But you should consider refactoring this code to make better use of template instead of for-each.

这篇关于xslt-processor仅返回请求的/matched标签的一小部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆