如何使用lmxl从KML获取元素值 [英] How to obtain Element values from a KML by using lmxl

查看:115
本文介绍了如何使用lmxl从KML获取元素值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题与在这里发现的问题非常相似:

如何从KML/XML中提取数据?

上述问题的答案是使用Nokogiri修复格式.

我想知道是否有一种方法可以解决类似的问题而不先固定格式.

如何获取dict的值,以便可以从下面的Element SimpleData中获取"FM2"和"FM3"?

这是我的kml:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
    <name>Test.kml</name>
    <open>1</open>
    <Schema name="test" id="S_test_SSSSSIIIDSDDDDDISSSDSSSDD">
        <SimpleField type="string" name="ID"> <displayName>&lt;b&gt;ID&lt;/b&gt;</displayName>
        </SimpleField>
        <SimpleField type="string" name="cname"><displayName>&lt;b&gt;cname&lt;/b&gt;</displayName>
        </SimpleField>
    </Schema>
    <Style id="falseColor01">
        <BalloonStyle>
            <text><![CDATA[<table border="0"><tr> 
            <td>b>ID</b>/td>td>$[test/ID]</td></tr>
            <tr><td><b>cname</b></td><td>$[test/cname]</td></tr>
            </table>]]></text>
        </BalloonStyle>
        <LineStyle>
            <color>ffffff00</color>
            <width>3</width>
        </LineStyle>
        <PolyStyle>
            <color>ffffff00</color>
            <colorMode>random</colorMode>
            <fill>0</fill>
        </PolyStyle>
    </Style>
    <StyleMap id="falseColor0">
        <Pair>
            <key>normal</key>
            <styleUrl>#falseColor00</styleUrl>
        </Pair>
        <Pair>
            <key>highlight</key>
            <styleUrl>#falseColor01</styleUrl>
        </Pair>
    </StyleMap>
    <Style id="falseColor00">
      <BalloonStyle>   
      </BalloonStyle>
        <LineStyle>
            <color>ffffff00</color>
            <width>3</width>
        </LineStyle>
        <PolyStyle>
            <color>ffffff00</color>
            <colorMode>random</colorMode>
            <fill>0</fill>
        </PolyStyle>
    </Style>
    <Folder id="layer 0">
        <name>Test_1</name>
        <open>1</open>
        <Placemark>
            <styleUrl>#falseColor0</styleUrl>
            <ExtendedData>
                <SchemaData schemaUrl="#S_test_SSSSSIIIDSDDDDDISSSDSSSDD">
                    <SimpleData name="ID">FM2</SimpleData>
                    <SimpleData name="cname">FM2</SimpleData>
                </SchemaData>
            </ExtendedData>
            <Polygon>
                <outerBoundaryIs>
                    <LinearRing>
                        <coordinates>150.889999,-32.17281600000001,0 
                        </coordinates>
                    </LinearRing>
                </outerBoundaryIs>
            </Polygon>
        </Placemark>
        <Placemark>
            <styleUrl>#falseColor0</styleUrl>
            <ExtendedData>
                <SchemaData schemaUrl="#S_test_SSSSSIIIDSDDDDDISSSDSSSDD">
                    <SimpleData name="ID">FM3</SimpleData>
                    <SimpleData name="cname">FM3</SimpleData>
                </SchemaData>
            </ExtendedData>
            <Polygon>
                <outerBoundaryIs>
                    <LinearRing>
                        <coordinates>150.90104,-32.15662800000001,0
                        </coordinates>
                    </LinearRing>
                </outerBoundaryIs>
            </Polygon>
        </Placemark>
    </Folder>
</Document>
</kml>

我的目的是从元素"ID"获取元素值,即"FM2".

我正在尝试使用lxml etree.我的代码是:

tree  = ET.parse(kml_file)
root = tree.getroot()

for Document in root:
    for Folder in Document:
        for Placemark in Folder:
            for ExtendedData in Placemark:
                for SchemaData in ExtendedData:
                    for SimpleData in SchemaData:
                        print(SimpleData.attrib)

,输出为: {'name':'ID'} {'name':'cname'}

如何获取dict的值,以便获得"FM2"和"FM3"?

我花了数小时试图解决问题.任何帮助将不胜感激.

解决方案

您遇到的问题之一是,当您执行for x in y时,您正在迭代当前元素的所有子元素.

因此,当您执行此操作时:

 for Folder in Document:
    ...
 

您不仅要遍历Folder元素,还需要对元素进行遍历.您还要遍历nameopenSchemaStyleStyleMap(目前不包括名称空间).

通过测试name属性值然后返回元素文本,您仍然可以得到所需的内容.

 for Document in root:
    for Folder in Document:
        for Placemark in Folder:
            for ExtendedData in Placemark:
                for SchemaData in ExtendedData:
                    for SimpleData in SchemaData:
                        if SimpleData.get("name") == "ID":
                            print(SimpleData.text)
 

但我不推荐它.

请改为考虑将 XPath 1.0 与lxml的谓词来测试属性值. >

乍一看,您会认为具有name属性值"ID"的所有SimpleData元素的XPath将是:

 /kml/Document/Folder/Placemark/ExtendedData/SchemaData/SimpleData[@name='ID']
 

,但事实并非如此.如果您发现根(kml)元素上有一个xmlns="http://www.opengis.net/kml/2.2".这意味着该元素及其所有后代元素都位于默认名称空间http://www.opengis.net/kml/2.2中(除非在这些元素上另行声明).

举例说明,如果您在for Folder in Document循环中添加了print(f"In Folder element \"{Folder.tag}\"..."),则会看到:

 In Folder element "{http://www.opengis.net/kml/2.2}name"...
In Folder element "{http://www.opengis.net/kml/2.2}open"...
In Folder element "{http://www.opengis.net/kml/2.2}Schema"...
In Folder element "{http://www.opengis.net/kml/2.2}Style"...
In Folder element "{http://www.opengis.net/kml/2.2}StyleMap"...
In Folder element "{http://www.opengis.net/kml/2.2}Style"...
In Folder element "{http://www.opengis.net/kml/2.2}Folder"...
 

有几种方法可以处理 lxml中的命名空间,但是我宁愿在字典中声明它们并使用namespaces参数传递它们.

这是一个完整的例子...

 from lxml import etree

ns = {"kml": "http://www.opengis.net/kml/2.2"}

tree = etree.parse("test.kml")

for simple_data in tree.xpath("/kml:kml/kml:Document/kml:Folder/kml:Placemark/kml:ExtendedData/kml:SchemaData/kml:SimpleData[@name='ID']", namespaces=ns):
    print(simple_data.text)
 

打印输出...

 FM2
FM3
 

My problem is very similar to the one found here:

How to pull data from KML/XML?

The answer to the above question is to use Nokogiri to fix the format.

I wonder if there is a way to solve a similar problem without fixing the format first.

How can I get the values of the dict, so that I can get 'FM2' and 'FM3' from the Element SimpleData below?

Here is my kml:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
    <name>Test.kml</name>
    <open>1</open>
    <Schema name="test" id="S_test_SSSSSIIIDSDDDDDISSSDSSSDD">
        <SimpleField type="string" name="ID"> <displayName>&lt;b&gt;ID&lt;/b&gt;</displayName>
        </SimpleField>
        <SimpleField type="string" name="cname"><displayName>&lt;b&gt;cname&lt;/b&gt;</displayName>
        </SimpleField>
    </Schema>
    <Style id="falseColor01">
        <BalloonStyle>
            <text><![CDATA[<table border="0"><tr> 
            <td>b>ID</b>/td>td>$[test/ID]</td></tr>
            <tr><td><b>cname</b></td><td>$[test/cname]</td></tr>
            </table>]]></text>
        </BalloonStyle>
        <LineStyle>
            <color>ffffff00</color>
            <width>3</width>
        </LineStyle>
        <PolyStyle>
            <color>ffffff00</color>
            <colorMode>random</colorMode>
            <fill>0</fill>
        </PolyStyle>
    </Style>
    <StyleMap id="falseColor0">
        <Pair>
            <key>normal</key>
            <styleUrl>#falseColor00</styleUrl>
        </Pair>
        <Pair>
            <key>highlight</key>
            <styleUrl>#falseColor01</styleUrl>
        </Pair>
    </StyleMap>
    <Style id="falseColor00">
      <BalloonStyle>   
      </BalloonStyle>
        <LineStyle>
            <color>ffffff00</color>
            <width>3</width>
        </LineStyle>
        <PolyStyle>
            <color>ffffff00</color>
            <colorMode>random</colorMode>
            <fill>0</fill>
        </PolyStyle>
    </Style>
    <Folder id="layer 0">
        <name>Test_1</name>
        <open>1</open>
        <Placemark>
            <styleUrl>#falseColor0</styleUrl>
            <ExtendedData>
                <SchemaData schemaUrl="#S_test_SSSSSIIIDSDDDDDISSSDSSSDD">
                    <SimpleData name="ID">FM2</SimpleData>
                    <SimpleData name="cname">FM2</SimpleData>
                </SchemaData>
            </ExtendedData>
            <Polygon>
                <outerBoundaryIs>
                    <LinearRing>
                        <coordinates>150.889999,-32.17281600000001,0 
                        </coordinates>
                    </LinearRing>
                </outerBoundaryIs>
            </Polygon>
        </Placemark>
        <Placemark>
            <styleUrl>#falseColor0</styleUrl>
            <ExtendedData>
                <SchemaData schemaUrl="#S_test_SSSSSIIIDSDDDDDISSSDSSSDD">
                    <SimpleData name="ID">FM3</SimpleData>
                    <SimpleData name="cname">FM3</SimpleData>
                </SchemaData>
            </ExtendedData>
            <Polygon>
                <outerBoundaryIs>
                    <LinearRing>
                        <coordinates>150.90104,-32.15662800000001,0
                        </coordinates>
                    </LinearRing>
                </outerBoundaryIs>
            </Polygon>
        </Placemark>
    </Folder>
</Document>
</kml>

My aim is to obtain the Element values, i.e. 'FM2' from the Elements 'ID'.

I'm trying to use lxml etree. My code is:

tree  = ET.parse(kml_file)
root = tree.getroot()

for Document in root:
    for Folder in Document:
        for Placemark in Folder:
            for ExtendedData in Placemark:
                for SchemaData in ExtendedData:
                    for SimpleData in SchemaData:
                        print(SimpleData.attrib)

and the output is: {'name': 'ID'} {'name': 'cname'}

How can I get the values of the dict, so that I can get 'FM2' and 'FM3'?

I have spent hours in trying to solve the problem. Any help would be much appreciated.

解决方案

One of the issues you're having is that when you do for x in y you're iterating all children of the current element.

So when you do this:

for Folder in Document:
    ...

you're not just iterating over Folder elements; you're also iterating over name, open, Schema, Style, and StyleMap (excluded the namespace for now).

You could still get what you want by testing the name attribute value and then returning the elements text...

for Document in root:
    for Folder in Document:
        for Placemark in Folder:
            for ExtendedData in Placemark:
                for SchemaData in ExtendedData:
                    for SimpleData in SchemaData:
                        if SimpleData.get("name") == "ID":
                            print(SimpleData.text)

but I would not recommend it.

Instead consider using XPath 1.0 with lxml's xpath() function.

This will allow you to directly target the elements you're interested in.

For this example I'm going to use the full path instead of the // abbreviated syntax. I'll also use a predicate to test the attribute value.

At first glance you would think that the XPath to all of the SimpleData elements with a name attribute value of "ID" would be:

/kml/Document/Folder/Placemark/ExtendedData/SchemaData/SimpleData[@name='ID']

but this is not the case. If you notice there is an xmlns="http://www.opengis.net/kml/2.2" on the root (kml) element. This means that that element and all of its decendant elements are in the default namespace http://www.opengis.net/kml/2.2 (unless declared otherwise on those elements).

To illustrate, if you added a print(f"In Folder element \"{Folder.tag}\"...") to your for Folder in Document loop, you'd see:

In Folder element "{http://www.opengis.net/kml/2.2}name"...
In Folder element "{http://www.opengis.net/kml/2.2}open"...
In Folder element "{http://www.opengis.net/kml/2.2}Schema"...
In Folder element "{http://www.opengis.net/kml/2.2}Style"...
In Folder element "{http://www.opengis.net/kml/2.2}StyleMap"...
In Folder element "{http://www.opengis.net/kml/2.2}Style"...
In Folder element "{http://www.opengis.net/kml/2.2}Folder"...

There are a few ways to handle namespaces in lxml, but I prefer to declare them in a dictionary and pass them with the namespaces argument.

Here's a full example...

from lxml import etree

ns = {"kml": "http://www.opengis.net/kml/2.2"}

tree = etree.parse("test.kml")

for simple_data in tree.xpath("/kml:kml/kml:Document/kml:Folder/kml:Placemark/kml:ExtendedData/kml:SchemaData/kml:SimpleData[@name='ID']", namespaces=ns):
    print(simple_data.text)

Print Output...

FM2
FM3

这篇关于如何使用lmxl从KML获取元素值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆