如何从KML/XML中提取数据? [英] How to pull data from KML/XML?

查看:121
本文介绍了如何从KML/XML中提取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据已从KML文件转换为XML,我很好奇如何使用PHP或Ruby来获取诸如邻域名称和坐标之类的信息.我知道他们周围有这样的标签.

I have some data I converted to XML from a KML file and I was curious how to use PHP or Ruby to get back things like the neighborhood names and coordinates. I know when they have a tag around them like so.

<cities>
  <neighborhood>Gotham</neighborhood>
</cities>

但不幸的是,数据的格式为:

but the data is unfortunately formatted as:

<SimpleData name="neighborhd">Colgate Center</SimpleData>

代替

<neighborhd>Colgate Center</neighborhd>

这是KML来源:

如何使用PHP或Ruby从类似的东西中提取数据?我安装了一些Ruby gems来解析XML数据,但是XML只是我使用很少的东西.

How can I use PHP or Ruby to pull data from something like this? I installed some Ruby gems for parsing XML data but XML is just something I haven't worked with much.

推荐答案

您的XML无效,但Nokogiri会尝试对其进行修复.

Your XML is invalid, but Nokogiri will attempt to fix it up.

这里是检查无效的XML/XHTML/HTML的方法以及重写所需部分的方法.

Here's how to check for invalid XML/XHTML/HTML and how to rewrite the section you want.

这是设置:

require 'nokogiri'

doc = Nokogiri.XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
  <Document>
    <Schema name="Sample_Neighborhoods_Samples" id="Sample_Neighborhoods_Samples">
      <SimpleField type="int" name="nid"/>
      <SimpleField type="string" name="neighborhd"/>
      <SimpleField type="string" name="place"/>
      <SimpleField type="string" name="placecode"/>
      <SimpleField type="string" name="nbr_type"/>
      <SimpleField type="string" name="po_name"/>
      <SimpleField type="string" name="metro"/>
      <SimpleField type="string" name="country"/>
      <SimpleField type="string" name="state"/>
      <SimpleField type="string" name="statefips"/>
      <SimpleField type="string" name="county"/>
      <SimpleField type="string" name="countyfips"/>
      <SimpleField type="string" name="mcd"/>
      <SimpleField type="string" name="mcdfips"/>
      <SimpleField type="string" name="cbsa"/>
      <SimpleField type="string" name="cbsacode"/>
      <SimpleField type="string" name="cbsatype"/>
      <SimpleField type="double" name="cenlat"/>
      <SimpleField type="double" name="cenlon"/>
      <SimpleField type="int" name="color"/>
      <SimpleField type="string" name="ncs_code"/>
      <SimpleField type="string" name="release"/>
    </Schema>
    <Style id="KMLSTYLER_6">
      <LabelStyle>
        <scale>1.0</scale>
      </LabelStyle>
      <LineStyle>
        <colorMode>normal</colorMode>
      </LineStyle>
      <PolyStyle>
        <color>7f4080ff</color>
        <colorMode>random</colorMode>
      </PolyStyle>
    </Style>
    <name>Sample_Neighborhoods_NYC</name>
    <visibility>1</visibility>
    <Folder id="kml_ft_Sample_Neighborhoods_Samples">
      <name>Sample_Neighborhoods_Samples</name>
      <Folder id="kml_ft_Sample_Neighborhoods_Samples_Sample_Neighborhoods_NYC">
        <name>Sample_Neighborhoods_NYC</name>
        <Placemark id="kml_1">
          <name>Colgate Center</name>
          <Snippet> </Snippet>
          <styleUrl>#KMLSTYLER_6</styleUrl>
          <ExtendedData>
            <SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
              <SimpleData name="nid">7086</SimpleData>
              <SimpleData name="neighborhd">Colgate Center</SimpleData>
              <SimpleData name="place">Jersey City</SimpleData>
              <SimpleData name="placecode">36000</SimpleData>
              <SimpleData name="nbr_type">S</SimpleData>
              <SimpleData name="po_name">JERSEY CITY</SimpleData>
              <SimpleData name="metro">New York City, NY</SimpleData>
              <SimpleData name="country">USA</SimpleData>
              <SimpleData name="state">NJ</SimpleData>
              <SimpleData name="statefips">34</SimpleData>
              <SimpleData name="county">Hudson</SimpleData>
              <SimpleData name="countyfips">34017</SimpleData>
              <SimpleData name="mcd">Jersey City</SimpleData>
              <SimpleData name="mcdfips">36000</SimpleData>
              <SimpleData name="cbsa">New York-Northern New Jersey-Long Island, NY-NJ-PA</SimpleData>
              <SimpleData name="cbsacode">35620</SimpleData>
              <SimpleData name="cbsatype">Metro</SimpleData>
              <SimpleData name="cenlat">40.7145135000001</SimpleData>
              <SimpleData name="cenlon">-74.0343385</SimpleData>
              <SimpleData name="color">1</SimpleData>
              <SimpleData name="ncs_code">40910000</SimpleData>
              <SimpleData name="release">1.12.2</SimpleData>
            </SchemaData>
          </ExtendedData>
          <Polygon>
            <outerBoundaryIs>
              <LinearRing>
                <coordinates>-74.036628,40.712211,0 -74.0357779999999,40.7120810000001,0                     -74.035535,40.7122010000001,0 -74.0348299999999,40.71209,0 -74.034903,40.711804,0 -74.033761,40.7116560000001,0 -74.0334089999999,40.7121090000001,0 -74.032996,40.7141330000001,0 -74.0331899999999,40.7141790000001,0 -74.032656,40.7162500000001,0 -74.032231,40.716194,0 -74.032049,40.716908,0 -74.033871,40.7170370000001,0 -74.035629,40.7173710000001,0 -74.035669,40.7171650000001,0 -74.036009,40.715335,0 -74.036325,40.713625,0 -74.036482,40.7123580000001,0 -74.036628,40.712211,0 </coordinates>
              </LinearRing>
            </outerBoundaryIs>
          </Polygon>
        </Placemark>
        <Placemark id="kml_2">
          <name>Colgate Center</name>
          <Snippet> </Snippet>
          <ExtendedData>
EOT

这里是查看是否有错误的方法.任何时候errors不为空,您都会遇到问题.

Here's how to see if there are errors. Any time errors is not empty you have a problem.

puts doc.errors

这是在整个文档中查找SimpleData节点的一种方法.出于可读性原因,我更喜欢在XPath上使用CSS访问器.有时XPath更好,因为它在搜索时可以提供更好的粒度.您需要同时学习它们.

Here's one way to find the SimpleData nodes throughout a document. I prefer to use CSS accessors over XPath for readability reasons. Sometimes XPath is better because it allows better granularity when searching. You need to learn them both.

doc.search('ExtendedData SimpleData').each do |simple_data|
  node_name = simple_data['name']
  puts "<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name]
end

这是运行后的输出:

Premature end of data in tag ExtendedData line 87
Premature end of data in tag Placemark line 84
Premature end of data in tag Folder line 44
Premature end of data in tag Folder line 42
Premature end of data in tag Document line 3
Premature end of data in tag kml line 2
<nid>7086</nid>
<neighborhd>Colgate Center</neighborhd>
<place>Jersey City</place>
<placecode>36000</placecode>
<nbr_type>S</nbr_type>
<po_name>JERSEY CITY</po_name>
<metro>New York City, NY</metro>
<country>USA</country>
<state>NJ</state>
<statefips>34</statefips>
<county>Hudson</county>
<countyfips>34017</countyfips>
<mcd>Jersey City</mcd>
<mcdfips>36000</mcdfips>
<cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
<cbsacode>35620</cbsacode>
<cbsatype>Metro</cbsatype>
<cenlat>40.7145135000001</cenlat>
<cenlon>-74.0343385</cenlon>
<color>1</color>
<ncs_code>40910000</ncs_code>
<release>1.12.2</release>

我不打算修改DOM,但是很容易做到:

I'm not trying to modify the DOM, but it's easy to do:

doc.search('ExtendedData SimpleData').each do |simple_data|
  node_name = simple_data['name']
  simple_data.replace("<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name])
end

puts doc.to_xml

运行后,这是受影响的部分:

After running this is the affected section:

<ExtendedData>
  <SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
    <nid>7086</nid>
    <neighborhd>Colgate Center</neighborhd>
    <place>Jersey City</place>
    <placecode>36000</placecode>
    <nbr_type>S</nbr_type>
    <po_name>JERSEY CITY</po_name>
    <metro>New York City, NY</metro>
    <country>USA</country>
    <state>NJ</state>
    <statefips>34</statefips>
    <county>Hudson</county>
    <countyfips>34017</countyfips>
    <mcd>Jersey City</mcd>
    <mcdfips>36000</mcdfips>
    <cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
    <cbsacode>35620</cbsacode>
    <cbsatype>Metro</cbsatype>
    <cenlat>40.7145135000001</cenlat>
    <cenlon>-74.0343385</cenlon>
    <color>1</color>
    <ncs_code>40910000</ncs_code>
    <release>1.12.2</release>
  </SchemaData>
</ExtendedData>

这篇关于如何从KML/XML中提取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆