蟒蛇ALEXA结果分析与lxml.etree [英] python alexa result parsing with lxml.etree

查看:156
本文介绍了蟒蛇ALEXA结果分析与lxml.etree的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用从AWS Alexa的API,但我发现很难在解析结果来获得我想要的

I am using alexa api from aws but I find difficult in parse the result to get what I want

Alexa的API返回一个对象树<输入lxml.etree._ElementTree'>

alexa api return an object tree <type 'lxml.etree._ElementTree'>

我用这个code打印树

I use this code to print the tree

from lxml import etree
root = tree.getroot()
print etree.tostring(root)

我得到以下

<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"><aws:OperationRequest><aws:RequestId>ccf3f263-ab76-ab63-db99-244666044e85</aws:RequestId></aws:OperationRequest><aws:UrlInfoResult><aws:Alexa>

  <aws:ContentData>
    <aws:DataUrl type="canonical">google.com/</aws:DataUrl>
    <aws:SiteData>
      <aws:Title>Google</aws:Title>
      <aws:Description>Enables users to search the world's information, including webpages, images, and videos. Offers unique features and search technology.</aws:Description>
      <aws:OnlineSince>15-Sep-1997</aws:OnlineSince>
    </aws:SiteData>
    <aws:LinksInCount>3453627</aws:LinksInCount>
  </aws:ContentData>
  <aws:TrafficData>
    <aws:DataUrl type="canonical">google.com/</aws:DataUrl>
    <aws:Rank>1</aws:Rank>
  </aws:TrafficData>
</aws:Alexa></aws:UrlInfoResult><aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><aws:StatusCode>Success</aws:StatusCode></aws:ResponseStatus></aws:Response></aws:UrlInfoResponse>

我用 root.find('LinksInCount')。文字来获取元素的值,但它不能正常工作。

I use root.find('LinksInCount').text to get value of element but it does not work.

我想知道如何获得文字 3453627 AWS:L​​inksInCount

I want to know how to get the text 3453627 of aws:LinksInCount

推荐答案

您碰到两个挑战:

  • 在使用XML命名空间
  • 在两个名称空间共享相同的命名空间preFIX

您看到AWS: preFIX,但用于两个不同的命名空间:

You see "aws:" prefix, but it is used for two different namespaces:

xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"
xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"

在XML中使用相同的命名空间preFIX是完全合法的。规则是,后来的一个有效。

Using the same namespace prefix in XML is completely legal. The rule is, the later one is valid.

xmlstr = """
<?xml version="1.0"?>
<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
  <aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11">
    <aws:OperationRequest>
      <aws:RequestId>ccf3f263-ab76-ab63-db99-244666044e85</aws:RequestId>
    </aws:OperationRequest>
    <aws:UrlInfoResult>
      <aws:Alexa>
        <aws:ContentData>
          <aws:DataUrl type="canonical">google.com/</aws:DataUrl>
          <aws:SiteData>
            <aws:Title>Google</aws:Title>
            <aws:Description>Enables users to search the world's information, including webpages, images, and videos. Offers unique features and search technology.</aws:Description>
            <aws:OnlineSince>15-Sep-1997</aws:OnlineSince>
          </aws:SiteData>
          <aws:LinksInCount>3453627</aws:LinksInCount>
        </aws:ContentData>
        <aws:TrafficData>
          <aws:DataUrl type="canonical">google.com/</aws:DataUrl>
          <aws:Rank>1</aws:Rank>
        </aws:TrafficData>
      </aws:Alexa>
    </aws:UrlInfoResult>
    <aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
      <aws:StatusCode>Success</aws:StatusCode>
    </aws:ResponseStatus>
  </aws:Response>
</aws:UrlInfoResponse>
"""

下一个挑战是,如何寻找命名空间的元素。

Next challenge is, how to search for namespaced elements.

我$ P $使用PFER 的XPath ,并为它,你可以使用任何空间,你喜欢在XPath的前pression,但你必须告诉的XPath 叫你所说的那些prefixes意思。这是由命名空间做词典:

I prefer using xpath, and for it, you can use whatever namespace you like in the xpath expression, but you have to tell the xpath call what you meant by those prefixes. This is done by namespaces dictionary:

from lxml import etree
doc = etree.fromstring(xmlstr.strip())

namespaces = {"aws": "http://awis.amazonaws.com/doc/2005-07-11"}
texts = doc.xpath("//aws:LinksInCount/text()", namespaces=namespaces)
print texts[0]

这篇关于蟒蛇ALEXA结果分析与lxml.etree的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆