蟒蛇ALEXA结果分析与lxml.etree [英] python alexa result parsing with lxml.etree
问题描述
我使用从AWS Alexa的API,但我发现很难在解析结果来获得我想要的
I am using alexa api from aws but I find difficult in parse the result to get what I want
Alexa的API返回一个对象树<输入lxml.etree._ElementTree'>
alexa api return an object tree <type 'lxml.etree._ElementTree'>
我用这个code打印树
I use this code to print the tree
from lxml import etree
root = tree.getroot()
print etree.tostring(root)
我得到以下
<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"><aws:OperationRequest><aws:RequestId>ccf3f263-ab76-ab63-db99-244666044e85</aws:RequestId></aws:OperationRequest><aws:UrlInfoResult><aws:Alexa>
<aws:ContentData>
<aws:DataUrl type="canonical">google.com/</aws:DataUrl>
<aws:SiteData>
<aws:Title>Google</aws:Title>
<aws:Description>Enables users to search the world's information, including webpages, images, and videos. Offers unique features and search technology.</aws:Description>
<aws:OnlineSince>15-Sep-1997</aws:OnlineSince>
</aws:SiteData>
<aws:LinksInCount>3453627</aws:LinksInCount>
</aws:ContentData>
<aws:TrafficData>
<aws:DataUrl type="canonical">google.com/</aws:DataUrl>
<aws:Rank>1</aws:Rank>
</aws:TrafficData>
</aws:Alexa></aws:UrlInfoResult><aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><aws:StatusCode>Success</aws:StatusCode></aws:ResponseStatus></aws:Response></aws:UrlInfoResponse>
我用 root.find('LinksInCount')。文字
来获取元素的值,但它不能正常工作。
I use root.find('LinksInCount').text
to get value of element but it does not work.
我想知道如何获得文字 3453627
的 AWS:LinksInCount
I want to know how to get the text 3453627
of aws:LinksInCount
推荐答案
您碰到两个挑战:
- 在使用XML命名空间
- 在两个名称空间共享相同的命名空间preFIX
您看到AWS:
preFIX,但用于两个不同的命名空间:
You see "aws:"
prefix, but it is used for two different namespaces:
xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"
xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"
在XML中使用相同的命名空间preFIX是完全合法的。规则是,后来的一个有效。
Using the same namespace prefix in XML is completely legal. The rule is, the later one is valid.
xmlstr = """
<?xml version="1.0"?>
<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11">
<aws:OperationRequest>
<aws:RequestId>ccf3f263-ab76-ab63-db99-244666044e85</aws:RequestId>
</aws:OperationRequest>
<aws:UrlInfoResult>
<aws:Alexa>
<aws:ContentData>
<aws:DataUrl type="canonical">google.com/</aws:DataUrl>
<aws:SiteData>
<aws:Title>Google</aws:Title>
<aws:Description>Enables users to search the world's information, including webpages, images, and videos. Offers unique features and search technology.</aws:Description>
<aws:OnlineSince>15-Sep-1997</aws:OnlineSince>
</aws:SiteData>
<aws:LinksInCount>3453627</aws:LinksInCount>
</aws:ContentData>
<aws:TrafficData>
<aws:DataUrl type="canonical">google.com/</aws:DataUrl>
<aws:Rank>1</aws:Rank>
</aws:TrafficData>
</aws:Alexa>
</aws:UrlInfoResult>
<aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:StatusCode>Success</aws:StatusCode>
</aws:ResponseStatus>
</aws:Response>
</aws:UrlInfoResponse>
"""
下一个挑战是,如何寻找命名空间的元素。
Next challenge is, how to search for namespaced elements.
我$ P $使用PFER 的XPath
,并为它,你可以使用任何空间,你喜欢在XPath的前pression,但你必须告诉的XPath
叫你所说的那些prefixes意思。这是由命名空间做
词典:
I prefer using xpath
, and for it, you can use whatever namespace you like in the xpath expression, but you have to tell the xpath
call what you meant by those prefixes. This is done by namespaces
dictionary:
from lxml import etree
doc = etree.fromstring(xmlstr.strip())
namespaces = {"aws": "http://awis.amazonaws.com/doc/2005-07-11"}
texts = doc.xpath("//aws:LinksInCount/text()", namespaces=namespaces)
print texts[0]
这篇关于蟒蛇ALEXA结果分析与lxml.etree的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!