查找鉴于命名空间属性的所有元素 [英] Find All Elements Given Namespaced Attribute
问题描述
如果我有这样的事情:
<p>blah</p>
<p foo:bar="something">blah</p>
<p foo:xxx="something">blah</p>
我将如何得到beautifulsoup与foo的命名空间的属性选择元素?
How would I get beautifulsoup to select elements with an attribute of the foo namespace?
例如。我想返回的第二和第三p元素。
E.g. I would like the 2nd and 3rd p elements returned.
推荐答案
BeautifulSoup(包括版本3和4)不出现治疗namespace- preFIX作为什么特别。它只是把寿namespace- preFIX和命名空间属性,因为这恰好有其名称中的冒号的属性。
BeautifulSoup (both version 3 and 4) does not appear to treat the namespace-prefix as anything special. It just treats tho namespace-prefix and namespaced attribute as an attribute that happens to have a colon in its name.
因此,要找到为&LT; P&GT;
与在富
命名空间属性的元素,你只需要循环通过所有的属性键,如果检查attr.startswith('富')
:
So to find as <p>
elements with attributes in the foo
namespace, you just have to loop through all the attribute keys and check if attr.startswith('foo')
:
import BeautifulSoup as bs
content = '''\
<p>blah</p>
<p foo:bar="something">blah</p>
<p foo:xxx="something">blah</p>'''
soup = bs.BeautifulSoup(content)
for p in soup.find_all('p'):
for attr in p.attrs.keys():
if attr.startswith('foo'):
print(p)
break
收益
<p foo:bar="something">blah</p>
<p foo:xxx="something">blah</p>
使用 LXML 可以通过XPath,它确实有被命名搜索属性语法的支持搜索:
With lxml you can search by XPath, which does have syntax support for searching for attributes by namespace:
import lxml.etree as ET
content = '''\
<root xmlns:foo="bar">
<p>blah</p>
<p foo:bar="something">blah</p>
<p foo:xxx="something">blah</p></root>'''
root = ET.XML(content)
for p in root.xpath('p[@foo:*]', namespaces={'foo':'bar'}):
print(ET.tostring(p))
收益
<p xmlns:foo="bar" foo:bar="something">blah</p>
<p xmlns:foo="bar" foo:xxx="something">blah</p>
这篇关于查找鉴于命名空间属性的所有元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!