在lxml中使用cssselect的XHTML命名空间问题 [英] XHTML namespace issues with cssselect in lxml

查看:168
本文介绍了在lxml中使用cssselect的XHTML命名空间问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有问题使用cssselect与XHTML(或XML与命名空间)。虽然文档说如何在csselect中使用命名空间我不明白它: cssselect命名空间

I have problems using cssselect with a XHTML (or XML with namespace). Although the documentation says how to use namespace in csselect I do not understand it: cssselect namespaces

我的输入XHTML字符串:

My Input XHTML string:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>Teststylesheet</title>
  <style type="text/css">
  /*<![CDATA[*/
  ol{margin:0;padding:0}
  /*]]>*/
  </style>
</head>
<body>
</body>
</html>

我的Python脚本:

My Python Script:

parser = etree.XMLParser()    
tree = etree.fromstring(xhtmlstring, parser).getroottree()
for style in CSSSelector("style")(tree):
  print "HAVE CSS!"

python脚本不打印任何有CSS!。使用 etree.HTMLParser 而不是 etree.XMLParser 工作,但我真的想使用XMLParser并保留一切(命名空间,结构)的XHTML。

The python script does not print any Have CSS!. Using the etree.HTMLParser instead of etree.XMLParser works but I really want to use the XMLParser and keep everything (namespace, structure) of the XHTML.

任何人都可以帮我解决这个命名空间的问题吗?

Can anybody help me with this namespace problem?

推荐答案

cssselect.CSSSelector (版本2.0)显示如何使用命名空间:

The doc string for cssselect.CSSSelector (version 2.0) shows how to use namespaces:

class CSSSelector(etree.XPath):
    """ ...
    To use CSS namespaces, you need to pass a prefix-to-namespace
    mapping as ``namespaces`` keyword argument::

        >>> rdfns = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
        >>> select_ns = cssselect.CSSSelector('root > rdf|Description',
        ...                                   namespaces={'rdf': rdfns})

        >>> rdf = etree.XML((
        ...     '<root xmlns:rdf="%s">'
        ...       '<rdf:Description>blah</rdf:Description>'
        ...     '</root>') % rdfns)
        >>> [(el.tag, el.text) for el in select_ns(rdf)]
        [('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}Description', 'blah')]
    """

您的 cssselect.CSSSelector 版本没有命名空间参数,那么您的lxml版本可能需要升级。

If you've tried this but your version of cssselect.CSSSelector does not have a namespaces parameter, then your version of lxml may need to be upgraded.

这篇关于在lxml中使用cssselect的XHTML命名空间问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆