在BeautifulSoup CSS选择一个冒号处理 [英] Dealing with a colon in BeautifulSoup CSS selectors

查看:4527
本文介绍了在BeautifulSoup CSS选择一个冒号处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

输入HTML:

<div style="display: flex">
    <div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
    <div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
    <div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
</div>

所需的输出:所有 DIV 元素正是在&LT; D​​IV的风格=显示:弯曲&GT;

我试图找到父 DIV CSS选择器

div[style="display: flex"]

这将引发一个错误:

>>> soup.select('div[style="display: flex"]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
    'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.

看起来 BeautifulSoup 试图间preT冒号作为一个伪类的语法。

It looks like BeautifulSoup tries to interpret the colon as a pseudo-class syntax.

我试着按照建议在<建议href=\"http://stackoverflow.com/questions/122238/handling-a-colon-in-an-element-id-in-a-css-selector\">Handling在一个CSS选择器一个元素的ID冒号,但它仍然抛出错误:

I've tried to follow the advices suggested at Handling a colon in an element ID in a CSS selector, but it still throws errors:

>>> soup.select('div[style="display\: flex"]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
    'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
>>> soup.select('div[style="display\3A flex"]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1426, in select
    'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "div[style="displayA"

问题:

什么是使用/逃避属性值一个冒号 BeautifulSoup CSS选择正确的方法是什么?

What is the correct way to use/escape a colon in attribute values in BeautifulSoup CSS selectors?

请注意,我可以用部分属性匹配的变通办法:

Note that I can workaround it with a partial attribute match:

soup.select("div[style$=flex]")

或者,以<一href=\"http://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all\"><$c$c>find_all():

soup.find_all("div", style="display: flex")

另外请注意,据我所知,使用风格来定位元素是远远不是一个很好的定位技术,但问题本身,就是要通用和所提供的HTML是只是一个例子。

Also note that I understand that using style to locate elements is far from being a good location technique, but the question itself is meant to be generic and the provided HTML is just an example.

推荐答案

更新:的问题已经得到解决在BeautifulSoup 4.5.0,如果需要升级:

Update: the issue is now fixed in BeautifulSoup 4.5.0, upgrade if needed:

pip install --upgrade beautifulsoup4


旧的答案:


Old answer:

创建一个问题在 BeautifulSoup 问题跟踪:

Created an issue at the BeautifulSoup issue tracker:

  • Dealing with a colon in BeautifulSoup CSS selectors

将更新的答案在启动板问题的任何更新的情况下。

Will update the answer in case of any updates in the launchpad issue.

这篇关于在BeautifulSoup CSS选择一个冒号处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆