ValueError错误:不支持或无效的CSS选择器:"单元4英寸蟒蛇 [英] ValueError: Unsupported or invalid CSS selector: "unit-4" python

查看:3257
本文介绍了ValueError错误:不支持或无效的CSS选择器:"单元4英寸蟒蛇的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是尝试网络使用python,Beautifulsoup,为了获取来自购物网站的产品的网址刮。

I just tried web scraping using python, Beautifulsoup, in order to fetch a url of a product from the shopping site.

在这里,我提供我的简单code:

Here I provide my simple code:

import requests
from bs4 import BeautifulSoup

root_url = 'http://www.flipkart.com'
index_url = root_url + '/tablets'

def get_item_url():
    response = requests.get(index_url)
    soup = BeautifulSoup(response.text)
    return [a.attrs.get('href') for a in soup.select('div.product-unit unit-    4 browse-product-section a[href^=/digiflip-pro-et701-tablet]') 


print(get_item_url())

通过运行这个程序产生这样的错误:

By run this program generate an error like this:

File "C:\Python27\lib\site-packages\bs4\element.py", line 1300, in select
'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "unit-4"

我怎样才能解决这个问题?

How can I solve the this error?

推荐答案

看着美丽的汤文档,我看到把属性之间的空格选择()搜索标签 的其它标签下方。所以,你的选择()正在寻找名为属性单元4 某处之下 DIV。产品单位,然后它寻找浏览产品截面单元4 。有没有这样的属性present,所以它返回一个错误。

Looking at the Beautiful Soup documentation, I see that putting spaces between attributes in select() searches for tags beneath other tags. So your select() is looking for an attribute called unit-4 somewhere beneath div.product-unit, and then it's looking for browse-product-section underneath unit-4. There is no such attribute present, so it returns an error.

选择()是是网页上的present HTML类的实际名称,所以你必须要追加一个<属性code>。来他们每个人找到他们。不过,我觉得你真正需要的是更多的东西,如:

The attributes in your select() are actually the names of HTML classes that are present on that web page, so you'd have to append a . to each of them to find them. However, I think what you're really looking for is something more like:

return [a.attrs.get('href') for a in soup.select('div.product-unit a[href^=/digiflip-pro-et701-tablet]')]

这看起来对于股利和回报下一个href

which looks for an href underneath that div and returns

['/digiflip-pro-et701-tablet/p/itme27y5v2ws5cfm?pid=TABDWMDPGHPNYND7', '/digiflip-pro-et701-tablet/p/itme27y5v2ws5cfm?pid=TABDWMDPGHPNYND7', '/digiflip-pro-et701-tablet/p/itme27y5v2ws5cfm?pid=TABDWMDPGHPNYND7&ref=70ca8997-80a5-412d-9b94-6d5fb55f1277', '/digiflip-pro-et701-tablet/p/itme27y5v2ws5cfm?pid=TABDWGBMYSEMWHUY&ref=70ca8997-80a5-412d-9b94-6d5fb55f1277']

顺便说一下,你也可以替换 div.product单元单元4 ,甚至做选择('一[^ HREF = / digiflip-PRO-et701平板电脑]'),你会得到相同的输出。

Incidentally, you could also replace div.product-unit with unit-4, or even do select('a[href^=/digiflip-pro-et701-tablet]') by itself, and you'd get the same output.

这篇关于ValueError错误:不支持或无效的CSS选择器:&QUOT;单元4英寸蟒蛇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆