使用Find_All函数会返回意外的结果集 [英] Using Find_All function returns an unexpected result set
问题描述
我正在使用python 3.8.2和bs4 BeautifulSoup.我试图找到一个标记的所有实例,并在结果集中列出每个实例,每行一个.但是,返回的结果集包含的行数多于网站的原始内容.这是因为结果集的第一行包含标记的所有实例.接下来的行包含除第一个实例之外的所有实例,第三行包含除第一个和第二个实例之外的所有实例,依此类推,以此类推,并包含结果集的其余部分.
I am using python 3.8.2 and bs4 BeautifulSoup. I am trying to find all instances of a tag and have each one listed in the result set, one per row. However the result set that is returned contains more lines than the original scrape of the website. This is because the first row of the result set contains all instances of the tag. The following row contains all instances except the first instance, the third contains all instances except the first and the second and so on and so forth with the remainder of the result set.
这是代码:
from bs4 import BeautifulSoup
import requests
url = "https://www.sainsburys.co.uk/shop/gb/groceries/drinks/seeall"
html_content = requests.get(url, timeout=5)
soup = BeautifulSoup(html_content.text)
test_1 = soup.find('ul',{"class": "productLister gridView"})
test = test_1.find_all("li", attrs={"class": "gridItem"})
如何获取它,以使<li class: "gridItem">
的每个实例仅被单独列出,每行一个.
How do I get it so that each instance of <li class: "gridItem">
is only listed by itself, one per row.
谢谢
推荐答案
该网站加载了JavaScript
事件,该事件会在页面加载后动态呈现其数据.
The website is loaded with JavaScript
event which render it's data dynamically once the page loads.
requests
库将无法即时渲染JavaScript
.因此您可以使用selenium
或requests_html
.确实有很多模块可以做到这一点.
requests
library will not be able to render JavaScript
on the fly. so you can use selenium
or requests_html
. and indeed there's a lot of modules which can do that.
现在,我们在表上确实还有另一个选项,可以跟踪从何处渲染数据.我能够找到 XHR 请求,该请求用于从back-end
中检索数据API
并将其呈现给用户端.
Now, we do have another option on the table, to track from where the data is rendered. I were able to locate the XHR request which is used to retrieve the data from the back-end
API
and render it to the users side.
您可以通过打开开发人员工具来获取
XHR
请求. 并检查网络并检查发出的XHR/JS
请求取决于呼叫的类型,例如fetch
You can get the
XHR
request by open Developer-Tools and check Network and checkXHR/JS
requests made depending of the type of call such asfetch
下面您可以实现自己的目标:
Below you can achieve your goal:
请注意以下几点:
Note the following:
-
website
持有3068item
- 我已经使用
parameter
"pageSize": "120"
将每页的项目增加为 - 所以
3068 / 120
=假设26
,这意味着每页120个项目,共26页. - 因此,您需要从
(0, 3120, 120)
循环(这意味着0 > 120 > 240
),依此类推,使用参数"beginIndex": "0"
,您将在for
循环下递增.
120
website
holding 3068item
- I've increased the items per page to be
120
usingparameter
"pageSize": "120"
- So
3068 / 120
= let's say26
, Which means 120 item per page for 26 pages. - So you will need to loop from
(0, 3120, 120)
which means0 > 120 > 240
and so on, Using parameter"beginIndex": "0"
which you will increment underfor
loop.
由于您没有为我们提供最终目标,因此您可以实现自己的目标.但我相信您的目标是name
或price
(网址,img)或其他内容.你会找到它的.
Below you can achieve your goal, since you didn't provided us your end goal. but i believe your target is name
or price
(url, img) or whatever. you will find it.
import requests
from bs4 import BeautifulSoup
params = {
"langId": "44",
"storeId": "10151",
"catalogId": "10241",
"categoryId": "12192",
"parent_category_rn": "",
"top_category": "12192",
"pageSize": "120",
"orderBy": "FAVOURITES_FIRST",
"searchTerm": "",
"catSeeAll": "true",
"beginIndex": "0",
"categoryFacetId1": "12192",
"categoryFacetId2": "",
"requesttype": "ajax"
}
def main(url):
with requests.Session() as req:
r = req.post(url, params=params).json()
for item in r[5]['productLists']:
for nest in item['products']:
soup = BeautifulSoup(nest['result'], 'html.parser')
target = soup.find("div", class_="productNameAndPromotions")
name = target.h3.a.text.strip()
url = target.h3.a.get("href")
img = f"https"+target.h3.a.img.get("src")
price = soup.find(
"p", class_="pricePerUnit").get_text(strip=True)
print(name, price, img, url)
main("https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/gb/groceries/drinks/AjaxApplyFilterSearchResultView")
简要输出名称和价格:
Sainsbury's British Semi Skimmed Milk 2.27L (4 pint) £1.10/unit
Sainsbury's British Semi Skimmed Milk 1.13L (2 pint) 80p/unit
Sainsbury's British Whole Milk 2.27L (4 pint) £1.10/unit
Cravendale Purefilter Semi Skimmed Milk 2L £1.90/unit
Sainsbury's British Skimmed Milk 2.27L (4 pint) £1.10/unit
Sainsbury's British Semi Skimmed Milk, SO Organic 2.27L (4 pint) £1.80/unit
Sainsbury's Sparkling Water, Basics 2L 25p/unit
Sainsbury's British Skimmed Milk 1.13L (2 pint) 80p/unit
Sainsbury's 100% Pure Squeezed Smooth Orange Juice, Not From Concentrate 1L £1.30/unit
Sainsbury's Water, Basics 2L 25p/unit
Sainsbury's British Whole Milk 1.13L (2 pint) 80p/unit
Sainsbury's Smooth Pure Orange Juice 1L 95p/unit
Pepsi Max 2L £1.90/unit
Sainsbury's Caledonian Still Water 4x2L £1.50/unit
Highland Spring Still Water 12x500ml £3.00/unit
Sainsbury's 100% Pressed Apple Juice, Not From Concentrate 1L £1.30/unit
Sainsbury's British Whole Milk, SO Organic 2.27L (4 Pint) £1.80/unit
Lactofree Semi Skimmed Lactose Free Fresh Dairy Drink 1L £1.50/unit
Diet Coke 8x330ml £4.00/unit
Alpro Roasted Almond Unsweetened UHT Drink 1L £1.80/unit
Robinsons Orange Squash No Added Sugar 1L £1.65/unit
Sainsbury's Soda Water 1L 60p/unit
Sainsbury's Caledonian Sparkling Water 4x2L £1.60/unit
Tropicana Smooth Orange Juice 950ml £2.45/unit
Sainsbury's Diet Indian Tonic Water 1L 60p/unit
Sainsbury's Pure Apple Juice 1L 95p/unit
Robinsons Apple & Blackcurrant Squash No Added Sugar 1L £1.65/unit
Sainsbury's Sparkling Flavoured Water, Lemon & Lime 1L 50p/unit
Sainsbury's Conegliano Prosecco, Taste the Difference 75cl £8.00/unit
Sainsbury's Unsweetened Soya Drink 1L 90p/unit
Sainsbury's British Semi Skimmed Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Caledonian Sparkling Water 6x500ml £1.50/unit
Sainsbury's Apple & Blackcurrant Squash, No Added Sugar 1.5L £1.00/unit
Highland Spring Still Water 6x1.5L £3.00/unit
Alpro Roasted Almond Unsweetened Fresh Drink 1L £1.85/unit
Sainsbury's Semi Skimmed Long Life Milk 1L 90p/unit
Tropicana Smooth Orange Juice 1.6L £2.50/unit
Sainsbury's 100% Pure Squeezed Orange Juice with Bits, Not From Concentrate 1L £1.30/unit
Cravendale Purefilter Semi Skimmed Milk 1L £1.15/unit
Sainsbury's Caledonian Still Water Sports Cap 6x500ml £1.50/unit
Sainsbury's Double Strength Orange Squash, No Added Sugar 1.5L £1.00/unit
Diet Coke 18x330ml £7.00/unit
Sainsbury's Indian Tonic Water 1L 60p/unit
Sainsbury's Pure Orange Juice 1L 85p/unit
Sainsbury's Pure Apple Juice 6x200ml £1.50/unit
Buxton Still Natural Mineral Water 8x500ml £2.00/unit
Sainsbury's Whole Long Life Milk 1L £1.05/unit
Cravendale Purefilter Skimmed Milk 2L £1.90/unit
Sainsbury's Sparkling Flavoured Water, Blackcurrant & Cherry 1L 50p/unit
Innocent Smooth Orange Juice 1.35L £3.00/unit
Alpro Original Soya Fresh Drink 1L £1.55/unit
Sainsbury's Still Flavoured Water, Strawberry & Kiwi 1L 50p/unit
Sainsbury's British Filtered Semi Skimmed Milk 2L £1.35/unit
Sainsbury's Sparkling Flavoured Water, Mango & Passionfruit 1L 50p/unit
Sainsbury's Caledonian Still Water 5L £1.10/unit
McGuigan Estate Merlot 75cl £5.10/unit
Schweppes Slimline Tonic Water 1L £1.50/unit
PG tips Pyramid Tea Bags x240 696g £4.50/unit
Sainsbury's Sparkling Flavoured Water, Strawberry & Kiwi 1L 50p/unit
Sainsbury's Caledonian Sparkling Water 2L 55p/unit
Sainsbury's Sweetened Soya Drink 1L 90p/unit
Sainsbury's 100% Pure Squeezed Smooth Orange Juice, Not From Concentrate 1.75L £2.10/unit
Sainsbury's Diet Lemonade 2L 60p/unit
Sainsbury's Apple & Mango Juice, Not From Concentrate 1L £1.30/unit
Robinsons Summer Fruits Squash No Added Sugar 1L £1.65/unit
Sainsbury's 100% Pure Squeezed Pineapple Juice, Not From Concentrate 1L £1.30/unit
Clearsprings Sauvignon Blanc 75cl £5.50/unit
Phantom River Sauvignon Blanc 75cl £5.00/unit
Nestle Pure Life Still Spring Water 12x500ml £2.50/unit
Buxton Sparkling Natural Mineral Water 8x500ml £2.10/unit
Brancott Estate Sauvignon Blanc 75cl £6.75/unit
Schweppes Slimline Lemonade 2L £1.30/unit
McGuigan Estate South Australian Shiraz 75cl £5.10/unit
Coca-Cola Zero Sugar 8x330ml £4.00/unit
Villa Maria Private Bin Sauvignon Blanc 75cl £9.25/unit
Diet Coke Caffeine Free 8x330ml £4.00/unit
Sainsbury's British Skimmed Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Kids Caledonian Still Water 6x300ml £1.10/unit
Canti Prosecco 75cl £7.50/unit
Oatly Enriched with Calcium Oat UHT Drink 1L £1.50/unit
Sainsbury's Pure Orange Juice 6x200ml £1.50/unit
Sainsbury's Still Flavoured Water, Lemon & Lime 1L 50p/unit
Valdo Prosecco Marca Oro 75cl £8.50/unit
Oyster Bay Sauvignon Blanc 75cl £8.00/unit
Ribena Blackcurrant Squash 850ml £2.30/unit
Volvic Mineral Water 6x1.5L £3.40/unit
Campo Viejo Rioja Tempranillo 75cl £6.75/unit
Nescafé Azera Americano Instant Coffee 100g £4.60/unit
Tropicana Orange Juice Original 950ml £2.45/unit
Sainsbury's Double Strength Orange & Mango Squash, No Added Sugar 1.5L £1.00/unit
Robinsons Lemon Squash No Added Sugar 1L £1.65/unit
Schweppes Lemonade 2L £1.30/unit
Robinsons Orange & Pineapple Squash No Added Sugar 1L £1.65/unit
Sainsbury's Diet Indian Tonic with Lime 1L 60p/unit
St Helen's Farm Semi Skimmed Goats Milk 1L £1.80/unit
Sainsbury's Double Strength Orange, Lemon & Pineapple Squash, No Added Sugar 1.5L £1.00/unit
Sainsbury's Double Strength Summerfruits Squash, No Added Sugar 1.5L £1.00/unit
Alpro Oat UHT Drink 1L £1.80/unit
Innocent Smooth Orange Juice 900ml £1.50/unit
Sainsbury's British Whole Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Skimmed Long Life Milk 1L 80p/unit
Nescafé Gold Blend Instant Coffee 200g £7.00/unit
Highland Spring Still Water Sports Cap 12x330ml £3.00/unit
Sainsbury's Cava Brut 75cl £6.00/unit
Alpro Light Unsweetened Soya Fresh Drink 1L £1.55/unit
Sainsbury's Caledonian Still Water 2L 50p/unit
Koko Coconut UHT Drink 1L £1.50/unit
Sainsbury's House Pinot Grigio 75cl £4.50/unit
Sainsbury's Cola Zero 2L 45p/unit
St Helen's Farm Whole Goats Milk 1L £1.80/unit
Sainsbury's Double Strength Cherries & Berries Squash, No Added Sugar 1.5L £1.00/unit
Sainsbury's Lemonade 2L 60p/unit
Sainsbury's Pure Orange Juice With Bits 1L 85p/unit
Sainsbury's Pinot Grigio, Taste the Difference 75cl £6.00/unit
Schweppes Tonic Water 1L £1.50/unit
Sainsbury's Cranberry Juice Drink 1L 85p/unit
Nescafé Gold Blend Instant Coffee Refill 150g £3.50/unit
Sainsbury's Gold Roast Instant Coffee 200g £3.15/unit
Sainsbury's Pure Orange Juice with Juicy Bits 1L 95p/unit
Edizione 789 Di Mondelli Prosecco 75cl £6.25/unit
这篇关于使用Find_All函数会返回意外的结果集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!