使用Find_All函数会返回意外的结果集 [英] Using Find_All function returns an unexpected result set

查看:119
本文介绍了使用Find_All函数会返回意外的结果集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python 3.8.2和bs4 BeautifulSoup.我试图找到一个标记的所有实例,并在结果集中列出每个实例,每行一个.但是,返回的结果集包含的行数多于网站的原始内容.这是因为结果集的第一行包含标记的所有实例.接下来的行包含除第一个实例之外的所有实例,第三行包含除第一个和第二个实例之外的所有实例,依此类推,以此类推,并包含结果集的其余部分.

I am using python 3.8.2 and bs4 BeautifulSoup. I am trying to find all instances of a tag and have each one listed in the result set, one per row. However the result set that is returned contains more lines than the original scrape of the website. This is because the first row of the result set contains all instances of the tag. The following row contains all instances except the first instance, the third contains all instances except the first and the second and so on and so forth with the remainder of the result set.

这是代码:

from bs4 import BeautifulSoup
import requests

url = "https://www.sainsburys.co.uk/shop/gb/groceries/drinks/seeall"

html_content = requests.get(url, timeout=5)
soup = BeautifulSoup(html_content.text)

test_1 = soup.find('ul',{"class": "productLister gridView"})

test = test_1.find_all("li", attrs={"class": "gridItem"})

如何获取它,以使<li class: "gridItem">的每个实例仅被单独列出,每行一个.

How do I get it so that each instance of <li class: "gridItem"> is only listed by itself, one per row.

谢谢

推荐答案

该网站加载了JavaScript事件,该事件会在页面加载后动态呈现其数据.

The website is loaded with JavaScript event which render it's data dynamically once the page loads.

requests库将无法即时渲染JavaScript.因此您可以使用seleniumrequests_html.确实有很多模块可以做到这一点.

requests library will not be able to render JavaScript on the fly. so you can use selenium or requests_html. and indeed there's a lot of modules which can do that.

现在,我们在表上确实还有另一个选项,可以跟踪从何处渲染数据.我能够找到 XHR 请求,该请求用于从back-end中检索数据API并将其呈现给用户端.

Now, we do have another option on the table, to track from where the data is rendered. I were able to locate the XHR request which is used to retrieve the data from the back-end API and render it to the users side.

您可以通过打开开发人员工具来获取XHR请求. 并检查网络并检查发出的XHR/JS请求取决于呼叫的类型,例如fetch

You can get the XHR request by open Developer-Tools and check Network and check XHR/JS requests made depending of the type of call such as fetch

下面您可以实现自己的目标:

Below you can achieve your goal:

请注意以下几点:

Note the following:

  1. website持有3068 item
  2. 我已经使用parameter "pageSize": "120"
  3. 将每页的项目增加为120
  4. 所以3068 / 120 =假设26,这意味着每页120个项目,共26页.
  5. 因此,您需要从(0, 3120, 120)循环(这意味着0 > 120 > 240),依此类推,使用参数"beginIndex": "0",您将在for循环下递增.
  1. website holding 3068 item
  2. I've increased the items per page to be 120 using parameter "pageSize": "120"
  3. So 3068 / 120 = let's say 26, Which means 120 item per page for 26 pages.
  4. So you will need to loop from (0, 3120, 120) which means 0 > 120 > 240 and so on, Using parameter "beginIndex": "0" which you will increment under for loop.

由于您没有为我们提供最终目标,因此您可以实现自己的目标.但我相信您的目标是nameprice(网址,img)或其他内容.你会找到它的.

Below you can achieve your goal, since you didn't provided us your end goal. but i believe your target is name or price (url, img) or whatever. you will find it.

import requests
from bs4 import BeautifulSoup

params = {
    "langId": "44",
    "storeId": "10151",
    "catalogId": "10241",
    "categoryId": "12192",
    "parent_category_rn": "",
    "top_category": "12192",
    "pageSize": "120",
    "orderBy": "FAVOURITES_FIRST",
    "searchTerm": "",
    "catSeeAll": "true",
    "beginIndex": "0",
    "categoryFacetId1": "12192",
    "categoryFacetId2": "",
    "requesttype": "ajax"
}


def main(url):
    with requests.Session() as req:
        r = req.post(url, params=params).json()
        for item in r[5]['productLists']:
            for nest in item['products']:
                soup = BeautifulSoup(nest['result'], 'html.parser')
                target = soup.find("div", class_="productNameAndPromotions")
                name = target.h3.a.text.strip()
                url = target.h3.a.get("href")
                img = f"https"+target.h3.a.img.get("src")
                price = soup.find(
                    "p", class_="pricePerUnit").get_text(strip=True)
                print(name, price, img, url)


main("https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/gb/groceries/drinks/AjaxApplyFilterSearchResultView")

简要输出名称和价格:

Sainsbury's British Semi Skimmed Milk 2.27L (4 pint) £1.10/unit
Sainsbury's British Semi Skimmed Milk 1.13L (2 pint) 80p/unit
Sainsbury's British Whole Milk 2.27L (4 pint) £1.10/unit
Cravendale Purefilter Semi Skimmed Milk 2L £1.90/unit
Sainsbury's British Skimmed Milk 2.27L (4 pint) £1.10/unit
Sainsbury's British Semi Skimmed Milk, SO Organic 2.27L (4 pint) £1.80/unit
Sainsbury's Sparkling Water, Basics 2L 25p/unit
Sainsbury's British Skimmed Milk 1.13L (2 pint) 80p/unit
Sainsbury's 100% Pure Squeezed Smooth Orange Juice, Not From Concentrate 1L £1.30/unit
Sainsbury's Water, Basics 2L 25p/unit
Sainsbury's British Whole Milk 1.13L (2 pint) 80p/unit
Sainsbury's Smooth Pure Orange Juice 1L 95p/unit
Pepsi Max 2L £1.90/unit
Sainsbury's Caledonian Still Water 4x2L £1.50/unit
Highland Spring Still Water 12x500ml £3.00/unit
Sainsbury's 100% Pressed Apple Juice, Not From Concentrate 1L £1.30/unit
Sainsbury's British Whole Milk, SO Organic 2.27L (4 Pint) £1.80/unit
Lactofree Semi Skimmed Lactose Free Fresh Dairy Drink 1L £1.50/unit
Diet Coke 8x330ml £4.00/unit
Alpro Roasted Almond Unsweetened UHT Drink 1L £1.80/unit
Robinsons Orange Squash No Added Sugar 1L £1.65/unit
Sainsbury's Soda Water 1L 60p/unit
Sainsbury's Caledonian Sparkling Water 4x2L £1.60/unit
Tropicana Smooth Orange Juice 950ml £2.45/unit
Sainsbury's Diet Indian Tonic Water 1L 60p/unit
Sainsbury's Pure Apple Juice 1L 95p/unit
Robinsons Apple & Blackcurrant Squash No Added Sugar 1L £1.65/unit
Sainsbury's Sparkling Flavoured Water, Lemon & Lime 1L 50p/unit
Sainsbury's Conegliano Prosecco, Taste the Difference 75cl £8.00/unit
Sainsbury's Unsweetened Soya Drink 1L 90p/unit
Sainsbury's British Semi Skimmed Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Caledonian Sparkling Water 6x500ml £1.50/unit
Sainsbury's Apple & Blackcurrant Squash, No Added Sugar 1.5L £1.00/unit
Highland Spring Still Water 6x1.5L £3.00/unit
Alpro Roasted Almond Unsweetened Fresh Drink 1L £1.85/unit
Sainsbury's Semi Skimmed Long Life Milk 1L 90p/unit
Tropicana Smooth Orange Juice 1.6L £2.50/unit
Sainsbury's 100% Pure Squeezed Orange Juice with Bits, Not From Concentrate 1L £1.30/unit
Cravendale Purefilter Semi Skimmed Milk 1L £1.15/unit
Sainsbury's Caledonian Still Water Sports Cap 6x500ml £1.50/unit
Sainsbury's Double Strength Orange Squash, No Added Sugar 1.5L £1.00/unit
Diet Coke 18x330ml £7.00/unit
Sainsbury's Indian Tonic Water 1L 60p/unit
Sainsbury's Pure Orange Juice 1L 85p/unit
Sainsbury's Pure Apple Juice 6x200ml £1.50/unit
Buxton Still Natural Mineral Water 8x500ml £2.00/unit
Sainsbury's Whole Long Life Milk 1L £1.05/unit
Cravendale Purefilter Skimmed Milk 2L £1.90/unit
Sainsbury's Sparkling Flavoured Water, Blackcurrant & Cherry 1L 50p/unit
Innocent Smooth Orange Juice 1.35L £3.00/unit
Alpro Original Soya Fresh Drink 1L £1.55/unit
Sainsbury's Still Flavoured Water, Strawberry & Kiwi 1L 50p/unit
Sainsbury's British Filtered Semi Skimmed Milk 2L £1.35/unit
Sainsbury's Sparkling Flavoured Water, Mango & Passionfruit 1L 50p/unit
Sainsbury's Caledonian Still Water 5L £1.10/unit
McGuigan Estate Merlot 75cl £5.10/unit
Schweppes Slimline Tonic Water 1L £1.50/unit
PG tips Pyramid Tea Bags x240 696g £4.50/unit
Sainsbury's Sparkling Flavoured Water, Strawberry & Kiwi 1L 50p/unit
Sainsbury's Caledonian Sparkling Water 2L 55p/unit
Sainsbury's Sweetened Soya Drink 1L 90p/unit
Sainsbury's 100% Pure Squeezed Smooth Orange Juice, Not From Concentrate 1.75L £2.10/unit
Sainsbury's Diet Lemonade 2L 60p/unit
Sainsbury's Apple & Mango Juice, Not From Concentrate 1L £1.30/unit
Robinsons Summer Fruits Squash No Added Sugar 1L £1.65/unit
Sainsbury's 100% Pure Squeezed Pineapple Juice, Not From Concentrate 1L £1.30/unit
Clearsprings Sauvignon Blanc 75cl £5.50/unit
Phantom River Sauvignon Blanc 75cl £5.00/unit
Nestle Pure Life Still Spring Water 12x500ml £2.50/unit
Buxton Sparkling Natural Mineral Water 8x500ml £2.10/unit
Brancott Estate Sauvignon Blanc 75cl £6.75/unit
Schweppes Slimline Lemonade 2L £1.30/unit
McGuigan Estate South Australian Shiraz 75cl £5.10/unit
Coca-Cola Zero Sugar 8x330ml £4.00/unit
Villa Maria Private Bin Sauvignon Blanc 75cl £9.25/unit
Diet Coke Caffeine Free 8x330ml £4.00/unit
Sainsbury's British Skimmed Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Kids Caledonian Still Water 6x300ml £1.10/unit
Canti Prosecco 75cl £7.50/unit
Oatly Enriched with Calcium Oat UHT Drink 1L £1.50/unit
Sainsbury's Pure Orange Juice 6x200ml £1.50/unit
Sainsbury's Still Flavoured Water, Lemon & Lime 1L 50p/unit
Valdo Prosecco Marca Oro 75cl £8.50/unit
Oyster Bay Sauvignon Blanc 75cl £8.00/unit
Ribena Blackcurrant Squash 850ml £2.30/unit
Volvic Mineral Water 6x1.5L £3.40/unit
Campo Viejo Rioja Tempranillo 75cl £6.75/unit
Nescafé Azera Americano Instant Coffee 100g £4.60/unit
Tropicana Orange Juice Original 950ml £2.45/unit
Sainsbury's Double Strength Orange & Mango Squash, No Added Sugar 1.5L £1.00/unit  
Robinsons Lemon Squash No Added Sugar 1L £1.65/unit
Schweppes Lemonade 2L £1.30/unit
Robinsons Orange & Pineapple Squash No Added Sugar 1L £1.65/unit
Sainsbury's Diet Indian Tonic with Lime 1L 60p/unit
St Helen's Farm Semi Skimmed Goats Milk 1L £1.80/unit
Sainsbury's Double Strength Orange, Lemon & Pineapple Squash, No Added Sugar 1.5L £1.00/unit
Sainsbury's Double Strength Summerfruits Squash, No Added Sugar 1.5L £1.00/unit    
Alpro Oat UHT Drink 1L £1.80/unit
Innocent Smooth Orange Juice 900ml £1.50/unit
Sainsbury's British Whole Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Skimmed Long Life Milk 1L 80p/unit
Nescafé Gold Blend Instant Coffee 200g £7.00/unit
Highland Spring Still Water Sports Cap 12x330ml £3.00/unit
Sainsbury's Cava Brut 75cl £6.00/unit
Alpro Light Unsweetened Soya Fresh Drink 1L £1.55/unit
Sainsbury's Caledonian Still Water 2L 50p/unit
Koko Coconut UHT Drink 1L £1.50/unit
Sainsbury's House Pinot Grigio 75cl £4.50/unit
Sainsbury's Cola Zero 2L 45p/unit
St Helen's Farm Whole Goats Milk 1L £1.80/unit
Sainsbury's Double Strength Cherries & Berries Squash, No Added Sugar 1.5L £1.00/unit
Sainsbury's Lemonade 2L 60p/unit
Sainsbury's Pure Orange Juice With Bits 1L 85p/unit
Sainsbury's Pinot Grigio, Taste the Difference 75cl £6.00/unit
Schweppes Tonic Water 1L £1.50/unit
Sainsbury's Cranberry Juice Drink 1L 85p/unit
Nescafé Gold Blend Instant Coffee Refill 150g £3.50/unit
Sainsbury's Gold Roast Instant Coffee 200g £3.15/unit
Sainsbury's Pure Orange Juice with Juicy Bits 1L 95p/unit
Edizione 789 Di Mondelli Prosecco 75cl £6.25/unit

这篇关于使用Find_All函数会返回意外的结果集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆