使用 Find_All 函数返回意外的结果集 [英] Using Find_All function returns an unexpected result set

查看:29
本文介绍了使用 Find_All 函数返回意外的结果集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 python 3.8.2 和 bs4 BeautifulSoup.我正在尝试查找标签的所有实例,并将每个实例都列在结果集中,每行一个.但是,返回的结果集包含的行数比网站的原始抓取要多.这是因为结果集的第一行包含该标记的所有实例.下一行包含除第一个实例之外的所有实例,第三行包含除第一个和第二个之外的所有实例,依此类推,结果集的其余部分.

I am using python 3.8.2 and bs4 BeautifulSoup. I am trying to find all instances of a tag and have each one listed in the result set, one per row. However the result set that is returned contains more lines than the original scrape of the website. This is because the first row of the result set contains all instances of the tag. The following row contains all instances except the first instance, the third contains all instances except the first and the second and so on and so forth with the remainder of the result set.

代码如下:

from bs4 import BeautifulSoup
import requests

url = "https://www.sainsburys.co.uk/shop/gb/groceries/drinks/seeall"

html_content = requests.get(url, timeout=5)
soup = BeautifulSoup(html_content.text)

test_1 = soup.find('ul',{"class": "productLister gridView"})

test = test_1.find_all("li", attrs={"class": "gridItem"})

如何获得它,以便

  • 的每个实例仅单独列出,每行一个.

    How do I get it so that each instance of <li class: "gridItem"> is only listed by itself, one per row.

    谢谢

    推荐答案

    网站加载了 JavaScript 事件,一旦页面加载,就会动态呈现它的数据.

    The website is loaded with JavaScript event which render it's data dynamically once the page loads.

    requests 库将无法即时渲染 JavaScript.所以你可以使用 seleniumrequests_html.确实有很多模块可以做到这一点.

    requests library will not be able to render JavaScript on the fly. so you can use selenium or requests_html. and indeed there's a lot of modules which can do that.

    现在,我们在表格上确实有另一个选项,可以跟踪数据的呈现位置.我能够找到用于从 检索数据的 XHR 请求后端 API并呈现给用户端.

    Now, we do have another option on the table, to track from where the data is rendered. I were able to locate the XHR request which is used to retrieve the data from the back-end API and render it to the users side.

    您可以通过打开 Developer-Tools 并检查 Network 并检查XHR/JS 请求取决于调用类型,例如 fetch

    You can get the XHR request by open Developer-Tools and check Network and check XHR/JS requests made depending of the type of call such as fetch

    您可以在下面实现您的目标:

    Below you can achieve your goal:

    注意以下几点:

    1. website 持有 3068 item
    2. 我使用 parameter "pageSize": "120"
    3. 将每页的项目数增加到 120
    4. 所以 3068/120 = 比如说 26,这意味着每页 120 个项目,共 26 页.
    5. 所以你需要从 (0, 3120, 120) 循环,这意味着 0 >120 >240 等等,使用参数 "beginIndex": "0" 您将在 for 循环下递增.
    1. website holding 3068 item
    2. I've increased the items per page to be 120 using parameter "pageSize": "120"
    3. So 3068 / 120 = let's say 26, Which means 120 item per page for 26 pages.
    4. So you will need to loop from (0, 3120, 120) which means 0 > 120 > 240 and so on, Using parameter "beginIndex": "0" which you will increment under for loop.

    您可以在下面实现您的目标,因为您没有向我们提供您的最终目标.但我相信您的目标是 nameprice (url, img) 或其他.你会找到的.

    Below you can achieve your goal, since you didn't provided us your end goal. but i believe your target is name or price (url, img) or whatever. you will find it.

    import requests
    from bs4 import BeautifulSoup
    
    params = {
        "langId": "44",
        "storeId": "10151",
        "catalogId": "10241",
        "categoryId": "12192",
        "parent_category_rn": "",
        "top_category": "12192",
        "pageSize": "120",
        "orderBy": "FAVOURITES_FIRST",
        "searchTerm": "",
        "catSeeAll": "true",
        "beginIndex": "0",
        "categoryFacetId1": "12192",
        "categoryFacetId2": "",
        "requesttype": "ajax"
    }
    
    
    def main(url):
        with requests.Session() as req:
            r = req.post(url, params=params).json()
            for item in r[5]['productLists']:
                for nest in item['products']:
                    soup = BeautifulSoup(nest['result'], 'html.parser')
                    target = soup.find("div", class_="productNameAndPromotions")
                    name = target.h3.a.text.strip()
                    url = target.h3.a.get("href")
                    img = f"https"+target.h3.a.img.get("src")
                    price = soup.find(
                        "p", class_="pricePerUnit").get_text(strip=True)
                    print(name, price, img, url)
    
    
    main("https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/gb/groceries/drinks/AjaxApplyFilterSearchResultView")
    

    名称和价格的简要输出:

    Brief output for name and price:

    Sainsbury's British Semi Skimmed Milk 2.27L (4 pint) £1.10/unit
    Sainsbury's British Semi Skimmed Milk 1.13L (2 pint) 80p/unit
    Sainsbury's British Whole Milk 2.27L (4 pint) £1.10/unit
    Cravendale Purefilter Semi Skimmed Milk 2L £1.90/unit
    Sainsbury's British Skimmed Milk 2.27L (4 pint) £1.10/unit
    Sainsbury's British Semi Skimmed Milk, SO Organic 2.27L (4 pint) £1.80/unit
    Sainsbury's Sparkling Water, Basics 2L 25p/unit
    Sainsbury's British Skimmed Milk 1.13L (2 pint) 80p/unit
    Sainsbury's 100% Pure Squeezed Smooth Orange Juice, Not From Concentrate 1L £1.30/unit
    Sainsbury's Water, Basics 2L 25p/unit
    Sainsbury's British Whole Milk 1.13L (2 pint) 80p/unit
    Sainsbury's Smooth Pure Orange Juice 1L 95p/unit
    Pepsi Max 2L £1.90/unit
    Sainsbury's Caledonian Still Water 4x2L £1.50/unit
    Highland Spring Still Water 12x500ml £3.00/unit
    Sainsbury's 100% Pressed Apple Juice, Not From Concentrate 1L £1.30/unit
    Sainsbury's British Whole Milk, SO Organic 2.27L (4 Pint) £1.80/unit
    Lactofree Semi Skimmed Lactose Free Fresh Dairy Drink 1L £1.50/unit
    Diet Coke 8x330ml £4.00/unit
    Alpro Roasted Almond Unsweetened UHT Drink 1L £1.80/unit
    Robinsons Orange Squash No Added Sugar 1L £1.65/unit
    Sainsbury's Soda Water 1L 60p/unit
    Sainsbury's Caledonian Sparkling Water 4x2L £1.60/unit
    Tropicana Smooth Orange Juice 950ml £2.45/unit
    Sainsbury's Diet Indian Tonic Water 1L 60p/unit
    Sainsbury's Pure Apple Juice 1L 95p/unit
    Robinsons Apple & Blackcurrant Squash No Added Sugar 1L £1.65/unit
    Sainsbury's Sparkling Flavoured Water, Lemon & Lime 1L 50p/unit
    Sainsbury's Conegliano Prosecco, Taste the Difference 75cl £8.00/unit
    Sainsbury's Unsweetened Soya Drink 1L 90p/unit
    Sainsbury's British Semi Skimmed Milk, SO Organic 1.13L (2 pint) £1.15/unit
    Sainsbury's Caledonian Sparkling Water 6x500ml £1.50/unit
    Sainsbury's Apple & Blackcurrant Squash, No Added Sugar 1.5L £1.00/unit
    Highland Spring Still Water 6x1.5L £3.00/unit
    Alpro Roasted Almond Unsweetened Fresh Drink 1L £1.85/unit
    Sainsbury's Semi Skimmed Long Life Milk 1L 90p/unit
    Tropicana Smooth Orange Juice 1.6L £2.50/unit
    Sainsbury's 100% Pure Squeezed Orange Juice with Bits, Not From Concentrate 1L £1.30/unit
    Cravendale Purefilter Semi Skimmed Milk 1L £1.15/unit
    Sainsbury's Caledonian Still Water Sports Cap 6x500ml £1.50/unit
    Sainsbury's Double Strength Orange Squash, No Added Sugar 1.5L £1.00/unit
    Diet Coke 18x330ml £7.00/unit
    Sainsbury's Indian Tonic Water 1L 60p/unit
    Sainsbury's Pure Orange Juice 1L 85p/unit
    Sainsbury's Pure Apple Juice 6x200ml £1.50/unit
    Buxton Still Natural Mineral Water 8x500ml £2.00/unit
    Sainsbury's Whole Long Life Milk 1L £1.05/unit
    Cravendale Purefilter Skimmed Milk 2L £1.90/unit
    Sainsbury's Sparkling Flavoured Water, Blackcurrant & Cherry 1L 50p/unit
    Innocent Smooth Orange Juice 1.35L £3.00/unit
    Alpro Original Soya Fresh Drink 1L £1.55/unit
    Sainsbury's Still Flavoured Water, Strawberry & Kiwi 1L 50p/unit
    Sainsbury's British Filtered Semi Skimmed Milk 2L £1.35/unit
    Sainsbury's Sparkling Flavoured Water, Mango & Passionfruit 1L 50p/unit
    Sainsbury's Caledonian Still Water 5L £1.10/unit
    McGuigan Estate Merlot 75cl £5.10/unit
    Schweppes Slimline Tonic Water 1L £1.50/unit
    PG tips Pyramid Tea Bags x240 696g £4.50/unit
    Sainsbury's Sparkling Flavoured Water, Strawberry & Kiwi 1L 50p/unit
    Sainsbury's Caledonian Sparkling Water 2L 55p/unit
    Sainsbury's Sweetened Soya Drink 1L 90p/unit
    Sainsbury's 100% Pure Squeezed Smooth Orange Juice, Not From Concentrate 1.75L £2.10/unit
    Sainsbury's Diet Lemonade 2L 60p/unit
    Sainsbury's Apple & Mango Juice, Not From Concentrate 1L £1.30/unit
    Robinsons Summer Fruits Squash No Added Sugar 1L £1.65/unit
    Sainsbury's 100% Pure Squeezed Pineapple Juice, Not From Concentrate 1L £1.30/unit
    Clearsprings Sauvignon Blanc 75cl £5.50/unit
    Phantom River Sauvignon Blanc 75cl £5.00/unit
    Nestle Pure Life Still Spring Water 12x500ml £2.50/unit
    Buxton Sparkling Natural Mineral Water 8x500ml £2.10/unit
    Brancott Estate Sauvignon Blanc 75cl £6.75/unit
    Schweppes Slimline Lemonade 2L £1.30/unit
    McGuigan Estate South Australian Shiraz 75cl £5.10/unit
    Coca-Cola Zero Sugar 8x330ml £4.00/unit
    Villa Maria Private Bin Sauvignon Blanc 75cl £9.25/unit
    Diet Coke Caffeine Free 8x330ml £4.00/unit
    Sainsbury's British Skimmed Milk, SO Organic 1.13L (2 pint) £1.15/unit
    Sainsbury's Kids Caledonian Still Water 6x300ml £1.10/unit
    Canti Prosecco 75cl £7.50/unit
    Oatly Enriched with Calcium Oat UHT Drink 1L £1.50/unit
    Sainsbury's Pure Orange Juice 6x200ml £1.50/unit
    Sainsbury's Still Flavoured Water, Lemon & Lime 1L 50p/unit
    Valdo Prosecco Marca Oro 75cl £8.50/unit
    Oyster Bay Sauvignon Blanc 75cl £8.00/unit
    Ribena Blackcurrant Squash 850ml £2.30/unit
    Volvic Mineral Water 6x1.5L £3.40/unit
    Campo Viejo Rioja Tempranillo 75cl £6.75/unit
    Nescafé Azera Americano Instant Coffee 100g £4.60/unit
    Tropicana Orange Juice Original 950ml £2.45/unit
    Sainsbury's Double Strength Orange & Mango Squash, No Added Sugar 1.5L £1.00/unit  
    Robinsons Lemon Squash No Added Sugar 1L £1.65/unit
    Schweppes Lemonade 2L £1.30/unit
    Robinsons Orange & Pineapple Squash No Added Sugar 1L £1.65/unit
    Sainsbury's Diet Indian Tonic with Lime 1L 60p/unit
    St Helen's Farm Semi Skimmed Goats Milk 1L £1.80/unit
    Sainsbury's Double Strength Orange, Lemon & Pineapple Squash, No Added Sugar 1.5L £1.00/unit
    Sainsbury's Double Strength Summerfruits Squash, No Added Sugar 1.5L £1.00/unit    
    Alpro Oat UHT Drink 1L £1.80/unit
    Innocent Smooth Orange Juice 900ml £1.50/unit
    Sainsbury's British Whole Milk, SO Organic 1.13L (2 pint) £1.15/unit
    Sainsbury's Skimmed Long Life Milk 1L 80p/unit
    Nescafé Gold Blend Instant Coffee 200g £7.00/unit
    Highland Spring Still Water Sports Cap 12x330ml £3.00/unit
    Sainsbury's Cava Brut 75cl £6.00/unit
    Alpro Light Unsweetened Soya Fresh Drink 1L £1.55/unit
    Sainsbury's Caledonian Still Water 2L 50p/unit
    Koko Coconut UHT Drink 1L £1.50/unit
    Sainsbury's House Pinot Grigio 75cl £4.50/unit
    Sainsbury's Cola Zero 2L 45p/unit
    St Helen's Farm Whole Goats Milk 1L £1.80/unit
    Sainsbury's Double Strength Cherries & Berries Squash, No Added Sugar 1.5L £1.00/unit
    Sainsbury's Lemonade 2L 60p/unit
    Sainsbury's Pure Orange Juice With Bits 1L 85p/unit
    Sainsbury's Pinot Grigio, Taste the Difference 75cl £6.00/unit
    Schweppes Tonic Water 1L £1.50/unit
    Sainsbury's Cranberry Juice Drink 1L 85p/unit
    Nescafé Gold Blend Instant Coffee Refill 150g £3.50/unit
    Sainsbury's Gold Roast Instant Coffee 200g £3.15/unit
    Sainsbury's Pure Orange Juice with Juicy Bits 1L 95p/unit
    Edizione 789 Di Mondelli Prosecco 75cl £6.25/unit
    

    这篇关于使用 Find_All 函数返回意外的结果集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆