BeautifulSoup:刮蒸愿望清单游戏-.findAll不返回在检查器中可见的嵌套div [英] BeautifulSoup: Scraping steam wishlist games - .findAll not returning nested divs visible in inspector

查看：28 发布时间：2021/4/15 19:18:31 python html web-scraping beautifulsoup

本文介绍了BeautifulSoup:刮蒸愿望清单游戏-.findAll不返回在检查器中可见的嵌套div的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

因此，我试图使用beautifulsoup从我的愿望清单中删除游戏.理想情况下，我想要游戏的名称，游戏的蒸汽商店页面的链接以及当前列出的价格.问题是，当我调用 soup.find_all("div"，{"class":"wishlist_row"})时，尽管我能够看到应该有几个这样的列表，但它会返回一个空列表检查员希望列表中每个游戏的divs.这是我当前代码的精简版本:

So I am trying to scrape games off my steam wish-list using beautifulsoup. Ideally, I would like the name of the game, the link to the steam store page of the game and the currently listed price. The issue is that when I call soup.find_all("div", {"class": "wishlist_row"}) it returns an empty list despite me being able to see that there should be several of these divs for each game on my wish-list in the inspector. Here is a condensed version of my current code:

from bs4 import BeautifulSoup
import requests

profile_id = "id/Zorro4"

url_base = "https://store.steampowered.com/wishlist/"

r = requests.get(url_base + profile_id + "#sort=order", headers=header)

data = r.text

soup = BeautifulSoup(data, features="lxml")

# find divs containing information about game and steam price
divs = soup.findAll("div", {"class": "wishlist_row"})

print(divs)
>>> []

如果我转到 https，则可以在检查器中清楚地看到这些div.://store.steampowered.com/wishlist/id/zorro4/#sort=order 我尝试过

使用html.parser代替lxml
欺骗用户代理/标头
尝试使用 .find("div"，{"class":"wishlist_row"})
通过这些线程进行查找
我注意到一些奇怪的东西可能会帮助解决问题，但是我不确定该怎么做.

I have noticed something odd that might help solve the problem but I am not sure what to make of it.
```
soup.find(id="wishlist_ctn") # The div which should contain all the wishlist_row divs
>>> <div id="wishlist_ctn">\n</div> 
```
据我所知，应该返回< div id ="wishlist_ctn"> ...</div> ，因为div包含更多嵌套的div(我在寻找).我不确定为什么它只返回换行符.当抓取wishlist_ctn div的内容时，这几乎就像是丢失了.任何帮助将不胜感激，在过去的几天里，我一直试图解决这一问题，但没有成功.

This, as far as I know, should return <div id="wishlist_ctn">...</div> since the div contains more nested divs (the ones I'm looking for). I am not sure why it just returns a newline character. It's almost as though when scraping the contents of the wishlist_ctn div gets lost. Any help would be super appreciated, I've been trying to solve this for the last couple days with no success.

推荐答案

您在网页上看到的数据是通过Javascript/JSON动态加载的.加载数据的URL位于HTML页面内部-我们可以使用 re 模块将其提取.

The data you see on the webpage is loaded dynamically via Javascript/JSON. The URL, from where the data is loaded is inside the HTML page - we can use re module to extract it.

此示例显示了愿望清单的JSON数据:

This example prints the JSON data of the wishlist:
```
import re
import json
import requests

url = 'https://store.steampowered.com/wishlist/id/zorro4/#sort=order'
wishlist_url =  json.loads( re.findall(r'g_strWishlistBaseURL = (".*?");', requests.get(url).text)[0] )

data = requests.get(wishlist_url + 'wishlistdata/?p=0').json()
print(json.dumps(data, indent=4))
```
打印:
```
{
    "50": {
        "name": "Half-Life: Opposing Force",
        "capsule": "https://steamcdn-a.akamaihd.net/steam/apps/50/header_292x136.jpg?t=1571756577",
        "review_score": 8,
        "review_desc": "Very Positive",
        "reviews_total": "5,383",
        "reviews_percent": 95,
        "release_date": "941443200",
        "release_string": "1 Nov, 1999",
        "platform_icons": "<span class=\"platform_img win\"></span><span class=\"platform_img mac\"></span><span class=\"platform_img linux\"></span>",
        "subs": [
            {
                "id": 32,

...and so on.
```
这篇关于BeautifulSoup:刮蒸愿望清单游戏-.findAll不返回在检查器中可见的嵌套div的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BeautifulSoup:刮蒸愿望清单游戏-.findAll不返回在检查器中可见的嵌套div [英] BeautifulSoup: Scraping steam wishlist games - .findAll not returning nested divs visible in inspector

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

BeautifulSoup:刮蒸愿望清单游戏-.findAll不返回在检查器中可见的嵌套div [英] BeautifulSoup: Scraping steam wishlist games - .findAll not returning nested divs visible in inspector

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭