如何从通过Javascript加载的页面上抓取数据 [英] How to scrap data off page loaded via Javascript

查看：71 发布时间：2020/9/20 8:00:11 python-3.x beautifulsoup

本文介绍了如何从通过Javascript加载的页面上抓取数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用beautifulsoup在此页面上删除评论-https://www.x....s.com/video_id/the-suburl

I want to scrap the comments off this page using beautifulsoup - https://www.x....s.com/video_id/the-suburl

通过Java单击可加载评论.注释是分页的，每个页面也都在单击时加载注释.我希望获取所有评论，对于每个评论，我想获取海报个人资料网址，评论，否.喜欢，没有喜欢和张贴时间(如页面上所述).

The comments are loaded on click via Javascript. The comments are paginated and each page loads comments on click too. I wish to fetch all comments, for each comment, I want to get the poster profile url, the comment, no. of likes, no of dislikes, and time posted (as stated on the page).

评论可以是字典列表.

我该怎么办?

推荐答案

此脚本将打印在页面上找到的所有注释:

This script will print all comments found on the page:

import json
import requests
from bs4 import BeautifulSoup


url = 'https://www.x......com/video_id/gggjggjj/'
video_id = url.rsplit('/', maxsplit=2)[-2].replace('video', '')

u = 'https://www.x......com/threads/video/ggggjggl/{video_id}/0/0'.format(video_id=video_id)
comments = requests.post(u, data={'load_all':1}).json()

for id_ in comments['posts']['ids']:
    print(comments['posts']['posts'][id_]['date'])
    print(comments['posts']['posts'][id_]['name'])
    print(comments['posts']['posts'][id_]['url'])
    print(BeautifulSoup(comments['posts']['posts'][id_]['message'], 'html.parser').get_text())
    # ...etc.
    print('-'*80)

这篇关于如何从通过Javascript加载的页面上抓取数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从通过Javascript加载的页面上抓取数据 [英] How to scrap data off page loaded via Javascript

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何从通过Javascript加载的页面上抓取数据 [英] How to scrap data off page loaded via Javascript

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭