硒刮JS加载页面 [英] Selenium scraping JS loaded pages

查看:35
本文介绍了硒刮JS加载页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从>://surviv.io/stats/玩家787 ,例如总击杀次数.有人可以告诉我如何用硒抓取js加载的数据.谢谢.

I'm trying to scrape some of the loaded JS data from https://surviv.io/stats/player787, such as the number of total kills. Could someone tell me how I can scrape the js loaded data with selenium. Thanks.

这是一些代码

from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://surviv.io/stats/player787')
b = browser.find_element_by_tag_name('tr')

包含我想要的数据的'tr'未被硒捕获

The 'tr' which contains the data that I want is not grabbed by selenium

推荐答案

之所以找不到它,是因为该页面未完全呈现.您可以添加一个硒等待,直到指定的元素首先呈现之前,它不会继续.

The reason it's not finding it is because the page isn't fully rendered. You can add a wait with selenium so will not move on until the specified element is rendered first.

此外,如果它在< table> 标记中,则让熊猫为您进行解析(它使用幕后的beautifulsoup提取< table> < th> < tr> < td> 标记,一旦获得,就将它们作为数据帧列表返回呈现的html源:

Also, if it's in a <table> tag, let pandas do the parsing for you (it uses beautifulsoup under the hood to pull out the <table>, <th>, <tr>, and <td> tags, returns them as a list of dataframes once you get the rendered html source:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import pandas as pd

browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
browser.get('https://surviv.io/stats/player787')
delay = 3 # seconds
WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'player-stats-overview')))

df = pd.read_html(browser.page_source)[0]

print (df.loc[0,'Kills'])

browser.close()

输出:

18884


print (df)
   Wins  Kills  Games  K/G
0   638  18884   8896  2.1

这篇关于硒刮JS加载页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆