硒刮JS加载页面 [英] Selenium scraping JS loaded pages
问题描述
我正在尝试从>://surviv.io/stats/玩家787 ,例如总击杀次数.有人可以告诉我如何用硒抓取js加载的数据.谢谢.
I'm trying to scrape some of the loaded JS data from https://surviv.io/stats/player787, such as the number of total kills. Could someone tell me how I can scrape the js loaded data with selenium. Thanks.
这是一些代码
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://surviv.io/stats/player787')
b = browser.find_element_by_tag_name('tr')
包含我想要的数据的'tr'未被硒捕获
The 'tr' which contains the data that I want is not grabbed by selenium
推荐答案
之所以找不到它,是因为该页面未完全呈现.您可以添加一个硒等待,直到指定的元素首先呈现之前,它不会继续.
The reason it's not finding it is because the page isn't fully rendered. You can add a wait with selenium so will not move on until the specified element is rendered first.
此外,如果它在< table>
标记中,则让熊猫为您进行解析(它使用幕后的beautifulsoup提取< table>
,< th>
,< tr>
和< td>
标记,一旦获得,就将它们作为数据帧列表返回呈现的html源:
Also, if it's in a <table>
tag, let pandas do the parsing for you (it uses beautifulsoup under the hood to pull out the <table>
, <th>
, <tr>
, and <td>
tags, returns them as a list of dataframes once you get the rendered html source:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import pandas as pd
browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
browser.get('https://surviv.io/stats/player787')
delay = 3 # seconds
WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'player-stats-overview')))
df = pd.read_html(browser.page_source)[0]
print (df.loc[0,'Kills'])
browser.close()
输出:
18884
print (df)
Wins Kills Games K/G
0 638 18884 8896 2.1
这篇关于硒刮JS加载页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!