python selenium 刮整个表 [英] python selenium scrape the whole table
问题描述
这段代码的目的是从一些链接中抓取一个数据表,然后把它变成一个pandas数据框.
The purpose of this code is to scrape a data table form a some links then turn it into a pandas data frame.
问题是这段代码只抓取了表格第一页的前 7 行,我想捕获整个表格.因此,当我尝试遍历表格页面时,出现错误.
The problem is that this code only scrapes the first 7 rows only which are in the first page of the table and I want to capture the whole table. so when i tried to loop over table pages, i got an error.
代码如下:
from selenium import webdriver
urls = open(r"C:\Users\Sayed\Desktop\script\sample.txt").readlines()
for url in urls:
driver = webdriver.Chrome(r"D:\Projects\Tutorial\Driver\chromedriver.exe")
driver.get(url)
for item in driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a'):
driver.execute_script("arguments[0].click();", item)
for table in driver.find_elements_by_xpath('//*[contains(@id,"eventHistoryTable")]//tr'):
data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
print(data)
这里是错误:
回溯(最近一次调用最后一次):
Traceback (most recent call last):
文件D:/Projects/Tutorial/ff.py",第 8 行,在对于 driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a') 中的项目:
File "D:/Projects/Tutorial/ff.py", line 8, in for item in driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a'):
TypeError: 'WebElement' 对象不可迭代
TypeError: 'WebElement' object is not iterable
推荐答案
查看以下脚本以从该网页获取整个表格.我在脚本中使用了硬编码延迟,这不是一个好习惯.但是,您始终可以定义 Explicit Wait
以使代码更健壮:
Check out the below script to get the whole table from that webpage. I've used harcoded delay within my script which is not a good practice. However, you can always define Explicit Wait
to make the code more robust:
import time
from selenium import webdriver
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'
driver = webdriver.Chrome()
driver.get(url)
item = driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a')
driver.execute_script("arguments[0].click();", item)
time.sleep(2)
for table in driver.find_elements_by_xpath('//*[contains(@id,"eventHistoryTable")]//tr'):
data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
print(data)
driver.quit()
要获取耗尽 show more
按钮以及定义 Explicit Wait
的所有数据,您可以尝试以下脚本:
To get all the data exhausting the show more
button along with defining Explicit Wait
you can try the below script:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'
driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver,10)
while True:
try:
item = wait.until(EC.visibility_of_element_located((By.XPATH,'//*[contains(@id,"showMoreHistory")]/a')))
driver.execute_script("arguments[0].click();", item)
except Exception:break
for table in wait.until(EC.visibility_of_all_elements_located((By.XPATH,'//*[contains(@id,"eventHistoryTable")]//tr'))):
data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
print(data)
driver.quit()
这篇关于python selenium 刮整个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!