如何使用python循环从Selenium Selenium中填充Excel文件 [英] how to fill excel file from selenium scraping in loop with python

查看:121
本文介绍了如何使用python循环从Selenium Selenium中填充Excel文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取一个网站而不是包含许多页面的网站,使用硒时,我每次在第二个"TAB"中打开一个页面,然后启动我的函数来获取数据.之后,我关闭选项卡并打开下一个选项卡,然后继续提取直到最后一页.我的问题是,当我将数据保存在excel文件中时,我发现它仅保存了从最后一页(标签)提取的最新信息.你能帮我找到我的错误吗?

  def scrape_client_infos(linksss):tds = []#tds是包含数据的列表reader = pd.read_excel(r'C:\ python projects \ mada \ db.xlsx')writer = pd.ExcelWriter(r'C:\ python projects \ mada \ db.xlsx',engine ='openpyxl')html = urlopen(linksss)soup = BeautifulSoup.BeautifulSoup(html,'html.parser')table = soup.find('table',attrs = {'class':'r2'})#scrab包含文本的所有tr对于table.find_all('tr')中的tr:elem = tr.find('td').get_text()elem = elem.replace('\ t','')elem = elem.replace('\ n','')elem = elem.replace('\ r','')tds.append(elem)打印(tds)#选择我需要保存在Excel中的数据raw_data = {'sub_num':[tds [1]],'id':[tds [0]],'国籍':[tds [2]],'国家':[tds [3]],'城市':[tds [3]],'age':[tds [7]],'marital_status':[tds [6]],'wayy':[tds [5]]}df = pd.DataFrame(raw_data,columns = ['sub_num','id','nationality','country','city','age','marital_status','wayy'])#将数据保存在Excel文件中df.to_excel(writer,sheet_name ='Sheet1',startrow = len(reader),header = False)writer.save()回汤 

P.S:我一直想从最后一行填充Excel文件

解决方案

要使用 Pandas 附加excel数据,您需要在writer对象中设置工作表./p>

更新代码的最后一部分:

 #将数据保存在Excel文件中从openpyxl导入load_workbook书= load_workbook(路径)startrw = book ['Sheet1'].max_row + 1writer.book =书writer.sheets = dict((book.worksheets中用于ws的(ws.title,ws))#防止覆盖df.to_excel(writer,sheet_name ='Sheet1',startrow = startrw,header = False)writer.save()回汤 

i am trying to scrape a website than contain many pages, with selenium i open each time a page in second 'TAB' and launch my function to get the data. after that i close the tab and open the next tab and continue extraction until the last page. my problem is when i save my data in the excel file, i found that it save just the last information extract from the last page(tab). can you help me to find my error ?

def scrape_client_infos(linksss):

tds=[] # tds is the list that contain the data

reader=pd.read_excel(r'C:\python projects\mada\db.xlsx')
writer= pd.ExcelWriter(r'C:\python projects\mada\db.xlsx',engine='openpyxl')
html = urlopen(linksss)
soup=BeautifulSoup.BeautifulSoup(html,'html.parser')

table=soup.find('table',attrs={'class':'r2'})

#scrab all the tr that contain text    
for tr in table.find_all('tr'):
    elem = tr.find('td').get_text()
    elem=elem.replace('\t','')
    elem=elem.replace('\n','')
    elem=elem.replace('\r','')
    tds.append(elem)
    
print(tds)   

#selecting the data that i need to save in excel
raw_data={'sub_num':[tds[1]],'id':[tds[0]],'nationality':[tds[2]],'country':[tds[3]],'city':[tds[3]],'age':[tds[7]],'marital_status':[tds[6]],'wayy':[tds[5]]}    
df=pd.DataFrame(raw_data,columns=['sub_num','id','nationality','country','city','age','marital_status','wayy'])

#save the data in excel file
df.to_excel(writer, sheet_name='Sheet1',startrow=len(reader), header=False)
writer.save()
return soup        

P.S: i always want to fill the excel file from the last line

解决方案

To append excel data using Pandas, you need to set the worksheets in the writer object.

Update the last section in your code:

#save the data in excel file
from openpyxl import load_workbook
book = load_workbook(path)
startrw = book['Sheet1'].max_row+1
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)  # prevent overwrite
df.to_excel(writer, sheet_name='Sheet1',startrow=startrw, header=False)
writer.save()
return soup   

这篇关于如何使用python循环从Selenium Selenium中填充Excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆