Python从"div:类"抓取数据 [英] Python scrape data from "div: class
问题描述
我正在尝试抓取< div></div>
标签.然后,我想将其放入已包含其他 scraped 数据的excel文件中.HTML看起来像
< div class ="tabElemNoBor resp overfH"< div class ="clearfix">< div class ="tabTitleLeftWhite">< div class ="tabTitleLeftWhite"<< b>键数据</b</div<</div></div><!-内部td->< div class ="std_txt th_inner"style ="padding-top:10px"< style>.fCCR {清除:两者;行高:22px;}.fCCR:nth-of-type(偶数){background-color:#F6F6F6}.fCCR:nth-of-type(odd){background-color:#FFF}.fCCT {display:inline-block; padding-left:7px;}.fCCV {display:inline-block; width:120px; float:right; text-align:right; padding-right:7px;}</style>< div class ="fCC">< div class ="fCCR">< div class ="fCCT""大写(USD)/div< div class ="fCCV"" 45 016 163 291 291</div></div>< div class ="fCCR">< div class ="fCCT">净销售额(USD)/div/div< div class ="fCCV"> 27753973 000</div>/div< div class ="fCCR''< div class =" fCCT''"员工人数</div< div class ="fCCV''"; 143 000</div</div>< div class ="fCCR">< div class =" fCCT"销售/雇员(USD)</div< div class ="fCCV"> 194 084</div</div>< div class ="fCCR"< div class ="fCCT">自由浮动</div>< div class="fCCV"> 99.8%/div/div> div class ="fCCR"> div class ="fCCT">自由流通量大写(USD)</div>< div class ="fCCV''> 44932273273933</div</div< div类="fCCR">< div类="fCCT"> Avg.交换20个会话(USD)/div> div class ="fCCV"> 380 055 475 lt/div>/div< div class ="fCCR"< div class ="fCCT">每日交易的平均每日资本/div> div class ="fCCV"> 0.84%/div/div/div</div></div>
我当时想我可以使用BeautifulSoup,但我不确定自己需要做什么.
我尝试了以下操作:
中elem的值:打印(value.text)
尝试获取顶部的div标签,然后仅打印其间的所有内容,但似乎不起作用.
URL
I'm trying to scrape some financial data (the Key data) that is between <div> </div>
tags. I want to then put it into an excel file that already has other scraped data in it. The HTML looks like
<div class="tabElemNoBor resp overfH"><div class="clearfix">
<div class="tabTitleWhite"><div class="tabTitleLeftWhite"><b>Key data</b></div></div>
</div><!-- inner td --><div class="std_txt th_inner " style="padding-top:10px"> <style>
.fCCR { clear: both; line-height: 22px;}
.fCCR:nth-of-type(even) {background-color: #F6F6F6}
.fCCR:nth-of-type(odd) {background-color: #FFF}
.fCCT {display: inline-block;padding-left: 7px;}
.fCCV {display: inline-block;width:120px;float: right;text-align: right;padding-right: 7px;}
</style>
<div class="fCC">
<div class="fCCR"><div class="fCCT">Capitalization (USD)</div><div class="fCCV">45 016 163 291</div></div><div class="fCCR"><div class="fCCT">Net sales (USD)</div><div class="fCCV">27 753 973 000</div></div><div class="fCCR"><div class="fCCT">Number of employees</div><div class="fCCV">143 000</div></div><div class="fCCR"><div class="fCCT">Sales / Employee (USD)</div><div class="fCCV">194 084</div></div><div class="fCCR"><div class="fCCT">Free-Float</div><div class="fCCV">99,8%</div></div><div class="fCCR"><div class="fCCT">Free-Float capitalization (USD)</div><div class="fCCV">44 932 273 933</div></div><div class="fCCR"><div class="fCCT">Avg. Exchange 20 sessions (USD)</div><div class="fCCV">380 055 475</div></div><div class="fCCR"><div class="fCCT">Average Daily Capital Traded</div><div class="fCCV">0,84%</div></div></div>
</div></div>
I was thinking I could maybe use BeautifulSoup but I'm not really sure what I would need to do.
I have tried the following:
for value in elem:
print (value.text)
to try and get the top div tag and then just print everything that is between it, but it doesn't seem to work.
EDIT: URL https://www.marketscreener.com/DOLLAR-GENERAL-CORPORATIO-5699818/financials/
Any help appreciated.
Thanks
Print the table to screen and save it to csv:
import csv
import requests
from bs4 import BeautifulSoup
url = 'https://www.marketscreener.com/DOLLAR-GENERAL-CORPORATIO-5699818/financials/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = []
for div in soup.select('div.fCCR'):
all_data.append( div.get_text(strip=True, separator='|').split('|') )
all_data.insert(0, [div.find_previous('b').text])
# pretty print all data:
print(*all_data[0])
print('-' * 80)
for row in all_data[1:]:
print(('{:<45}'*2).format(*row))
# save it to csv:
with open('data.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in all_data:
spamwriter.writerow(row)
Prints:
Key data
--------------------------------------------------------------------------------
Capitalization (USD) 45 016 163 291
Net sales (USD) 27 753 973 000
Number of employees 143 000
Sales / Employee (USD) 194 084
Free-Float 99,8%
Free-Float capitalization (USD) 44 932 273 933
Avg. Exchange 20 sessions (USD) 380 055 475
Average Daily Capital Traded 0,84%
and the data.csv
file in LibreOffice`:
这篇关于Python从"div:类"抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!