获取表的内容BeautifulSoup [英] Get content of table in BeautifulSoup
本文介绍了获取表的内容BeautifulSoup的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个网站,我与BeautifulSoup上提取下表
这是URL(我还附上一张照片
I have the following table on a website which I am extracting with BeautifulSoup This is the url (I have also attached a picture
我非常希望让每个公司在一排CSV但是我得到它在不同的行。请参阅所附的图片。
Ideally I would like to have each company in one row in csv however I am getting it in different rows. Please see picture attached.
我想它有它就像在现场D但我得到它在A1,A2,A3 ......
I would like it to have it like in field "D" but I am getting it in A1,A2,A3...
这是code我使用提取:
This is the code I am using to extract:
def _writeInCSV(text):
print "Writing in CSV File"
with open('sara.csv', 'wb') as csvfile:
#spamwriter = csv.writer(csvfile, delimiter='\t',quotechar='\n', quoting=csv.QUOTE_MINIMAL)
spamwriter = csv.writer(csvfile, delimiter='\t',quotechar="\n")
for item in text:
spamwriter.writerow([item])
read_list=[]
initial_list=[]
url="http://www.nse.com.ng/Issuers-section/corporate-disclosures/corporate-actions/closure-of-register"
r=requests.get(url)
soup = BeautifulSoup(r._content, "html.parser")
#gdata_even=soup.find_all("td", {"class":"ms-rteTableEvenRow-3"})
gdata_even=soup.find_all("td", {"class":"ms-rteTable-default"})
for item in gdata_even:
print item.text.encode("utf-8")
initial_list.append(item.text.encode("utf-8"))
print ""
_writeInCSV(initial_list)
有人能帮助吗?
推荐答案
下面的理念是:
- 从表中读取头细胞
- 从表中读取所有其他行
- 压缩所有数据行细胞产生词典列表标题
- 使用
csv.DictWriter()
转储到CSV
实施
import csv
from pprint import pprint
from bs4 import BeautifulSoup
import requests
url = "http://www.nse.com.ng/Issuers-section/corporate-disclosures/corporate-actions/closure-of-register"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
rows = soup.select("table.ms-rteTable-default tr")
headers = [header.get_text(strip=True).encode("utf-8") for header in rows[0].find_all("td")]
data = [dict(zip(headers, [cell.get_text(strip=True).encode("utf-8") for cell in row.find_all("td")]))
for row in rows[1:]]
# see what the data looks like at this point
pprint(data)
with open('sara.csv', 'wb') as csvfile:
spamwriter = csv.DictWriter(csvfile, headers, delimiter='\t', quotechar="\n")
for row in data:
spamwriter.writerow(row)
这篇关于获取表的内容BeautifulSoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文