BeautifulSoup,保存文本文件刮结果 [英] BeautifulSoup, save scrape results in text file
问题描述
我试图从BeautifulSoup表中抽取数据,这保存到文件中。我写这样的:
I'm trying to scrape data from a table with BeautifulSoup and save this to a file. I wrote this:
import urllib2
from bs4 import BeautifulSoup
url = "http://dofollow.netsons.org/table1.htm"
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
for tr in soup.find_all('tr')[2:]:
tds = tr.find_all('td')
print "%s, %s, %s" % (tds[0].text, tds[1].text, tds[2].text)
这工作。
然后我试着写结果到一个文件,但它不工作。 (
I then tried to write the results to a file but it is not working. :(
logfile = open("log.txt", 'a')
logfile.write("%s,%s,%s\n" % (tds[0].text, tds[1].text, tds[2].text))
logfile.close()
如何保存我的结果在测试文件?
How can save my results in a test file?
推荐答案
BeautifulSoup给你的Uni code数据,这些数据在写入文件之前需要连接code。
BeautifulSoup gives you Unicode data, which you need to encode before writing it to a file.
如果您使用 IO
库,它可以打开与透明编码的文件对象时,它会更容易:
It'll be easier if you use the io
library, which lets you open a file object with transparent encoding:
import io
with io.open('log.txt', 'a', encoding='utf8') as logfile:
for tr in soup.find_all('tr')[2:]:
tds = tr.find_all('td')
logfile.write(u"%s, %s, %s\n" % (tds[0].text, tds[1].text, tds[2].text))
的与
语句采用关闭文件对象对你的照顾。
The with
statement takes care of closing the file object for you.
我用UTF8作为codeC,但你可以选择任何一台可以处理你刮页面中使用的所有codepoints。
I used UTF8 as the codec, but you can pick any that can handle all codepoints used in the pages you are scraping.
这篇关于BeautifulSoup,保存文本文件刮结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!