BeautifulSoup,保存文本文件刮结果 [英] BeautifulSoup, save scrape results in text file

查看:240
本文介绍了BeautifulSoup,保存文本文件刮结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从BeautifulSoup表中抽取数据,这保存到文件中。我写这样的:

I'm trying to scrape data from a table with BeautifulSoup and save this to a file. I wrote this:

import urllib2
from bs4 import BeautifulSoup

url = "http://dofollow.netsons.org/table1.htm"

page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)

for tr in soup.find_all('tr')[2:]:
    tds = tr.find_all('td')
    print "%s, %s, %s" % (tds[0].text, tds[1].text, tds[2].text)

这工作。

然后我试着写结果到一个文件,但它不工作。 (

I then tried to write the results to a file but it is not working. :(

logfile = open("log.txt", 'a')             
logfile.write("%s,%s,%s\n" % (tds[0].text, tds[1].text, tds[2].text))   
logfile.close()

如何保存我的结果在测试文件?

How can save my results in a test file?

推荐答案

BeautifulSoup给你的Uni code数据,这些数据在写入文件之前需要连接code。

BeautifulSoup gives you Unicode data, which you need to encode before writing it to a file.

如果您使用 IO 库,它可以打开与透明编码的文件对象时,它会更容易:

It'll be easier if you use the io library, which lets you open a file object with transparent encoding:

import io

with io.open('log.txt', 'a', encoding='utf8') as logfile:
    for tr in soup.find_all('tr')[2:]:
        tds = tr.find_all('td')
        logfile.write(u"%s, %s, %s\n" % (tds[0].text, tds[1].text, tds[2].text))

语句采用关闭文件对象对你的照顾。

The with statement takes care of closing the file object for you.

我用UTF8作为codeC,但你可以选择任何一台可以处理你刮页面中使用的所有codepoints。

I used UTF8 as the codec, but you can pick any that can handle all codepoints used in the pages you are scraping.

这篇关于BeautifulSoup,保存文本文件刮结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆