BeautifulSoup错误文件保存.TXT [英] BeautifulSoup Error in file saving .txt

查看:216
本文介绍了BeautifulSoup错误文件保存.TXT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

from bs4 import BeautifulSoup
import requests
import os


url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html"
r  = requests.get(url)
soup = BeautifulSoup(r.content.decode('utf-8', 'ignore'))
data = soup.find_all("article", {"class": "article"})

with open("data1.txt", "wb") as file:
   content=‘utf-8’
for item in data:
    content+='''{}\n{}\n\n{}\n{}'''.format( item.contents[0].find_all("time", {"datetime": "2016-03-16T09:50:30+0100"})[0].text,
                                            item.contents[0].find_all("a", {"class": "link-grey"})[0].text,
                                            item.contents[0].find_all("img", {"class": "media-full"})[0],
                                            item.contents[1].find_all("div", {"class": "article_textwrap"})[0].text,
                                            )
with open("data1.txt".format(file_name), "wb") as file:
    file.write(content)

最近解决的UTF / UNI code问题,但现在它不是将其保存为txt文件,也不保存它。我需要做什么呢?

Recently solved a utf/Unicode problem but now it isn't saving it as a .txt file nor saving it at all. What do I need to do?

推荐答案

如果你想写入数据为UTF-8的文件试试 codecs.open 这样的:

If you want to write the data as UTF-8 to the file try codecs.open like:

from bs4 import BeautifulSoup
import requests
import os
import codecs


url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html"
r  = requests.get(url)
soup = BeautifulSoup(r.content)
data = soup.find_all("article", {"class": "article"})

with codecs.open("data1.txt", "wb", "utf-8") as filen:
    for item in data:
        filen.write(item.contents[0].find_all("time", {"datetime": "2016-03-16T09:50:30+0100"})[0].get_text())
        filen.write('\n')
        filen.write(item.contents[0].find_all("a", {"class": "link-grey"})[0].get_text())
        filen.write('\n\n')
        filen.write(item.contents[0].find_all("img", {"class": "media-full"})[0].get_text())
        filen.write('\n')
        filen.write(item.contents[1].find_all("div", {"class": "article_textwrap"})[0].get_text())

我不确定 filen.write(item.contents [0] .find_all(IMG,{级:媒体全})[0]),因为返回的标签实例给我。

I'm unsure about filen.write(item.contents[0].find_all("img", {"class": "media-full"})[0]) because that returned a Tag instance for me.

这篇关于BeautifulSoup错误文件保存.TXT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆