BeautifulSoup错误文件保存.TXT [英] BeautifulSoup Error in file saving .txt
本文介绍了BeautifulSoup错误文件保存.TXT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
from bs4 import BeautifulSoup
import requests
import os
url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html"
r = requests.get(url)
soup = BeautifulSoup(r.content.decode('utf-8', 'ignore'))
data = soup.find_all("article", {"class": "article"})
with open("data1.txt", "wb") as file:
content=‘utf-8’
for item in data:
content+='''{}\n{}\n\n{}\n{}'''.format( item.contents[0].find_all("time", {"datetime": "2016-03-16T09:50:30+0100"})[0].text,
item.contents[0].find_all("a", {"class": "link-grey"})[0].text,
item.contents[0].find_all("img", {"class": "media-full"})[0],
item.contents[1].find_all("div", {"class": "article_textwrap"})[0].text,
)
with open("data1.txt".format(file_name), "wb") as file:
file.write(content)
最近解决的UTF / UNI code问题,但现在它不是将其保存为txt文件,也不保存它。我需要做什么呢?
Recently solved a utf/Unicode problem but now it isn't saving it as a .txt file nor saving it at all. What do I need to do?
推荐答案
如果你想写入数据为UTF-8的文件试试 codecs.open
这样的:
If you want to write the data as UTF-8 to the file try codecs.open
like:
from bs4 import BeautifulSoup
import requests
import os
import codecs
url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html"
r = requests.get(url)
soup = BeautifulSoup(r.content)
data = soup.find_all("article", {"class": "article"})
with codecs.open("data1.txt", "wb", "utf-8") as filen:
for item in data:
filen.write(item.contents[0].find_all("time", {"datetime": "2016-03-16T09:50:30+0100"})[0].get_text())
filen.write('\n')
filen.write(item.contents[0].find_all("a", {"class": "link-grey"})[0].get_text())
filen.write('\n\n')
filen.write(item.contents[0].find_all("img", {"class": "media-full"})[0].get_text())
filen.write('\n')
filen.write(item.contents[1].find_all("div", {"class": "article_textwrap"})[0].get_text())
我不确定 filen.write(item.contents [0] .find_all(IMG,{级:媒体全})[0])
,因为返回的标签
实例给我。
I'm unsure about filen.write(item.contents[0].find_all("img", {"class": "media-full"})[0])
because that returned a Tag
instance for me.
这篇关于BeautifulSoup错误文件保存.TXT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文