BeautifulSoup输出到.txt文件 [英] BeautifulSoup output to .txt file
本文介绍了BeautifulSoup输出到.txt文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想我的数据导出为.txt文件
从BS4进口BeautifulSoup
进口要求
进口OS进口OSos.getcwd()
'/家庭/文件夹
os.mkdir(Probeersel6)
os.chdir(Probeersel6)
os.getcwd()
'/家庭/桌面/文件夹
os.mkdir(img目录)#now`folder`URL =http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html
R = requests.get(URL)
汤= BeautifulSoup(r.content)
数据= soup.find_all(文章,{级:文章})开放(%s的名.txt,WB%(URL))的文件中:
在数据项:
打印item.contents [0] .find_all(时间,{日期时间:2016-03-16T09:50:30 + 0100})[0]的.text
打印item.contents [0] .find_all(A,{级:链接灰色})[0]的.text
打印\\ n
打印item.contents [0] .find_all(IMG,{级:媒体全})[0]
打印\\ n
打印item.contents [1] .find_all(格,{级:article_textwrap})[0]的.text
file.write()
什么应该被放在:
file.write()
工作?
我也试图让.txt文件的相同网址我应该做的名字与一个字符串?
开放(%s的名.txt,WB%(URL))的文件中:
结果
URL =http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html
解决方案
您应该把里面的 file.write
您的内容。我可能会做这样的事情:
#!的/ usr / bin中/ python3
#从BS4进口BeautifulSoup
进口要求URL ='http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html
FILE_NAME = url.rsplit('/',1)[1] .rsplit('。')[0]R = requests.get(URL)
汤= BeautifulSoup(r.content,'LXML')
数据= soup.find_all('文章',{'类':'文章'})
内容=''。加入('''{} \\ n {} \\ n \\ n {} \\ n {}''时间',{'日期时间''格式(item.contents [0] .find_all(。': '2016-03-16T09:50:30 + 0100'})[0]的.text,
item.contents [0] .find_all('A',{'类':'链接灰色'})[0]的.text,
item.contents [0] .find_all('IMG',{类:媒介全'})[0]
item.contents [1] .find_all('格',{'类':'article_textwrap'})[0]的.text,
),用于在数据项)开放(./ {} txt'.format(FILE_NAME),MODE ='重量',编码='UTF-8')的文件中:
file.write(内容)
I am trying to export my data as a .txt file
from bs4 import BeautifulSoup
import requests
import os
import os
os.getcwd()
'/home/folder'
os.mkdir("Probeersel6")
os.chdir("Probeersel6")
os.getcwd()
'/home/Desktop/folder'
os.mkdir("img") #now `folder`
url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html"
r = requests.get(url)
soup = BeautifulSoup(r.content)
data = soup.find_all("article", {"class": "article"})
with open(""%s".txt", "wb" %(url)) as file:
for item in data:
print item.contents[0].find_all("time", {"datetime": "2016-03-16T09:50:30+0100"})[0].text
print item.contents[0].find_all("a", {"class": "link-grey"})[0].text
print "\n"
print item.contents[0].find_all("img", {"class": "media-full"})[0]
print "\n"
print item.contents[1].find_all("div", {"class": "article_textwrap"})[0].text
file.write()
what should be put in the:
file.write()
to work?
I am also trying to get the name of the .txt file the same as the url should I do that with a string?
with open(""%s".txt", "wb" %(url)) as file:
url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html"
解决方案
You should put Inside file.write
your content. I'll probably do something like:
#!/usr/bin/python3
#
from bs4 import BeautifulSoup
import requests
url = 'http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html'
file_name=url.rsplit('/',1)[1].rsplit('.')[0]
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
data = soup.find_all('article', {'class': 'article'})
content=''.join('''{}\n{}\n\n{}\n{}'''.format( item.contents[0].find_all('time', {'datetime': '2016-03-16T09:50:30+0100'})[0].text,
item.contents[0].find_all('a', {'class': 'link-grey'})[0].text,
item.contents[0].find_all('img', {'class': 'media-full'})[0],
item.contents[1].find_all('div', {'class': 'article_textwrap'})[0].text,
) for item in data)
with open('./{}.txt'.format(file_name), mode='wt', encoding='utf-8') as file:
file.write(content)
这篇关于BeautifulSoup输出到.txt文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文