使用BeautifulSoup保存网页内容 [英] Saving content of a webpage using BeautifulSoup
本文介绍了使用BeautifulSoup保存网页内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正尝试使用下面的代码使用BeautifulSoup抓取网页:
I'm trying to scrape a webpage using BeautifulSoup using the code below:
import urllib.request
from bs4 import BeautifulSoup
with urllib.request.urlopen("http://en.wikipedia.org//wiki//Markov_chain.htm") as url:
s = url.read()
soup = BeautifulSoup(s)
with open("scraped.txt", "w", encoding="utf-8") as f:
f.write(soup.get_text())
f.close()
问题在于它保存了维基百科的主页,而不是该特定文章.为什么地址不起作用,该如何更改?
The problem is that it saves the Wikipedia's main page instead of that specific article. Why the address doesn't work and how should I change it?
推荐答案
该页面的正确网址为 http ://en.wikipedia.org/wiki/Markov_chain :
>>> import urllib.request
>>> from bs4 import BeautifulSoup
>>> url = "http://en.wikipedia.org/wiki/Markov_chain"
>>> soup = BeautifulSoup(urllib.request.urlopen(url))
>>> soup.title
<title>Markov chain - Wikipedia, the free encyclopedia</title>
这篇关于使用BeautifulSoup保存网页内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文