使用BeautifulSoup保存网页内容 [英] Saving content of a webpage using BeautifulSoup

查看：180 发布时间：2020/9/20 8:20:21 python python-3.x web-scraping beautifulsoup

本文介绍了使用BeautifulSoup保存网页内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正尝试使用下面的代码使用BeautifulSoup抓取网页:

I'm trying to scrape a webpage using BeautifulSoup using the code below:

import urllib.request
from bs4 import BeautifulSoup

with urllib.request.urlopen("http://en.wikipedia.org//wiki//Markov_chain.htm") as url:
    s = url.read()

soup = BeautifulSoup(s)

with open("scraped.txt", "w", encoding="utf-8") as f:
    f.write(soup.get_text())
    f.close()

问题在于它保存了维基百科的主页，而不是该特定文章.为什么地址不起作用，该如何更改?

The problem is that it saves the Wikipedia's main page instead of that specific article. Why the address doesn't work and how should I change it?

推荐答案

该页面的正确网址为 http ://en.wikipedia.org/wiki/Markov_chain :

>>> import urllib.request
>>> from bs4 import BeautifulSoup
>>> url = "http://en.wikipedia.org/wiki/Markov_chain"
>>> soup = BeautifulSoup(urllib.request.urlopen(url))
>>> soup.title
<title>Markov chain - Wikipedia, the free encyclopedia</title>

这篇关于使用BeautifulSoup保存网页内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用BeautifulSoup保存网页内容 [英] Saving content of a webpage using BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用BeautifulSoup保存网页内容 [英] Saving content of a webpage using BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭