跟随链接与BeautifulSoup4 [英] Follow links with BeautifulSoup4
本文介绍了跟随链接与BeautifulSoup4的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用Python从页面中提取链接:
I'm using Python to extract links from a page:
for link in soup.find_all('a', href=True):
if 'http' in link['href']:
links.append(link['href'])
我该如何构造一些东西来打开每个链接,并从链接页面上的"p"标签中提取文本?
How do I construct something that opens each link and extracts text from say "p" tags on the linked pages?
推荐答案
您可以使用请求
获取所收集链接的HTML,然后使用 BeautifulSoup
对其进行解析./p>
You can use requests
to get the HTML for collected links and then parse it with BeautifulSoup
.
import requests
from bs4 import BeautifulSoup
# get links
for link in soup.find_all('a', href=True):
if link['href'].startswith('http'):
links.append(link['href'])
# visit links and print paragraphs text
for link in links:
response = requests.get(link)
soup = BeautifulSoup(response.content, 'html.parser')
for p in soup.find_all('p'):
print p.text
或者在链接上没有两次迭代
Or without two iterations over links
import requests
from bs4 import BeautifulSoup
# get links
for link in soup.find_all('a', href=True):
if link['href'].startswith('http'):
response = requests.get(link['href'])
soup = BeautifulSoup(response.content, 'html.parser')
for p in soup.find_all('p'):
print p.text
这篇关于跟随链接与BeautifulSoup4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文