BeautifulSoup HTML获取src链接 [英] BeautifulSoup HTML getting src link
问题描述
我正在使用python 3.5.1和request模块制作一个小型网络爬虫,该模块可从特定网站下载所有漫画.我正在尝试一页.我使用BeautifulSoup4解析页面,如下所示:
I'm making a small web crawler using python 3.5.1 and requests module, which downloads all comics from a specific website.I'm experimenting with one page. I parse the page using BeautifulSoup4 like below:
import webbrowser
import sys
import requests
import re
import bs4
res = requests.get('http://mangapark.me/manga/berserk/s5/c342')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
for link in soup.find_all("a", class_ = "img-link"):
if(link):
print(link)
else:
print('ERROR')
当我执行print(link)
时,我感兴趣的是正确的HTML部分,但是当我尝试使用link.get('src')
仅获取 src 中的链接时,它仅显示None
.
When I do print(link)
there are the correct HTML parts I'm interested in, but when I try to get only the link in src using link.get('src')
it only prints None
.
我尝试使用以下方式获取链接:
I tried getting the link using:
img = soup.find("img")["src"]
可以,但是我想要所有的src链接,而不是第一个链接. 我对beautifulSoup经验很少.请指出发生了什么事.谢谢.
and it was OK, but I want to have all the src links, not the first link. I have little experience with beautifulSoup. Please point out what's going on. Thank you.
我感兴趣的网站的示例HTML部分为:
The sample HTML part from the website I'm interested in is:
<a class="img-link" href="#img2">
<img id="img-1" class="img"
rel="1" i="1" e="0" z="1"
title="Berserk ch.342 page 1" src="http://2.p.mpcdn.net/352582/687224/1.jpg"
width="960" _width="818" _heighth="1189"/>
</a>