BeautifulSoup HTML获取src链接 [英] BeautifulSoup HTML getting src link

查看：839 发布时间：2020/9/20 6:17:27 python html python-3.x beautifulsoup html-parsing

本文介绍了BeautifulSoup HTML获取src链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用python 3.5.1和request模块制作一个小型网络爬虫，该模块可从特定网站下载所有漫画.我正在尝试一页.我使用BeautifulSoup4解析页面，如下所示:

I'm making a small web crawler using python 3.5.1 and requests module, which downloads all comics from a specific website.I'm experimenting with one page. I parse the page using BeautifulSoup4 like below:

import webbrowser
import sys
import requests
import re
import bs4

res = requests.get('http://mangapark.me/manga/berserk/s5/c342')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')

for link in soup.find_all("a", class_ = "img-link"):
    if(link):
        print(link)
    else:
        print('ERROR')

当我执行print(link)时，我感兴趣的是正确的HTML部分，但是当我尝试使用link.get('src')仅获取 src 中的链接时，它仅显示None.

When I do print(link) there are the correct HTML parts I'm interested in, but when I try to get only the link in src using link.get('src') it only prints None.

我尝试使用以下方式获取链接:

I tried getting the link using:

img = soup.find("img")["src"]

可以，但是我想要所有的src链接，而不是第一个链接. 我对beautifulSoup经验很少.请指出发生了什么事.谢谢.

and it was OK, but I want to have all the src links, not the first link. I have little experience with beautifulSoup. Please point out what's going on. Thank you.

我感兴趣的网站的示例HTML部分为:

The sample HTML part from the website I'm interested in is:

<a class="img-link" href="#img2">
    <img id="img-1" class="img"
          rel="1" i="1" e="0" z="1" 
          title="Berserk ch.342 page 1" src="http://2.p.mpcdn.net/352582/687224/1.jpg"
          width="960" _width="818" _heighth="1189"/>        
</a>

推荐答案

我会使用

在这里，我们得到的所有具有src属性的img元素都位于具有img-link类的a元素下面.它打印:

Here, we are getting all of the img elements having an src attribute located under an a element with a img-link class. It prints:

http://2.p.mpcdn.net/352582/687224/1.jpg
http://2.p.mpcdn.net/352582/687224/2.jpg
http://2.p.mpcdn.net/352582/687224/3.jpg
http://2.p.mpcdn.net/352582/687224/4.jpg
...
http://2.p.mpcdn.net/352582/687224/20.jpg

如果您仍然想使用find_all()，则必须将其嵌套:

If you still want to use the find_all(), you would have to nest it:

for link in soup.find_all("a", class_ = "img-link"):
    for img in link.find_all("a", src=True):  # searching for img with src attribute
        print(img["src"])

这篇关于BeautifulSoup HTML获取src链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BeautifulSoup HTML获取src链接 [英] BeautifulSoup HTML getting src link

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

BeautifulSoup HTML获取src链接 [英] BeautifulSoup HTML getting src link

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭