Python中的图像抓取程序无法正常运行 [英] Image scraping program in Python not functioning as intended

查看:63
本文介绍了Python中的图像抓取程序无法正常运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码只返回一个空字符串,我也不知道为什么.

My code only returns an empty string, and I have no idea why.

import urllib2

def getImage(url):
    page = urllib2.urlopen(url)
    page = page.read() #Gives HTML to parse

    start = page.find('<a img=')
    end = page.find('>', start)

    img = page[start:end]

return img

它只会返回找到的第一个图像,因此它不是一个很好的图像抓取工具;也就是说,我目前的主要目标就是能够找到图像.我无法.

It would only return the first image it finds, so it's not a very good image scraper; that said, my primary goal right now is just to be able to find an image. I'm unable to.

推荐答案

您应该为此使用一个库,那里有几个库,但是通过更改显示给我们的代码来回答您的问题...

You should use a library for this and there are several out there, but to answer your question by changing the code you showed us...

您的问题是您试图查找图像,但是图像不使用<a ...>标记.他们使用<img ...>标记.这是一个示例:

Your problem is that you are trying to find images, but images don't use the <a ...> tag. They use the <img ...> tag. Here is an example:

<img src="smiley.gif" alt="Smiley face" height="42" width="42">

您应该做的是像这样将start = page.find('<a img=')行更改为start = page.find('<img '):

What you should do is change your start = page.find('<a img=') line to start = page.find('<img ') like so:

def getImage(url):
    page = urllib2.urlopen(url)
    page = page.read() #Gives HTML to parse

    start = page.find('<img ')
    end = page.find('>', start)

    img = page[start:end+1]
    return img

这篇关于Python中的图像抓取程序无法正常运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆