Python中的图像抓取程序无法正常运行 [英] Image scraping program in Python not functioning as intended
问题描述
我的代码只返回一个空字符串,我也不知道为什么.
My code only returns an empty string, and I have no idea why.
import urllib2
def getImage(url):
page = urllib2.urlopen(url)
page = page.read() #Gives HTML to parse
start = page.find('<a img=')
end = page.find('>', start)
img = page[start:end]
return img
它只会返回找到的第一个图像,因此它不是一个很好的图像抓取工具;也就是说,我目前的主要目标就是能够找到图像.我无法.
It would only return the first image it finds, so it's not a very good image scraper; that said, my primary goal right now is just to be able to find an image. I'm unable to.
推荐答案
您应该为此使用一个库,那里有几个库,但是通过更改显示给我们的代码来回答您的问题...
You should use a library for this and there are several out there, but to answer your question by changing the code you showed us...
您的问题是您试图查找图像,但是图像不使用<a ...>
标记.他们使用<img ...>
标记.这是一个示例:
Your problem is that you are trying to find images, but images don't use the <a ...>
tag. They use the <img ...>
tag. Here is an example:
<img src="smiley.gif" alt="Smiley face" height="42" width="42">
您应该做的是像这样将start = page.find('<a img=')
行更改为start = page.find('<img ')
:
What you should do is change your start = page.find('<a img=')
line to start = page.find('<img ')
like so:
def getImage(url):
page = urllib2.urlopen(url)
page = page.read() #Gives HTML to parse
start = page.find('<img ')
end = page.find('>', start)
img = page[start:end+1]
return img
这篇关于Python中的图像抓取程序无法正常运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!