我如何解析在图像目录中的每个HTML文件? [英] How do I parse every html file in a directory for images?
问题描述
我有一个目录充满HTML文件,其每一个中有一个银屑病患者的临床图像。我想打开每个文件,发现图像,并将其保存在同一目录下。
进口操作系统,os.path中
进口图片
从BeautifulSoup进口BeautifulSoup作为BSPATH =C:\\用户\\ gokalraina \\桌面\\真皮的图像根显示目录,路径文件:
在文件F:
汤= BS(F)
在soup.findAll(IMG)的图像:
打印图片:%(SRC)的形象%
IM = Image.open(图)
im.save(路径+图像[SRC],JPEG)
我得到这个错误:
回溯(最后最近一次调用):
文件C:\\用户\\ gokalraina \\桌面\\ modfile.py,7号线,上述<&模块GT;
根显示目录,路径文件:
ValueError错误:需要超过1的数值来解压
即使谷歌搜索的错误后,我不知道什么是错的,或者如果我正确地这样做。请记住,我是新来的蟒蛇。
编辑:使该方案建议的更改后,我仍然得到一个错误:
回溯(最后最近一次调用):
文件C:\\用户\\ gokalraina \\桌面\\ modfile.py25行,上述<&模块GT;
IM = Image.open(图)
文件C:\\ Python27 \\ lib目录\\站点包\\ PIL \\ Image.py,1956年线,在开
preFIX = fp.read(16)
类型错误:'NoneType'对象不是可调用
这是修订后的code(感谢nightcracker)
进口操作系统,os.path中
进口图片
从BeautifulSoup进口BeautifulSoup作为BS PATH =C:\\用户\\ gokalraina \\桌面\\真皮的图像 在os.walk根,迪尔斯,文件(路径):
在文件F:
汤= BS(开放(os.path.join(根,F))。阅读())
在soup.findAll(IMG)的图像:
打印图片:%(SRC)的形象%
IM = Image.open(图)
im.save(路径+图像[SRC],JPEG)
您需要更改此行:
根,显示目录,路径文件:
到
在os.walk根,迪尔斯,文件(路径):
另外请注意,文件
是文件的名称,不是对象,因此这将成为您的固定code:
进口操作系统,os.path中
进口图片
从BeautifulSoup进口BeautifulSoup作为BSPATH =C:\\用户\\ gokalraina \\桌面\\真皮的图像在os.walk根,迪尔斯,文件(路径):
在文件F:
汤= BS(开放(os.path.join(根,F))。阅读())
在soup.findAll(IMG)的图像:
打印图片:%(SRC)的形象%
IM = Image.open(图)
im.save(路径+图像[SRC],JPEG)
I have a directory full of html files, each of which has a clinical image of a psoriasis patient in it. I want to open each file, find the image, and save it in the same directory.
import os, os.path
import Image
from BeautifulSoup import BeautifulSoup as bs
path = 'C:\Users\gokalraina\Desktop\derm images'
for root, dirs, files in path:
for f in files:
soup = bs(f)
for image in soup.findAll("img"):
print "Image: %(src)s" % image
im = Image.open(image)
im.save(path+image["src"], "JPEG")
I get this error:
Traceback (most recent call last):
File "C:\Users\gokalraina\Desktop\modfile.py", line 7, in <module>
for root, dirs, files in path:
ValueError: need more than 1 value to unpack
Even after googling the error, I have no clue what is wrong or if I am doing this correctly. Please keep in mind that I am new to python.
EDIT: After making the suggested changes to the program, I am still getting an error:
Traceback (most recent call last):
File "C:\Users\gokalraina\Desktop\modfile.py", line 25, in <module>
im = Image.open(image)
File "C:\Python27\lib\site-packages\PIL\Image.py", line 1956, in open
prefix = fp.read(16)
TypeError: 'NoneType' object is not callable
This is the revised code (thanks to nightcracker)
import os, os.path
import Image
from BeautifulSoup import BeautifulSoup as bs
path = 'C:\Users\gokalraina\Desktop\derm images'
for root, dirs, files in os.walk(path):
for f in files:
soup = bs(open(os.path.join(root, f)).read())
for image in soup.findAll("img"):
print "Image: %(src)s" % image
im = Image.open(image)
im.save(path+image["src"], "JPEG")
You need to change this line:
for root, dirs, files in path:
to
for root, dirs, files in os.walk(path):
Also note that files
are file names, not objects, so this would be your fixed code:
import os, os.path
import Image
from BeautifulSoup import BeautifulSoup as bs
path = 'C:\Users\gokalraina\Desktop\derm images'
for root, dirs, files in os.walk(path):
for f in files:
soup = bs(open(os.path.join(root, f)).read())
for image in soup.findAll("img"):
print "Image: %(src)s" % image
im = Image.open(image)
im.save(path+image["src"], "JPEG")
这篇关于我如何解析在图像目录中的每个HTML文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!