我如何解析在图像目录中的每个HTML文件? [英] How do I parse every html file in a directory for images?

查看:241
本文介绍了我如何解析在图像目录中的每个HTML文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个目录充满HTML文件,其每一个中有一个银屑病患者的临床图像。我想打开每个文件,发现图像,并将其保存在同一目录下。

 进口操作系统,os.path中
进口图片
从BeautifulSoup进口BeautifulSoup作为BSPATH =C:\\用户\\ gokalraina \\桌面\\真皮的图像根显示目录,路径文件:
    在文件F:
        汤= BS(F)
        在soup.findAll(IMG)的图像:
            打印图片:%(SRC)的形象%
            IM = Image.open(图)
            im.save(路径+图像[SRC],JPEG)

我得到这个错误:

 回溯(最后最近一次调用):
   文件C:\\用户\\ gokalraina \\桌面\\ modfile.py,7号线,上述<&模块GT;
     根显示目录,路径文件:
 ValueError错误:需要超过1的数值来解压

即使谷歌搜索的错误后,我不知道什么是错的,或者如果我正确地这样做。请记住,我是新来的蟒蛇。

编辑:使该方案建议的更改后,我仍然得到一个错误:

 回溯(最后最近一次调用):
  文件C:\\用户\\ gokalraina \\桌面\\ modfile.py25行,上述<&模块GT;
    IM = Image.open(图)
  文件C:\\ Python27 \\ lib目录\\站点包\\ PIL \\ Image.py,1956年线,在开
    preFIX = fp.read(16)
 类型错误:'NoneType'对象不是可调用

这是修订后的code(感谢nightcracker)

 进口操作系统,os.path中
 进口图片
 从BeautifulSoup进口BeautifulSoup作为BS PATH =C:\\用户\\ gokalraina \\桌面\\真皮的图像 在os.walk根,迪尔斯,文件(路径):
    在文件F:
       汤= BS(开放(os.path.join(根,F))。阅读())
       在soup.findAll(IMG)的图像:
          打印图片:%(SRC)的形象%
          IM = Image.open(图)
          im.save(路径+图像[SRC],JPEG)


解决方案

您需要更改此行:

 根,显示目录,路径文件:

 在os.walk根,迪尔斯,文件(路径):

另外请注意,文件是文件的名称,不是对象,因此这将成为您的固定code:

 进口操作系统,os.path中
进口图片
从BeautifulSoup进口BeautifulSoup作为BSPATH =C:\\用户\\ gokalraina \\桌面\\真皮的图像在os.walk根,迪尔斯,文件(路径):
    在文件F:
        汤= BS(开放(os.path.join(根,F))。阅读())
        在soup.findAll(IMG)的图像:
            打印图片:%(SRC)的形象%
            IM = Image.open(图)
            im.save(路径+图像[SRC],JPEG)

I have a directory full of html files, each of which has a clinical image of a psoriasis patient in it. I want to open each file, find the image, and save it in the same directory.

import os, os.path
import Image
from BeautifulSoup import BeautifulSoup as bs

path = 'C:\Users\gokalraina\Desktop\derm images'

for root, dirs, files in path:
    for f in files:
        soup = bs(f)
        for image in soup.findAll("img"):
            print "Image: %(src)s" % image
            im = Image.open(image)
            im.save(path+image["src"], "JPEG")

I get this error:

 Traceback (most recent call last):
   File "C:\Users\gokalraina\Desktop\modfile.py", line 7, in <module>
     for root, dirs, files in path:
 ValueError: need more than 1 value to unpack

Even after googling the error, I have no clue what is wrong or if I am doing this correctly. Please keep in mind that I am new to python.

EDIT: After making the suggested changes to the program, I am still getting an error:

  Traceback (most recent call last):
  File "C:\Users\gokalraina\Desktop\modfile.py", line 25, in <module>
    im = Image.open(image)
  File "C:\Python27\lib\site-packages\PIL\Image.py", line 1956, in open
    prefix = fp.read(16)
 TypeError: 'NoneType' object is not callable

This is the revised code (thanks to nightcracker)

 import os, os.path
 import Image
 from BeautifulSoup import BeautifulSoup as bs

 path = 'C:\Users\gokalraina\Desktop\derm images'

 for root, dirs, files in os.walk(path):
    for f in files:
       soup = bs(open(os.path.join(root, f)).read())
       for image in soup.findAll("img"):
          print "Image: %(src)s" % image
          im = Image.open(image)
          im.save(path+image["src"], "JPEG")

解决方案

You need to change this line:

for root, dirs, files in path:

to

for root, dirs, files in os.walk(path):

Also note that files are file names, not objects, so this would be your fixed code:

import os, os.path
import Image
from BeautifulSoup import BeautifulSoup as bs

path = 'C:\Users\gokalraina\Desktop\derm images'

for root, dirs, files in os.walk(path):
    for f in files:
        soup = bs(open(os.path.join(root, f)).read())
        for image in soup.findAll("img"):
            print "Image: %(src)s" % image
            im = Image.open(image)
            im.save(path+image["src"], "JPEG")

这篇关于我如何解析在图像目录中的每个HTML文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆