我如何解析在图像目录中的每个HTML文件？ [英] How do I parse every html file in a directory for images?

查看：241 发布时间：2016/8/5 19:18:53 python image jpeg beautifulsoup

本文介绍了我如何解析在图像目录中的每个HTML文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个目录充满HTML文件，其每一个中有一个银屑病患者的临床图像。我想打开每个文件，发现图像，并将其保存在同一目录下。

 进口操作系统，os.path中
进口图片
从BeautifulSoup进口BeautifulSoup作为BSPATH =C：\\用户\\ gokalraina \\桌面\\真皮的图像根显示目录，路径文件：
    在文件F：
        汤= BS（F）
        在soup.findAll（IMG）的图像：
            打印图片：％（SRC）的形象％
            IM = Image.open（图）
            im.save（路径+图像[SRC]，JPEG）

我得到这个错误：

 回溯（最后最近一次调用）：
   文件C：\\用户\\ gokalraina \\桌面\\ modfile.py，7号线，上述＆lt;＆模块GT;
     根显示目录，路径文件：
 ValueError错误：需要超过1的数值来解压

即使谷歌搜索的错误后，我不知道什么是错的，或者如果我正确地这样做。请记住，我是新来的蟒蛇。

编辑：使该方案建议的更改后，我仍然得到一个错误：

 回溯（最后最近一次调用）：
  文件C：\\用户\\ gokalraina \\桌面\\ modfile.py25行，上述＆lt;＆模块GT;
    IM = Image.open（图）
  文件C：\\ Python27 \\ lib目录\\站点包\\ PIL \\ Image.py，1956年线，在开
    preFIX = fp.read（16）
 类型错误：'NoneType'对象不是可调用

这是修订后的code（感谢nightcracker）

 进口操作系统，os.path中
 进口图片
 从BeautifulSoup进口BeautifulSoup作为BS PATH =C：\\用户\\ gokalraina \\桌面\\真皮的图像 在os.walk根，迪尔斯，文件（路径）：
    在文件F：
       汤= BS（开放（os.path.join（根，F））。阅读（））
       在soup.findAll（IMG）的图像：
          打印图片：％（SRC）的形象％
          IM = Image.open（图）
          im.save（路径+图像[SRC]，JPEG）

解决方案

您需要更改此行：

 根，显示目录，路径文件：

到

 在os.walk根，迪尔斯，文件（路径）：

另外请注意，文件是文件的名称，不是对象，因此这将成为您的固定code：

 进口操作系统，os.path中
进口图片
从BeautifulSoup进口BeautifulSoup作为BSPATH =C：\\用户\\ gokalraina \\桌面\\真皮的图像在os.walk根，迪尔斯，文件（路径）：
    在文件F：
        汤= BS（开放（os.path.join（根，F））。阅读（））
        在soup.findAll（IMG）的图像：
            打印图片：％（SRC）的形象％
            IM = Image.open（图）
            im.save（路径+图像[SRC]，JPEG）

I have a directory full of html files, each of which has a clinical image of a psoriasis patient in it. I want to open each file, find the image, and save it in the same directory.

import os, os.path
import Image
from BeautifulSoup import BeautifulSoup as bs

path = 'C:\Users\gokalraina\Desktop\derm images'

for root, dirs, files in path:
    for f in files:
        soup = bs(f)
        for image in soup.findAll("img"):
            print "Image: %(src)s" % image
            im = Image.open(image)
            im.save(path+image["src"], "JPEG")

I get this error:

 Traceback (most recent call last):
   File "C:\Users\gokalraina\Desktop\modfile.py", line 7, in <module>
     for root, dirs, files in path:
 ValueError: need more than 1 value to unpack

Even after googling the error, I have no clue what is wrong or if I am doing this correctly. Please keep in mind that I am new to python.

EDIT: After making the suggested changes to the program, I am still getting an error:

  Traceback (most recent call last):
  File "C:\Users\gokalraina\Desktop\modfile.py", line 25, in <module>
    im = Image.open(image)
  File "C:\Python27\lib\site-packages\PIL\Image.py", line 1956, in open
    prefix = fp.read(16)
 TypeError: 'NoneType' object is not callable

This is the revised code (thanks to nightcracker)

 import os, os.path
 import Image
 from BeautifulSoup import BeautifulSoup as bs

 path = 'C:\Users\gokalraina\Desktop\derm images'

 for root, dirs, files in os.walk(path):
    for f in files:
       soup = bs(open(os.path.join(root, f)).read())
       for image in soup.findAll("img"):
          print "Image: %(src)s" % image
          im = Image.open(image)
          im.save(path+image["src"], "JPEG")

解决方案

You need to change this line:

for root, dirs, files in path:

for root, dirs, files in os.walk(path):

Also note that files are file names, not objects, so this would be your fixed code:

import os, os.path
import Image
from BeautifulSoup import BeautifulSoup as bs

path = 'C:\Users\gokalraina\Desktop\derm images'

for root, dirs, files in os.walk(path):
    for f in files:
        soup = bs(open(os.path.join(root, f)).read())
        for image in soup.findAll("img"):
            print "Image: %(src)s" % image
            im = Image.open(image)
            im.save(path+image["src"], "JPEG")

这篇关于我如何解析在图像目录中的每个HTML文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我如何解析在图像目录中的每个HTML文件？ [英] How do I parse every html file in a directory for images?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

我如何解析在图像目录中的每个HTML文件？ [英] How do I parse every html file in a directory for images?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭