使用python提取关键字形式的图像 [英] extract keywords form images using python

查看:115
本文介绍了使用python提取关键字形式的图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

仍在学习python.我目前正在研究将从代码中提取元数据(用户自定义关键字)的python代码. 我已经尝试过Pillow AND exif,但这不包括用户制作的标签或关键字. 通过applist,我成功地提取了包括关键字在内的图元文件,但是当我尝试使用ElementTree提取图元文件以提取感兴趣的部分时,我仅获得空数据.

still learning python. I am currently working on a python code that will extracts metadata (usermade keywords) from images. I already tried Pillow AND exif but this excludes the user made tags or keywords. With applist, i successfully managed to extract the metafile including my keywords but when I tried to purse it with ElementTree to extract the parts of interest, I obtain only empty data.

我的xml文件如下所示(经过一些操作):

My xml file look like this (after some manipulation):

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:description>
            <rdf:Seq>
               <rdf:li xml:lang="x-default">South Carolina, Olivyana, Kumasi</rdf:li>
            </rdf:Seq>
         </dc:description>
         <dc:subject>
            <rdf:Bag>
               <rdf:li>Kumasi</rdf:li>
               <rdf:li>Summer 2016</rdf:li>
               <rdf:li>Charlestone</rdf:li>
               <rdf:li>SC</rdf:li>
               <rdf:li>Beach</rdf:li>
               <rdf:li>Olivjana</rdf:li>
            </rdf:Bag>
         </dc:subject>
         <dc:title>
            <rdf:Seq>
               <rdf:li xml:lang="x-default">P1050365</rdf:li>
            </rdf:Seq>
         </dc:title>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:aux="http://ns.adobe.com/exif/1.0/aux/">
         <aux:SerialNumber>F360908190331</aux:SerialNumber>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

我的代码如下:

import xml.etree.ElementTree as ET
from PIL import Image, ExifTags
with Image.open("myfile.jpg") as im:
    for segment, content in im.applist:
        marker, body = content.split(b'\x00', 1)
        if segment == 'APP1' and marker == b'http://ns.adobe.com/xap/1.0/':
            data = body.decode('"utf-8"')
print (data)

目前无法将其传递给解析器,因为有空行返回错误:

at this point it was't possible to pass this to the parser as there is an empty line returning an error:

tree = ET.parse(data)

ValueError: embedded null byte

因此,将其删除后,我将数据保存在xml文件(上面的xml数据)中,并传递给了解析器,但没有获得任何数据:

so after removing it i saved the data in a xml file (the xml data above) and passed to the parser but obtaining no data:

tree = ET.parse('mytags.xml')
tags = tree.findall('xmpmeta/RDF/Description/subject/Bags')
print (type(tags))
print (len(tags))

<class 'list'>
0

有趣的是,我使用了xml文件形式的标记(即'x:xmpmeta':),但收到以下错误消息:

Interestingly, it I used the tags in the form of the xml file (i.e. 'x:xmpmeta':), I receive the following error:

SyntaxError: prefix 'x' not found in prefix map

感谢您的帮助.

Fabio

推荐答案

仅在XML解析上无法解决PIL元数据的问题,这是您遇到的三个问题:

Focusing only on your XML parsing not PIL metadata work, three issues are your problem:

  1. 使用findall时,需要定义名称空间前缀,可以使用 namespaces arg进行定义.然后,您的xpath必须包含前缀.
  2. 使用findall时,请勿包括根,因为这是起点,但从子级开始向下.
  3. 没有 Bags 本地名称,带有复数形式,只有 Bag ,其长度为1.如果您想要它的子级,请更深一层.
  1. You need to define the namespace prefixes when using findall which you can do with the namespaces arg. And then your xpath must include the prefixes.
  2. When using findall do not include the root as that is the starting point but from its child downward.
  3. There is no Bags local name with plural but only Bag and its length would be one. If you want its children, go one level deeper.

考虑调整后的脚本:

import xml.etree.ElementTree as ET

tree = ET.parse('mytags.xml')

nmspdict = {'x':'adobe:ns:meta/',            
            'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
            'dc': 'http://purl.org/dc/elements/1.1/'}

tags = tree.findall('rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li',
                    namespaces = nmspdict)

print (type(tags))
print (len(tags))

# <class 'list'>
# 6

for i in tags:
    print(i.text)
# Kumasi
# Summer 2016
# Charlestone
# SC
# Beach
# Olivjana

这篇关于使用python提取关键字形式的图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆