使用没有实体但是UTF-8的ImageMagick提取IPTC [英] Extract IPTC using ImageMagick without Entities but UTF-8

查看:176
本文介绍了使用没有实体但是UTF-8的ImageMagick提取IPTC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含ITPC数据的图像,并使用以下命令将IPTC提取为文本数据:

I have an image containing ITPC data and use the following command to extract the IPTC as textual data:

convert image.jpg IPTCTEXT:iptc.txt

问题是这似乎是使用特殊字符的实体:

The problem is that this seems to be using entities for "special characters":

2#120#Caption="Beschreibung für den Import aus IPTC"

实际上它应该是für。但不是获得正确的实体ü对于ü字符,我得到两个实体(可能两个字节的UTF-8编码字符都被转换为entites分隔)。这两个entites我无法正确解析。

Actually it should be "für" here. But instead of getting the correct entity ü for the "ü" character i get two entities (probably both bytes of the UTF-8 encoded character got transformed to entites separated). And these two entites i cannot parse correctly.

有没有办法获得正确的实体或禁用完全返回UTF-8字符的实体?

Is there any way to get the correct entity or disable the entities completely returning UTF-8 characters?

编辑:
我尝试使用Java中的StringEscapeUtils.unescapeXml解析实体但我得到两个字符(¼)而不是ü,因为两个实体都是非转义分开的。

I tried parsing the entities using StringEscapeUtils.unescapeXml in Java but i get two characters ("ü") instead of the "ü" as both entities are unescaped separated.

Edit2:
这里的示例图片: http://fs1.directupload.net/images/150615/5eiv6wwf.jpg

推荐答案

最可靠的元数据包是IMHO exiv2( http://exiv2.org/ ;适用于所有Linux发行版,Windows和不确定Mac二进制文件。)

The most reliable metadata package is IMHO exiv2 (http://exiv2.org/; available in all Linux distros, Windows, and not sure about Mac binaries).

请参阅 http ://paste.fedoraproject.org/232538/34459066/ 获得结果。 ImageMagick用于元数据目的并不是那么好,我担心。

See http://paste.fedoraproject.org/232538/34459066/ for results. ImageMagick is for metadata purposes not that great, I am afraid.

这篇关于使用没有实体但是UTF-8的ImageMagick提取IPTC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆