如何检测一个文档中的图像 [英] How to detect image in a document

查看：162 发布时间：2016/5/21 13:56:48 apache apache-tika

本文介绍了如何检测一个文档中的图像的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我如何检测文档中的图片说，DOC，XLS，PPT或PDF？

How can I detect images in a document say doc,xls,ppt or pdf ?

我与Apache提卡遇到了，我想它的命令行选项。
http://tika.apache.org/1.2/gettingstarted.html

I came across with Apache Tika, I am trying its command line option. http://tika.apache.org/1.2/gettingstarted.html

但不能肯定它会怎样检测图像。

But not quite sure how it will detect images.

任何帮助是AP preciated。

Any help is appreciated.

感谢

推荐答案

您说过要使用命令行的解决方案，并没有写任何Java code，所以它不会是prettiest办法做到这一点......如果你很高兴编写Java的一点点，并创建一个新的程序从Python中调用，那么你就可以做到这一点更漂亮！

You've said you want to use a command line solution, and not write any Java code, so it's not going to be the prettiest way to do it... If you are happy to write a little bit of Java, and create a new program to call from Python, then you can do it much nicer!

做的第一件事是让蒂卡App中的文件中提取出任何嵌入的资源。使用 - 此提取物选项，并提取发生在你应用控制一个特殊的临时目录，如：

The first thing to do is to have the Tika App extract out any embedded resources within your file. Use the --extract option for this, and have the extraction occur in a special temp directory you app controls, eg

$ java -jar tika.jar --extract ../testWORD_embedded_pdf.doc
Extracting 'image1.emf' (application/x-emf)
Extracting '_1402837031.pdf' (application/pdf)

如果你能

抓斗提取的输出，并解析寻找图像（但要注意，一些图像有一个应用程序/ preFIX他们canconical MIMETYPE！）。您可能需要运行一些第二--detect一步，我不知道，测试分析器是如何得到的提取。

Grab the output of the extraction if you can, and parse that looking for images (but be aware that some images have an application/ prefix on their canconical mimetype!). You might need to run a second --detect step on a few, I'm not sure, test how the parsers get on with the extraction.

现在，如果有图像，他们将在您的测试目录。只要你想处理它们。最后，ZAP公司的临时目录，当你的文件完成！

Now, if there were images, they'll be in your test dir. Process them as you want. Finally, zap the temp dir when you're done with the file!

这篇关于如何检测一个文档中的图像的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何检测一个文档中的图像 [英] How to detect image in a document

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

如何检测一个文档中的图像 [英] How to detect image in a document

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭