Apache Tika Server-请求标头参数? [英] Apache Tika Server - Request Header Parameters?

查看:98
本文介绍了Apache Tika Server-请求标头参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Apache Tika Server提供了Rest API来从文档中提取文本.还可以设置特定的请求标头参数,例如 X-Tika-PDFOcrStrategy .例如:

The Apache Tika Server provides a Rest API to extract text from a document. It is also possible to set specific request header parameters like X-Tika-PDFOcrStrategy. e.g:

$ curl -T test/Dokument01.pdf http://localhost:9998/tika --header "X-Tika-PDFOcrStrategy: ocr_only"

从关于tika的许多不同文档中,我发现了这些文档化的附加标头参数:

From a lot of different documents about tika I found these documented additional header parameters:

X-Tika-OCRLanguage: eng
X-Tika-PDFextractInlineImages: true | false
X-Tika-PDFOcrStrategy: ocr_only  |  ocr_and_text_extraction
X-Tika-OCRoutputType: hocr

但是似乎没有关于如何使用 X-Tika -.....?标头参数或不支持哪些参数的文档.

But there seems to be no documentation about how to use the X-Tika-.....? header parameters or which parameters are supported and which not.

例如,我想知道是否可以用诸如此类的内容覆盖ImageType模式或DPI:

For example I wonder if it is possible to overwrite the ImageType mode or the DPI with something like:

X-Tika-PDFocrImageType: rgb
X-Tika-PDFocrDPI: 100

我的问题是:这些参数支持哪些标头参数以及遵循哪些命名约定?

My question is: Which header parameters are supported and which naming convention did these params follow?

推荐答案

处理 X-Tika-OCR X-Tika-PDF 标头的代码是然后将这些标头后缀和值映射到 PDFParserConfig通过反射配置对象.

Those header suffixes and values are then mapped onto the TesseractOCRConfig and PDFParserConfig configuration objects via reflection.

因此,要查看可以设置的 X-Tika 标头,请在要调整内容的config类上查找选项(

So, to see what X-Tika headers you can set, look up the options on the config class you want to tweak things on (Tesseract or PDF), then build the name, then set the header. If you are not sure what the option does, or what values it takes, look at the JavaDocs for the underlying setter method that will get called.

例如,例如PDF上的setExtractInlineImages ,它映射到 X-Tika-PDFextractInlineImages

For eg setExtractInlineImages on PDF, that maps to X-Tika-PDFextractInlineImages

这篇关于Apache Tika Server-请求标头参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆