python将文件发送到作为服务运行的tika [英] python send file to tika running as a service

查看:24
本文介绍了python将文件发送到作为服务运行的tika的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

参考这个问题 我想发送一个 MSWord (.doc) 文件到作为服务运行的 tika 应用程序,我该怎么做?

Reference to this question I would like to send a MS Word (.doc) file to a tika application running as a service, how can I do this?

运行 tika 有这个链接:http://mimi.kaktusteam.de/blog-posts/2013/02/running-apache-tika-in-server-mode/

There is this link for running tika: http://mimi.kaktusteam.de/blog-posts/2013/02/running-apache-tika-in-server-mode/

但是对于访问它的 python 代码,我不确定我是否可以使用套接字或 urllib 或者究竟是什么?

But for the python code to access it I am not sure if I can use sockets or urllib or what exactly?

推荐答案

对于远程访问 Tika,基本上有两种方法可用.一个是 Tika JAXRS 服务器,它提供了一个完整的 RESTful 接口.另一种是简单的 Tika-App --server 模式,它仅适用于网络管道级别.

For remote access to Tika, there are basically two methods available. One is the Tika JAXRS Server, which provides a full RESTful interface. The other is the simple Tika-App --server mode, which just works at a network pipe level.

对于生产用途,您可能希望使用 Tika JAXRS 服务器,因为它的功能更全面.为了简单的测试和入门,服务器模式下的Tika App应该没问题

For production use, you'll probably want to use the Tika JAXRS server, as it's more fully featured. For simple testing and getting started, the Tika App in Server mode ought to be fine

对于后者,只需连接到您正在运行 Tika-App 的端口,将您的文档数据流式传输到它,然后读取您的 html.例如,在一个终端运行

For the latter, just connect to the port that you're running the Tika-App on, stream it your document data, and read your html back. For example, in one terminal run

$ java -jar tika-app-1.3.jar --server --port 1234

然后,在另一个,做

$ nc 127.0.0.1 1234 < test.pdf

然后您将看到测试 PDF 返回的 html

You'll then see the html returned of your test PDF

从 python 中,您只需要一个简单的套接字调用,就像 netcat 所做的那样,发送二进制数据,然后读回您的结果.例如,尝试以下操作:

From python, you just want a simple socket call much as netcat there is doing, send over the binary data, then read back your result. For example, try something like:

#!/usr/bin/python
import socket, sys

# Where to connect
host = '127.0.0.1'
port = 1234

if len(sys.argv) < 2:
  print "Must give filename"
  sys.exit(1)

filename = sys.argv[1]
print "Sending %s to Tika on port %d" % (filename, port)

# Connect to Tika
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))

# Open the file to send
f = open(filename, 'rb')

# Stream the file to Tika
while True:
  chunk = f.read(65536)
  if not chunk:
    # EOF
    break
  s.sendall(chunk)

# Tell Tika we have sent everything
s.shutdown(socket.SHUT_WR)

# Get the response
while True:
  chunk = s.recv(65536)
  if not chunk:
    # EOF
    break
  print chunk

这篇关于python将文件发送到作为服务运行的tika的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆