如何通过Python访问Hive? [英] How to Access Hive via Python?

查看:620
本文介绍了如何通过Python访问Hive?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

https://cwiki.apache.org/confluence/display/ Hive / HiveClient#HiveClient-Python 似乎已经过时。



当我将它添加到/ etc / profile时:

  export PYTHONPATH = $ PYTHONPATH:/ usr / lib / hive / lib / py 

然后,我可以执行链接中列出的导入,除了来自hive import ThriftHive 的之外,实际上它们必须是:

  from hive_service import ThriftHive 

接下来,该示例中的端口为10000,当我尝试导致程序挂起时。默认的Hive Thrift端口是9083,它停止了挂起。



所以我这样设置:

<$从thrift导入Thrift
从thrift.transport导入TSocket
从thrift.protocol导入TTransport
导入TBinaryProtocol
尝试:
transport = TSocket.TSocket('< node-with-metastore>',9083)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = ThriftHive.Client(protocol)
transport.open()
client.execute(CREATE TABLE test(c1 int))

transport.close()
除了Thrift.TException,tx:
print'%s'%(tx.message)

我收到以下错误:

  Traceback(最近一次调用最后一次):
File< stdin>,第1行,在< module>
文件/usr/lib/hive/lib/py/hive_service/ThriftHive.py,第68行,执行
self.recv_execute()
文件/ usr / lib / hive /lib/py/hive_service/ThriftHive.py,第84行,在recv_execute
中提升x
thrift.Thrift.TApplicationException:无效的方法名称:'execute'

但是检查ThriftHive.py文件显示该方法在Client类中执行。



如何使用Python访问Hive?

解决方案

我相信最简单的方法就是使用PyHive。
$ b

安装你需要这些库:

  pip install sasl 
pip安装thrift
pip install thrift-sasl
pip install PyHive

请注意,尽管您将库安装为 PyHive ,但您可以将模块导入为 pyhive ,全部小写。



如果您使用的是Linux,则可能需要在runni之前单独安装SASL以上。安装软件包libsasl2-dev使用apt-get或yum或任何你的发行包管理器。对于Windows,在GNU.org上有一些选项,你可以下载一个二进制安装程序。在Mac上SASL应该可用,如果你已经安装了xcode开发者工具(终端中的 xcode-select --install

安装后,你可以像这样连接到Hive:

  from pyhive import hive 
conn = hive。 Connection(host =YOUR_HIVE_HOST,port = PORT,username =YOU)

现在你有蜂巢连接,你可以选择如何使用它。您可以直接查询:

  cursor = conn.cursor()
cursor.execute(SELECT cool_stuff ():
use_result(result)

...或使用连接创建一个熊猫数据框:

 将pandas导入为pd 
df = pd.read_sql(SELECT cool_stuff FROM hive_table,conn)


https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python appears to be outdated.

When I add this to /etc/profile:

export PYTHONPATH=$PYTHONPATH:/usr/lib/hive/lib/py

I can then do the imports as listed in the link, with the exception of from hive import ThriftHive which actually need to be:

from hive_service import ThriftHive

Next the port in the example was 10000, which when I tried caused the program to hang. The default Hive Thrift port is 9083, which stopped the hanging.

So I set it up like so:

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
try:
    transport = TSocket.TSocket('<node-with-metastore>', 9083)
    transport = TTransport.TBufferedTransport(transport)
    protocol = TBinaryProtocol.TBinaryProtocol(transport)
    client = ThriftHive.Client(protocol)
    transport.open()
    client.execute("CREATE TABLE test(c1 int)")

    transport.close()
except Thrift.TException, tx:
    print '%s' % (tx.message)

I received the following error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 68, in execute
self.recv_execute()
File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 84, in recv_execute
raise x
thrift.Thrift.TApplicationException: Invalid method name: 'execute'

But inspecting the ThriftHive.py file reveals the method execute within the Client class.

How may I use Python to access Hive?

解决方案

I believe the easiest way is to use PyHive.

To install you'll need these libraries:

pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

Please note that although you install the library as PyHive, you import the module as pyhive, all lower-case.

If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager for your distribution. For Windows there are some options on GNU.org, you can download a binary installer. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install in Terminal)

After installation, you can connect to Hive like this:

from pyhive import hive
conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")

Now that you have the hive connection, you have options how to use it. You can just straight-up query:

cursor = conn.cursor()
cursor.execute("SELECT cool_stuff FROM hive_table")
for result in cursor.fetchall():
  use_result(result)

...or to use the connection to make a Pandas dataframe:

import pandas as pd
df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)

这篇关于如何通过Python访问Hive?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆