如何通过Python访问Hive？ [英] How to Access Hive via Python?

查看：620 发布时间：2018/5/31 18:26:11 python hadoop hive

本文介绍了如何通过Python访问Hive？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

https://cwiki.apache.org/confluence/display/ Hive / HiveClient＃HiveClient-Python 似乎已经过时。

当我将它添加到/ etc / profile时：

  export PYTHONPATH = $ PYTHONPATH：/ usr / lib / hive / lib / py

然后，我可以执行链接中列出的导入，除了来自hive import ThriftHive 的之外，实际上它们必须是：

  from hive_service import ThriftHive

接下来，该示例中的端口为10000，当我尝试导致程序挂起时。默认的Hive Thrift端口是9083，它停止了挂起。

所以我这样设置： <$从thrift导入Thrift 从thrift.transport导入TSocket 从thrift.protocol导入TTransport 导入TBinaryProtocol 尝试： transport = TSocket.TSocket（'< node-with-metastore>'，9083） transport = TTransport.TBufferedTransport（transport） protocol = TBinaryProtocol.TBinaryProtocol（transport） client = ThriftHive.Client（protocol） transport.open（） client.execute（CREATE TABLE test（c1 int）） transport.close（）除了Thrift.TException，tx： print'％s'％（tx.message）

我收到以下错误：

  Traceback（最近一次调用最后一次）：
 File< stdin>，第1行，在< module> 
文件/usr/lib/hive/lib/py/hive_service/ThriftHive.py，第68行，执行
 self.recv_execute（）
文件/ usr / lib / hive /lib/py/hive_service/ThriftHive.py，第84行，在recv_execute 
中提升x 
 thrift.Thrift.TApplicationException：无效的方法名称：'execute'

但是检查ThriftHive.py文件显示该方法在Client类中执行。

如何使用Python访问Hive？
解决方案
我相信最简单的方法就是使用PyHive。
$ b
安装你需要这些库：

pip install sasl pip安装thrift pip install thrift-sasl pip install PyHive
请注意，尽管您将库安装为 PyHive ，但您可以将模块导入为 pyhive ，全部小写。

如果您使用的是Linux，则可能需要在runni之前单独安装SASL以上。安装软件包libsasl2-dev使用apt-get或yum或任何你的发行包管理器。对于Windows，在GNU.org上有一些选项，你可以下载一个二进制安装程序。在Mac上SASL应该可用，如果你已经安装了xcode开发者工具（终端中的 xcode-select --install ）

安装后，你可以像这样连接到Hive：
from pyhive import hive conn = hive。 Connection（host =YOUR_HIVE_HOST，port = PORT，username =YOU）
现在你有蜂巢连接，你可以选择如何使用它。您可以直接查询：

cursor = conn.cursor（） cursor.execute（SELECT cool_stuff （）： use_result（result）
...或使用连接创建一个熊猫数据框：
将pandas导入为pd df = pd.read_sql（SELECT cool_stuff FROM hive_table，conn）

https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python appears to be outdated.

When I add this to /etc/profile:
export PYTHONPATH=$PYTHONPATH:/usr/lib/hive/lib/py
I can then do the imports as listed in the link, with the exception of from hive import ThriftHive which actually need to be:
from hive_service import ThriftHive
Next the port in the example was 10000, which when I tried caused the program to hang. The default Hive Thrift port is 9083, which stopped the hanging.

So I set it up like so:
from thrift import Thrift from thrift.transport import TSocket from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol try: transport = TSocket.TSocket('<node-with-metastore>', 9083) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = ThriftHive.Client(protocol) transport.open() client.execute("CREATE TABLE test(c1 int)") transport.close() except Thrift.TException, tx: print '%s' % (tx.message)
I received the following error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 68, in execute self.recv_execute() File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 84, in recv_execute raise x thrift.Thrift.TApplicationException: Invalid method name: 'execute'
But inspecting the ThriftHive.py file reveals the method execute within the Client class.

How may I use Python to access Hive?
解决方案
I believe the easiest way is to use PyHive.

To install you'll need these libraries:
pip install sasl pip install thrift pip install thrift-sasl pip install PyHive
Please note that although you install the library as PyHive, you import the module as pyhive, all lower-case.

If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager for your distribution. For Windows there are some options on GNU.org, you can download a binary installer. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install in Terminal)

After installation, you can connect to Hive like this:
from pyhive import hive conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")
Now that you have the hive connection, you have options how to use it. You can just straight-up query:
cursor = conn.cursor() cursor.execute("SELECT cool_stuff FROM hive_table") for result in cursor.fetchall(): use_result(result)
...or to use the connection to make a Pandas dataframe:
import pandas as pd df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)

这篇关于如何通过Python访问Hive？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何通过Python访问Hive？ [英] How to Access Hive via Python?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何通过Python访问Hive？ [英] How to Access Hive via Python?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭