如何在Windows上为python 3.7正确设置pyarrow [英] How to properly setup pyarrow for python 3.7 on Windows

查看:129
本文介绍了如何在Windows上为python 3.7正确设置pyarrow的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试通过pip( pip install pyarrow ,以及建议的Yagav: py -3.7 -m pip install --user pyarrow )进行pyarrow安装.conda( conda install -c conda-forge pyarrow ,也使用了 conda install pyarrow ),从src构建lib(使用conda环境和一些魔术,我不是真的)理解),但是在安装后(无错误),当我打电话时,它始终以一个相同的问题结束:

I've been trying pyarrow installation via pip (pip install pyarrow, and, as suggested Yagav: py -3.7 -m pip install --user pyarrow) and conda (conda install -c conda-forge pyarrow, also used conda install pyarrow) , building lib from src (using conda environment and some magic, which I don’t really understand), but all the time, after installation (with no errors) it ends with one and the same problem, when I call:

import pyarrow as pa
fs = pa.hdfs.connect(host='my_host', user='my_user@my_host', kerb_ticket='path_to_kerb_ticket')

它失败并显示下一条消息:

it fails with next message:


Traceback (most recent call last):
  File "", line 1, in 
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 209, in connect
    extra_conf=extra_conf)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__
    _maybe_set_hadoop_classpath()
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 135, in _maybe_set_hadoop_classpath
    classpath = _hadoop_classpath_glob(hadoop_bin)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 162, in _hadoop_classpath_glob
    return subprocess.check_output(hadoop_classpath_args)
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 395, in check_output
    **kwargs).stdout
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 1178, in _execute_child
    startupinfo)
OSError: [WinError 193] %1 is not a valid win32 application

起初我在想,Hadoop 2.5.6中的libhdfs.so存在问题,但是似乎我对此有误.我想,问题不在pyarrow或子进程中,而是一些系统变量或依赖项.

At first I was thinking, that there is a problem with libhdfs.so from Hadoop 2.5.6, but it seems that I was wrong about that. I guess, there is a problem not in the pyarrow or subprocess, but some system variables or dependencies.

我还手动将系统变量定义为 HADOOP_HOME JAVA_HOME KRB5CCNAME

Also I have manually defined system variables as HADOOP_HOME, JAVA_HOME and KRB5CCNAME

推荐答案

好,我是自己找到的.正如我一直在想的那样,问题在于系统环境变量,它需要具有 CLASSPATH 变量,该变量包含hadoop客户端的所有.jar文件的路径,您可以使用 hadoop来获取它们cmd中的classpath hadoop classpath --glob .

Ok, I found it by myself. As I've been thinking, the broblem was in system environvent variables, it needs to have CLASSPATH variable, which contains paths to all .jar files of hadoop client, you can get them using hadoop classpath or hadoop classpath --glob in cmd.

这篇关于如何在Windows上为python 3.7正确设置pyarrow的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆