UserWarning:不建议使用pyarrow.open_stream,请使用pyarrow.ipc.open_stream警告 [英] UserWarning: pyarrow.open_stream is deprecated, please use pyarrow.ipc.open_stream warnings

查看:681
本文介绍了UserWarning:不建议使用pyarrow.open_stream,请使用pyarrow.ipc.open_stream警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过pyspark在本地运行spark 2.4.2,用于NLP中的ML项目.管道中的部分预处理步骤涉及使用通过pyarrow优化的pandas_udf函数.每次我使用预处理的spark数据框进行操作时,都会出现以下警告:

I am running spark 2.4.2 locally through pyspark for an ML project in NLP. Part of the pre-processing steps in the Pipeline involve the use of pandas_udf functions optimized through pyarrow. Each time I operate with the pre-processed spark dataframe the following warning appears:

UserWarning:不建议使用pyarrow.open_stream,请使用pyarrow.ipc.open_stream warnings.warn("pyarrow.open_stream已过时,请使用"

UserWarning: pyarrow.open_stream is deprecated, please use pyarrow.ipc.open_stream warnings.warn("pyarrow.open_stream is deprecated, please use "

我尝试更新pyarrow,但未能避免警告.我的pyarrow版本是0.14.我想知道此警告的含义,是否有人找到了解决方案?提前非常感谢您.

I tried updating pyarrow but didn't manage to avoid the warning. My pyarrow version is 0.14. I was wondering the implications of this warning and if somebody has found a solution for it? Thank you very much in advance.

火花会话详细信息:

conf = SparkConf(). \
setAppName('map'). \
setMaster('local[*]'). \
set('spark.yarn.appMasterEnv.PYSPARK_PYTHON', '~/anaconda3/bin/python'). \
set('spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON', '~/anaconda3/bin/python'). \
set('executor.memory', '8g'). \
set('spark.executor.memoryOverhead', '16g'). \
set('spark.sql.codegen', 'true'). \
set('spark.yarn.executor.memory', '16g'). \
set('yarn.scheduler.minimum-allocation-mb', '500m'). \
set('spark.dynamicAllocation.maxExecutors', '3'). \
set('spark.driver.maxResultSize', '0'). \
set("spark.sql.execution.arrow.enabled", "true"). \
set("spark.debug.maxToStringFields", '100')

spark = SparkSession.builder. \
    appName("map"). \
    config(conf=conf). \
    getOrCreate()

推荐答案

此警告来自您的pyspark版本,该版本使用了不推荐使用的pyarrow函数.

This warning is coming from your version of pyspark, which is using a deprecated function of pyarrow.

但是一切正常,因此您可以暂时忽略该警告,或者更新pyspark版本(在最新版本中,它们已修复了不推荐使用的pyarrow函数的用法)

But everything works fine, so you can either simply ignore the warning for now, or update your pyspark version (in the latest version they have fixed the usage of the deprecated pyarrow function)

这篇关于UserWarning:不建议使用pyarrow.open_stream,请使用pyarrow.ipc.open_stream警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆