UserWarning:不建议使用pyarrow.open_stream,请使用pyarrow.ipc.open_stream警告 [英] UserWarning: pyarrow.open_stream is deprecated, please use pyarrow.ipc.open_stream warnings
问题描述
我正在通过pyspark
在本地运行spark 2.4.2
,用于NLP中的ML项目.管道中的部分预处理步骤涉及使用通过pyarrow
优化的pandas_udf
函数.每次我使用预处理的spark数据框进行操作时,都会出现以下警告:
I am running spark 2.4.2
locally through pyspark
for an ML project in NLP. Part of the pre-processing steps in the Pipeline involve the use of pandas_udf
functions optimized through pyarrow
. Each time I operate with the pre-processed spark dataframe the following warning appears:
UserWarning:不建议使用pyarrow.open_stream,请使用pyarrow.ipc.open_stream warnings.warn("pyarrow.open_stream已过时,请使用"
UserWarning: pyarrow.open_stream is deprecated, please use pyarrow.ipc.open_stream warnings.warn("pyarrow.open_stream is deprecated, please use "
我尝试更新pyarrow
,但未能避免警告.我的pyarrow版本是0.14.我想知道此警告的含义,是否有人找到了解决方案?提前非常感谢您.
I tried updating pyarrow
but didn't manage to avoid the warning. My pyarrow version is 0.14. I was wondering the implications of this warning and if somebody has found a solution for it? Thank you very much in advance.
火花会话详细信息:
conf = SparkConf(). \
setAppName('map'). \
setMaster('local[*]'). \
set('spark.yarn.appMasterEnv.PYSPARK_PYTHON', '~/anaconda3/bin/python'). \
set('spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON', '~/anaconda3/bin/python'). \
set('executor.memory', '8g'). \
set('spark.executor.memoryOverhead', '16g'). \
set('spark.sql.codegen', 'true'). \
set('spark.yarn.executor.memory', '16g'). \
set('yarn.scheduler.minimum-allocation-mb', '500m'). \
set('spark.dynamicAllocation.maxExecutors', '3'). \
set('spark.driver.maxResultSize', '0'). \
set("spark.sql.execution.arrow.enabled", "true"). \
set("spark.debug.maxToStringFields", '100')
spark = SparkSession.builder. \
appName("map"). \
config(conf=conf). \
getOrCreate()
推荐答案
此警告来自您的pyspark
版本,该版本使用了不推荐使用的pyarrow
函数.
This warning is coming from your version of pyspark
, which is using a deprecated function of pyarrow
.
但是一切正常,因此您可以暂时忽略该警告,或者更新pyspark版本(在最新版本中,它们已修复了不推荐使用的pyarrow函数的用法)
But everything works fine, so you can either simply ignore the warning for now, or update your pyspark version (in the latest version they have fixed the usage of the deprecated pyarrow function)
这篇关于UserWarning:不建议使用pyarrow.open_stream,请使用pyarrow.ipc.open_stream警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!