pandas UDF和pyarrow 0.15.0 [英] pandasUDF and pyarrow 0.15.0

查看：304 发布时间：2020/5/24 0:58:11 pandas apache-spark pyspark pyarrow

本文介绍了 pandas UDF和pyarrow 0.15.0的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近开始在EMR群集上运行的许多pyspark作业中遇到一堆错误.错误是

I have recently started getting a bunch of errors on a number of pyspark jobs running on EMR clusters. The erros are

java.lang.IllegalArgumentException
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
    at org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:543)
    at org.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:58)
    at org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:132)
    at org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:181)
    at org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:172)
    at org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:65)
    at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:162)
    at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:122)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at org.apache.spark.sql.execution.python.ArrowEvalPythonExec$$anon$2.<init>(ArrowEvalPythonExec.scala:98)
    at org.apache.spark.sql.execution.python.ArrowEvalPythonExec.evaluate(ArrowEvalPythonExec.scala:96)
    at org.apache.spark.sql.execution.python.EvalPythonExec$$anonfun$doExecute$1.apply(EvalPythonExec.scala:127)...

它们似乎都发生在熊猫系列的apply功能中.我发现的唯一变化是pyarrow已在星期六(05/10/2019)更新.测试似乎适用于0.14.1

They all seem to happen in apply functions of a pandas series. The only change I found is that pyarrow has been updated on Saturday (05/10/2019). Tests seem to work with 0.14.1

所以我的问题是，是否有人知道这是新更新的pyarrow中的错误，还是有一些重大更改会导致将来无法使用pandasUDF?

So my question is if anyone know if this is a bug in the new updated pyarrow or is there some significant change that will make pandasUDF hard to use in the future?

pandas UDF和pyarrow 0.15.0 [英] pandasUDF and pyarrow 0.15.0

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas UDF和pyarrow 0.15.0 [英] pandasUDF and pyarrow 0.15.0

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭