将数据帧保存到本地文件系统会导致结果为空 [英] Saving dataframe to local file system results in empty results

查看：91 发布时间：2020/8/23 2:05:54 apache-spark amazon-emr

本文介绍了将数据帧保存到本地文件系统会导致结果为空的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们正在AWS EMR上运行spark 2.3.0.以下DataFrame"df"不为空且大小适中:

We are running spark 2.3.0 on AWS EMR. The following DataFrame "df" is non empty and of modest size:

scala> df.count
res0: Long = 4067

以下代码可以很好地将df写入hdfs:

The following code works fine for writing df to hdfs:

   scala> val hdf = spark.read.parquet("/tmp/topVendors")
hdf: org.apache.spark.sql.DataFrame = [displayName: string, cnt: bigint]

scala> hdf.count
res4: Long = 4067

但是，使用相同的代码写入本地parquet或csv文件最终会得到空的结果:

However using the same code to write to a local parquet or csv file end up with empty results:

df.repartition(1).write.mode("overwrite").parquet("file:///tmp/topVendors")

scala> val locdf = spark.read.parquet("file:///tmp/topVendors")
org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:207)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:207)
  at scala.Option.getOrElse(Option.scala:121)

我们可以看到失败的原因:

We can see why it fails:

 ls -l /tmp/topVendors
total 0
-rw-r--r-- 1 hadoop hadoop 0 Jul 30 22:38 _SUCCESS

因此没有没有实木复合地板文件正在写入.

So there is no parquet file being written.

对于csv和parquet以及在两个不同的EMR服务器上，我已经尝试了20次了:在所有情况下都表现出相同的行为.

I have tried this maybe twenty times and for both csv and parquet and on two different EMR Servers: this same behavior is exhibited in all cases.

这是特定于EMR的错误吗?更一般的EC2错误?还有别的吗这段代码适用于macOS上的spark.

Is this an EMR specific bug? A more general EC2 bug? Something else? This code works on spark on macOS.

以防万一-这是版本信息:

In case it matters - here is the versioning info:

Release label:emr-5.13.0
Hadoop distribution:Amazon 2.8.3
Applications:Spark 2.3.0, Hive 2.3.2, Zeppelin 0.7.3

将数据帧保存到本地文件系统会导致结果为空 [英] Saving dataframe to local file system results in empty results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将数据帧保存到本地文件系统会导致结果为空 [英] Saving dataframe to local file system results in empty results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭