将数据帧保存到本地文件系统会导致结果为空 [英] Saving dataframe to local file system results in empty results

查看：34 发布时间：2021/11/12 5:47:03 apache-spark amazon-emr

本文介绍了将数据帧保存到本地文件系统会导致结果为空的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们在 AWS EMR 上运行 spark 2.3.0.以下 DataFrame "df" 非空且大小适中:

We are running spark 2.3.0 on AWS EMR. The following DataFrame "df" is non empty and of modest size:

scala> df.count
res0: Long = 4067

以下代码适用于将 df 写入 hdfs:

The following code works fine for writing df to hdfs:

   scala> val hdf = spark.read.parquet("/tmp/topVendors")
hdf: org.apache.spark.sql.DataFrame = [displayName: string, cnt: bigint]

scala> hdf.count
res4: Long = 4067

但是，使用相同的代码写入本地 parquet 或 csv 文件最终会得到空结果:

However using the same code to write to a local parquet or csv file end up with empty results:

df.repartition(1).write.mode("overwrite").parquet("file:///tmp/topVendors")

scala> val locdf = spark.read.parquet("file:///tmp/topVendors")
org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:207)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:207)
  at scala.Option.getOrElse(Option.scala:121)

我们可以看到它失败的原因:

We can see why it fails:

 ls -l /tmp/topVendors
total 0
-rw-r--r-- 1 hadoop hadoop 0 Jul 30 22:38 _SUCCESS

所以没有正在写入镶木地板文件.

So there is no parquet file being written.

我已经尝试了大约二十次，对于 csv 和 parquet 以及两个不同的 EMR 服务器:同样的行为表现在所有情况.

I have tried this maybe twenty times and for both csv and parquet and on two different EMR Servers: this same behavior is exhibited in all cases.

这是 EMR 特定的错误吗?更一般的 EC2 错误?还有什么?此代码适用于 macOS 上的 spark.

Is this an EMR specific bug? A more general EC2 bug? Something else? This code works on spark on macOS.

以防万一 - 这是版本信息:

In case it matters - here is the versioning info:

Release label:emr-5.13.0
Hadoop distribution:Amazon 2.8.3
Applications:Spark 2.3.0, Hive 2.3.2, Zeppelin 0.7.3

将数据帧保存到本地文件系统会导致结果为空 [英] Saving dataframe to local file system results in empty results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将数据帧保存到本地文件系统会导致结果为空 [英] Saving dataframe to local file system results in empty results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭