火花访问前 n 行 - 采取与限制 [英] spark access first n rows - take vs limit

查看：27 发布时间：2021/11/14 21:51:10 apache-spark apache-spark-sql limit

本文介绍了火花访问前 n 行 - 采取与限制的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想访问 spark 数据框的前 100 行并将结果写回 CSV 文件.

I want to access the first 100 rows of a spark data frame and write the result back to a CSV file.

为什么 take(100) 基本上是即时的，而

Why is take(100) basically instant, whereas

df.limit(100)
      .repartition(1)
      .write
      .mode(SaveMode.Overwrite)
      .option("header", true)
      .option("delimiter", ";")
      .csv("myPath")

需要永远.我不想获取每个分区的前 100 条记录，而只想获取任何 100 条记录.

takes forever. I do not want to obtain the first 100 records per partition but just any 100 records.

为什么 take() 比 limit() 快这么多?

Why is take() so much faster than limit()?

火花访问前 n 行 - 采取与限制 [英] spark access first n rows - take vs limit

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

火花访问前 n 行 - 采取与限制 [英] spark access first n rows - take vs limit

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭