如果文件已存在于 pyspark 中，如何覆盖 rdd saveAsPickleFile(path)? [英] How to overwrite the rdd saveAsPickleFile(path) if file already exist in pyspark?

查看：47 发布时间：2021/11/14 22:34:29 apache-spark pyspark rdd pyspark-sql

本文介绍了如果文件已存在于 pyspark 中，如何覆盖 rdd saveAsPickleFile(path)?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何覆盖当我们节省时间时，RDD 输出对象任何现有路径.

How to overwrite RDD output objects any existing path when we are saving time.

测试 1:

975078|56691|2.000|20171001_926_570_1322
975078|42993|1.690|20171001_926_570_1322
975078|46462|2.000|20171001_926_570_1322
975078|87815|1.000|20171001_926_570_1322

rdd=sc.textFile('/home/administrator/work/test1').map( lambda x: x.split("|")[:4]).map( lambda r: Row( user_code = r[0],item_code = r[1],qty = float(r[2])))
rdd.coalesce(1).saveAsPickleFile("/home/administrator/work/foobar_seq1")

第一次正确保存.现在我再次从输入中删除了一行文件和RDD保存在同一位置，显示文件已经存在.

The first time it is saving properly. now again I removed one line from the input file and saving RDD same location, it show file has existed.

rdd.coalesce(1).saveAsPickleFile("/home/administrator/work/foobar_seq1")

例如，在数据框中，我们可以覆盖现有路径.

For example, in dataframe we can overwrite existing path.

df.coalesce(1).write().overwrite().save(path)

如果我在 RDD 对象上做同样的事情会出错.

If I am doing same on RDD object getting an error.

rdd.coalesce(1).write().overwrite().saveAsPickleFile(path)

请帮我解决这个问题

推荐答案

您可以像下面这样保存 RDD 文件注意(代码在scala中，但python的逻辑也应该相同)我使用的是2.3.0 spark版本.

Hi you can save RDD files like below Note (code is in scala but logic should be same for python as well) i am using 2.3.0 spark version.

  val sconf = new SparkConf().set("spark.hadoop.validateOutputSpecs", "False").setMaster("local[*]").setAppName("test")
  val scontext = new SparkContext(sconf)
  val lines = scontext.textFile("s${filePath}", 1)
    println(lines.first)
    lines.saveAsTextFile("C:\\Users\\...\\Desktop\\sample2")

或者如果你使用 DataFrame 然后使用

or if ur working with DataFrame then use

DF.write.mode(SaveMode.Overwrite).parquet(path.parquet)

或了解更多信息，请查看这个

or for more info please look at this

这篇关于如果文件已存在于 pyspark 中，如何覆盖 rdd saveAsPickleFile(path)?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如果文件已存在于 pyspark 中，如何覆盖 rdd saveAsPickleFile(path)? [英] How to overwrite the rdd saveAsPickleFile(path) if file already exist in pyspark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如果文件已存在于 pyspark 中，如何覆盖 rdd saveAsPickleFile(path)? [英] How to overwrite the rdd saveAsPickleFile(path) if file already exist in pyspark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭