在 Spark 中将多个小文件合并为几个大文件 [英] merge multiple small files in to few larger files in Spark

查看：117 发布时间：2021/11/14 21:42:54 scala hadoop apache-spark hive apache-spark-sql

本文介绍了在 Spark 中将多个小文件合并为几个大文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我通过 Spark 使用 hive.我的 spark 代码中有一个 Insert into partitioned table 查询.输入数据为 200+gb.当 Spark 写入分区表时，它会吐出非常小的文件(以 kb 为单位的文件).所以现在输出分区表文件夹有 5000 多个小 kb 文件.我想将这些合并到几个大 MB 文件中，可能只有几个 200mb 的文件.我厌倦了使用 hive 合并设置，但它们似乎不起作用.

I using hive through Spark. I have a Insert into partitioned table query in my spark code. The input data is in 200+gb. When Spark is writing to a partitioned table, it is spitting very small files(files in kb's). so now the output partitioned table folder have 5000+ small kb files. I want to merge these in to few large MB files, may be about few 200mb files. I tired using hive merge settings, but they don't seem to work.

'val result7A = hiveContext.sql("set hive.exec.dynamic.partition=true")

 val result7B = hiveContext.sql("set hive.exec.dynamic.partition.mode=nonstrict")

val result7C = hiveContext.sql("SET hive.merge.size.per.task=256000000")

val result7D = hiveContext.sql("SET hive.merge.mapfiles=true")

val result7E = hiveContext.sql("SET hive.merge.mapredfiles=true")

val result7F = hiveContext.sql("SET hive.merge.sparkfiles = true")

val result7G = hiveContext.sql("set hive.aux.jars.path=c:\\Applications\\json-serde-1.1.9.3-SNAPSHOT-jar-with-dependencies.jar")

val result8 = hiveContext.sql("INSERT INTO TABLE partition_table PARTITION (date) select a,b,c from partition_json_table")'

上述 hive 设置在 mapreduce hive 执行中工作，并输出指定大小的文件.是否有任何选项可以执行此 Spark 或 Scala?

The above hive settings work in a mapreduce hive execution and spits out files of specified size. Is there any option to do this Spark or Scala?

在 Spark 中将多个小文件合并为几个大文件 [英] merge multiple small files in to few larger files in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 Spark 中将多个小文件合并为几个大文件 [英] merge multiple small files in to few larger files in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭