星火/斯卡拉扁平化和flatMap不工作在数据框 [英] Spark/Scala flatten and flatMap is not working on DataFrame

查看:435
本文介绍了星火/斯卡拉扁平化和flatMap不工作在数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有包含相同类型(相同的镶木架构)三DataFrames一个数据帧。他们只在它们被包含内容/值不同:

I have a DataFrame containing three DataFrames of the same type (same parquet schema). They only differ in the content/values they are containing:

嵌套结构

我想变平的结构中,使三DataFrames越来越合并成一个单一的平面数据帧包含所有内容/值

I want to flatten the structure, so that the three DataFrames are getting merged into one single Parquet DataFrame containing all of the content/values.

我和扁平化和flatMap试了一下,但我总是得到错误:

I tried it with flatten and flatMap, but with that I always get the error:

错误:从org.apache.spark.sql.DataFrame =&GT无隐观;穿越[U] .parquetsFiles.flatten
错误:没有足够的论据,扁平化的方法(如隐状育苗:org.apache.spark.sql.DataFrame => Traversable的[U],隐含L:scala.reflect.ClassTag [U]未指定值的参数asTrav,男。 parquetFiles.flatten

我也它转换成一个列表,然后试图趋于平坦,这也产生了同样的错误。
你有什么想法如何解决这个问题还是什么问题在这里?
谢谢,亚历克斯

I also converted it to a List and then tried to flatten and this is also producing the same error. Do you have any idea how to solve it or what is the problem here? Thanks, Alex

推荐答案

所以好像你想加入这三个 DataFrames 在一起,做到这一点的 unionAll 函数会工作得很好。你可以做 parquetFiles.reduce((X,Y)=> x.unionAll(Y))(注意:这会发生爆炸一个空的列表上,但如果你可能有只看褶皱之一,而不是减少)。

So it seems like you want to join these three DataFrames together, to do that the unionAll function would work really well. You could do parquetFiles.reduce((x, y) => x.unionAll(y)) (note this will explode on an empty list but if you might have that just look at one of the folds instead of reduce).

这篇关于星火/斯卡拉扁平化和flatMap不工作在数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆