如何在Scala中联接两个数据框并按其索引从数据框中选择几列? [英] How to join two dataframes in Scala and select on few columns from the dataframes by their index?

查看:60
本文介绍了如何在Scala中联接两个数据框并按其索引从数据框中选择几列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须加入两个数据框,这与此处给出的任务非常相似在Spark SQL中加入两个DataFrame,并仅选择一个列

I have to join two dataframes, which is very similar to the task given here Joining two DataFrames in Spark SQL and selecting columns of only one

但是,我只想选择 df2 的第二列。在我的任务中,我将在 reduce 函数中为两个数据框使用join函数,以获取数据框列表。在此数据帧列表中,列名将不同。但是,在每种情况下,我都想保留 df2 的第二列。

However, I want to select only the second column from df2. In my task, I am going to use the join function for two dataframes within a reduce function for a list of dataframes. In this list of dataframes, the column names will be different. However, in each case I would want to keep the second column of df2.

我在任何地方都找不到如何通过编号索引选择数据框的列。感谢您的帮助!

I did not find anywhere how to select a dataframe's column by their numbered index. Any help is appreciated!

编辑:

ANSWER

我想出了解决方案。这是执行此操作的一种方法:

I figured out the solution. Here is one way to do this:

def joinDFs(df1: DataFrame, df2: DataFrame): DataFrame = {
  val df2cols = df2.columns
  val desiredDf2Col = df2cols(1)  // the second column
  val df3 = df1.as("df1").join(df2.as("df2"), $"df1.time" === $"df2.time")
      .select($"df1.*",$"df2.$desiredDf2Col")
  df3
}

然后我可以在 reduce 中应用此功能

And then I can apply this function in a reduce operation on a list of dataframes.

var listOfDFs: List[DataFrame] = List()
// Populate listOfDFs as you want here
val joinedDF = listOfDFs.reduceLeft((x, y) => {joinDFs(x, y)})


推荐答案

要选择数据框中的第二列,您只需执行以下操作:

To select the second column in your dataframe you can simply do:

val df3 = df2.select(df2.columns(1))

这将首先找到第二列名称,然后选择它。

This will first find the second column name and then select it.

这篇关于如何在Scala中联接两个数据框并按其索引从数据框中选择几列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆