加入数据帧 spark java [英] join in a dataframe spark java

查看:30
本文介绍了加入数据帧 spark java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,感谢您花时间阅读我的问题.

First of all, thank you for the time in reading my question.

我的问题如下:在带有 Java 的 Spark 中,我在两个数据帧中加载了两个 csv 文件的数据.

My question is the following: In Spark with Java, i load in two dataframe the data of two csv files.

这些数据帧将包含以下信息.

These dataframes will have the following information.

数据框机场

Id | Name    | City
-----------------------
1  | Barajas | Madrid

数据框 airport_city_state

Dataframe airport_city_state

City | state
----------------
Madrid | España

我想加入这两个数据框,使其看起来像这样:

I want to join these two dataframes so that it looks like this:

数据框结果

Id | Name    | City   | state
--------------------------
1  | Barajas | Madrid | España

其中 dfairport.city = dfaiport_city_state.city

但我无法解释语法,所以我可以正确地进行连接.我如何创建变量的一些代码:

But I can not clarify with the syntax so I can do the join correctly. A little code of how I have created the variables:

 // Load the csv, you have to specify that you have header and what delimiter you have
Dataset <Row> dfairport = Load.Csv (sqlContext, data_airport);
Dataset <Row> dfairport_city_state = Load.Csv (sqlContext,   data_airport_city_state);


// Change the name of the columns in the csv dataframe to match the columns in the database
// Once they match the name we can insert them
Dfairport
.withColumnRenamed ("leg_key", "id")
.withColumnRenamed ("leg_name", "name")
.withColumnRenamed ("leg_city", "city")

dfairport_city_state
.withColumnRenamed("city", "ciudad")
.withColumnRenamed("state", "estado");

推荐答案

首先,非常感谢您的回复.

First, thank you very much for your response.

我已经尝试了我的两种解决方案,但都没有奏效,出现以下错误:ETL_Airport 类型的方法 dairport_city_state (String) 未定义

I have tried both of my solutions but none of them work, I get the following error: The method dfairport_city_state (String) is undefined for the type ETL_Airport

我无法访问数据框的特定列以进行连接.

I can not access a specific column of the dataframe for join.

已经加入了,我把解决方案放在这里以防其他人帮助;)

Already got to do the join, I put here the solution in case someone else helps;)

感谢您所做的一切,并致以最诚挚的问候

Thanks for everything and best regards

//Join de tablas en las que comparten ciudad
Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport.col("leg_city").equalTo(dfairport_city_state.col("city")));

这篇关于加入数据帧 spark java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆