在 Spark SQL 中交叉连接计算 [英] Cross Join for calculation in Spark SQL

查看：30 发布时间：2021/11/14 22:14:49 apache-spark apache-spark-sql

本文介绍了在 Spark SQL 中交叉连接计算的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个只有 1 记录/值的临时视图，我想使用该值来计算另一个大表中存在的客户的年龄(100M 行).我使用了 CROSS JOIN 子句，这导致了性能问题.

I have a temporary view with only 1 record/value and I want to use that value to calculate the age of the customers present in another big table (with 100M rows). I used a CROSS JOIN clause, which is resulting in a performance issue.

有没有更好的方法来实现这个要求，它会表现得更好?broadcast 提示是否适用于这种情况?处理此类情况的推荐方法是什么?

Is there a better approach to implement this requirement which is will perform better ? Will a broadcast hint be suitable in this scenario ? What is the recommended approach to tackle such scenarios ?

参考表:(仅包含1值)

create temporary view ref
as
select to_date(refdt, 'dd-MM-yyyy') as refdt --returns only 1 value
from tableA
where logtype = 'A';

客户表(1000 万行):

Cust table (10 M rows):

custid | birthdt
A1234  | 20-03-1980
B3456  | 09-05-1985
C2356  | 15-12-1990

查询(计算年龄 w.r.t birthdt):

Query (calculate age w.r.t birthdt):

select 
a.custid, 
a.birthdt, 
cast((datediff(b.ref_dt, a.birthdt)/365.25) as int) as age
from cust a
cross join ref b;

我的问题是 - 有没有更好的方法来实现这个要求?

My question is - Is there a better approach to implement this requirement ?

谢谢

在 Spark SQL 中交叉连接计算 [英] Cross Join for calculation in Spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 Spark SQL 中交叉连接计算 [英] Cross Join for calculation in Spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭