交叉联接以在Spark SQL中进行计算 [英] Cross Join for calculation in Spark SQL

查看:115
本文介绍了交叉联接以在Spark SQL中进行计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个仅包含1个记录/值的临时视图,并且我想使用该值来计算另一个大表(具有100 M行)中存在的客户的年龄.我使用了CROSS JOIN子句,这会导致性能问题.

I have a temporary view with only 1 record/value and I want to use that value to calculate the age of the customers present in another big table (with 100M rows). I used a CROSS JOIN clause, which is resulting in a performance issue.

是否有更好的方法来实现此要求,并且效果会更好? broadcast提示是否适合这种情况?建议如何解决这种情况?

Is there a better approach to implement this requirement which is will perform better ? Will a broadcast hint be suitable in this scenario ? What is the recommended approach to tackle such scenarios ?

参考表:(仅包含1值)

create temporary view ref
as
select to_date(refdt, 'dd-MM-yyyy') as refdt --returns only 1 value
from tableA
where logtype = 'A';

客户表(1000万行):

Cust table (10 M rows):

custid | birthdt
A1234  | 20-03-1980
B3456  | 09-05-1985
C2356  | 15-12-1990

查询(计算年龄birthdt):

select 
a.custid, 
a.birthdt, 
cast((datediff(b.ref_dt, a.birthdt)/365.25) as int) as age
from cust a
cross join ref b;

我的问题是-是否有更好的方法来实现此要求?

My question is - Is there a better approach to implement this requirement ?

谢谢

推荐答案

很难弄清楚你的意思,但是如果不能将Scalapysparkdataframes.cache等一起使用,那么我认为而不是使用temporary view,而只需创建一个单行table.我的印象是,您正在Databricks等笔记本上使用Spark%sql.

Hard to work out exactly your point, but if you cannot use Scala or pyspark and dataframes with .cache etc. then I think that instead of of using a temporary view, just create a single row table. My impression is you are using Spark %sql in a notebook on, say, Databricks.

这是我的怀疑.

broadcastjoin提示很可能意味着优化器只发出1行.参见 https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-hint-framework.html#specifying-query-hints

That said a broadcastjoin hint may well mean the optimizer only sends out 1 row. See https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-hint-framework.html#specifying-query-hints

这篇关于交叉联接以在Spark SQL中进行计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆