Spark最佳方法查找数据帧以提高性能 [英] Spark best approach Look-up Dataframe to improve performance

查看：20 发布时间：2021/12/31 18:02:16 scala apache-spark cassandra datastax-enterprise

本文介绍了Spark最佳方法查找数据帧以提高性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

Dataframe A(百万条记录)其中一列是create_date,modified_date

Dataframe A (millions of records) one of the column is create_date,modified_date

Dataframe B 500 记录有 start_date 和 end_date

Dataframe B 500 records has start_date and end_date

目前的方法:

从 start_date 和 end_date 之间的 a.create_date 上的连接 b 中选择 a.*,b.*

上述工作需要半小时或更长时间才能运行.

The above job takes half hour or more to run.

如何提高性能