partitionColumn、lowerBound、upperBound、numPartitions参数是什么意思? [英] What is the meaning of partitionColumn, lowerBound, upperBound, numPartitions parameters?

查看:38
本文介绍了partitionColumn、lowerBound、upperBound、numPartitions参数是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Spark 中通过 JDBC 连接从 SQL Server 获取数据时,我发现我可以设置一些并行化参数,例如 partitionColumnlowerBoundupperBoundnumPartitions.我已经阅读了 spark 文档 但无法理解.

While fetching data from SQL Server via a JDBC connection in Spark, I found that I can set some parallelization parameters like partitionColumn, lowerBound, upperBound, and numPartitions. I have gone through spark documentation but wasn't able to understand it.

谁能解释一下这些参数的含义?

Can anyone explain me the meanings of these parameters?

推荐答案

实际上上面的列表遗漏了一些东西,特别是第一个和最后一个查询.

Actually the list above misses a couple of things, specifically the first and the last query.

如果没有它们,您会丢失一些数据(lowerBound 之前的数据和 upperBound 之后的数据).从例子看不清楚,因为下界是0.

Without them you would loose some data (the data before the lowerBound and that after upperBound). From the example is not clear because the lower bound is 0.

完整列表应该是:

SELECT * FROM table WHERE partitionColumn < 100

SELECT * FROM table WHERE partitionColumn BETWEEN 0 AND 100  
SELECT * FROM table WHERE partitionColumn BETWEEN 100 AND 200  

...

SELECT * FROM table WHERE partitionColumn > 9000

这篇关于partitionColumn、lowerBound、upperBound、numPartitions参数是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆