partitionColumn,lowerBound,upperBound,numPartitions参数是什么意思? [英] What is the meaning of partitionColumn, lowerBound, upperBound, numPartitions parameters?

查看:202
本文介绍了partitionColumn,lowerBound,upperBound,numPartitions参数是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当通过Spark中的JDBC连接从SQL Server获取数据时,我发现我可以设置一些并行化参数,例如partitionColumnlowerBoundupperBoundnumPartitions.我已经阅读过火花文档但无法理解.

While fetching data from SQL Server via a JDBC connection in Spark, I found that I can set some parallelization parameters like partitionColumn, lowerBound, upperBound, and numPartitions. I have gone through spark documentation but wasn't able to understand it.

任何人都可以向我解释这些参数的含义吗?

Can anyone explain me the meanings of these parameters?

推荐答案

这很简单:

  • partitionColumn是应用于确定分区的列.
  • lowerBoundupperBound确定要获取的值的范围.完整的数据集将使用与以下查询对应的行:

  • partitionColumn is a column which should be used to determine partitions.
  • lowerBound and upperBound determine range of values to be fetched. Complete dataset will use rows corresponding to the following query:

SELECT * FROM table WHERE partitionColumn BETWEEN lowerBound AND upperBound

  • numPartitions确定要创建的分区数. lowerBoundupperBound之间的范围分为numPartitions,每个步幅等于:

  • numPartitions determines number of partitions to be created. Range between lowerBound and upperBound is divided into numPartitions each with stride equal to:

    upperBound / numPartitions - lowerBound / numPartitions
    

    例如,如果:

    • lowerBound:0
    • upperBound:1000
    • numPartitions:10

    • lowerBound: 0
    • upperBound: 1000
    • numPartitions: 10

    步幅等于100,分区对应于以下查询:

    Stride is equal to 100 and partitions correspond to following queries:

    • SELECT * FROM table WHERE partitionColumn BETWEEN 0 AND 100
    • SELECT * FROM table WHERE partitionColumn BETWEEN 100 AND 200
    • ...
    • SELECT * FROM table WHERE partitionColumn BETWEEN 900 AND 1000
    • SELECT * FROM table WHERE partitionColumn BETWEEN 0 AND 100
    • SELECT * FROM table WHERE partitionColumn BETWEEN 100 AND 200
    • ...
    • SELECT * FROM table WHERE partitionColumn BETWEEN 900 AND 1000

    这篇关于partitionColumn,lowerBound,upperBound,numPartitions参数是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆