Spark JDBC伪列不起作用 [英] Spark JDBC pseudocolumn isn't working
问题描述
对于我的用例,我正在尝试使用spark JDBC读取一个大的oracle表.由于我的表中没有整数类型的列,因此我将rownum
用作paritionColumn
.
For my use case, I am trying to read one big oracle table using spark JDBC. Since, I do not have an integer type column in my table, I am using rownum
as paritionColumn
.
这是我的spark查询的样子:(为了测试,我使用的表只有22000行.)
Here is what my spark query looks like: (For testing I am using a table with only 22000 rows.)
val df = spark.read.jdbc(jdbcUrl = url, table = select * from table1,
columnName= "rownum", lowerBound = 0, upperBound = 22000,
numPartitions = 3, connectionProperties = oracleProperties)
理想情况下,它应该返回3个分区,每个分区有7000行.但是,当我对数据帧的每个分区进行计数时,我可以看到只有一个分区具有行,而其他分区为0.
Ideally, it should return me 3 partitions with almost 7000 rows in each. But when I ran the count on each partitions of dataframe I can see that only one partition has rows while others are 0.
df.rdd.mapPartitionsWithIndex{case(i, rows) => Iterator((i, rows.size))}.toDF().show()
输出:
+---+----+
| _1| _2 |
+---+----+
| 0 |7332|
| 1 | 0 |
| 2 | 0 |
+---+----+
能否请您说明为什么它仅在一个分区中返回行?
Can you please suggest why its only returning rows in one partition?
我的来源是一个Oracle数据库.使用Oracle JDBC驱动程序
oracle.jdbc.driver.OracleDriver
jar-> ojdbc7.jar
My source is a Oracle Database. Using oracle jdbc driver
oracle.jdbc.driver.OracleDriver
jar --> ojdbc7.jar