火花，标量& jdbc-如何限制记录数 [英] spark, scala & jdbc - how to limit number of records

查看：174 发布时间：2019/9/2 14:32:31 sql scala apache-spark jdbc

本文介绍了火花，标量& jdbc-如何限制记录数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有一种方法可以限制使用spark sql 2.2.0从jdbc源获取的记录数?

Is there a way to limit the number of records fetched from the jdbc source using spark sql 2.2.0?

我正在处理一项任务，将200M以上的大量记录从一个MS Sql Server表移动(并转换)到另一个:

I am dealing with a task of moving (and transforming) a large number of records >200M from one MS Sql Server table to another:

val spark = SparkSession
    .builder()
    .appName("co.smith.copydata")
    .getOrCreate()

val sourceData = spark
    .read
    .format("jdbc")
    .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
    .option("url", jdbcSqlConnStr)
    .option("dbtable", sourceTableName)
    .load()
    .take(limit)

虽然有效，但显然首先要从数据库中加载所有200M条记录，首先要花18分钟的时间，然后将我希望用于测试和开发目的的有限数量的记录返回给我.

While it works, it is clearly first loading all the 200M records from the database first taking its sweet 18 min and then returns me the limited number of records I desire for testing and development purposes.

切换take(...)和load()会产生编译错误.

Switching around take(...) and load() produces compilation error.

我很高兴有一些方法可以将样本数据复制到较小的表中，使用SSIS或其他etl工具.

I appreciate there are ways to copy sample data to a smaller table, use SSIS, or alternative etl tools.

我真的很好奇，是否有一种方法可以使用spark，sql和jdbc来实现我的目标.

I am really curious whether there is a way to achieve my goal using spark, sql and jdbc.

火花，标量& jdbc-如何限制记录数 [英] spark, scala & jdbc - how to limit number of records

问题描述

推荐答案

相关文章

Java相关最新文章

热门教程

热门工具

登录关闭

火花，标量&amp; jdbc-如何限制记录数 [英] spark, scala &amp; jdbc - how to limit number of records

问题描述

推荐答案

相关文章

Java相关最新文章

热门教程

热门工具

登录 关闭

火花，标量& jdbc-如何限制记录数 [英] spark, scala & jdbc - how to limit number of records

登录关闭