Apache Spark 是否从目标数据库加载整个数据? [英] Does Apache Spark load entire data from target database?

查看:31
本文介绍了Apache Spark 是否从目标数据库加载整个数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 Apache Spark 并通过 JDBC 连接到 Vertica.

在 Vertica 数据库中,我有 1 亿条记录,Spark 代码在另一台服务器上运行.

当我在 Spark 中运行查询并监控网络使用情况时,两台服务器之间的流量非常高.

Spark 似乎从目标服务器加载了所有数据.

这是我的代码:

test_df = spark.read.format("jdbc").option("url", url).option("dbtable", "my_table").option("user", "user").option("password", "pass").load()test_df.createOrReplaceTempView('tb')data = spark.sql("select * from tb")数据显示()

当我运行此程序时,在 2 分钟且网络使用率非常高后,结果返回.

Spark 是否从目标数据库加载全部数据?

解决方案

在 Spark 作业完成后,使用 Spark 作业使用和运行的相同凭据登录 Vertica 数据库:

SELECT * FROM v_monitor.query_requests ORDER BY start_timetamp DESC LIMIT 10000;

这将向您显示 spark 作业发送到数据库的查询,让您查看它是否将计数 (*) 下推到数据库,或者它是否确实尝试通过网络检索整个表.

I want to use Apache Spark and connect to Vertica by JDBC.

In Vertica database, I have 100 million records and spark code runs on another server.

When I run the query in Spark and monitor network usage, traffic between two servers is very high.

It seems Spark loads all data from target server.

this is my code:

test_df = spark.read.format("jdbc")
    .option("url" , url).option("dbtable", "my_table")
    .option("user", "user").option("password" , "pass").load()

test_df.createOrReplaceTempView('tb')

data = spark.sql("select * from tb")

data.show()

when I run this, after 2 minutes and very high network usage, result returned.

Does Spark load the entire data from target database?

解决方案

After your spark jobs finishes logon to the Vertica database using the same credentials that the spark job used and run:

SELECT * FROM v_monitor.query_requests ORDER BY start_timetamp DESC LIMIT 10000;

This will show you the queries sent to the database by the spark job, allowing you to see if it pushed down the count(*) to the database or if it indeed tried to retrieve the entire table across the network.

这篇关于Apache Spark 是否从目标数据库加载整个数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆