Apache Spark是否从目标数据库加载整个数据? [英] Does Apache Spark load entire data from target database?

查看：86 发布时间：2019/9/2 13:01:05 apache-spark jdbc vertica pyspark-sql

本文介绍了Apache Spark是否从目标数据库加载整个数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用Apache Spark并通过JDBC连接到Vertica.

I want to use Apache Spark and connect to Vertica by JDBC.

在Vertica数据库中，我有1亿条记录，并且spark代码在另一台服务器上运行.

In Vertica database, I have 100 million records and spark code runs on another server.

当我在Spark中运行查询并监视网络使用情况时，两台服务器之间的流量非常高.

When I run the query in Spark and monitor network usage, traffic between two servers is very high.

Spark似乎从目标服务器加载了所有数据.

It seems Spark loads all data from target server.

这是我的代码:

test_df = spark.read.format("jdbc")
    .option("url" , url).option("dbtable", "my_table")
    .option("user", "user").option("password" , "pass").load()

test_df.createOrReplaceTempView('tb')

data = spark.sql("select * from tb")

data.show()

运行此命令时，经过2分钟且网络使用率很高，结果返回.

when I run this, after 2 minutes and very high network usage, result returned.

Spark是否从目标数据库加载整个数据?

Does Spark load the entire data from target database?

推荐答案

在Spark作业完成后，使用与Spark作业使用并运行的相同凭据登录到Vertica数据库:

After your spark jobs finishes logon to the Vertica database using the same credentials that the spark job used and run:

SELECT * FROM v_monitor.query_requests ORDER BY start_timetamp DESC LIMIT 10000;

这将向您显示spark作业发送到数据库的查询，使您可以查看它是否将count(*)推送到数据库，或者是否确实尝试通过网络检索整个表.

This will show you the queries sent to the database by the spark job, allowing you to see if it pushed down the count(*) to the database or if it indeed tried to retrieve the entire table across the network.

这篇关于Apache Spark是否从目标数据库加载整个数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Apache Spark是否从目标数据库加载整个数据? [英] Does Apache Spark load entire data from target database?

问题描述

推荐答案

相关文章

Java相关最新文章

热门教程

热门工具

登录关闭

Apache Spark是否从目标数据库加载整个数据? [英] Does Apache Spark load entire data from target database?

问题描述

推荐答案

相关文章

Java相关最新文章

热门教程

热门工具

登录 关闭

登录关闭