如何使用Spark Java从MariaDB中读取数据 [英] How to read data from mariadb using Spark java

查看:384
本文介绍了如何使用Spark Java从MariaDB中读取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用Spark和Java从MariaDB中读取一个表.

I need to read a table from MariaDB by using Spark and Java.

我写了一个Java代码来从数据库中读取表数据.建立连接成功,但是在读取数据时会产生错误.我试图将表数据读取为数据框.但是列名称在结果中显示为列值.找到下面给出的代码:

I wrote a Java code for read table data from database.The connection is established successfully but it produces an error while reading the data. I am trying to read the table data as a dataframe. But the column name is shown as column value in result. find the code given below:

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import static org.apache.spark.sql.functions.col;

public class mariadb_to_csv {    
public static void main(String[] args) {

    Properties prop = new Properties();
    String resourceName = "config.properties";
        ClassLoader loader = Thread.currentThread().getContextClassLoader();
        try(InputStream resourceStream = loader.getResourceAsStream(resourceName)) {
            prop.load(resourceStream);
        } catch (IOException e) {
            e.printStackTrace();
        }
    SparkSession spark = SparkSession.builder()
            .appName("Java Spark SQL basic example")
            .config("spark.some.config.option", "some-value").getOrCreate();

    Dataset<Row> jdbcDF = spark.read().format("jdbc")
            .option("url","url_address")
            .option("driver", "org.mariadb.jdbc.Driver")
            .option("dbtable", "source_table")
            .option("user", "username")
            .option("password", "password")
            .load();
    jdbcDF.select(col("code"), col("name"), col("isActive"), col("createdByUser"), col("modifiedByUser")).show();       


     }
     }  

结果,列值在列名中重复.

In result, the column value is duplicated in column name.

这有什么问题?

推荐答案

"maridb"连接器似乎存在问题. 将主机网址从"jdbc: mariadb ://$ {Hostname}:$ {Port}/$ {Database}"更改为"jdbc: mysql ://$ {Hostname}:$ {Port}/$ {Database}"为我解决了这个问题.

Seems there is a problem with "maridb" connector. Changing the host url from "jdbc:mariadb://${Hostname}:${Port}/${Database}" to "jdbc:mysql://${Hostname}:${Port}/${Database}" solved the problem for me.

MariaDB和Databricks还使用"jdbc"作为连接URL来解释如何使用Spark从Mariadb读取数据.

MariaDB and Databricks also used "jdbc" as connection url to explain how to read data from Mariadb using Spark.

  1. https://mariadb.com /kb/zh-CN/library/mariadb-columnstore-with-spark/#usage

https://docs.databricks. com/spark/latest/data-sources/sql-databases.html

这篇关于如何使用Spark Java从MariaDB中读取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆