找不到 Spark 和 Cassandra Java 应用程序异常提供程序 org.apache.hadoop.fs.s3.S3FileSystem [英] Spark and Cassandra Java Application Exception Provider org.apache.hadoop.fs.s3.S3FileSystem not found
问题描述
我想将 cassandra 表加载到 Spark 中的数据帧,我遵循了下面的示例程序(在这个 answer),但我得到了下面提到的执行,我尝试先将表加载到 RDD,然后将其转换为 Datafrme,加载 RDD 成功,但是当我尝试将其转换为数据帧时,我在第一种方法中遇到了相同的执行,有什么建议吗?我使用的是 Spark 2.0.0、Cassandra 3.7 和 Java 8.
I want to load cassandra table to a datafram in spark, I have followed the sample programes below (found in this answer), but I am getting an execption mentioned below, I have tried to load the table to RDD first then convert it to Datafrme, loading the RDD is successful, but when I try to convert it to a dataframe I am getting the same execption faced in the first methdology, any suggestions ? I am using Spark 2.0.0, Cassandra 3.7, and Java 8.
public class SparkCassandraDatasetApplication {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("SparkCassandraDatasetApplication")
.config("spark.sql.warehouse.dir", "/file:C:/temp")
.config("spark.cassandra.connection.host", "127.0.0.1")
.config("spark.cassandra.connection.port", "9042")
.master("local[2]")
.getOrCreate();
//Read data to dataframe
// this is throwing an exception
Dataset<Row> dataset = spark.read().format("org.apache.spark.sql.cassandra")
.options(new HashMap<String, String>() {
{
put("keyspace", "mykeyspace");
put("table", "mytable");
}
}).load();
//Print data
dataset.show();
spark.stop();
}
}
提交时,我收到此异常:
When submitted I am getting this exception:
Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3.S3FileSystem not found
at java.util.ServiceLoader.fail(ServiceLoader.java:239)
at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:372)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2623)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2634)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:115)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145)
使用 RDD 方法从 cassandra 读取是成功的(我已经使用 count() 调用对其进行了测试),但是将 RDD 转换为 DF 会引发与第一种方法中面临的相同异常.
Using the RDD method to read from cassandra is successful ( i have tested it with count() call), but converting the RDD to DF is throwing the same exception faced in the first method.
public class SparkCassandraRDDApplication {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("App")
.config("spark.sql.warehouse.dir", "/file:/opt/spark/temp")
.config("spark.cassandra.connection.host", "127.0.0.1")
.config("spark.cassandra.connection.port", "9042")
.master("local[2]")
.getOrCreate();
SparkContext sc = spark.sparkContext();
//Read
JavaRDD<UserData> resultsRDD = javaFunctions(sc).cassandraTable("mykeyspace", "mytable",CassandraJavaUtil.mapRowTo(UserData.class));
//This is again throwing an exception
Dataset<Row> usersDF = spark.createDataFrame(resultsRDD, UserData.class);
//Print
resultsRDD.foreach(data -> {
System.out.println(data.id);
System.out.println(data.username);
});
sc.stop();
}
}
推荐答案
请检查hadoop-common-2.2.0.jar"在类路径中是否可用.您可以通过创建一个包含所有依赖项的 jar 来测试您的应用程序.使用下面的 pom.xml,其中 maven-shade-plugin 用于包含所有依赖以创建 uber jar.
Please check if "hadoop-common-2.2.0.jar" is available in classpath. You can test your application by creating a jar including all the dependencies. Use below pom.xml in which maven-shade-plugin is used to include all the dependencies to create uber jar.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.abaghel.examples.spark</groupId>
<artifactId>spark-cassandra</artifactId>
<version>1.0.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.0-M3</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.abaghel.examples.spark.cassandra.SparkCassandraDatasetApplication</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
你可以像下面这样运行jar
You can run the jar like below
spark-submit --class com.abaghel.examples.spark.cassandra.SparkCassandraDatasetApplication spark-cassandra-1.0.0-SNAPSHOT.jar
这篇关于找不到 Spark 和 Cassandra Java 应用程序异常提供程序 org.apache.hadoop.fs.s3.S3FileSystem的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!