dataframereader如何读取http? [英] How can dataframereader read http?

查看：84 发布时间：2020/9/4 8:30:37 scala apache-spark intellij-idea apache-spark-sql hdfs

本文介绍了dataframereader如何读取http?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的开发环境:

Intellij
Maven
Scala2.10.6
win7 x64

依赖项:

 <dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>2.2.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10 -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.10</artifactId>
        <version>2.2.0</version>
        <scope>provided</scope>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10 -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>2.2.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.10.6</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.scala-lang/scala-reflect -->
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-reflect</artifactId>
        <version>2.10.6</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.7.4</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.7.4</version>
    </dependency>
</dependencies>

问题:
我想将远程csv文件读入dataframe.
我接下来尝试了:

problem :
I want read remote csv file into dataframe.
I tried next:

val weburl = "http://myurl.com/file.csv"
val tfile = spark.read.option("header","true").option("inferSchema","true").csv(weburl)

它返回下一个错误:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: http

我尝试了以下互联网搜索(包括stackoverflow)

I tried next following internet searching(include stackoverflow)

val content = scala.io.Source.fromURL(weburl).mkString
val list = content.split("\n")
//...doing something to string and typecase, seperate each lows to make dataframe format.

它工作正常，但我认为加载Web源csv文件的方法更智能.
有什么方法可以让DataframeReader读取HTTP csv吗?

it works fine, but I think more smart way to loading web source csv file.
Is there any way to DataframeReader can read HTTP csv?

我认为设置SparkContext.hadoopConfiguration是一些关键，因此我尝试了Internet中的许多代码.但是它不起作用，我也不知道如何设置代码行以及代码行的每种含义.

I think setting SparkContext.hadoopConfiguration is some key, so I tried many codes in internet. but it didn't work and I don't know how to set and each meaning of code lines.

下一步是我的尝试之一，但没有用.(访问"http"时出现相同的错误消息)

Next is one of my trying and it didn't work.(same error message on accessing "http")

val sc = new SparkContext(spark_conf)
val spark = SparkSession.builder.appName("Test").getOrCreate()
val hconf = sc.hadoopConfiguration


hconf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
hconf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
hconf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)

设置这是关键吗?还是不?
还是DataframeReader无法直接从远程源读取?比我该怎么办?
我需要为http格式导入一些特殊的库吗?

Is setting this is key? or not?
Or DataframeReader can't read directly from remote source? than how can i do it?
I need import some special library for http format?

我想知道的事情:

有什么方法可以使dataframereader读取HTTP源吗?
无需使用自己的解析数据. (例如 ).

Is there any way to dataframereader can read HTTP source?
Without using their own parsing data. (like Best way to convert online csv to dataframe scala.)
I need to read CSV format. CSV is formal format. I think more general way to read data like dataframereader.csv("local file").

我知道这个问题水平太低了.很抱歉，我的理解水平不高.

I know this question level too low. I'm sorry for my low-level of understanding.

dataframereader如何读取http? [英] How can dataframereader read http?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

dataframereader如何读取http? [英] How can dataframereader read http?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭