如何使用hadoop office库在apache spark java中将Dataset写入excel文件 [英] How to write Dataset to a excel file using hadoop office library in apache spark java
问题描述
目前,我正在使用 com.crealytics.spark.excel 来阅读excel文件,但是使用这个库我不能将数据集写入Excel文件。
这个链接说,使用hadoop办公室库( org.zuinnote.spark.office.excel
),我们可以读写excel文件
Currently I am using com.crealytics.spark.excel to read excel file,but using this library I can't write the dataset to an excel file.
this link says that using hadoop office library (org.zuinnote.spark.office.excel
) we can read and write to the excel file
请帮助我将数据集对象写入spark java中的excel文件。
推荐答案
您可以使用 org.zuinnote.spark.office.excel
来使用Dataset读写excel文件。示例在 https://github.com/ZuInnoTe/spark-hadoopoffice-ds/ 。但是,如果您在数据集中读取Excel并尝试将其写入另一个Excel文件,则会出现一个问题。请参阅 https://github.com/ZuInnoTe/hadoopoffice/issues在Scala中的问题和解决方法/ 12 。
You can use org.zuinnote.spark.office.excel
for both reading and writing excel file using Dataset. Examples are given at https://github.com/ZuInnoTe/spark-hadoopoffice-ds/. However, there is one issue if you read the Excel in Dataset and try to write it in another Excel file. Please see the issue and workaround in scala at https://github.com/ZuInnoTe/hadoopoffice/issues/12.
我用Java编写了一个示例程序,使用 org.zuinnote.spark.office.excel
那个链接。请参阅这是否有助于您。
I have written a sample program in Java using org.zuinnote.spark.office.excel
and workaround given at that link. Please see if this helps you.
public class SparkExcel {
public static void main(String[] args) {
//spark session
SparkSession spark = SparkSession
.builder()
.appName("SparkExcel")
.master("local[*]")
.getOrCreate();
//Read
Dataset<Row> df = spark
.read()
.format("org.zuinnote.spark.office.excel")
.option("read.locale.bcp47", "de")
.load("c:\\temp\\test1.xlsx");
//Print
df.show();
df.printSchema();
//Flatmap function
FlatMapFunction<Row, String[]> flatMapFunc = new FlatMapFunction<Row, String[]>() {
@Override
public Iterator<String[]> call(Row row) throws Exception {
ArrayList<String[]> rowList = new ArrayList<String[]>();
List<Row> spreadSheetRows = row.getList(0);
for (Row srow : spreadSheetRows) {
ArrayList<String> arr = new ArrayList<String>();
arr.add(srow.getString(0));
arr.add(srow.getString(1));
arr.add(srow.getString(2));
arr.add(srow.getString(3));
arr.add(srow.getString(4));
rowList.add(arr.toArray(new String[] {}));
}
return rowList.iterator();
}
};
//Apply flatMap function
Dataset<String[]> df2 = df.flatMap(flatMapFunc, spark.implicits().newStringArrayEncoder());
//Write
df2.write()
.mode(SaveMode.Overwrite)
.format("org.zuinnote.spark.office.excel")
.option("write.locale.bcp47", "de")
.save("c:\\temp\\test2.xlsx");
}
}
我已经用Java测试了这个代码8和Spark 2.1.0。我正在使用maven,并从 org.zuinnote.spark.office.excel 的依赖关系。 zuinnote / spark-hadoopoffice-ds_2.11 / 1.0.3rel =nofollow noreferrer> https://mvnrepository.com/artifact/com.github.zuinnote/spark-hadoopoffice-ds_2.11/1.0.3
I have tested this code with Java 8 and Spark 2.1.0. I am using maven and added dependency for org.zuinnote.spark.office.excel
from https://mvnrepository.com/artifact/com.github.zuinnote/spark-hadoopoffice-ds_2.11/1.0.3
这篇关于如何使用hadoop office库在apache spark java中将Dataset写入excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!