Spark:以UTF-8编码导入文本文件 [英] Spark: importing text file in UTF-8 encoding

查看：93 发布时间：2021/4/8 20:04:51 scala apache-spark

本文介绍了Spark:以UTF-8编码导入文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试处理一个包含很多特殊字符的文件，例如德语变音符(ä，ü，o)等:

I am trying to process a file which contains a lot of special characters such as German umlauts(ä,ü,o) etc. as follows :

sc.hadoopConfiguration.set("textinputformat.record.delimiter"，"\ r \ n \ r \ n") sc.textFile("/file/path/samele_file.txt)

但是在阅读内容时，这些特殊字符不会被识别.

But upon reading the contents, these special characters are not recognized.

我认为默认编码不是UTF-8或类似格式.我想知道是否有办法在此textFile方法上设置编码，例如:

I think the default encoding is not in UTF-8 or similar formats. I would like to know if there is a way to set encoding on this textFile method such as:

sc.textFile("/file/path/samele_file.txt",mode="utf-8")`

Spark:以UTF-8编码导入文本文件 [英] Spark: importing text file in UTF-8 encoding

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark:以UTF-8编码导入文本文件 [英] Spark: importing text file in UTF-8 encoding

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭