包含所有字符集以避免“java.nio.charset.MalformedInputException: Input length = 1"? [英] All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"?

查看:42
本文介绍了包含所有字符集以避免“java.nio.charset.MalformedInputException: Input length = 1"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用 Java 创建一个简单的 wordcount 程序,用于读取目录中基于文本的文件.

I'm creating a simple wordcount program in Java that reads through a directory's text-based files.

但是,我不断收到错误消息:

However, I keep on getting the error:

java.nio.charset.MalformedInputException: Input length = 1

从这行代码:

BufferedReader reader = Files.newBufferedReader(file,Charset.forName("UTF-8"));

我知道我可能会得到这个,因为我使用了一个 Charset,它不包含文本文件中的一些字符,其中一些包含其他语言的字符.但我想包括这些字符.

I know I probably get this because I used a Charset that didn't include some of the characters in the text files, some of which included characters of other languages. But I want to include those characters.

我后来在JavaDocs 认为 Charset 是可选的,仅用于更有效地读取文件,因此我将代码更改为:

I later learned at the JavaDocs that the Charset is optional and only used for a more efficient reading of the files, so I changed the code to:

BufferedReader reader = Files.newBufferedReader(file);

但有些文件仍然抛出MalformedInputException.我不知道为什么.

But some files still throw the MalformedInputException. I don't know why.

我想知道是否有一个包罗万象的 Charset 可以让我阅读具有许多不同类型字符的文本文件?

I was wondering if there is an all-inclusive Charset that will allow me to read text files with many different types of characters?

谢谢.

推荐答案

您可能想要一个支持的编码列表.对于每个文件,依次尝试每种编码,也许从 UTF-8 开始.每次捕获 MalformedInputException 时,尝试下一次编码.

You probably want to have a list of supported encodings. For each file, try each encoding in turn, maybe starting with UTF-8. Every time you catch the MalformedInputException, try the next encoding.

这篇关于包含所有字符集以避免“java.nio.charset.MalformedInputException: Input length = 1"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆