获取文件名作为UTF-8? (ä,ü,ö...总是'?') [英] Get filename as UTF-8? (ä,ü,ö ... is always '?')

查看:91
本文介绍了获取文件名作为UTF-8? (ä,ü,ö...总是'?')的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须读取一些文件的名称,并将它们作为字符串放在列表中。它不是那么难,我只是有一些问题,一些字符,如ä,ö,ü...他们总是作为一个'?'在我的字符串。



什么问题?那么编码。好吧,这应该很容易...这就是我的想法。所以我试图使用如下功能:



new String(insert.getBytes(UTF-8)

new String(insert.getBytes(ISO-8859-1),UTF-8)
因为最多的文件是ISO-8859-1



它没有帮助,这是我的代码:

  ... 
文件[] fileList = dir.listFiles();
String insert;
(File f:fileList){
.. 。
insert = f.getName()。substring(0,f.getName()。length() - 4);
insert = insert.charAt(0)+ insert.substring(1,insert .length())。toLowerCase()。replaceFirst(([0-9] *(_ s?(i)?(_ dat)?)* $),).replaceFirst(_,) ;
...
System.out.println(test UTF8:+ new String(insert.getBytes(UTF-8))); //不帮助
系统。 out.println(test ISO,UTF8:+ new String(insert.getBytes(ISO-8859-1),UTF-8)); //不帮助
...
names.add(insert);
}

最后有很多我的列表中带有?字符的字符串。
如何解决问题?如果不仅有ISO-8859-1文件,那么最好的方法是什么? (可以说有很多未知的编码文件)



谢谢!

解决方案

鉴于此问题中的扩展注释来回显示,现在看起来这是一个字体问题,或者是(或许更有可能)文件名编码问题。



我问Lissy运行以下命令让我们弄清楚问题是什么。如果她确定文件名中包含ä,但是当 ls 文件名时,该字符不会出现,那么这个命令会告诉我们这是否是一个字体或编码问题。

  touchfilenäme
ls filen * me
/ pre>

如果在 ls 的输出中显示filenäme,那么我们知道问题在于创建/将文件复制到此系统上。如果创建文件的程序没有意识到文件系统编码是什么或者太愚蠢的做正确的事情,可能会发生这种情况。 convmv 程序可能是解决此问题的最佳方法。

  convmv -f ENCODING -t utf8 -r。 

问题是正确的编码是什么。可能性包括UTF-16,cp850或者iso8859-1。 convmv --list 将显示当前已知的(对您的系统)编码的列表。由于上面列出的命令只显示您可能会做什么,可以安全地使用不同的编码运行多次,直到找到适用于所有文件的



如果这是一个字体问题,我们必须研究


I have to read the name of some files and put them in a list as a string. Its not so hard I just have some Problems with some characters like ä,ö,ü ... they are always as a '?' in my string.

Whats the Problem? Well the encoding. Ok this should be easy... thats what i thought. So I tried to use functions like:

new String(insert.getBytes("UTF-8") or new String(insert.getBytes("ISO-8859-1"), "UTF-8") because the most of the files are ISO-8859-1

Its not helping. This is my code:

...
File[] fileList = dir.listFiles();
String insert;
for(File f : fileList) {
...
insert=f.getName().substring(0,f.getName().length()-4);
                insert=insert.charAt(0)+insert.substring(1,insert.length()).toLowerCase().replaceFirst("([0-9]*(_s?(i)?(_dat)?)*$)", "").replaceFirst("_", " ");
...
System.out.println("test UTF8: " + new String(insert.getBytes("UTF-8"))); //not helping
System.out.println("test ISO , UTF8: " + new String(insert.getBytes("ISO-8859-1"), "UTF-8")); //not helping
...
names.add(insert);
}

At the end there are a lot of strings with '?' characters in my list. How to fix the problem? And whats the best way if there are not only ISO-8859-1 files? (lets say there are a lot of unknown encoded files)

Thank You!

解决方案

Given the extended comments back and forth under the question, it now looks like this is either a font problem or (perhaps more likely) a filename encoding problem.

I asked Lissy to run the following command to let us figure out what the problem is. If she is sure that the filename contain "ä" in them, but that character does not appear when she ls the filename, then this command will tell us whether this is a font or encoding problem.

touch filenäme
ls filen*me

If this shows "filenäme" in the output of ls then we know the problem is with the creation/copy of the files onto this system. This could happen if the program which created the files didn't realize what the filesystem encoding was or was too stupid to do the right thing. The convmv program will probably be the best way to fix this.

convmv -f ENCODING -t utf8 -r .

The question is what is the proper encoding. Possibilities include UTF-16, cp850, or perhaps iso8859-1. convmv --list will show you the list of currently known (to your system) encodings. Since the listed command above only shows you what it might do, it is safe to run several times with different encodings until you find one which works for all files.

If this is a font problem, we'll have to look into that

这篇关于获取文件名作为UTF-8? (ä,ü,ö...总是'?')的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆