将未知编码的TXT文件转换为字符串 [英] Convert TXT File of Unknown Encoding to String
问题描述
如果编码类型未知,我如何将纯文本(.txt)文件转换为字符串?
How can I convert Plain Text (.txt) files to a string if the encoding type is unknown?
我正在开发一个允许用户将txt文件导入我的应用程序。这意味着文件可能已经在任何数量的应用程序中创建,使用任何可被认为对纯文本文件有效的编码。我的理解是可以包括(ASCII,UTF-8,UTF-16,UTF-16BE,UTF-16LE,UTF-32,UTF-32BE,UTF-32LE或EBCDIC?!)
I'm working on a feature that would allow users to import txt files into my app. This means the file could have been created in any number of apps, utilizing any of a variety of encodings that would be considered valid for a plain text file. My understanding is this could include (ASCII, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, or EBCDIC?!)
使用以下内容,事情进展顺利:
Things had been going well using the following:
NSString *txtFileAsString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:&errorReading];
然后,用户提供导入时导致空的内容的文件。我在XCode调试中看过该文件,看到Cocoa错误261,NSStringEncoding = 4。
Then a user supplied a file that resulted in empty content when imported. I watched the file in XCode debug, and see a Cocoa error 261, NSStringEncoding=4.
我知道:
- 用户提供的文件是使用名为knowtes的应用程序创建的。
- 该文件在Mac OS X上使用TextEdit,TextWranger等打开
- 该文件包含特殊字符,如umlauts(rant:为什么umlaut上的u不具有umlaut?!)
- Finder信息显示:
- The user supplied file was created with an app called knowtes
- The file opens with TextEdit, TextWranger, etc. on Mac OS X
- The file contains "special characters" such as umlauts (rant: why doesn't the "u" on umlaut have an umlaut?!)
- Finder Info displays:
种类:文本
Kind: text
- 终端 - 我输出:
text / plain; charset = utf-16le
text/plain; charset=utf-16le
我猜想这个文件的utf-16le编码是关键,因为我期待着NSUTF8文件。
我尝试使用ASCII作为最低公分母。它没有崩溃,但是在一些没有出现在原始文件中的字符中变得柔和。
I am guessing that the utf-16le encoding of the file is the key, as I'm expecting a NSUTF8 file. I attempted to use ASCII as a lowest common denominator. It didn't crash, but fudged in some characters that weren't present in the original file.
NSString *txtFileAsString = [NSString stringWithContentsOfFile:path encoding:NSASCIIStringEncoding error:&errorReading];
所以我尝试首先将文件转换为NSData,希望它可能否定需要识别编码。它不起作用。
So I attempted to convert the file to NSData first, hoping it might negate the need to recognize the encoding. It did not work.
NSData *txtFileData = [NSData dataWithContentsOfFile:path];
NSString *txtFileAsString = [[NSString alloc]initWithData:txtFileData encoding:NSUTF8StringEncoding];
这带给我几个问题:
- 将纯文本文件内容(无论编码)转换为字符串(即最低公分母)都没有通用方式吗?我相信以前是
initWithContentsOfFile
的目的,不幸的是现在已经不推荐了。 ASCIStringEncoding不起作用。 - 有没有关于将NSUTF16编码的文件转换为一个字符串,我需要处理不同于NSUTF8?
假设文件实际上是URF16LE,为什么以下建议不工作?
- Is there not a universal way to convert Plain Text file contents, regardless of encoding, to a string (i.e. lowest common denominator)? I believe that used to be the purpose
initWithContentsOfFile
, which unfortunately is now deprecated. ASCIStringEncoding didn't work. - Is there anything about converting an NSUTF16 encoded file to a string that I would need to handle differently than if it were NSUTF8?
Assuming the file is in fact URF16LE, why does the following suggestion not work either?
NSString *txtFileAsString = nil;
if (path !=nil) {
NSData *txtFileData = [NSData dataWithContentsOfFile:path];
NSString *txtFileAsString = [[NSString alloc]initWithData:txtFileData encoding:NSASCIIStringEncoding];
if (!txtFileAsString) {
txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF8StringEncoding];
}
if (!txtFileAsString) {
txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16StringEncoding];
}
if (!txtFileAsString) {
txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16LittleEndianStringEncoding];
}
if (!txtFileAsString) {
txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16BigEndianStringEncoding];
}
if (!txtFileAsString) {
txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32StringEncoding];
}
if (!txtFileAsString) {
txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32LittleEndianStringEncoding];
}
if (!txtFileAsString) {
txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32BigEndianStringEncoding];
}}
推荐答案
有时 stringWithContentsOfFile:usedEncoding:error:
可以做这个工作(esp如果文件有一个字节顺序标记):
Sometimes stringWithContentsOfFile:usedEncoding:error:
can do the job (esp if the file has a Byte Order Mark):
NSError *error;
NSStringEncoding encoding;
NSString *string = [NSString stringWithContentsOfFile:path usedEncoding:&encoding error:&error];
请注意,使用 usedEncoding
与类似命名的方法混淆,只有一个编码
参数。
Note, this rendition with usedEncoding
should not be confused with the similarly named method that just has a encoding
parameter.
这篇关于将未知编码的TXT文件转换为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!