iOS utf-8编码问题 [英] iOS utf-8 encoding issue
问题描述
我尝试使用UTF-8字符集获取html页面
i try get html page with UTF-8 charset
NSString *html=[NSString stringWithContentsOfURL:[NSURL URLWithString: @"http://forums.drom.ru/general/t1151288178.html"] encoding:NSUTF8StringEncoding error:&error]);
但 NSLog(@%@,html)
return null
为什么会发生这种情况?
but NSLog(@"%@",html)
return null
Why is this happening?
推荐答案
问题是,虽然文件的元标记声称是UTF8,它不是(至少不完全)。您可以通过以下方式确认:
The problem is that while the file's meta tag purports to be UTF8, it's not (at least not entirely). You can confirm this by:
-
下载html(as
NSData
,成功):
NSError *error = nil;
NSURL *url = [NSURL URLWithString:@"http://forums.drom.ru/general/t1151288178.html"];
NSData *data = [NSData dataWithContentsOfURL:url options:0 error:&error];
NSString *docsPath = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES)[0];
NSString *filename = [docsPath stringByAppendingPathComponent:@"test.html"];
[data writeToFile:filename atomically:YES];
从终端运行 iconv
命令行将报告错误(包括行号和字符数):
Run iconv
from the Terminal command line, it will report an error (including line number and character number):
iconv -f UTF-8 test.html > /dev/null
感谢 Torsten Marek 与我们分享。
当我看看HTML的那部分时,绝对不会UTF8字符,埋在 clever_cut_pattern
JavaScript变量的设置中。
When I look at that portion of the HTML, there are definitely not UTF8 characters there, buried in the setting of the clever_cut_pattern
JavaScript variable.
如果我们以为你只是得到了编码在这些情况下,典型的律师通常是使用 usedEncoding
参数(即相反地)使用 stringWithContentOfURL
比起猜测编码是什么,让 NSString
为你确定这一点):
If we thought you just got the encoding wrong, the typical counsel in these cases would generally be to use the rendition of stringWithContentOfURL
with the usedEncoding
parameter (i.e. rather than guessing what the encoding is, let NSString
determine this for you):
NSStringEncoding encoding;
NSString *html = [NSString stringWithContentsOfURL:url usedEncoding:&encoding error:&error];
不幸的是,在这种情况下,即使失败(大概是因为该文件声称是UTF8,而是't)。
Unfortunately, in this case, even that fails (presumably because the file purports to be UTF8, but isn't).
然后问题变成好的,那我现在该怎么办。这取决于你为什么试图在你的应用程序中下载这个HTML。如果您真的需要将其转换为UTF8(即删除非UTF8字符),则理论上可以获得GNU iconv(3)
功能,这是 libiconv
图书馆。这可能会识别出您可能会删除的不合格字符。这是您愿意通过多少工作来处理这种不合格的网页的问题。
The question then becomes "ok, so what do I do now". It depends upon why you were trying to download that HTML in your app, anyway. If you really need to convert this to UTF8 (i.e. strip out the non-UTF8 characters), you could theoretically get the GNU iconv(3)
function, which is part of the libiconv
library. That could identify non-conforming characters that you could presumably remove. It's a question of how much work you're willing to go through to handle this non-conforming web page.
这篇关于iOS utf-8编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!