iOS utf-8编码问题 [英] iOS utf-8 encoding issue

查看:203
本文介绍了iOS utf-8编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用UTF-8字符集获取html页面

i try get html page with UTF-8 charset

NSString *html=[NSString stringWithContentsOfURL:[NSURL URLWithString:  @"http://forums.drom.ru/general/t1151288178.html"] encoding:NSUTF8StringEncoding error:&error]);

NSLog(@%@,html) return null
为什么会发生这种情况?

but NSLog(@"%@",html) return null Why is this happening?

推荐答案

问题是,虽然文件的元标记声称是UTF8,它不是(至少不完全)。您可以通过以下方式确认:

The problem is that while the file's meta tag purports to be UTF8, it's not (at least not entirely). You can confirm this by:


  • 下载html(as NSData ,成功):

NSError *error = nil;
NSURL *url = [NSURL URLWithString:@"http://forums.drom.ru/general/t1151288178.html"];
NSData *data = [NSData dataWithContentsOfURL:url options:0 error:&error];
NSString *docsPath = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES)[0];
NSString *filename = [docsPath stringByAppendingPathComponent:@"test.html"];
[data writeToFile:filename atomically:YES];


  • 从终端运行 iconv 命令行将报告错误(包括行号和字符数):

  • Run iconv from the Terminal command line, it will report an error (including line number and character number):

    
    iconv -f UTF-8 test.html > /dev/null
    

    感谢 Torsten Marek 与我们分享。

    当我看看HTML的那部分时,绝对不会UTF8字符,埋在 clever_cut_pattern JavaScript变量的设置中。

    When I look at that portion of the HTML, there are definitely not UTF8 characters there, buried in the setting of the clever_cut_pattern JavaScript variable.

    如果我们以为你只是得到了编码在这些情况下,典型的律师通常是使用 usedEncoding 参数(即相反地)使用 stringWithContentOfURL 比起猜测编码是什么,让 NSString 为你确定这一点):

    If we thought you just got the encoding wrong, the typical counsel in these cases would generally be to use the rendition of stringWithContentOfURL with the usedEncoding parameter (i.e. rather than guessing what the encoding is, let NSString determine this for you):

    NSStringEncoding encoding;
    NSString *html = [NSString stringWithContentsOfURL:url usedEncoding:&encoding error:&error];
    

    不幸的是,在这种情况下,即使失败(大概是因为该文件声称是UTF8,而是't)。

    Unfortunately, in this case, even that fails (presumably because the file purports to be UTF8, but isn't).

    然后问题变成好的,那我现在该怎么办。这取决于你为什么试图在你的应用程序中下载这个HTML。如果您真的需要将其转换为UTF8(即删除非UTF8字符),则理论上可以获得GNU iconv(3) 功能,这是 libiconv 图书馆。这可能会识别出您可能会删除的不合格字符。这是您愿意通过多少工作来处理这种不合格的网页的问题。

    The question then becomes "ok, so what do I do now". It depends upon why you were trying to download that HTML in your app, anyway. If you really need to convert this to UTF8 (i.e. strip out the non-UTF8 characters), you could theoretically get the GNU iconv(3) function, which is part of the libiconv library. That could identify non-conforming characters that you could presumably remove. It's a question of how much work you're willing to go through to handle this non-conforming web page.

    这篇关于iOS utf-8编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆