使用NSXMLParser解析HTML [英] Using an NSXMLParser to parse HTML

查看:212
本文介绍了使用NSXMLParser解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个应用程式,汇集来自互联网的一些资讯提供,并重新格式化内容。所以我正在寻找一种方法来解析一些HTML。给定的XML和HTML非常相似的结构,我在想也许我应该使用一个NSXMLParser我已经使用它来解析我的RSS源,我已经习惯使用它,但我遇到了一个问题。

I'm working on an app which aggregates some feeds from the internet and reformats the content. So I'm looking for a way to parse some HTML. Given XML and HTML are very similar in structure I was thinking "maybe I should just use an NSXMLParser" I'm already using it to parse my RSS feeds and I've become comfortable using it, but I'm running into a problem.

解析器不会将< p> 识别为元素。提取< title> < img> 等元素没有问题,但它不喜欢< p> 。有没有人试过这样做,如果是这样,你有这个问题的任何建议或工作?我认为XMLParser是好的我正在做,我想使用它,但显然,如果我不能得到< p> 元素对我来说完全没用。

The parser will not recognize <p> as an element. It has no problem extracting elements like <title>, or <img>, but it doesn't like <p>. Has anyone tried doing this, and if so do you have any suggestion or work arounds for this issue? I think the XMLParser is good for what I'm doing and I would like to use it, but obviously, if I can't get the text in <p> elements it's completely useless to me.

欢迎任何建议,甚至建议完全不同的方法。我已经查看了一些第三方库这样做,但从我看到他们都有一些错误,我更喜欢使用由苹果提供的东西。

Any suggestions are welcome, even ones suggesting a different method entirely. I've looked into some third party libraries for doing this but from what I've read they all have some bugs and I would much prefer to use something provided by Apple.

推荐答案

p作为元素的名称绝对没有什么特别之处。虽然很难确定,因为您没有提供您正在解析的HTML的示例,该问题很可能是由不是格式良好的XML引起的。换句话说,使用NSXMLParser将在XHTML上工作,但不一定是普通的HTML。

There's absolutely nothing special about "p" as the name of an element. While it is hard to be sure because you haven't provided an example of the HTML you are parsing, the problem is most likely caused by HTML that is not well-formed XML. In other words, using NSXMLParser would work on XHTML, but not necessarily plain-old HTML.

p元素经常出现在没有匹配结束标记的HTML中,这不是有效的XML。我的猜想是,你必须将HTML转换为XHTML,然后尝试使用NSXMLParser解析它

The "p" element is frequently found in HTML without the matching closing tag, which is not valid XML. My guess is that you would have to convert the HTML to XHTML before trying to parse it with an NSXMLParser

这篇关于使用NSXMLParser解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆