在iOS中解析XML /'屏幕抓取'的最佳方法是什么? UIWebview还是NSXMLParser? [英] What's the best approach for parsing XML/'screen scraping' in iOS? UIWebview or NSXMLParser?

查看:123
本文介绍了在iOS中解析XML /'屏幕抓取'的最佳方法是什么? UIWebview还是NSXMLParser?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个需要从网页获取一些数据的iOS应用。我的第一个是使用 NSXMLParser initWithContentsOfURL:并使用 NSXMLParser 委托解析HTML。然而,这种方法似乎很快就会变得很痛苦(例如,如果HTML改变了,我将不得不重写解析代码,这可能很尴尬)。

I am creating an iOS app that needs to get some data from a web page. My first though was to use NSXMLParser initWithContentsOfURL: and parse the HTML with the NSXMLParser delegate. However this approach seems like it could quickly become painful (if, for example, the HTML changed I would have to rewrite the parsing code which could be awkward).

看到我正在加载一个网页,我也看了 UIWebView 。它看起来像 UIWebView 可能是要走的路。 stringByEvaluatingJavaScriptFromString:似乎是一种非常方便的数据提取方式,并且允许将javascript存储在一个单独的文件中,如果HTML发生变化,该文件很容易编辑。但是,使用 UIWebView 似乎有点hacky(看到 UIWebView 是一个 UIView 子类它可能会阻塞主线程,而文档说javascript的限制为10MB)。

Seeing as I'm loading a web page I took take a look at UIWebView too. It looks like UIWebView may be the way to go. stringByEvaluatingJavaScriptFromString: seems like a very handy way to extract the data and would allow the javascript to be stored in a separate file that would be easy to edit if the HTML changed. However, using UIWebView seems a bit hacky (seeing as UIWebView is a UIView subclass it may block the main thread, and the docs say that the javascript has a limit of 10MB).

有没有人对解析XML /有什么建议?在我被困之前的HTML?

Does anyone have any advice regarding parsing XML/HTML before I get stuck in?

更新:

我写了一篇博客发布我的解决方案: iOS中的HTML解析/屏幕抓取

I wrote a blog post about my solution:HTML parsing/screen scraping in iOS

推荐答案

使用XML解析器解析HTML通常无法正常工作,因为许多网站都有不正确的HTML,Web浏览器会处理这些HTML,但像严格的XML解析器一样 NSXMLParser 将完全失败。

Parsing HTML with an XML parser usually does not work anyway because many sites have incorrect HTML, which a web browser will deal with, but a strict XML parser like NSXMLParser will totally fail on.

对于许多脚本语言,有更好的抓取库更加仁慈。就像Python的Beautiful Soup模块一样。不幸的是我不知道Objective-C的这些模块。

For many scripting languages there are great scraping libraries that are more merciful. Like Python's Beautiful Soup module. Unfortunately I do not know of such modules for Objective-C.

将内容加载到 UIWebView 可能是最简单的去的地方。请注意,您不必在屏幕上放置 UIWebView 。您可以创建一个单独的 UIWindow 并将 UIWebView 添加到其中,以便您完全进行屏幕外渲染。我想有一个关于这个的WWDC2009视频。正如您已经提到的那样,它不会是轻量级的。

Loading stuff into a UIWebView might be the simplest way to go here. Note that you do not have to put the UIWebView on screen. You can create a separate UIWindow and add the UIWebView to it, so that you do full off-screen rendering. There was a WWDC2009 video about this I think. As you already mention, it will not be lightweight though.

根据您想要的数据和您需要解析的页面的复杂程度,您可能也是能够使用正则表达式甚至手写解析器来解析它。我已经多次这样做了,对于简单的数据,这很有效。

Depending on the data that you want and the complexity of the pages that you need to parse, you might also be able to parse it by using regular expressions or even a hand written parser. I have done this many times, and for simple data this works well.

这篇关于在iOS中解析XML /'屏幕抓取'的最佳方法是什么? UIWebview还是NSXMLParser?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆