Objective-C HTML解析。获取标签之间的所有文本 [英] Objective-C HTML parsing. Get all text between tags

查看:94
本文介绍了Objective-C HTML解析。获取标签之间的所有文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 hpple 尝试从ThePirateBay获取torrent描述。目前,我正在使用此代码:

  NSString * path = @// div [@ id ='content'] /格[@ ID = '主内容'] / DIV / DIV [@ ID = 'detailsouterframe'] /格[@ ID = 'detailsframe'] /格[@ ID = '细节'] /格[@类= 'NFO'] /前/节点(); 
NSArray * nodes = [parser searchWithXPathQuery:path];
for(TFHppleElement *节点中的元素){
NSString * postid = [元素内容];
if(postid){
[texts appendString:postid];






$ p

这只返回纯文本,而不是任何URL的截图。无论如何要获取所有链接和其他标签,而不仅仅是纯文本?
piratebay是这样设计的:

 < pre> 
< a href =http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg =nofollow>
http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg</a>
关于文件的更多文本
< / pre>


解决方案



你想要的是 a -tag的内容(或属性),所以你需要告诉你需要的解析器。



只需将 XPath 更改为

  @// div [@ id ='content'] / div [@ id ='main-content'] / div / div [@ id ='detailsouterframe' ] / div [@ id ='detailsframe'] / div [@ id ='details'] / div [@ class ='nfo'] / pre / a

(您最后错过了 a ,并且您不需要 $ b


http://www.imdb.com/title/ tt1904996 /

http://leetleech.org/images/65823608764828593230。 png

http://leetleech.org/images/44748070481477652927.png

http://leetleech.org/images/42024611449329122742.png




如果您只需要截图网址,您可以执行以下操作:

  NSMutableArray * screenshotURLs = [[NSMutableArray alloc] initWithCapacity:0]; 
for(int i = 1; i< nodes.count; i ++){
[screenshotURLs addObject:nodes [i]];
}


I am using hpple to try and grab a torrent description from ThePirateBay. Currently, I'm using this code:

NSString *path = @"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/node()";
NSArray *nodes = [parser searchWithXPathQuery:path];
for (TFHppleElement * element in nodes) {
    NSString *postid = [element content];
    if (postid) {
        [texts appendString:postid];
    }
}

This returns just the plain text, and not any of the URL's for screenshots. Is there anyway to get all links and other tags, not just plain text? The piratebay is fomratted like so:

<pre>
    <a href="http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg" rel="nofollow">
    http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg</a>
More texts about the file
</pre>

解决方案

That's an easy job and you did it almost correctly!

What you want is the content (or an attribute) of the a-tag, so you need to tell the parser that you want it.

Just change your XPath to

@"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/a"

(You missed the a at the very end and you do not need node())

Output:

http://www.imdb.com/title/tt1904996/
http://leetleech.org/images/65823608764828593230.png
http://leetleech.org/images/44748070481477652927.png
http://leetleech.org/images/42024611449329122742.png

If you only want the screenshot URLs you can do something like

NSMutableArray *screenshotURLs = [[NSMutableArray alloc] initWithCapacity:0];
for (int i = 1; i < nodes.count; i++) {
    [screenshotURLs addObject:nodes[i]];
}

这篇关于Objective-C HTML解析。获取标签之间的所有文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆