Objective-C HTML解析。获取标签之间的所有文本 [英] Objective-C HTML parsing. Get all text between tags
问题描述
我正在使用 hpple 尝试从ThePirateBay获取torrent描述。目前,我正在使用此代码:
NSString * path = @// div [@ id ='content'] /格[@ ID = '主内容'] / DIV / DIV [@ ID = 'detailsouterframe'] /格[@ ID = 'detailsframe'] /格[@ ID = '细节'] /格[@类= 'NFO'] /前/节点();
NSArray * nodes = [parser searchWithXPathQuery:path];
for(TFHppleElement *节点中的元素){
NSString * postid = [元素内容];
if(postid){
[texts appendString:postid];
$ p 这只返回纯文本,而不是任何URL的截图。无论如何要获取所有链接和其他标签,而不仅仅是纯文本?
piratebay是这样设计的:
< pre>
< a href =http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg =nofollow>
http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg</a>
关于文件的更多文本
< / pre>
解决方案
你想要的是 a
-tag的内容(或属性),所以你需要告诉你需要的解析器。
只需将 XPath
更改为
@// div [@ id ='content'] / div [@ id ='main-content'] / div / div [@ id ='detailsouterframe' ] / div [@ id ='detailsframe'] / div [@ id ='details'] / div [@ class ='nfo'] / pre / a
(您最后错过了 a
,并且您不需要 $ b
http://www.imdb.com/title/ tt1904996 /
http://leetleech.org/images/65823608764828593230。 png
http://leetleech.org/images/44748070481477652927.png
http://leetleech.org/images/42024611449329122742.png
如果您只需要截图网址,您可以执行以下操作:
NSMutableArray * screenshotURLs = [[NSMutableArray alloc] initWithCapacity:0];
for(int i = 1; i< nodes.count; i ++){
[screenshotURLs addObject:nodes [i]];
}
I am using hpple to try and grab a torrent description from ThePirateBay. Currently, I'm using this code:
NSString *path = @"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/node()";
NSArray *nodes = [parser searchWithXPathQuery:path];
for (TFHppleElement * element in nodes) {
NSString *postid = [element content];
if (postid) {
[texts appendString:postid];
}
}
This returns just the plain text, and not any of the URL's for screenshots. Is there anyway to get all links and other tags, not just plain text?
The piratebay is fomratted like so:
<pre>
<a href="http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg" rel="nofollow">
http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg</a>
More texts about the file
</pre>
解决方案 That's an easy job and you did it almost correctly!
What you want is the content (or an attribute) of the a
-tag, so you need to tell the parser that you want it.
Just change your XPath
to
@"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/a"
(You missed the a
at the very end and you do not need node()
)
Output:
http://www.imdb.com/title/tt1904996/
http://leetleech.org/images/65823608764828593230.png
http://leetleech.org/images/44748070481477652927.png
http://leetleech.org/images/42024611449329122742.png
If you only want the screenshot URLs you can do something like
NSMutableArray *screenshotURLs = [[NSMutableArray alloc] initWithCapacity:0];
for (int i = 1; i < nodes.count; i++) {
[screenshotURLs addObject:nodes[i]];
}
这篇关于Objective-C HTML解析。获取标签之间的所有文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!