解析HTML NSRegularExpression [英] Parsing HTML NSRegularExpression

查看:117
本文介绍了解析HTML NSRegularExpression的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用NSRegularExpressions解析一个HTML页面。
页面是这个html代码的重复:

i'm trying to parse an HTML page using NSRegularExpressions.. The page is a repetition of this html code:

<div class="fact" id="fact66">STRING THAT I WANT</div> <div class="vote">
<a href="index.php?p=detail_fact&fact=106">#106</a> &nbsp; &nbsp; 
<span id="p106">246080 / 8.59  </span> &nbsp; &nbsp;
<span id="f106" class="vote2">
<a href="#" onclick="xajax_voter(106,3); return false;">(+++)</a> 
<a href="#" onclick="xajax_voter(106,2); return false;">(++)</a>  
<a href="#" onclick="xajax_voter(106,1); return false;">(+)</a> 
<a href="#" onclick="xajax_berk(106); return false;">(-)</a></span>
<span id="ve106"></span>
</div>

所以,我想要获取div之间的字符串

So, i'ld like to get the string between the div

 <div class="fact" id="fact66">STRING THAT I WANT</div>

所以我做了一个正则表达式,看起来像这样

So i made a regex that looks like this

<div class="fact" id="fact[0-9].*\">(.*)</div>

现在,在我的代码中,我使用它来实现它:

Now, in my code, i implement it using this:

    NSString *htmlString = [NSString stringWithContentsOfURL:[NSURL URLWithString:@"http://www.myurl.com"] encoding:NSASCIIStringEncoding error:nil];
NSRegularExpression* myRegex = [[NSRegularExpression alloc] initWithPattern:@"<div class=\"fact\" id=\"fact[0-9].*\">(.*)</div>\n" options:0 error:nil];
    [myRegex enumerateMatchesInString:htmlString options:0 range:NSMakeRange(0, [htmlString length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop) {
        NSRange range = [match rangeAtIndex:1];
        NSString *string =[htmlString substringWithRange:range];
        NSLog(string);
    }];

但它不会返回任何内容...我在Java和PHP中测试了我的正则表达式,我是否做错了?

But it returns nothing... I tested my regex in Java and PHP and it works great, what am i doing wrong ?

感谢

Thanks

推荐答案

尝试使用这个正则表达式:

Try using this regex:

 @"<div class=\"fact\" id=\"fact[0-9]*\">([^<]*)</div>"

正则表达式:

fact[0-9].*

表示:事实之后跟着一个数字0和9,然后任何字符重复任意次数。

means: fact followed by a number between 0 and 9, followed by any character repeated any number of times.

我也建议使用:

I also suggest using:

([^<]*)

而不是

(.*)


$ b $

to match between the two divs so to deal with regex greediness, or alternatively:

(.*?)

(?会使正则表达式非贪婪,因此停止在<$ c $的第一个实例c>< / div> 。

(? will make the regex non-greedy, so it stops at the first instance of </div>.

这篇关于解析HTML NSRegularExpression的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆