NSXMLParser 不会忽略 CDATA [英] NSXMLParser doesn't ignore CDATA

查看:67
本文介绍了NSXMLParser 不会忽略 CDATA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 ios 开发的新手,我正在尝试解析 RSS 文件(xml).

im pretty new in ios development and im tryin to parse an RSS file(xml).

这里是 xml:(语言不通)

<item>
<category> General < / category >
<title> killed in a tractor accident , was critically injured windsurfer </ title>
<description>
< ! [ CDATA [
<div> <a href='http://www.ynet.co.il/articles/0,7340,L-4360016,00.html'> <img src = 'http://www.ynet.co. il/PicServer3/2012/11/28/4302844/YOO_8879_a.jpg ' alt =' photo: Yaron Brener 'title =' Amona 'border = '0' width = '116 'height = '116'> </ a> < / div >
] ] >
Tractor driver in his 50s near Kfar Yuval flipped and trapped underneath . Room was critically injured windsurfer hurled rocks because of strong winds and wind surfer after was moderately injured in Netanya
< / description >
<link>
http://www.ynet.co.il/articles/0 , 7340, L- 4360016 , 00.html
< / link >
<pubDate> Fri, 22 Mar 2013 17:10:15 +0200 </ pubDate>
<guid>
http://www.ynet.co.il/articles/0 , 7340, L- 4360016 , 00.html
< / guid >
<tags> Kill , car accidents , surfing < / tags >
< / item >

这是我的 xmlparser 代码:

    - (void)parserDidStartDocument:(NSXMLParser *)parser
    {
       self.titles = [[NSMutableArray alloc]init];
       self.descriptions = [[NSMutableArray alloc]init];
        self.links = [[NSMutableArray alloc]init];
    }

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
    if ([elementName isEqualToString:@"item"]) {
        isItem = YES;
    }

    if ([elementName isEqualToString:@"title"]) {
        isTitle=YES;
        self.titlesString = [[NSMutableString alloc]init];
    }

    if ([elementName isEqualToString:@"description"]) {
        isDesription = YES;
        self.descriptionString = [NSMutableString string];
        self.data = [NSMutableData data];
    }



}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{
    if(isItem && isTitle){
        [self.titlesString appendString:string];
    }
    if (isItem && isDesription) {
        if (self.descriptionString)
            [self.descriptionString appendString:string];
    }






}

- (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock
{
    if (self.data)
        [self.data appendData:CDATABlock];

}


- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
    if ([elementName isEqualToString:@"item"]) {
        isItem = NO;
        [self.titles addObject:self.titlesString];

        [self.descriptions addObject:self.descriptionString];


    }

    if ([elementName isEqualToString:@"title"]) {
        isTitle=NO;

    }
    if ([elementName isEqualToString:@"description"]) {

        NSString *result = [self.descriptionString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
        NSLog(@"string=%@", result);


        if ([self.data length] > 0)
        {
            NSString *htmlSnippet = [[NSString alloc] initWithData:self.data encoding:NSUTF8StringEncoding];
            NSString *imageSrc = [self firstImgUrlString:htmlSnippet];
            NSLog(@"img src=%@", imageSrc);
            [self.links addObject:imageSrc];
        }



        self.descriptionString = nil;
        self.data = nil;
    }


}

- (NSString *)firstImgUrlString:(NSString *)string
{
    NSError *error = NULL;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(<img\\s[\\s\\S]*?src\\s*?=\\s*?['\"](.*?)['\"][\\s\\S]*?>)+?"
                                                                           options:NSRegularExpressionCaseInsensitive
                                                                             error:&error];

    NSTextCheckingResult *result = [regex firstMatchInString:string
                                                     options:0
                                                       range:NSMakeRange(0, [string length])];

    if (result)
        return [string substringWithRange:[result rangeAtIndex:2]];

    return nil;
}

@end

就像我说我对 iPhone 开发还很陌生,我花了几个小时寻找解决它的方法,但一无所获.我决定开个话题,然后问几个问题:

Like I said I'm pretty new to iPhone development, I looked for ways to solve it for several hours but found nothing. I decided to open a topic, then a few questions:

一.解析器不会忽略 CDATA 正在解析的一切.为什么会这样?正如你所看到的,描述本身不在 cdata 中,我只有第一步,但即使我不使用 foundCDATA,我也会得到其余的:(NSData *) CDATABlock

One. The parser does not ignore what CDATA is just doing parse everything. Why is this happening? As you can see the description itself is not in cdata and I I have only the first step but I get the rest even when I'm not using foundCDATA: (NSData *) CDATABlock

两个.我想取图片链接,怎么做?我在网上搜索,发现很多指南解释只使用foundCDATA函数:(NSData *) CDATABlock但是它是如何使用的呢?我在代码中使用的方式?

Two. I want to take the image link, how to do it? I searched online and found a lot of guide explains only use the function foundCDATA: (NSData *) CDATABlock But how is it used? The way in which I used in the code?

我需要一个解释以便我理解,谢谢!

Please I need an explanation so I can understand, thank you!

推荐答案

回答你的两个问题:

  1. 如果您实现了 foundCDATA,解析器将在该方法中解析 description CDATA,而不是在 foundCharacters.另一方面,如果您还没有实现 foundCDATACDATA 将被 foundCharacters 解析.因此,如果您不希望 foundCharacters 解析 CDATA,那么您必须实现 foundCDATA.

  1. The parser will, if you have implemented foundCDATA, will parse the description CDATA in that method, and not in foundCharacters. If, on the other hand, you have not implemented foundCDATA, the CDATA will be parsed by foundCharacters. So, if you don't want foundCharacters to parse the CDATA, then you have to implement foundCDATA.

如果您想提取 img URL,您必须以某种方式解析您收到的 HTML.您可以使用 Hpple,但我可能只是倾向于使用正则表达式:

If you want to extract the img URL, you have to parse the HTML you received somehow. You can use Hpple, but I might just be inclined to use a regular expression:

- (NSString *)firstImgUrlString:(NSString *)string
{
    NSError *error = NULL;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(<img\\s[\\s\\S]*?src\\s*?=\\s*?['\"](.*?)['\"][\\s\\S]*?>)+?"
                                                                           options:NSRegularExpressionCaseInsensitive
                                                                             error:&error];

    NSTextCheckingResult *result = [regex firstMatchInString:string
                                                     options:0
                                                       range:NSMakeRange(0, [string length])];

    if (result)
        return [string substringWithRange:[result rangeAtIndex:2]];

    return nil;
}

另请参阅其他堆栈溢出答案,其中我演示了 Hpple 和正则表达式解决方案:

Also see this other Stack Overflow answer in which I demonstrate both Hpple and regex solutions:

<小时>

例如,这里是 NSXMLParserDelegate 方法,它将解析描述,将文本(不包括 CDATA)放在一个字段中,并将来自 CDATA 的图像 URL 放在另一个变量中.您必须进行修改以适应您的流程,但希望这可以为您提供基本思路:


As an example, here is the NSXMLParserDelegate methods that will parse the description, putting the text (excluding the CDATA) in one field, and putting the image URL from the CDATA in another variable. You'll have to modify to accommodate your process, but hopefully this gives you the basic idea:

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
    if ([elementName isEqualToString:@"description"])
    {
        self.string = [NSMutableString string];
        self.data = [NSMutableData data];
    }
}

- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError
{
    NSLog(@"%s, parseError=%@", __FUNCTION__, parseError);
}

// In my standard NSXMLParser routine, I leave self.string `nil` when not parsing 
// a particular element, and initialize it if I am parsing. I do it this way
// so only my `didStartElement` and `didEndElement` need to worry about the particulars
// and my `foundCharacters` and `foundCDATA` are simplified. But do it however you
// want.

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    if (self.string)
        [self.string appendString:string];
}

- (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock
{
    if (self.data)
        [self.data appendData:CDATABlock];
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
    if ([elementName isEqualToString:@"description"])
    {
        // get the text (non-CDATA) portion

        // you might want to get rid of the leading and trailing whitespace

        NSString *result = [self.string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
        NSLog(@"string=%@", result);

        // get the img out of the CDATA

        if ([self.data length] > 0)
        {
            NSString *htmlSnippet = [[NSString alloc] initWithData:self.data encoding:NSUTF8StringEncoding];
            NSString *imageSrc = [self firstImgUrlString:htmlSnippet];
            NSLog(@"img src=%@", imageSrc);
        }

        // once I've saved the data where I want to save it, I `nil` out my
        // `string` and `data` properties:

        self.string = nil;
        self.data = nil;
    }
}

这篇关于NSXMLParser 不会忽略 CDATA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆