仅从目标C中的html内容中提取文本 [英] extract the text only from html content in objective C

查看:110
本文介绍了仅从目标C中的html内容中提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现的所有条带函数都是从html内容中提取html元素。我正在寻找一个简单的目标c函数,给出一个嵌套的文本块,如:

All the strip functions that I found were extracting the html elements from an html content. I am looking for a simple objective c function that given a nested block of text like:

<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br /><div style="padding-top:0.8em;"><img alt="" height="1" width="1" /></div><div class="lh"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;usg=AFQjCNFV5azq03nECHSmTV0CI-KwzBFXWA&amp;url=http://www.fool.com/investing/general/2012/03/11/the-justice-department-has-apples-number.aspx"><b>The Justice Department Has ordered <b>Apple</b> .... 

它只会返回司法部命令苹果......

It will only return The justice Department has ordered apple ....

我知道有一个UIWebView Javascript函数可以做到但它看起来有点慢,因为它依赖于javascript。我想知道是否有函数给出带有嵌套标签的html(它将忽略所有标签及其内容并返回纯文本内容)

I know there is a UIWebView Javascript function that does it but it seems a little slow cause it relies on javascript. I was wondering if there is function that given the html with nested tags (it will ignore all the tags and their content and returns a plain content text)

谢谢,
Ross

Thanks, Ross

推荐答案

只需使用尖括号拆分字符串,取出所有其他元素,然后将它们连接在一起:

Just split the string using angle brackets, take every other element, and join them back together:

NSArray *components = [yourString componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:@"<>"]];

NSMutableArray *componentsToKeep = [NSMutableArray array];
for (int i = 0; i < [components count]; i = i + 2) {
    [componentsToKeep addObject:[components objectAtIndex:i]];
}

NSString *plainText = [componentsToKeep componentsJoinedByString:@""];

这篇关于仅从目标C中的html内容中提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆