如何从html中提取文本内容,如稍后阅读或InstaPaper Iphone应用程序? [英] How to extract text contents from html like Read it later or InstaPaper Iphone app?

查看:117
本文介绍了如何从html中提取文本内容,如稍后阅读或InstaPaper Iphone应用程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从我的Iphone应用程序中的html中提取主要文章内容,并且
在TextView或CoreText上显示它。

I want to extract main article content from html on my Iphone app and show it on TextView or CoreText.

稍后阅读它和InstaPaper Iphone应用程序有这个功能,但在网上研究后,
我还是不知道他们是怎么做的。

Read it later and InstaPaper Iphone apps have this feature, but after researching on web, I still can't tell how they do this.

目前,我从html中获取文字内容通过这段代码,但它也需要很多不需要的内容。

At the moment, I take text content from html by this code, but it takes lots of no need contents too.

textArticle = [webView stringByEvaluatingJavaScriptFromString:@"document.body.innerText"];

这个问题是我想要的,但遗憾的是它不适用于Iphone应用程序。

类似Instapaper的算法

This question is what I wanted, but sadly it was not for Iphone app.
Instapaper-like algorithm

这是这种功能的开源,但我不确定我是否可以将它用于Iphone应用程序。
https://github.com/jiminoc/goose/wiki

This is open source for this kind of feature, but I am not sure if I can use it for Iphone app. https://github.com/jiminoc/goose/wiki

以前似乎更聪明地提供api,但它现在不可用。
http://smartrmobi.blogspot。 com / 2011/02 / smartr-api-withdrawn-until-further.html

It seems smartr provided api for that before, but it is not available now. http://smartrmobi.blogspot.com/2011/02/smartr-api-withdrawn-until-further.html

也许,最简单的方法是从xml元素获取文章内容,但这只是我的猜测。

Maybe, easiest way to do this is get article content from xml element, but this is only my guess.

我想知道从哪里开始,所以我真的很感激任何建议。

I would like to know where to start so I'd really appreciate for any suggestions.

谢谢

推荐答案

经过研究,似乎我可以使用api从网络中提取文本内容。
这意味着我需要在获取url并再次呈现结果后访问网页。

After researching, it seems I can use api to extract text contents from web. It means I need to access webpage after I got url and render the result again.

它比仅使用上面显示的js脚本要慢,因为它需要访问web api,但
稍后阅读,instapaper都使用这种方法我想。

It is slower than just using js script showed above because it needs to access web api but read it later and instapaper both are using this approach I guess.

http://viewtext.org/

这个api有很好的功能,它将多页文章合二为一。
我正在使用这个api,因为其他api没有这个功能。

this api has very nice feature which combines multi-page articles into one. I am using this api because of this feature which other api do not have.

http://fivefilters.org/content-only/

关于这个的好处是你可以自己购买脚本并进行设置服务器。

great thing about this is you can buy script and set up on your own server.

* 更新 *

*UPDATE*

它似乎大多数应用程序使用可读性或Ins​​tapaper或Google动员来解析网络上的文本内容。

It seems that most apps use "Readability" or "Instapaper" or "Google" mobilizer to parse only text contents from the web.

其中,我最喜欢的是可读性解析器此刻,因为它没有像Instapaper解析器那样的广告。 (尽管将广告用于支付服务器成本没有错误)

Among them, my favorite is "Readability" parser at the moment, since it doesn't come with advertisement like Instapaper parser. (Nothing wrong about putting ads to cover the server cost though)

Pocket还仅为创建口袋集成应用程序的开发人员提供文章解析器。

Pocket also provides article parser only for developers who creating pocket integrated apps.

这篇关于如何从html中提取文本内容,如稍后阅读或InstaPaper Iphone应用程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆