如何检测像Evernote clipper这样的主要文章标签 [英] How to detect the main article tag like Evernote clipper did

查看:169
本文介绍了如何检测像Evernote clipper这样的主要文章标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试使用



您还可以查看此主题:可读性用于从URL中提取文本的算法是什么?



或在google上搜索content extraction js lib之类的术语' 例如。
(找到这个: https://github.com/hatena/extract-content -javascript



希望这会有所帮助


When I tried with Evernote clipper extension, I see a very useful feature. When I clicked at "article", It gives me a really correct main content of page. Let see the result when I used Evernote Clipper with page https://developer.chrome.com/extensions/api_index

I looked at the main article that evernote field out, in several pages, the article is infact extracted from the first article tag. However evernote clipper still work well with pages doesn't use that kind of tag.

I wonder how Evernote clipper can do that ? Is there any js library support to detect the main tag containing the main content of pages. Could you give me some advises to do it.

Thank you in advance!

解决方案

From my knowledge, there is no universal js lib to do that. The Evernote clipper uses its own method to extract the "interesting" content from a web page. You can access the code of the Evernote clipper to try to understand the process.

On my mac, the path to the chrome extension is :

~/Library/Application Support/Google/Chrome/Default/Extensions/pioclpoplcdbaefihamjohnefbikjilc/6.2_0/

Here's another tool that works pretty much the same : https://www.readability.com/

You can also check this thread : What algorithm does Readability use for extracting text from URLs?

or search on google for terms like 'content extraction js lib' for example. (Found this one : https://github.com/hatena/extract-content-javascript)

Hope this helps

这篇关于如何检测像Evernote clipper这样的主要文章标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆