有没有办法从JS渲染页面获取所有文本? [英] Is there a way to get all text from the rendered page with JS?

查看:121
本文介绍了有没有办法从JS渲染页面获取所有文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有(不显眼的,用户)方式使用Javascript获取页面中的所有文本?我可以获取HTML,解析它,删除所有标签等,但我想知道是否有办法从alread渲染页面获取文本。

Is there an (unobtrusive, to the user) way to get all the text in a page with Javascript? I could get the HTML, parse it, remove all tags, etc, but I'm wondering if there's a way to get the text from the alread rendered page.

To澄清一下,我不想从选择中获取文字,我想要整个页面。

To clarify, I don't want to grab text from a selection, I want the entire page.

谢谢!

推荐答案

所有归功于 Greg W的答案,因为我根据他的代码建立了这个答案,但我发现对于没有内联样式或脚本标签的网站 它通常更简单易用:

All credit to Greg W's answer, as I based this answer on his code, but I found that for a website without inline style or script tags it was generally simpler to use:

var theText = $('body').text();

因为这会抓取所有标签中的所有文本,而无需手动设置可能包含文本的每个标签。

as this grabs all text in all tags without one having to manually set every tag that might contain text.

另外,如果你不小心,手动设置标签有在输出中创建重复文本的倾向,因为每个函数通常必须检查包含在内的标签其他标签导致它抓取相同的文本两次。使用一个包含我们想要抓取文本的所有标签的选择器可以解决这个问题。

Also, if you're not careful, setting the tags manually has the propensity to create duplicated text in the output as the each function will often have to check tags contained within other tags which results in it grabbing the same text twice. Using one selector which contains all the tags we want to grab text from circumvents this issue.

需要注意的是,如果在body标签中有内联样式或脚本标签,也会抓住这些。

The caveat is that if there are inline style or script tags within the body tag it will grab those too.

阅读这篇文章关于 innerText 我现在认为获取文本的绝对最佳方式是普通的ol vanilla js:

After reading this article about innerText I now think the absolute best way to get the text is plain ol vanilla js:

document.body.innerText

原样,这不是可靠的跨浏览器,但在受控环境中它会返回最佳结果。阅读文章了解更多详情。

As is, this is not reliable cross-browser, but in controlled environments it returns the best results. Read the article for more details.

此方法以通常更易读的方式格式化文本,包含样式或脚本标记内容输出。

This method formats the text in a usually more readable manner and does not include style or script tag contents in the output.

这篇关于有没有办法从JS渲染页面获取所有文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆