使用JavaScript/Jquery从页面获取所有可见文本(文本,按钮标签,下拉列表等)到变量中 [英] Get all the visible text(Text, button label, drop down list, etc) from a page using JavaScript/Jquery into a variable

查看:65
本文介绍了使用JavaScript/Jquery从页面获取所有可见文本(文本,按钮标签,下拉列表等)到变量中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要捕获显示在网页上的可见文本:

I need to capture the visible text displayed on a webpage:

  • 按钮标签
  • 下拉列表值
  • 文字
  • 字段标签,(输入框标签,单选按钮标签)等 基本上,我想捕获UI上显示的所有内容-没有HTML标记.
  • Button labels
  • Drop down list values
  • Text
  • Field labels,(Input box labels, radio button labels) etc Basically everything that is displayed on UI, I want to capture - Without HTML Tags.

通过tagName("body").getText()将捕获单个字段.

无论分配的ID/类别如何,我都需要整个页面.

I need for entire page, irrespective of ID/class assigned.

推荐答案

您可以使用

You can use a TreeWalker in order to target only text nodes, and extract their textContent property.

但是,您必须从<script><style>元素中滤除节点,因为我想您不想要这些节点(否则,简单的document.body.textContent即可).

You would have to filter out nodes from <script> and <style> elements though, since I guess you don't want these (otherwise a simple document.body.textContent would do).

function getDisplayedText(sourceElement) {
  // we need to filter out textContent of script and style elements
  var filterNodes = function(node) {
    var t = node.parentElement.tagName;
    if (t !== 'SCRIPT' && t !== 'STYLE')
      return NodeFilter.FILTER_ACCEPT;
  };
  filterNodes.acceptNode = filterNodes; // IE doesn't like {acceptNode:...} object
  
  // input value hack
  document.querySelectorAll('input:not([type="file"]):not([type="color"]):not([type="checkbox"])')
  .forEach(function(i){
    i.textContent = ''; // clean up previous calls?
    i.appendChild(document.createTextNode(i.value))
  });
  var treeWalker = document.createTreeWalker(
    sourceElement, // walk from the sourceElement
    NodeFilter.SHOW_TEXT, // walk only through text nodes
    filterNodes,
    false
  );

  var str = ''; // will hold all our text nodes
  while (treeWalker.nextNode())
    str += treeWalker.currentNode.textContent;

  return str;
}
console.log(getDisplayedText(document.body));

<!-- Dummy content -->
<div id="Content">
  <div id="Panes">
    <div>
      <h2>What is Lorem Ipsum?</h2>
      <p><strong>Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type
        specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more
        recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>
    </div>
    <script>
      var shoulNotBeInTheResult = this;
    </script>
    <div>
      <h2>Why do we use it?</h2>
      <p>It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content
        here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.
        Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like).</p>
    </div><br>
    <style>
      .shouldNotBeThereEither {}
    </style>
    <div>
      <h2>Gimme an <input type="text" value="Example"></h2>
      <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce tincidunt sit amet arcu a mollis. Aliquam nunc nisl, aliquam sed consequat fermentum, pulvinar vitae risus. Nullam aliquam semper sodales. Vivamus dictum nisl risus, sed dignissim mauris
        accumsan et. Integer nec mi ipsum.</p>
      <p> Nullam volutpat tristique sapien, non rutrum dolor porta ut. Nam commodo ultricies magna non auctor. Aenean nec hendrerit libero. Sed scelerisque a dolor sed commodo. Vivamus maximus libero ut elementum viverra. Donec ut massa quam. Nullam vitae
        nisl libero. Nullam turpis odio, convallis lacinia leo quis, viverra lacinia lectus. Mauris id turpis consectetur leo ultrices cursus at sit amet sapien. Vestibulum convallis arcu ipsum, sed vulputate risus condimentum at.</p>
    </div>
  </div>
</div>

这篇关于使用JavaScript/Jquery从页面获取所有可见文本(文本,按钮标签,下拉列表等)到变量中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆