获取API：从http响应中获取标题，关键字和正文 [英] Fetch API: Get title, keywords and body text from http response

查看：126 发布时间：2019/6/12 12:56:25 javascript web-scraping

本文介绍了获取API：从http响应中获取标题，关键字和正文的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想知道使用fetch api（＃a href =https://stackoverflow.com/questions/9028234/is-）从responseText获取用户可以看到的标题，关键字和内容的最佳方式是什么？有没有发送cookie的地方 - 当-n-xmlhttprequest-on-same-ori / 30050285＃30050285>有没有办法在同一个XMLHttpRequest上发送cookie来源？）

目前，我使用正则表达式从响应文本中获取标题，例如：

  var re_title = new RegExp（< title> [\\\\\\]] *（。*）[\ n \\\] *< / title>，gmi）; 
 var title = re_title.exec（responseText）; 
 if（title）
 title = title [1]

并获得关键字元标记中的内容，我需要使用几个正则表达式。

为了让用户看到内容，我们不需要像script，div这样的标签等等，我们不需要脚本标签之间的文本。这只是为了得到响应体中有意义的单词。

我认为（也就是各种stackoverflow帖子）使用正则表达式就是这样正确的方法。可能有什么替代方案？

解决方案

as zzzzBov 提到，您可以使用浏览器的 DOMParser API实现来解析 response.text（）的 fetch 请求。这是一个为自己发送此类请求并解析标题，关键字和正文文本的示例：

 <！DOCTYPE html>< html>< head> < title>这是页面标题< / title> < meta charset =UTF-8> < meta name =descriptioncontent =Free Web Help> < meta name =keywordscontent =HTML，CSS，XML，JavaScript> < meta charset =utf-8> <脚本> fetch（https://dl.dropboxusercontent.com/u/76726218/so.html）.then（function（response）{return（response.text（））;}）。then（function（responseText）{var parsedResponse =（new window.DOMParser（））。parseFromString（responseText，text / html）; document.getElementById（title）。innerHTML =Title：+ parsedResponse.title; document.getElementById（keywords） .innerHTML =Keywords：+ parsedResponse.getElementsByName（keywords）[0] .getAttribute（content）; document.getElementById（visibleText）。innerHTML =Visible Text：+ parsedResponse.getElementsByTagName（body ）[0] .textContent;}）; < /脚本>< /头><身体GT; < div>此文字对用户可见。< / div> < div>所以< i>是< / i> < b取代;该< / B个。< / DIV> < HR> < b取代;结果：筛; / B个< ul id =results> < li id =title>< / li> < li id =keywords>< / li> < li id =visibleText>< / li> < / ul>< / body>< / html>

<我在 Fetch API 上找到了Mozilla的文档，< a href =https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch =nofollow noreferrer>使用抓取，获取基本概念很有帮助。

I want to know what could be the best way to get the title, keywords and content visible to the user from responseText using fetch api (Is there a way to not send cookies when making an XMLHttpRequest on the same origin?)

At the moment, I use regular expressions to get the title from the response text, for example:

var re_title = new RegExp("<title>[\n\r\s]*(.*)[\n\r\s]*</title>", "gmi");
var title = re_title.exec(responseText);
if (title)
    title = title[1]

And to get the content in the keyword meta tag, i need to employ several regular expressions.

To get the content visible to the user, we don't need tags like script, div etc. also, we don't need the text between script tags. This is to get only the words which are meaningful in the body of the response.

I think (also as per various stackoverflow post) using regular expressions for this is just not the right approach. What could be the alternative?

解决方案

As zzzzBov mentioned, you can use your browser's implementation of the DOMParser API to achieve this by parsing the response.text() of a fetch request. Here's an example that sends such a request for itself and parses the title, keywords, and body text:

<!DOCTYPE html>
<html>

<head>
  <title>This is the page title</title>
  <meta charset="UTF-8">
  <meta name="description" content="Free Web Help">
  <meta name="keywords" content="HTML,CSS,XML,JavaScript">
  <meta charset="utf-8">
  <script>
    fetch("https://dl.dropboxusercontent.com/u/76726218/so.html")
      .then(function(response) {
        return (response.text());
      })
      .then(function(responseText) {
        var parsedResponse = (new window.DOMParser()).parseFromString(responseText, "text/html");
        document.getElementById("title").innerHTML = "Title: " + parsedResponse.title;
        document.getElementById("keywords").innerHTML = "Keywords: " + parsedResponse.getElementsByName("keywords")[0].getAttribute("content");
        document.getElementById("visibleText").innerHTML = "Visible Text: " + parsedResponse.getElementsByTagName("body")[0].textContent;
      });
  </script>
</head>

<body>

  <div>This text is visible to the user.</div>
  <div>So <i>is</i>  <b>this</b>.</div>
  <hr>
  <b>Results:</b>
  <ul id="results">
    <li id="title"></li>
    <li id="keywords"></li>
    <li id="visibleText"></li>
  </ul>

</body>

</html>

I found Mozilla's documentation on the Fetch API, Using Fetch, and Fetch basic concepts helpful.

这篇关于获取API：从http响应中获取标题，关键字和正文的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取API：从http响应中获取标题，关键字和正文 [英] Fetch API: Get title, keywords and body text from http response

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

获取API：从http响应中获取标题，关键字和正文 [英] Fetch API: Get title, keywords and body text from http response

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭