从网站获取 html 源代码，然后从 html 文件中获取元素 [英] Get html source code from a website and then get an element from the html file

查看：51 发布时间：2021/9/23 20:18:42 javascript html

本文介绍了从网站获取 html 源代码，然后从 html 文件中获取元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想获取网站的 HTML 代码，然后从该 HTML 文件中获取某个元素.

有些东西可以得到像ajax和jquery这样的HTML代码.我正在使用节点并希望它全部使用 javascript.另外，我不知道如何从中获取某个元素.

我已经在 python 中完成了这个，但我需要在 javascript 中.为简单起见.让我们以网站 - https://example.com为例.这是网站 HTML 代码的主体.

<div>#一些东西

我想让 div 类让 <div> 成为 <div class="test"> 以使其更容易.

最后，我想得到-<div class="test">

的内容

像这样-

#一些东西

提前致谢

解决方案

对于 Node.js，有两个原生获取模块:http 和 https.如果您想使用 Node.js 应用程序进行抓取，那么您可能应该使用 https，获取页面的 html，使用 html 解析器对其进行解析，我建议使用 cheerio.举个例子:

//原生 Node.js 模块const https = require('https')//不要忘记 `npm installcheerio` 来获取解析器！constcheerio = require('cheerio')//Node.js 的自定义获取const fetch = (method, url, payload=undefined) =>新承诺((解决，拒绝)=> {https.get(网址，资源=>{常量数据缓冲区 = []res.on('data', data => dataBuffers.push(data.toString('utf8')))res.on('end', () => resolve(dataBuffers.join('')))}).on('错误', 拒绝)})const scrapeHtml = url =>新承诺((解决，拒绝)=>{获取('GET'，网址).then(html => {const cheerioPage = cheerio.load(html)//cheerioPage 现在是一个加载的 html 解析器，具有与 jQuery 类似的界面//例如，要查找 ID 为 productData 的表，您可以这样做:const productTable =cheerioPage('table .productData')//然后你需要再次将元素重新加载到cheerio中//对其执行更多类似 jQuery 的搜索:constcheerioProductTable =cheerio.load(productTable)const productRows =cheerioProductTable('tr')//现在我们有了对表中每一行的引用，即对象//从cheerio 搜索返回类似于数组，但原生JS 函数//比如 .map 对它不起作用，所以我们需要做一个手动校准的循环:让我 = 0让cheerioProdRow, prodRowTextconst productsTextData = []while(i < productRows.length) {cheerioProdRow = cheerio.load(productRows[i])prodRowText =cheerioProdRow.text().trim()productsTextData.push(prodRowText)我++}解决(产品文本数据)}).catch(拒绝)})scrapeHtml(/*此处抓取的网址*/).then(数据=> {//期望返回的数据是每个文本的数组//从我们加载的 html 表中的行.现在我们可以为所欲为//否则你想要抓取的数据.console.log('数据:', 数据)}).catch(err => console.log('err: ', err)

祝你刮刮乐！

I want to get HTML code of a website and then get a certain element from that HTML file.



There are things that can get HTML code like ajax and jquery. I am using node and want it to be in total javascript. Also, I have no idea how to get a certain element from that.

I have done this in python but I need it in javascript. For simplicity. Let's take the website- https://example.com. This is the body of the HTML code of website.
<body>
<div>
    #Some Stuff 
</div>
</body>
I want to get the div class lets take <div> to be <div class="test"> to make it easier.

Finally, I want to get- the content of <div class="test">

Like this-
<div class="test">
    #Some Stuff 
</div>
Thanks in Advance
 解决方案 
For Node.js there are two native fetching modules: http and https. If you're looking to scrape with a Node.js application, then you should probably use https, get the page's html, parse it with an html parser, I'd recommend cheerio. Here's an example: 
// native Node.js module
const https = require('https')
// don't forget to `npm install cheerio` to get the parser!
const cheerio = require('cheerio')

// custom fetch for Node.js
const fetch = (method, url, payload=undefined) => new Promise((resolve, reject) => {
    https.get(
        url,
        res => {
            const dataBuffers = []
            res.on('data', data => dataBuffers.push(data.toString('utf8')))
            res.on('end', () => resolve(dataBuffers.join('')))
        }
    ).on('error', reject)
})

const scrapeHtml = url => new Promise((resolve, reject) =>{
  fetch('GET', url)
  .then(html => {
    const cheerioPage = cheerio.load(html)
    // cheerioPage is now a loaded html parser with a similar interface to jQuery
    // FOR EXAMPLE, to find a table with the id productData, you would do this:
    const productTable = cheerioPage('table .productData')

    // then you would need to reload the element into cheerio again to
    // perform more jQuery like searches on it:
    const cheerioProductTable = cheerio.load(productTable)
    const productRows = cheerioProductTable('tr')

    // now we have a reference to every row in the table, the object
    // returned from a cheerio search is array-like, but native JS functions
    // such as .map don't work on it, so we need to do a manually calibrated loop:
    let i = 0
    let cheerioProdRow, prodRowText
    const productsTextData = []
    while(i < productRows.length) {
      cheerioProdRow = cheerio.load(productRows[i])
      prodRowText = cheerioProdRow.text().trim()
      productsTextData.push(prodRowText)
      i++
    }
    resolve(productsTextData)
  })
  .catch(reject)
})

scrapeHtml(/*URL TO SCRAPE HERE*/)
.then(data => {
  // expect the data returned to be an array of text from each 
  // row in the table from the html we loaded. Now we can do whatever
  // else you want with the scraped data. 
  console.log('data: ', data)
})
.catch(err => console.log('err: ', err)

Happy scraping!

                        这篇关于从网站获取 html 源代码，然后从 html 文件中获取元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

从网站获取 html 源代码，然后从 html 文件中获取元素 [英] Get html source code from a website and then get an element from the html file

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

从网站获取 html 源代码，然后从 html 文件中获取元素 [英] Get html source code from a website and then get an element from the html file

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭