如何在 Nodejs 中读取和解析 html? [英] How to read and parse html in Nodejs?

查看:48
本文介绍了如何在 Nodejs 中读取和解析 html?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的项目.我需要帮助,这是一个相关的项目.我需要读取一个 HTML 文件,然后将其转换为 JSON 格式.我想以代码和文本的形式获取匹配项.我如何实现这一目标?

这样,我就有了两个 HTML 标签

<p>在实践中,在函数作用域内修改全局变量通常是一个坏主意,因为这通常是难以调试的混乱和奇怪错误的原因.<br/>如果要通过函数修改全局变量,建议将其作为参数传递并重新分配返回值.<br/>例如:</p><pre><code class="{python} language-{python}">a_var = 2def a_func(some_var):返回 2**3a_var = a_func(a_var)打印(a_var)</code></pre>

我的代码:

const fs = require('fs')const showdown = require('showdown')var read = fs.readFileSync('./test.md', 'utf8')函数导入器(mdFile){变量结果 = []让 json = {}var converter = new showdown.Converter()var text = mdFilevar html = converter.makeHtml(text);for (var i = 0; i < html.length; i++) {htmlRead = html[i]if(html == html.match(/

(.*?)<\/p>/g))json.text = html.match(/

(.*?)<\/p>/g)if(html == html.match(/

(.*?)<\/pre>/g))json.code = html.match(/

(.*?)<\/pre>/g}返回 html}控制台日志(进口商(读取))

如何在代码上获得这些匹配项?

新代码:我把所有的p标签都写在同一个json中,如何将每个p标签写入不同的json块?

$('html').each(function(){if ($('p').text != undefined) {json.code = $('p').text()json.language = "文本"}})

解决方案

我建议使用 Cheerio.它试图在 Node.js 中实现 jQuery 功能.

const Cheerio = require('cheerio')var html = "<p>在实践中,在函数作用域内修改全局变量通常是一个坏主意,因为这通常会导致混乱和难以调试的奇怪错误.<br/>如果你想要通过函数修改全局变量,建议将其作为参数传递并重新分配返回值.<br/>例如:</p>"const $ =cheerio.load(html)var 段落 = $('p').html();//段落内容.你可以用你喜欢的任何其他方式操纵它//...你可以对你需要的任何其他元素做同样的事情

您应该查看 Cheerio 并阅读其文档.我觉得它真的很整洁!

<块引用>

对于您问题的新部分

您可以遍历每个元素并将其插入到 JSON 对象数组中,如下所示:

var jsonObject = [];//一个包含所有内容的 JSON 对象数组$('p').each(function() {//循环每个段落//现在我们把段落的内容放到一个json对象中jsonObject.push({"段落":$(this).html()});//向主jsonObject添加数据});

因此生成的 JSON 对象数组应如下所示:

<预><代码>[{段落":文本"},{段落":文本 2"},{段落":文本 3"}]

我相信您还应该阅读 JSON 以及它是如何工作的.

I have a simple project. I need the help this is a related project. I need to read an HTML file and then convert it to JSON format. I want to get the matches as code and text. How I achieve this?

In this way, I have two HTML tags

<p>In practice, it is usually a bad idea to modify global variables inside the function scope since it often is the cause of confusion and weird errors that are hard to debug.<br />
If you want to modify a global variable via a function, it is recommended to pass it as an argument and reassign the return-value.<br />
For example:</p>

<pre><code class="{python} language-{python}">a_var = 2

def a_func(some_var):
    return 2**3

a_var = a_func(a_var)
print(a_var)
</code></pre>

mycode:

const fs = require('fs')
const showdown  = require('showdown')

var read =  fs.readFileSync('./test.md', 'utf8')

function importer(mdFile) {

    var result = []
    let json = {}

    var converter = new showdown.Converter()
    var text      = mdFile
    var html      = converter.makeHtml(text);

    for (var i = 0; i < html.length; i++) {
        htmlRead = html[i]
        if(html == html.match(/<p>(.*?)<\/p>/g))
            json.text = html.match(/<p>(.*?)<\/p>/g)

       if(html == html.match(/<pre>(.*?)<\/pre>/g))
            json.code = html.match(/<pre>(.*?)<\/pre>/g

    }

    return html
}
console.log(importer(read))

How do I get these matches on the code?

new code : I write all the p tags in the same json, how to write each p tag into different json blocks?

$('html').each(function(){
    if ($('p').text != undefined) {
        json.code = $('p').text()
        json.language = "Text"
    }
})

解决方案

I would recommend using Cheerio. It tries to implement jQuery functionality to Node.js.

const cheerio = require('cheerio')

var html = "<p>In practice, it is usually a bad idea to modify global variables inside the function scope since it often be the cause of confusion and weird errors that are hard to debug.<br />If you want to modify a global variable via a function, it is recommended to pass it as an argument and reassign the return-value.<br />For example:</p>"

const $ = cheerio.load(html)
var paragraph = $('p').html(); //Contents of paragraph. You can manipulate this in any other way you like

//...You would do the same for any other element you require

You should check out Cheerio and read its documentation. I find it really neat!

Edit: for the new part of your question

You can iterate over every element and insert it into an array of JSON objects like this:

var jsonObject = []; //An array of JSON objects that will hold everything
$('p').each(function() { //Loop for each paragraph
   //Now let's take the content of the paragraph and put it into a json object
    jsonObject.push({"paragraph":$(this).html()}); //Add data to the main jsonObject    
});

So the resulting array of JSON objects should look something like this:

[
  {
    "paragraph": "text"
  },
  {
    "paragraph": "text 2"
  },
  {
    "paragraph": "text 3"
  }
]

I believe You should also read up on JSON and how it works.

这篇关于如何在 Nodejs 中读取和解析 html?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆