如何在 Nodejs 中读取和解析 html? [英] How to read and parse html in Nodejs?
问题描述
我有一个简单的项目.我需要帮助,这是一个相关的项目.我需要读取一个 HTML 文件,然后将其转换为 JSON 格式.我想以代码和文本的形式获取匹配项.我如何实现这一目标?
这样,我就有了两个 HTML 标签
<p>在实践中,在函数作用域内修改全局变量通常是一个坏主意,因为这通常是难以调试的混乱和奇怪错误的原因.<br/>如果要通过函数修改全局变量,建议将其作为参数传递并重新分配返回值.<br/>例如:</p><pre><code class="{python} language-{python}">a_var = 2def a_func(some_var):返回 2**3a_var = a_func(a_var)打印(a_var)</code></pre>
我的代码:
const fs = require('fs')const showdown = require('showdown')var read = fs.readFileSync('./test.md', 'utf8')函数导入器(mdFile){变量结果 = []让 json = {}var converter = new showdown.Converter()var text = mdFilevar html = converter.makeHtml(text);for (var i = 0; i < html.length; i++) {htmlRead = html[i]if(html == html.match(/(.*?)<\/p>/g))json.text = html.match(/
(.*?)<\/p>/g)if(html == html.match(/
(.*?)<\/pre>/g))json.code = html.match(/(.*?)<\/pre>/g}返回 html}控制台日志(进口商(读取))
如何在代码上获得这些匹配项?
新代码:我把所有的p标签都写在同一个json中,如何将每个p标签写入不同的json块?
$('html').each(function(){if ($('p').text != undefined) {json.code = $('p').text()json.language = "文本"}})
我建议使用 Cheerio.它试图在 Node.js 中实现 jQuery 功能.
const Cheerio = require('cheerio')var html = "<p>在实践中,在函数作用域内修改全局变量通常是一个坏主意,因为这通常会导致混乱和难以调试的奇怪错误.<br/>如果你想要通过函数修改全局变量,建议将其作为参数传递并重新分配返回值.<br/>例如:</p>"const $ =cheerio.load(html)var 段落 = $('p').html();//段落内容.你可以用你喜欢的任何其他方式操纵它//...你可以对你需要的任何其他元素做同样的事情
您应该查看 Cheerio 并阅读其文档.我觉得它真的很整洁!
<块引用>对于您问题的新部分
您可以遍历每个元素并将其插入到 JSON 对象数组中,如下所示:
var jsonObject = [];//一个包含所有内容的 JSON 对象数组$('p').each(function() {//循环每个段落//现在我们把段落的内容放到一个json对象中jsonObject.push({"段落":$(this).html()});//向主jsonObject添加数据});
因此生成的 JSON 对象数组应如下所示:
<预><代码>[{段落":文本"},{段落":文本 2"},{段落":文本 3"}]我相信您还应该阅读 JSON 以及它是如何工作的.
I have a simple project. I need the help this is a related project. I need to read an HTML file and then convert it to JSON format. I want to get the matches as code and text. How I achieve this?
In this way, I have two HTML tags
<p>In practice, it is usually a bad idea to modify global variables inside the function scope since it often is the cause of confusion and weird errors that are hard to debug.<br />
If you want to modify a global variable via a function, it is recommended to pass it as an argument and reassign the return-value.<br />
For example:</p>
<pre><code class="{python} language-{python}">a_var = 2
def a_func(some_var):
return 2**3
a_var = a_func(a_var)
print(a_var)
</code></pre>
mycode:
const fs = require('fs')
const showdown = require('showdown')
var read = fs.readFileSync('./test.md', 'utf8')
function importer(mdFile) {
var result = []
let json = {}
var converter = new showdown.Converter()
var text = mdFile
var html = converter.makeHtml(text);
for (var i = 0; i < html.length; i++) {
htmlRead = html[i]
if(html == html.match(/<p>(.*?)<\/p>/g))
json.text = html.match(/<p>(.*?)<\/p>/g)
if(html == html.match(/<pre>(.*?)<\/pre>/g))
json.code = html.match(/<pre>(.*?)<\/pre>/g
}
return html
}
console.log(importer(read))
How do I get these matches on the code?
new code : I write all the p tags in the same json, how to write each p tag into different json blocks?
$('html').each(function(){
if ($('p').text != undefined) {
json.code = $('p').text()
json.language = "Text"
}
})
I would recommend using Cheerio. It tries to implement jQuery functionality to Node.js.
const cheerio = require('cheerio')
var html = "<p>In practice, it is usually a bad idea to modify global variables inside the function scope since it often be the cause of confusion and weird errors that are hard to debug.<br />If you want to modify a global variable via a function, it is recommended to pass it as an argument and reassign the return-value.<br />For example:</p>"
const $ = cheerio.load(html)
var paragraph = $('p').html(); //Contents of paragraph. You can manipulate this in any other way you like
//...You would do the same for any other element you require
You should check out Cheerio and read its documentation. I find it really neat!
Edit: for the new part of your question
You can iterate over every element and insert it into an array of JSON objects like this:
var jsonObject = []; //An array of JSON objects that will hold everything
$('p').each(function() { //Loop for each paragraph
//Now let's take the content of the paragraph and put it into a json object
jsonObject.push({"paragraph":$(this).html()}); //Add data to the main jsonObject
});
So the resulting array of JSON objects should look something like this:
[
{
"paragraph": "text"
},
{
"paragraph": "text 2"
},
{
"paragraph": "text 3"
}
]
I believe You should also read up on JSON and how it works.
这篇关于如何在 Nodejs 中读取和解析 html?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!