使用Node JS从URL提取表值 [英] Extracting table value from an URL with Node JS

查看:42
本文介绍了使用Node JS从URL提取表值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Node JS并不陌生,但我正在尝试建立一个提供静态文件的网站.经过一些研究,我发现带有Express的NodeJS对此非常有用.到目前为止,我设法提供了一些位于服务器上的静态html文件,但是现在我想做其他事情:我有一个指向html页面的URL,并且在该html页面中,有一个包含一些信息的表.

I am quite new to Node JS and express but I am trying to build a website which serves static files. After some research I've found out that NodeJS with Express can be quite useful for this. So far I managed to serve some static html files which are located on my server, but now I want to do something else: I have an URL to an html page, and in that html page, there is a table with some information.

我想从中提取几个特定的​​值,并且1)将其另存为JSON在文件中,2)将这些值写入html页面.我尝试过使用jQuery,但到目前为止,我一直没有成功.

I want to extract specific a couple of values from it, and 1) save it as JSON in a file, 2) write those values in a html page. I've tried to play with jQuery, but so far I've been unsuccessful.

这是我到目前为止所拥有的:

This is what I have so far:

1.node应用程序运行在端口8081上,我将使用NGINX反向代理从任何地方进一步访问它(我已经安装了nginx并且可以正常工作)

1.node app running on port 8081, which I will further access it from anywhere with NGINX reverse proxy (I already have nginx setup and it works)

2.使用适当的URI时,我可以获取URL并将其用作HTML.

2.I can get the URL and serve it as HTML when I use the proper URI.

3.我看到该表没有ID,而只有一个与它相关联的详细信息"类.另外,我只对获取这些行感兴趣:

3.I see that the table doesn't have an ID, but only the "details" class associated with it. Also, I am only interested in getting these rows:

<div class='group'>
<table class='details'>
<tr>
<th>Status:</th>
<td>
With editors
</td>
</tr>

根据我到目前为止所看到的,如果表具有ID,则jQuery可以正常工作.

From what I've seen so far, jQuery would work fine if the table has an ID.

这是我在 app.js


var express = require('express');
var app = express();
var request = require('request');
const path = require('path');

var content;

app.use('/', function(req, res, next) {
  var status = 'It works';
  console.log('This is very %s', status);
  //console.log(content);
  next();
});

request(
  {
    uri:
      'https://authors.aps.org/Submissions/status?utf8=%E2%9C%93&accode=CH10674&author=Poenaru&commit=Submit'
  },
  function(error, response, body) {
    content = body;
  }
);

app.get('/', function(req, res) {
  console.log('Got a GET request for the homepage');
  res.sendFile(path.join(__dirname, '/', 'index.html'));
});

app.get('/url', function(req, res) {
  console.log('You requested table data!!!');

TO DO:   SHOW ONLY THE THE VALUES OF THAT TABLE INSTEAD OF THE WHOLE HTML PAGE

  res.send(content);
});

var server = app.listen(8081, function() {
  var host = server.address().address;
  var port = server.address().port;
  console.log('Node-App listening at http://%s:%s', host, port);
});

基本上,该URL的HTML内容被保存到 content 变量中,现在我只想保存其中的表,并且还仅将保存的部分输出到新的html页面.

Basically, the HTML content of that URL is saved into content variable, and now I would like to save only the table from it, and also output only the saved part to the new html page.

有什么想法吗?预先谢谢你:)

Any ideas? Thank you in advance :)

推荐答案

好,所以我遇到了一个名为 cheerio 基本上允许一个人在服务器上使用jQuery.有了来自该特定URL的html代码,我可以在该表中搜索所需的元素.Cheerio非常简单,使用此代码,我得到了所需的结果:

Ok, So I've come across this package called cheerio which basically allows one to use jQuery on the server. Having the html code from that specific URL, I could search in that table the elements that I need. Cheerio is quite straight-forward and with this code I got the results I needed:

var cheerio = require('cheerio');
request(
  'https://authors.aps.org/Submissions/status?utf8=%E2%9C%93&accode=CH10674&author=Poenaru&commit=Submit',
  (error, res, html) => {
    if (!error && res.statusCode === 200) {
      const $ = cheerio.load(html);
      const details = $('.details');
      const articleInfo = details.find('th').eq(0);
      const articleStatus = details
        .find('th')
        .next()
        .eq(0);
      //console.log(details.html());
      console.log(articleInfo.html());
      console.log(articleStatus.html());
    }
  }
);

感谢@ O.Jones和@avcS指导我使用 jsdon html-node-parser .我一定会在不久的将来和那些玩的:)

Thank you @O.Jones and @avcS for guiding me to jsdon and html-node-parser. I will definitely play with those in the near future :)

干杯!

这篇关于使用Node JS从URL提取表值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆