JavaScript的HTML刮 [英] Javascript HTML Scraping

查看:91
本文介绍了JavaScript的HTML刮的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网页只纯文本的工作 - 我该怎么去'刮'的数据,然后将其存储到一个数组变量。有没有标记(即不'格','ID'等)

I am working with a web page with just plain text - how can I go about 'scraping' the data and then storing it into an array variable. There are no tags (i.e. no 'div','id' etc.)

该HTML看起来像这样(即,如果您要查看源$ C ​​$ C这纯粹是完全纯文本W / O标记)

The html looks like something like this (i.e. if you were to view the source code it would just be completely plain text w/o markup)

HTML(查看源代码:www.blablabla.com/path.txt):

HTML (view-source:www.blablabla.com/path.txt):

Hello World My Name is John

我想每个字存储到线沿线的一个数组:

I would like to store each word into an array along the lines of:

VAR数组= [你好,世界,我的,姓名,是,约翰];

var array = ["Hello", "World", "My", "Name", "is", "John"];

推荐答案

如果您正在使用节点的 HTTP ,你可以直接读取数据。

If you're using node's http , you can just read the data directly.

var http = require('http');

http.get('http://www.example.com', function(res) {

}).on('data', function(chunk) {
  // do something with the chunk here, for example print it out
  console.log('body: ' + chunk);
});

这是更简单的方式来做到这将是通过要求

An easier way to do this would be via the request package

var request = require('request');

request('http://www.example.com', function(err, resp, body) {
  if(!error && resp.statusCode == 200) {
    // do something with body
    var array = body.split(/(\s+)/);
  }
});

这篇关于JavaScript的HTML刮的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆