JavaScript的HTML刮 [英] Javascript HTML Scraping

查看：91 发布时间：2016/6/3 21:56:11 javascript html arrays node.js

本文介绍了JavaScript的HTML刮的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个网页只纯文本的工作 - 我该怎么去'刮'的数据，然后将其存储到一个数组变量。有没有标记（即不'格'，'ID'等）

I am working with a web page with just plain text - how can I go about 'scraping' the data and then storing it into an array variable. There are no tags (i.e. no 'div','id' etc.)

该HTML看起来像这样（即，如果您要查看源$ C $ C这纯粹是完全纯文本W / O标记）

The html looks like something like this (i.e. if you were to view the source code it would just be completely plain text w/o markup)

HTML（查看源代码：www.blablabla.com/path.txt）：

HTML (view-source:www.blablabla.com/path.txt):

Hello World My Name is John

我想每个字存储到线沿线的一个数组：

I would like to store each word into an array along the lines of:

VAR数组= [你好，世界，我的，姓名，是，约翰];

var array = ["Hello", "World", "My", "Name", "is", "John"];

推荐答案

如果您正在使用节点的 HTTP ，你可以直接读取数据。

If you're using node's http , you can just read the data directly.

var http = require('http');

http.get('http://www.example.com', function(res) {

}).on('data', function(chunk) {
  // do something with the chunk here, for example print it out
  console.log('body: ' + chunk);
});

这是更简单的方式来做到这将是通过要求包

An easier way to do this would be via the request package

var request = require('request');

request('http://www.example.com', function(err, resp, body) {
  if(!error && resp.statusCode == 200) {
    // do something with body
    var array = body.split(/(\s+)/);
  }
});

这篇关于JavaScript的HTML刮的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

JavaScript的HTML刮 [英] Javascript HTML Scraping

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

JavaScript的HTML刮 [英] Javascript HTML Scraping

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭