JavaScript的HTML刮 [英] Javascript HTML Scraping
问题描述
我有一个网页只纯文本的工作 - 我该怎么去'刮'的数据,然后将其存储到一个数组变量。有没有标记(即不'格','ID'等)
I am working with a web page with just plain text - how can I go about 'scraping' the data and then storing it into an array variable. There are no tags (i.e. no 'div','id' etc.)
该HTML看起来像这样(即,如果您要查看源$ C $ C这纯粹是完全纯文本W / O标记)
The html looks like something like this (i.e. if you were to view the source code it would just be completely plain text w/o markup)
HTML(查看源代码:www.blablabla.com/path.txt):
HTML (view-source:www.blablabla.com/path.txt):
Hello World My Name is John
我想每个字存储到线沿线的一个数组:
I would like to store each word into an array along the lines of:
VAR数组= [你好,世界,我的,姓名,是,约翰];
var array = ["Hello", "World", "My", "Name", "is", "John"];
推荐答案
如果您正在使用节点的 HTTP
,你可以直接读取数据。
If you're using node's http
, you can just read the data directly.
var http = require('http');
http.get('http://www.example.com', function(res) {
}).on('data', function(chunk) {
// do something with the chunk here, for example print it out
console.log('body: ' + chunk);
});
这是更简单的方式来做到这将是通过要求
包
An easier way to do this would be via the request
package
var request = require('request');
request('http://www.example.com', function(err, resp, body) {
if(!error && resp.statusCode == 200) {
// do something with body
var array = body.split(/(\s+)/);
}
});
这篇关于JavaScript的HTML刮的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!