从网页抓取文本并将其存储为变量 [英] Grabbing text from webpage and storing as variable

查看:156
本文介绍了从网页抓取文本并将其存储为变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在网页上



http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463



它列出游戏中特定商品的价格,我想抓住该商品的当前指导价:,并将其存储为一个变量,以便我可以将其输出到谷歌电子表格中。我只想要这个号码,目前它是643.8k,但我不确定如何抓取这样的特定文本。



由于数字在k形式,这意味着我无法绘制它,它必须是643,800这样的东西才能使其可绘制。我有一个公式,我的第二个问题是要知道是否有可能在拉数上使用公式,然后将其存储为最终输出?



-EDIT -



这是我到目前为止所做的,并不能确定原因。

 函数pullRuneScape(){

var page = UrlFetchApp.fetch(http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj= 19463\" )getContentText();

var number = page.match(/当前指导价格:< \ / th> \\\
(\ d *)/)[1];

SpreadsheetApp.getActive()。getSheetByName('RuneScape')。appendRow([new Date(),number]);



$ div class =h2_lin>解决方案

你的正则表达式是错的。我成功测试了这一个:

  var number = page.match(/当前指导价格:< \ / th> \\ \\s * LT; TD>([^<] *)< \ / TD> / m)的[1]; 

功能:


  1. 当前指导价格:< \ / th> 查找当前指导价:并关闭td标签
  2. code> \s *< td> 允许标记之间有空格,找到打开的td标记
  3. ([^< ;] *)构建一个组并匹配除此字符以外的所有内容<

  4. < \ / td> 匹配关闭的td标记

  5. / m 匹配多行


On the webpage

http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463

It lists prices for a particular item in a game, I wanted to grab the "Current guide price:" of said item, and store it as a variable so I could output it in a google spreadsheet. I only want the number, currently it is "643.8k", but I am not sure how to grab specific text like that.

Since the number is in "k" form, that means I can't graph it, It would have to be something like 643,800 to make it graphable. I have a formula for it, and my second question would be to know if it's possible to use a formula on the number pulled, then store that as the final output?

-EDIT-

This is what I have so far and it's not working not sure why.

function pullRuneScape() {

var page = UrlFetchApp.fetch("http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463").getContentText();

  var number = page.match(/Current guide price:<\/th>\n(\d*)/)[1];

  SpreadsheetApp.getActive().getSheetByName('RuneScape').appendRow([new Date(), number]);

}

解决方案

Your regex is wrong. I tested this one successfully:

var number = page.match(/Current guide price:<\/th>\s*<td>([^<]*)<\/td>/m)[1];

What it does:

  1. Current guide price:<\/th> find Current guide price: and closing td tag
  2. \s*<td> allow whitespace between tags, find opening td tag
  3. ([^<]*) build a group and match everything except this char <
  4. <\/td> match the closing td tag
  5. /m match multiline

这篇关于从网页抓取文本并将其存储为变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆