从网页获取价格表格到Google电子表格中 [英] Get table of prices from webpage into a Google spreadsheet
问题描述
我正尝试将以下表格 - 从链接 - 转换为Google
我试过以下内容:
= IMPORTXML( http://www.immopreise.at/Wien/Wohnung/Miete\",\"//table[@id='preisTabelle'])
附件中您可以找到示例表:
https://docs.google.com/spreadsheets/d/1-aXJULo6BELQQ6136Lps_HUzOwkw5SKaPGxIl5gBDfM/edit?usp=sharing
我的问题是我没有得到任何回报。
有什么建议我做错了吗?
感谢您的回复!
第一种方法: (使用ImportXml和RegexExtract:
= IMPORTXML( http://www.immopreise.at/Wien/Wohnung/Miete,
// table [@ id ='preisTabelle'])
所提到的代码产生一个空字符串,因为网页在该位置有一个空表,如下所示:
< table id =preisTabelle>< / table>
数据实际位于< script>
tag:
< script>
var ImmoOptions = {mapOptions:{region:Wien,karteAnzeigen:true},TrendChartConf:{uri:{district:/ Trend / GetDistricts图 :/趋势/ GetChart, chart1: /趋势/ GetTrendChart, 比较: / Preisvergleich, chart2: /趋势/趋势图, 刷新: / Preisentwicklung uebersicht: /区域= Wien\\\&pathInfo = Wohnung%2FMiete?}, firstDirstrict:{ 维也纳: 维恩-1-维也纳内-施塔特, Niederoesterreich: 圣-Poelten-施塔特 布尔根兰: 艾森斯塔特城市半州, Oberoesterreich: 林茨城市半州, 施泰尔马克: 格拉茨ALLE-Bezirke, 克恩滕: 克拉根福城市半州, 萨尔茨堡:Salzburg-斯塔特, 蒂罗尔: 因斯布鲁克-斯塔特, 福拉尔贝格: 布雷根茨}, firstDirstrictId:{ 9:231, 3:153, 1:133, 4:177 , 6:201, 2:142, 5:195, 7:218, 8:228}}, preisInfos:{ tabelle:{ spalten:[{命名 : ≤50m² spaltenArt: Waehrung, nachkommaStellen:真实的, farbmarkierung:空},{ 名: 51-80m², spaltenArt: Waehrung, nachkommaStellen :真正的 farbmarkierung :空},{ 名 : 81-129m² spaltenArt: Waehrun G nachkommaStellen:真实的, farbmarkierung:空},{ 名: \\\>130m², spaltenArt: Waehrung, nachkommaStellen:真实的, farbmarkierung:空},{ name:\\\\\\&#216; /m²\\\\\\\ \\\\\\\\\\\\\\\\\\\ u0027Detailed\\\'\\\>Trend\\\\\\Td.\0000c / span\\\>,spaltenArt:Tendenz,nachkommaStellen :false,farbmarkierung:null}],zeilen:[{name:1.,Innere Stadt,zellen:[21.68,19.02,18.43,19.56,19.27,0],id :231},{name:2.,Leopoldstadt,zellen:[18.27,15.06,14.28,14.20,14.85,1],id:232},{name: ,Landstraße,zellen:[18.88,17.04,15.42,14.68,16.03,1],id:233},{name:4.,Wieden,z ellen:[19.37,16.58,16.89,16.35,16.83,1],id:234},{name:5.,Margareten,zellen:[15.46,14.11,14.20,14.77,14.39 ,id:235},{name:6.,Mariahilf,zellen:[18.23,14.68,15.72,15.32,15.53,1],id:236},{名称:7.,Neubau,zellen:[16.09,14.89,14.58,14.94,14.95,0],id:237},{name:8.,Josefstadt,zellen :[16.77,16.78,14.02,14.93,15.08,0],id:238},{name:9.,Alsergrund,zellen:[15.72,14.48,14.53,14.92,14.69,0 ],id:239},{name:10.,Favoriten,zellen:[14.14,12.35,11.81,0,12.52,0],id:240},{name :11.,Simmering,zellen:[13.69,12.34,11.50,13.46,12.38,-1],id:241},{name:12.Meidling,zellen: [15.66,14.97,13.28,11.79,14.54,1],id:242},{name:13.,Hietzing,zellen:[16.71,15.93,14.63,14.05,14.99,0] ,id:243},{name:14。,Penzing,zellen:[14.43,13.14,12.72,12.37,13.11,0],id:244},{name 15.,Rudolfsheim-Fünfhaus,zellen:[13.58,12.90,12.93,11.76,13.07,0],id:245},{name:16。,Ottakring,zellen: [13.99,12.64,12.64,12.45,12.90,0], ID:2 46},{name:17。,Hernals,zellen:[14.71,13.06,13.15,13.61,13.50,1],id:247},{name:18。,Währing zellen:[14.47,13.96,13.82,15.67,14.37,0],id:248},{name:19。,Döbling,zellen:[16.21,14.37,15.06, 16.44,15.29,0],id:249},{name:20。,Brigittenau,zellen:[15.68,13.57,12.56,13.30,13.43,0],id:250} ,{name:21。,Floridsdorf,zellen:[15.58,13.58,12.38,14.97,13.68,1],id:251},{name:22。,Donaustadt, zellen:[18.19,15.57,16.85,15.89,16.18,0],id:252},{name:23。,Liesing,zellen:[14.79,14.09,13.49,15.60, 13.92,1],id:253}],tabellenTitel:Wohnungen Miete,titelErsteSpalte:Bezirk,GesamtAnzahlObjekte:12739},preisspannen:[{bis:12}, { 双:14},{ 双:15},{ 双:空}]}, basecharts:NULL, CurrentView:{ trendVar:{ CatNum:0,ImmoArtNum :5,AltbauNum:2,AngebotTypeNum:1},hid:0}};
jQuery(function(){
InitMap(ImmoOptions.mapOptions);
});
< / script>
最感兴趣的数据可在ImmoOptions变量中找到:
[
{
name:1.,Innere Stadt,
zellen:[21.68,19.02,18.43 ,19.56,19.27,0],
id:231
},
{
name:2.,Leopoldstadt,
zellen :[18.27,15.06,14.28,14.2,14.85,1],
id:232
},
/ *为简洁起见编辑* /
]
以下公式可以将脚本放入电子表格中的单元格中(假设我们将其粘贴到单元格A [100] )..
= IMPORTXML(http://www.immopreise.at/Wien/Wohnung/Miete,/ / script [2])
然后,下面的公式提取JSON字符串(ImmoOptions变量的值)到一个单元格中(假设我们将以下内容粘贴到单元格A [1]中)。
$ $ p $ = REGEXEXTRACT(A100, (?s)=(。*))
此时,我们需要javascript解析JSON。这可以通过将工作表转换为 Google应用(工具 - >脚本编辑器)并使用JavaScript进行编码来完成。
在javascript中,会有三个步骤(细节未在此显示):
1。使用IMPORTXML获取脚本中的数据(在url / page中)
2.使用REGEXEXTRACT将ImmoOptions的值作为JSON字符串
3.解析JSON字符串以获取数据
第二种方法:以下是您如何使用Google应用/脚本:
-
登录Google并打开此电子表格在浏览器中。
选择 -
转到新窗口/选项卡。选择
工具 - >脚本编辑器
。这会让你和脚本一起编辑。从工具栏中选择函数doGet
并运行脚本;它会生成电子表格。
下面是附在表格中的脚本(仅供参考,以供参考链接失踪):
函数doGet(){
var r1 = Math.random( )*千亿;
var html = UrlFetchApp.fetch(http://www.immopreise.at/Wien/Wohnung/Miete?somevariable=+ r1).getContentText();
var re = / var ImmoOptions =(。*); / i;
var jo = JSON.parse(re.exec(html)[1]);
var arr = jo [preisInfos] [tabelle] [zeilen];
var sheet = SpreadsheetApp.getActiveSheet();
sheet.clear(); sheet.appendRow([R1]);
sheet.appendRow(['Bezirk','Col-1','Col-2','Col-3','Col-4','Col-5','Col-6']) ;
for(var i = 0; i< arr.length; i ++){
var item = arr [i]; var row = [item.name];
row = row.concat(item.zellen); sheet.appendRow(行);
$ b工作原理:
- 拉取相关网址的整个HTML内容。
- 使用正则表达式从
< script> ..< / script>
- 解压缩json数据。 >
- 获取相关数组;填充到电子表格中。
缺点:
- 这是一个易碎的修补程序脚本,它将与
< script>
include中的更改打破(或以任何其他方式正则表达式的中断) - 不给你很好的表格UI控件(它们可以被构建,但有更多的工作)。
- 整个json数据都在一行中(可以通过删除新行或使用正确的正则表达式来修改)。
I am trying to get the following table - from link - into a google sheet.
I tried the following:
=IMPORTXML("http://www.immopreise.at/Wien/Wohnung/Miete","//table[@id='preisTabelle']")
Attached you can find an example sheet:
https://docs.google.com/spreadsheets/d/1-aXJULo6BELQQ6136Lps_HUzOwkw5SKaPGxIl5gBDfM/edit?usp=sharing
My problem is I do not get anything back.
Any suggestions what I am doing wrong?
I appreciate your reply!
解决方案First Approach: Here is how you can do it (Using ImportXml and RegexExtract:
=IMPORTXML("http://www.immopreise.at/Wien/Wohnung/Miete", "//table[@id='preisTabelle']")
The mentioned code produces an empty string, because, the web page has an empty table at that location, shown as below:
<table id="preisTabelle"></table>
The data is actually located inside a
<script>
tag:<script> var ImmoOptions = {"mapOptions":{"region":"Wien","karteAnzeigen":true},"TrendChartConf":{"uri":{"district":"/Trend/GetDistricts","chart":"/Trend/GetChart","chart1":"/Trend/GetTrendChart","compare":"/Preisvergleich","chart2":"/Trend/TrendChart","refresh":"/Preisentwicklung","uebersicht":"/?region=Wien\u0026pathInfo=Wohnung%2FMiete"},"firstDirstrict":{"Wien":"Wien-1-Innere-Stadt","Niederoesterreich":"Sankt-Poelten-Stadt","Burgenland":"Eisenstadt-Stadt","Oberoesterreich":"Linz-Stadt","Steiermark":"Graz-Alle-Bezirke","Kaernten":"Klagenfurt-Stadt","Salzburg":"Salzburg-Stadt","Tirol":"Innsbruck-Stadt","Vorarlberg":"Bregenz"},"firstDirstrictId":{"9":231,"3":153,"1":133,"4":177,"6":201,"2":142,"5":195,"7":218,"8":228}},"preisInfos":{"tabelle":{"spalten":[{"name":"≤50m²","spaltenArt":"Waehrung","nachkommaStellen":true,"farbmarkierung":null},{"name":"51-80m²","spaltenArt":"Waehrung","nachkommaStellen":true,"farbmarkierung":null},{"name":"81-129m²","spaltenArt":"Waehrung","nachkommaStellen":true,"farbmarkierung":null},{"name":"\u003e130m²","spaltenArt":"Waehrung","nachkommaStellen":true,"farbmarkierung":null},{"name":"\u003cspan class=\u0027Detailed\u0027\u003e\u0026#216;/m²\u003c/span\u003e\u003cspan class=\u0027Compact\u0027\u003e\u0026#216;/m²\u003c/span\u003e","spaltenArt":"Waehrung","nachkommaStellen":true,"farbmarkierung":true},{"name":"\u003cspan class=\u0027Detailed\u0027\u003eTrend\u003c/span\u003e\u003cspan class=\u0027Compact\u0027\u003eTd.\u003c/span\u003e","spaltenArt":"Tendenz","nachkommaStellen":false,"farbmarkierung":null}],"zeilen":[{"name":" 1., Innere Stadt","zellen":[21.68,19.02,18.43,19.56,19.27,0],"id":231},{"name":" 2., Leopoldstadt","zellen":[18.27,15.06,14.28,14.20,14.85,1],"id":232},{"name":" 3., Landstraße","zellen":[18.88,17.04,15.42,14.68,16.03,1],"id":233},{"name":" 4., Wieden","zellen":[19.37,16.58,16.89,16.35,16.83,1],"id":234},{"name":" 5., Margareten","zellen":[15.46,14.11,14.20,14.77,14.39,0],"id":235},{"name":" 6., Mariahilf","zellen":[18.23,14.68,15.72,15.32,15.53,1],"id":236},{"name":" 7., Neubau","zellen":[16.09,14.89,14.58,14.94,14.95,0],"id":237},{"name":" 8., Josefstadt","zellen":[16.77,16.78,14.02,14.93,15.08,0],"id":238},{"name":" 9., Alsergrund","zellen":[15.72,14.48,14.53,14.92,14.69,0],"id":239},{"name":"10., Favoriten","zellen":[14.14,12.35,11.81,0,12.52,0],"id":240},{"name":"11., Simmering","zellen":[13.69,12.34,11.50,13.46,12.38,-1],"id":241},{"name":"12., Meidling","zellen":[15.66,14.97,13.28,11.79,14.54,1],"id":242},{"name":"13., Hietzing","zellen":[16.71,15.93,14.63,14.05,14.99,0],"id":243},{"name":"14., Penzing","zellen":[14.43,13.14,12.72,12.37,13.11,0],"id":244},{"name":"15., Rudolfsheim-Fünfhaus","zellen":[13.58,12.90,12.93,11.76,13.07,0],"id":245},{"name":"16., Ottakring","zellen":[13.99,12.64,12.64,12.45,12.90,0],"id":246},{"name":"17., Hernals","zellen":[14.71,13.06,13.15,13.61,13.50,1],"id":247},{"name":"18., Währing","zellen":[14.47,13.96,13.82,15.67,14.37,0],"id":248},{"name":"19., Döbling","zellen":[16.21,14.37,15.06,16.44,15.29,0],"id":249},{"name":"20., Brigittenau","zellen":[15.68,13.57,12.56,13.30,13.43,0],"id":250},{"name":"21., Floridsdorf","zellen":[15.58,13.58,12.38,14.97,13.68,1],"id":251},{"name":"22., Donaustadt","zellen":[18.19,15.57,16.85,15.89,16.18,0],"id":252},{"name":"23., Liesing","zellen":[14.79,14.09,13.49,15.60,13.92,1],"id":253}],"tabellenTitel":"Wohnungen Miete","titelErsteSpalte":"Bezirk","GesamtAnzahlObjekte":12739},"preisspannen":[{"bis":12},{"bis":14},{"bis":15},{"bis":null}]},"basecharts":null,"CurrentView":{"trendVar":{"CatNum":0,"ImmoArtNum":5,"AltbauNum":2,"AngebotTypeNum":1},"hid":0}} ; jQuery(function () { InitMap(ImmoOptions.mapOptions); }); </script>
The data of most interest is found inside variable ImmoOptions:
[ { "name": " 1., Innere Stadt", "zellen": [21.68, 19.02, 18.43, 19.56, 19.27, 0], "id": 231 }, { "name": " 2., Leopoldstadt", "zellen": [18.27, 15.06, 14.28, 14.2, 14.85, 1], "id": 232 }, /* Edited for brevity */ ]
The following formula can get the script into a cell in spread sheet (let's say we pasted it into cell A[100]) ..
=IMPORTXML("http://www.immopreise.at/Wien/Wohnung/Miete","//script[2]")
Then, the following formula extracts JSON string (value of the ImmoOptions variable) into a cell (let's say we pasted the following into cell A[1]) ..
=REGEXEXTRACT(A100,"(?s)=(.*)")
At this point, we need javascript to parse JSON. This can be done by converting the sheet to a Google App (Tools->Script Editor) and doing the coding in javascript.
In the javascript, there will be three steps (The details are not shown here):
1. Use IMPORTXML to get the data inside script (in the url/page) 2. Use REGEXEXTRACT to get the value of ImmoOptions as JSON string 3. Parse JSON string to get the data
Second Approach: Here is how you can do it using Google App/Script:
Log into google and open this spreadsheet in browser.
Choose
File->Make a Copy
(may be with a name like S1). This will make a copy of the file in your google drive; and opens it in a new tab.Go to that new window/tab. Choose
Tools->Script Editor
. This will put you into a editor with the script. From the toolbar select the functiondoGet
and run the script; it will generate the spreadsheet.
Here is the script attached with the sheet (for reference, in case the link goes missing):
function doGet() { var r1=Math.random()*100000000000; var html = UrlFetchApp.fetch("http://www.immopreise.at/Wien/Wohnung/Miete?somevariable=" + r1).getContentText(); var re = /var ImmoOptions = (.*);/i; var jo=JSON.parse(re.exec(html)[1]); var arr=jo["preisInfos"]["tabelle"]["zeilen"]; var sheet = SpreadsheetApp.getActiveSheet(); sheet.clear(); sheet.appendRow([r1]); sheet.appendRow(['Bezirk','Col-1','Col-2','Col-3','Col-4','Col-5','Col-6']); for (var i=0;i<arr.length;i++) { var item = arr[i]; var row=[item.name]; row=row.concat(item.zellen); sheet.appendRow(row); } }
How it works:
- Pulls the entire html content of the relevant url.
- Uses regular expression to extract json data from inside
<script>..</script>
- Parses the extracted json data.
- Gets the relevant array; populates into the spreadsheet.
Disadvantages:
- It is a brittle patch-work script that will break with changes in the
<script>
include (or in any other way regex's break) - Doesn't give you nice controls on the UI of table (They could be built, but with more work).
- Works only if the entire json data is in a single line (Could be modified by removing new lines .. or by using a proper regex).
这篇关于从网页获取价格表格到Google电子表格中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
文件 - >制作副本
(可能带有像S1这样的名称)。这将在您的谷歌驱动器中制作该文件的副本;并在新标签中打开它。