从网页获取价格表格到Google电子表格中 [英] Get table of prices from webpage into a Google spreadsheet

查看:130
本文介绍了从网页获取价格表格到Google电子表格中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试将以下表格 - 从链接 - 转换为Google

我试过以下内容:

  = IMPORTXML( http://www.immopreise.at/Wien/Wohnung/Miete\",\"//table[@id='preisTabelle'])

附件中您可以找到示例表:

https://docs.google.com/spreadsheets/d/1-aXJULo6BELQQ6136Lps_HUzOwkw5SKaPGxIl5gBDfM/edit?usp=sharing



我的问题是我没有得到任何回报。



有什么建议我做错了吗?



感谢您的回复!

解决方案

第一种方法: (使用ImportXml和RegexExtract:

  = IMPORTXML( http://www.immopreise.at/Wien/Wohnung/Miete,
// table [@ id ='preisTabelle'])

所提到的代码产生一个空字符串,因为网页在该位置有一个空表,如下所示:

 < table id =preisTabelle>< / table> 

数据实际位于< script> tag:

 < script> 
var ImmoOptions = {mapOptions:{region:Wien,karteAnzeigen:true},TrendChartConf:{uri:{district:/ Trend / GetDistricts图 :/趋势/ GetChart, chart1: /趋势/ GetTrendChart, 比较: / Preisvergleich, chart2: /趋势/趋势图, 刷新: / Preisentwicklung uebersicht: /区域= Wien\\\&pathInfo = Wohnung%2FMiete?}, firstDirstrict:{ 维也纳: 维恩-1-维也纳内-施塔特, Niederoesterreich: 圣-Poelten-施塔特 布尔根兰: 艾森斯塔特城市半州, Oberoesterreich: 林茨城市半州, 施泰尔马克: 格拉茨ALLE-Bezirke, 克恩滕: 克拉根福城市半州, 萨尔茨堡:Salzburg-斯塔特, 蒂罗尔: 因斯布鲁克-斯塔特, 福拉尔贝格: 布雷根茨}, firstDirstrictId:{ 9:231, 3:153, 1:133, 4:177 , 6:201, 2:142, 5:195, 7:218, 8:228}}, preisInfos:{ tabelle:{ spalten:[{命名 : ≤50m² spaltenArt: Waehrung, nachkommaStellen:真实的, farbmarkierung:空},{ 名: 51-80m², spaltenArt: Waehrung, nachkommaStellen :真正的 farbmarkierung :空},{ 名 : 81-129m² spaltenArt: Waehrun G nachkommaStellen:真实的, farbmarkierung:空},{ 名: \\\>130m², spaltenArt: Waehrung, nachkommaStellen:真实的, farbmarkierung:空},{ name:\\\\\\&#216; /m²\\\\\\\ \\\\\\\\\\\\\\\\\\\ u0027Detailed\\\'\\\>Trend\\\\\\Td.\0000c / span\\\>,spaltenArt:Tendenz,nachkommaStellen :false,farbmarkierung:null}],zeilen:[{name:1.,Innere Stadt,zellen:[21.68,19.02,18.43,19.56,19.27,0],id :231},{name:2.,Leopoldstadt,zellen:[18.27,15.06,14.28,14.20,14.85,1],id:232},{name: ,Landstraße,zellen:[18.88,17.04,15.42,14.68,16.03,1],id:233},{name:4.,Wieden,z ellen:[19.37,16.58,16.89,16.35,16.83,1],id:234},{name:5.,Margareten,zellen:[15.46,14.11,14.20,14.77,14.39 ,id:235},{name:6.,Mariahilf,zellen:[18.23,14.68,15.72,15.32,15.53,1],id:236},{名称:7.,Neubau,zellen:[16.09,14.89,14.58,14.94,14.95,0],id:237},{name:8.,Josefstadt,zellen :[16.77,16.78,14.02,14.93,15.08,0],id:238},{name:9.,Alsergrund,zellen:[15.72,14.48,14.53,14.92,14.69,0 ],id:239},{name:10.,Favoriten,zellen:[14.14,12.35,11.81,0,12.52,0],id:240},{name :11.,Simmering,zellen:[13.69,12.34,11.50,13.46,12.38,-1],id:241},{name:12.Meidling,zellen: [15.66,14.97,13.28,11.79,14.54,1],id:242},{name:13.,Hietzing,zellen:[16.71,15.93,14.63,14.05,14.99,0] ,id:243},{name:14。,Penzing,zellen:[14.43,13.14,12.72,12.37,13.11,0],id:244},{name 15.,Rudolfsheim-Fünfhaus,zellen:[13.58,12.90,12.93,11.76,13.07,0],id:245},{name:16。,Ottakring,zellen: [13.99,12.64,12.64,12.45,12.90,0], ID:2 46},{name:17。,Hernals,zellen:[14.71,13.06,13.15,13.61,13.50,1],id:247},{name:18。,Währing zellen:[14.47,13.96,13.82,15.67,14.37,0],id:248},{name:19。,Döbling,zellen:[16.21,14.37,15.06, 16.44,15.29,0],id:249},{name:20。,Brigittenau,zellen:[15.68,13.57,12.56,13.30,13.43,0],id:250} ,{name:21。,Floridsdorf,zellen:[15.58,13.58,12.38,14.97,13.68,1],id:251},{name:22。,Donaustadt, zellen:[18.19,15.57,16.85,15.89,16.18,0],id:252},{name:23。,Liesing,zellen:[14.79,14.09,13.49,15.60, 13.92,1],id:253}],tabellenTitel:Wohnungen Miete,titelErsteSpalte:Bezirk,GesamtAnzahlObjekte:12739},preisspannen:[{bis:12}, { 双:14},{ 双:15},{ 双:空}]}, basecharts:NULL, CurrentView:{ trendVar:{ CatNum:0,ImmoArtNum :5,AltbauNum:2,AngebotTypeNum:1},hid:0}};
jQuery(function(){
InitMap(ImmoOptions.mapOptions);
});
< / script>

最感兴趣的数据可在ImmoOptions变量中找到:

  [
{
name:1.,Innere Stadt,
zellen:[21.68,19.02,18.43 ,19.56,19.27,0],
id:231
},
{
name:2.,Leopoldstadt,
zellen :[18.27,15.06,14.28,14.2,14.85,1],
id:232
},
/ *为简洁起见编辑* /
]

以下公式可以将脚本放入电子表格中的单元格中(假设我们将其粘贴到单元格A [100] )..

  = IMPORTXML(http://www.immopreise.at/Wien/Wohnung/Miete,/ / script [2])

然后,下面的公式提取JSON字符串(ImmoOptions变量的值)到一个单元格中(假设我们将以下内容粘贴到单元格A [1]中)。

$ $ p $ = REGEXEXTRACT(A100, (?s)=(。*))

此时,我们需要javascript解析JSON。这可以通过将工作表转换为 Google应用(工具 - >脚本编辑器)并使用JavaScript进行编码来完成。



在javascript中,会有三个步骤(细节未在此显示):

  1。使用IMPORTXML获取脚本中的数据(在url / page中)
2.使用REGEXEXTRACT将ImmoOptions的值作为JSON字符串
3.解析JSON字符串以获取数据






第二种方法:以下是您如何使用Google应用/脚本:


  1. 登录Google并打开此电子表格在浏览器中。

  2. 选择文件 - >制作副本(可能带有像S1这样的名称)。这将在您的谷歌驱动器中制作该文件的副本;并在新标签中打开它。
  3. 转到新窗口/选项卡。选择工具 - >脚本编辑器。这会让你和脚本一起编辑。从工具栏中选择函数 doGet 并运行脚本;它会生成电子表格。


    下面是附在表格中的脚本(仅供参考,以供参考链接失踪):

     函数doGet(){
    var r1 = Math.random( )*千亿;
    var html = UrlFetchApp.fetch(http://www.immopreise.at/Wien/Wohnung/Miete?somevariable=+ r1).getContentText();
    var re = / var ImmoOptions =(。*); / i;
    var jo = JSON.parse(re.exec(html)[1]);
    var arr = jo [preisInfos] [tabelle] [zeilen];

    var sheet = SpreadsheetApp.getActiveSheet();
    sheet.clear(); sheet.appendRow([R1]);
    sheet.appendRow(['Bezirk','Col-1','Col-2','Col-3','Col-4','Col-5','Col-6']) ;

    for(var i = 0; i< arr.length; i ++){
    var item = arr [i]; var row = [item.name];
    row = row.concat(item.zellen); sheet.appendRow(行);




    $ b

    工作原理:


    1. 拉取相关网址的整个HTML内容。

    2. 使用正则表达式从< script> ..< / script>

    3. 解压缩json数据。
    4. >
    5. 获取相关数组;填充到电子表格中。

    缺点:


    1. 这是一个易碎的修补程序脚本,它将与< script> include中的更改打破(或以任何其他方式正则表达式的中断)

    2. 不给你很好的表格UI控件(它们可以被构建,但有更多的工作)。
    3. 整个json数据都在一行中(可以通过删除新行或使用正确的正则表达式来修改)。


    I am trying to get the following table - from link - into a google sheet.

    I tried the following:

    =IMPORTXML("http://www.immopreise.at/Wien/Wohnung/Miete","//table[@id='preisTabelle']")
    

    Attached you can find an example sheet:

    https://docs.google.com/spreadsheets/d/1-aXJULo6BELQQ6136Lps_HUzOwkw5SKaPGxIl5gBDfM/edit?usp=sharing

    My problem is I do not get anything back.

    Any suggestions what I am doing wrong?

    I appreciate your reply!

    解决方案

    First Approach: Here is how you can do it (Using ImportXml and RegexExtract:

    =IMPORTXML("http://www.immopreise.at/Wien/Wohnung/Miete",
               "//table[@id='preisTabelle']")
    

    The mentioned code produces an empty string, because, the web page has an empty table at that location, shown as below:

    <table id="preisTabelle"></table>
    

    The data is actually located inside a <script> tag:

    <script>
                var ImmoOptions = {"mapOptions":{"region":"Wien","karteAnzeigen":true},"TrendChartConf":{"uri":{"district":"/Trend/GetDistricts","chart":"/Trend/GetChart","chart1":"/Trend/GetTrendChart","compare":"/Preisvergleich","chart2":"/Trend/TrendChart","refresh":"/Preisentwicklung","uebersicht":"/?region=Wien\u0026pathInfo=Wohnung%2FMiete"},"firstDirstrict":{"Wien":"Wien-1-Innere-Stadt","Niederoesterreich":"Sankt-Poelten-Stadt","Burgenland":"Eisenstadt-Stadt","Oberoesterreich":"Linz-Stadt","Steiermark":"Graz-Alle-Bezirke","Kaernten":"Klagenfurt-Stadt","Salzburg":"Salzburg-Stadt","Tirol":"Innsbruck-Stadt","Vorarlberg":"Bregenz"},"firstDirstrictId":{"9":231,"3":153,"1":133,"4":177,"6":201,"2":142,"5":195,"7":218,"8":228}},"preisInfos":{"tabelle":{"spalten":[{"name":"≤50m²","spaltenArt":"Waehrung","nachkommaStellen":true,"farbmarkierung":null},{"name":"51-80m²","spaltenArt":"Waehrung","nachkommaStellen":true,"farbmarkierung":null},{"name":"81-129m²","spaltenArt":"Waehrung","nachkommaStellen":true,"farbmarkierung":null},{"name":"\u003e130m²","spaltenArt":"Waehrung","nachkommaStellen":true,"farbmarkierung":null},{"name":"\u003cspan class=\u0027Detailed\u0027\u003e\u0026#216;/m²\u003c/span\u003e\u003cspan class=\u0027Compact\u0027\u003e\u0026#216;/m²\u003c/span\u003e","spaltenArt":"Waehrung","nachkommaStellen":true,"farbmarkierung":true},{"name":"\u003cspan class=\u0027Detailed\u0027\u003eTrend\u003c/span\u003e\u003cspan class=\u0027Compact\u0027\u003eTd.\u003c/span\u003e","spaltenArt":"Tendenz","nachkommaStellen":false,"farbmarkierung":null}],"zeilen":[{"name":" 1.,  Innere Stadt","zellen":[21.68,19.02,18.43,19.56,19.27,0],"id":231},{"name":" 2.,  Leopoldstadt","zellen":[18.27,15.06,14.28,14.20,14.85,1],"id":232},{"name":" 3.,  Landstraße","zellen":[18.88,17.04,15.42,14.68,16.03,1],"id":233},{"name":" 4.,  Wieden","zellen":[19.37,16.58,16.89,16.35,16.83,1],"id":234},{"name":" 5.,  Margareten","zellen":[15.46,14.11,14.20,14.77,14.39,0],"id":235},{"name":" 6.,  Mariahilf","zellen":[18.23,14.68,15.72,15.32,15.53,1],"id":236},{"name":" 7.,  Neubau","zellen":[16.09,14.89,14.58,14.94,14.95,0],"id":237},{"name":" 8.,  Josefstadt","zellen":[16.77,16.78,14.02,14.93,15.08,0],"id":238},{"name":" 9.,  Alsergrund","zellen":[15.72,14.48,14.53,14.92,14.69,0],"id":239},{"name":"10.,  Favoriten","zellen":[14.14,12.35,11.81,0,12.52,0],"id":240},{"name":"11.,  Simmering","zellen":[13.69,12.34,11.50,13.46,12.38,-1],"id":241},{"name":"12.,  Meidling","zellen":[15.66,14.97,13.28,11.79,14.54,1],"id":242},{"name":"13.,  Hietzing","zellen":[16.71,15.93,14.63,14.05,14.99,0],"id":243},{"name":"14.,  Penzing","zellen":[14.43,13.14,12.72,12.37,13.11,0],"id":244},{"name":"15.,  Rudolfsheim-Fünfhaus","zellen":[13.58,12.90,12.93,11.76,13.07,0],"id":245},{"name":"16.,  Ottakring","zellen":[13.99,12.64,12.64,12.45,12.90,0],"id":246},{"name":"17.,  Hernals","zellen":[14.71,13.06,13.15,13.61,13.50,1],"id":247},{"name":"18.,  Währing","zellen":[14.47,13.96,13.82,15.67,14.37,0],"id":248},{"name":"19.,  Döbling","zellen":[16.21,14.37,15.06,16.44,15.29,0],"id":249},{"name":"20.,  Brigittenau","zellen":[15.68,13.57,12.56,13.30,13.43,0],"id":250},{"name":"21.,  Floridsdorf","zellen":[15.58,13.58,12.38,14.97,13.68,1],"id":251},{"name":"22.,  Donaustadt","zellen":[18.19,15.57,16.85,15.89,16.18,0],"id":252},{"name":"23.,  Liesing","zellen":[14.79,14.09,13.49,15.60,13.92,1],"id":253}],"tabellenTitel":"Wohnungen Miete","titelErsteSpalte":"Bezirk","GesamtAnzahlObjekte":12739},"preisspannen":[{"bis":12},{"bis":14},{"bis":15},{"bis":null}]},"basecharts":null,"CurrentView":{"trendVar":{"CatNum":0,"ImmoArtNum":5,"AltbauNum":2,"AngebotTypeNum":1},"hid":0}} ;
                jQuery(function () {
                    InitMap(ImmoOptions.mapOptions); 
                });
    </script>
    

    The data of most interest is found inside variable ImmoOptions:

    [
      {
        "name": " 1.,  Innere Stadt",
        "zellen": [21.68, 19.02, 18.43, 19.56, 19.27, 0],
        "id": 231
      },
      {
        "name": " 2.,  Leopoldstadt",
        "zellen": [18.27, 15.06, 14.28, 14.2, 14.85, 1],
        "id": 232
      },
      /* Edited for brevity */
    ]
    

    The following formula can get the script into a cell in spread sheet (let's say we pasted it into cell A[100]) ..

    =IMPORTXML("http://www.immopreise.at/Wien/Wohnung/Miete","//script[2]")
    

    Then, the following formula extracts JSON string (value of the ImmoOptions variable) into a cell (let's say we pasted the following into cell A[1]) ..

    =REGEXEXTRACT(A100,"(?s)=(.*)")
    

    At this point, we need javascript to parse JSON. This can be done by converting the sheet to a Google App (Tools->Script Editor) and doing the coding in javascript.

    In the javascript, there will be three steps (The details are not shown here):

    1. Use IMPORTXML to get the data inside script (in the url/page)
    2. Use REGEXEXTRACT to get the value of ImmoOptions as JSON string
    3. Parse JSON string to get the data
    


    Second Approach: Here is how you can do it using Google App/Script:

    1. Log into google and open this spreadsheet in browser.

    2. Choose File->Make a Copy (may be with a name like S1). This will make a copy of the file in your google drive; and opens it in a new tab.

    3. Go to that new window/tab. Choose Tools->Script Editor. This will put you into a editor with the script. From the toolbar select the function doGet and run the script; it will generate the spreadsheet.

    Here is the script attached with the sheet (for reference, in case the link goes missing):

    function doGet() {
      var r1=Math.random()*100000000000;
      var html = UrlFetchApp.fetch("http://www.immopreise.at/Wien/Wohnung/Miete?somevariable=" + r1).getContentText();
      var re = /var ImmoOptions = (.*);/i;  
      var jo=JSON.parse(re.exec(html)[1]);  
      var arr=jo["preisInfos"]["tabelle"]["zeilen"];
    
      var sheet = SpreadsheetApp.getActiveSheet();
      sheet.clear(); sheet.appendRow([r1]);
      sheet.appendRow(['Bezirk','Col-1','Col-2','Col-3','Col-4','Col-5','Col-6']);
    
      for (var i=0;i<arr.length;i++) {
          var item = arr[i]; var row=[item.name];
          row=row.concat(item.zellen); sheet.appendRow(row);
    
      }  
    }
    

    How it works:

    1. Pulls the entire html content of the relevant url.
    2. Uses regular expression to extract json data from inside <script>..</script>
    3. Parses the extracted json data.
    4. Gets the relevant array; populates into the spreadsheet.

    Disadvantages:

    1. It is a brittle patch-work script that will break with changes in the <script> include (or in any other way regex's break)
    2. Doesn't give you nice controls on the UI of table (They could be built, but with more work).
    3. Works only if the entire json data is in a single line (Could be modified by removing new lines .. or by using a proper regex).

    这篇关于从网页获取价格表格到Google电子表格中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆