另一个IMPORTXML返回空内容 [英] another IMPORTXML returning empty content

查看:106
本文介绍了另一个IMPORTXML返回空内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我输入

=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")

在我的Google表格中,我得到:#N/A Imported content is empty.

in my google sheet, I get: #N/A Imported content is empty.

但是,当我输入时:

=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")

我得到了一些内容,所以我可以假定对该页面的访问没有被阻止.

I get some content, so I can presume that access to the page is not blocked.

毫无疑问,该页面包含几个h2标签.

And the page contains several h2 tags without any doubt.

那是什么问题?

推荐答案

  • 您想知道以下情况的原因.
    • =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")返回#N/A Imported content is empty.
    • =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")返回内容.
      • You want to know the reason of the following situation.
        • =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2") returns #N/A Imported content is empty.
        • =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*") returns the content.
        • 如果我的理解正确,那么这个答案如何?

          If my understanding is correct, how about this answer?

          当我看到http://www.ilgiornale.it/autore/franco-battaglia.html的HTML数据时,我注意到它的错误之处.如下.

          When I saw the HTML data of http://www.ilgiornale.it/autore/franco-battaglia.html, I noticed that the wrong point of it. It is as follows.

          window.jQuery || document.write("<script src='/sites/all/modules/jquery_update/replace/jquery/jquery.min.js'>\x3C/script>")
          

          在这种情况下,脚本标签不会像\x3C/script>那样关闭.似乎IMPORTXML检索此行时,脚本选项卡未关闭.我可以确认将\x3C转换为<时,=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")正确返回了h2标记的值.

          In this case, the script tag is not closed like \x3C/script>. It seems that when IMPORTXML retrieves this line, the script tab is not closed. I could confirm that when \x3C is converted to <, =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2") correctly returns the values of h2 tag.

          通过这种方式,似乎出现了=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")返回#N/A Imported content is empty的问题.

          By this, it seems that the issue that =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2") returns #N/A Imported content is empty occurs.

          关于=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")返回内容的原因,当我输入此公式时,找不到脚本选项卡的值.从这种情况来看,我认为脚本标签可能有问题.因此,我可以找到上述错误点.我可以确认,当\x3C转换为<时,=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")返回的值包括脚本标记的值.

          About the reason that =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*") returns the content, when I put this formula, I couldn't find the values of the script tab. From this situation, I thought that the script tag might have an issue. So I could find the above wrong point. I could confirm that when \x3C is converted to <, =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*") returns the values including the values of the script tag.

          为了避免出现上述问题,需要将\x3C修改为<.那么以下解决方法呢?在这些变通办法中,我使用了Google Apps脚本.请考虑这些变通办法只是几种变通办法中的两个.

          In order to avoid above issue, it is required to be modified \x3C to <. So how about the following workarounds? In these workarounds, I used Google Apps Script. Please think of these workarounds as just two of several workarounds.

          首先,在这种模式下,从URL下载HTML数据,然后修改错误的点.然后,将修改后的HTML数据创建为文件,并共享该文件.并检索文件的URL.使用该URL检索值.

          In this pattern, at first, download the HTML data from the URL, and modify the wrong point. Then, the modified HTML data is created as a file, and the file is shared. And retrieve the URL of the file. Using this URL, the values are retrieved.

          function myFunction() {
            var url = "http://www.ilgiornale.it/autore/franco-battaglia.html";
            var data = UrlFetchApp.fetch(url).getContentText().replace(/\\x3C/g, "<");
            var file = DriveApp.createFile("htmlData.html", data, MimeType.HTML);
            file.setSharing(DriveApp.Access.ANYONE_WITH_LINK, DriveApp.Permission.VIEW);
            var endpoint = "https://drive.google.com/uc?id=" + file.getId() + "&export=download";
            Logger.log(endpoint)
          }
          

          • 使用此脚本时,请首先运行myFunction()函数并检索端点.作为测试用例,请将端点放入单元格"A1".并将=IMPORTXML(A1,"//h2")放入单元格"A2".这样,就可以检索值.
            • When you use this script, at first, please run the function of myFunction() and retrieve the endpoint. And as a test case, please put the endpoint to the cell "A1". And put =IMPORTXML(A1,"//h2") to the cell "A2". By this, the values can be retrieved.
            • 在这种模式下,通过解析HTML数据直接将标记h2的值检索出来,并将其放入活动的电子表格中.

              In this pattern, the values of the tag h2 are directly retrieved by parsing HTML data and put them to the active Spreadsheet.

              function myFunction() {
                var url = "http://www.ilgiornale.it/autore/franco-battaglia.html";
                var data = UrlFetchApp.fetch(url).getContentText().match(/<h2[\s\S]+?<\/h2>/g);
                var xml = XmlService.parse("<temp>" + data.join("") + "</temp>");
                var h2Values = xml.getRootElement().getChildren("h2").map(function(e) {return [e.getValue()]});
                var sheet = SpreadsheetApp.getActiveSheet();
                sheet.getRange(sheet.getLastRow() + 1, 1, h2Values.length, 1).setValues(h2Values);
              
                Logger.log(h2Values)
              }
              

              • 运行脚本时,标记h2的值将直接放置到活动电子表格中.
                • When you run the script, the values of the tag h2 are directly put to the active Spreadsheet.
                  • Class UrlFetchApp
                  • Class XmlService

                  如果我误解了您的问题,而这不是您想要的方向,我深表歉意.

                  If I misunderstood your question and this was not the direction you want, I apologize.

                  这篇关于另一个IMPORTXML返回空内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆