另一个IMPORTXML返回空内容 [英] another IMPORTXML returning empty content
问题描述
当我输入
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")
在我的Google表格中,我得到:#N/A Imported content is empty
.
in my google sheet, I get: #N/A Imported content is empty
.
但是,当我输入时:
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")
我得到了一些内容,所以我可以假定对该页面的访问没有被阻止.
I get some content, so I can presume that access to the page is not blocked.
毫无疑问,该页面包含几个h2
标签.
And the page contains several h2
tags without any doubt.
那是什么问题?
推荐答案
- 您想知道以下情况的原因.
-
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")
返回#N/A Imported content is empty
. -
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")
返回内容. - You want to know the reason of the following situation.
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")
returns#N/A Imported content is empty
.=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")
returns the content.
如果我的理解正确,那么这个答案如何?
If my understanding is correct, how about this answer?
当我看到
http://www.ilgiornale.it/autore/franco-battaglia.html
的HTML数据时,我注意到它的错误之处.如下.When I saw the HTML data of
http://www.ilgiornale.it/autore/franco-battaglia.html
, I noticed that the wrong point of it. It is as follows.window.jQuery || document.write("<script src='/sites/all/modules/jquery_update/replace/jquery/jquery.min.js'>\x3C/script>")
在这种情况下,脚本标签不会像
\x3C/script>
那样关闭.似乎IMPORTXML检索此行时,脚本选项卡未关闭.我可以确认将\x3C
转换为<
时,=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")
正确返回了h2
标记的值.In this case, the script tag is not closed like
\x3C/script>
. It seems that when IMPORTXML retrieves this line, the script tab is not closed. I could confirm that when\x3C
is converted to<
,=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")
correctly returns the values ofh2
tag.通过这种方式,似乎出现了
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")
返回#N/A Imported content is empty
的问题.By this, it seems that the issue that
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")
returns#N/A Imported content is empty
occurs.关于
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")
返回内容的原因,当我输入此公式时,找不到脚本选项卡的值.从这种情况来看,我认为脚本标签可能有问题.因此,我可以找到上述错误点.我可以确认,当\x3C
转换为<
时,=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")
返回的值包括脚本标记的值.About the reason that
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")
returns the content, when I put this formula, I couldn't find the values of the script tab. From this situation, I thought that the script tag might have an issue. So I could find the above wrong point. I could confirm that when\x3C
is converted to<
,=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")
returns the values including the values of the script tag.为了避免出现上述问题,需要将
\x3C
修改为<
.那么以下解决方法呢?在这些变通办法中,我使用了Google Apps脚本.请考虑这些变通办法只是几种变通办法中的两个.In order to avoid above issue, it is required to be modified
\x3C
to<
. So how about the following workarounds? In these workarounds, I used Google Apps Script. Please think of these workarounds as just two of several workarounds.首先,在这种模式下,从URL下载HTML数据,然后修改错误的点.然后,将修改后的HTML数据创建为文件,并共享该文件.并检索文件的URL.使用该URL检索值.
In this pattern, at first, download the HTML data from the URL, and modify the wrong point. Then, the modified HTML data is created as a file, and the file is shared. And retrieve the URL of the file. Using this URL, the values are retrieved.
function myFunction() { var url = "http://www.ilgiornale.it/autore/franco-battaglia.html"; var data = UrlFetchApp.fetch(url).getContentText().replace(/\\x3C/g, "<"); var file = DriveApp.createFile("htmlData.html", data, MimeType.HTML); file.setSharing(DriveApp.Access.ANYONE_WITH_LINK, DriveApp.Permission.VIEW); var endpoint = "https://drive.google.com/uc?id=" + file.getId() + "&export=download"; Logger.log(endpoint) }
- 使用此脚本时,请首先运行
myFunction()
函数并检索端点.作为测试用例,请将端点放入单元格"A1".并将=IMPORTXML(A1,"//h2")
放入单元格"A2".这样,就可以检索值. - When you use this script, at first, please run the function of
myFunction()
and retrieve the endpoint. And as a test case, please put the endpoint to the cell "A1". And put=IMPORTXML(A1,"//h2")
to the cell "A2". By this, the values can be retrieved.
在这种模式下,通过解析HTML数据直接将标记
h2
的值检索出来,并将其放入活动的电子表格中.In this pattern, the values of the tag
h2
are directly retrieved by parsing HTML data and put them to the active Spreadsheet.function myFunction() { var url = "http://www.ilgiornale.it/autore/franco-battaglia.html"; var data = UrlFetchApp.fetch(url).getContentText().match(/<h2[\s\S]+?<\/h2>/g); var xml = XmlService.parse("<temp>" + data.join("") + "</temp>"); var h2Values = xml.getRootElement().getChildren("h2").map(function(e) {return [e.getValue()]}); var sheet = SpreadsheetApp.getActiveSheet(); sheet.getRange(sheet.getLastRow() + 1, 1, h2Values.length, 1).setValues(h2Values); Logger.log(h2Values) }
- 运行脚本时,标记
h2
的值将直接放置到活动电子表格中. - When you run the script, the values of the tag
h2
are directly put to the active Spreadsheet. - Class UrlFetchApp
- Class XmlService
如果我误解了您的问题,而这不是您想要的方向,我深表歉意.
If I misunderstood your question and this was not the direction you want, I apologize.
这篇关于另一个IMPORTXML返回空内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-