IMPORTXML XPath_Query for Google Sheets [英] IMPORTXML XPath_Query for Google Sheets
问题描述
我正在使用GoogleSheet的IMPORTXML
函数来检索一年前的每个日历日期或可获得数据的最近一年的数据.
I'm using GoogleSheet's IMPORTXML
function to retrieve data for each calendar date one year earlier or the closest year-ago date where data are available.
这是数据示例(完整的数据源在这里):
<entry>
<id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(6794)</id>
<title type="text"></title>
<updated>2018-02-06T22:05:38Z</updated>
<author>
<name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6794)" />
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">6794</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2017-02-24T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">0.4</d:BC_1MONTH>
<d:BC_3MONTH m:type="Edm.Double">0.52</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">0.65</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">0.8</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">1.12</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">1.38</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">1.8</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">2.12</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.31</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.69</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.95</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.95</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
<entry>
<id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(6795)</id>
<title type="text"></title>
<updated>2018-02-06T22:05:38Z</updated>
<author>
<name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6795)" />
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">6795</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2017-02-27T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">0.44</d:BC_1MONTH>
<d:BC_3MONTH m:type="Edm.Double">0.5</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">0.68</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">0.81</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">1.2</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">1.46</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">1.87</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">2.18</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.36</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.72</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.98</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.98</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
<entry>
这是我目前用于检索2017年2月27日数据的XPath查询:
This is the XPath query I'm currently using to retrieve data for 2017 Feb 27:
//*[local-name() = 'NEW_DATE'][text() = '2017-02-27T00:00:00']/..
这是显示的结果:
6795 2017-02-27T00:00:00 0.44 0.5 0.68 0.81 1.2 1.46 1.87 2.18 2.36 2.72 2.98 2.98
有没有办法:
- 检索显示的所有数据,但"d:Id"元素除外(上面的"6795") 和
- 如果搜索没有数据的日期(例如2017年) 2月25日将导致#N/A"错误,因为查询将不会为缺失的日期返回任何内容),查询将默认为下一个可用的远期日期(例如2017年2月27日)?
- Retrieve all data displayed except the "d:Id" element ("6795" above) and
- If searching for a date for which there are no data (e.g. 2017 Feb 25 will result in a "#N/A" error since the query would return nothing for the missing date), the query will default to the next available forward date (e.g. 2017 Feb 27)?
为了避免减少IMPORTXML
调用,我避免使用IF函数.
I'm avoiding using the IF function in order to make fewer IMPORTXML
calls.
推荐答案
因为Google表格(GS)似乎仅与XPath 1.0兼容(其此处和此处截至目前为止,请先不要确认或弄清支持哪种版本),例如XPath 2.0 IF-THEN-ELSE
语句
Because Google Sheets (GS) seems to be compatible with only XPath 1.0 (its documentation and product forum pages here and here don't confirm or clarify which version(s) is supported as of this date), alternative approaches such as XPath 2.0 IF-THEN-ELSE
statements cannot be used. Instead, the source XML data sought can be filtered using GS native functions.
问题1
按照@Tanaike提出的解决方案,检索元素的所有子节点(保存一个),可以使用XPath"not"命令(即!
)完成,如下面应用于"NEW_DATE"的父元素并省去了子元素"Id". TRANSPOSE
用于以列形式显示它. (A1是包含问题中源XML URL的单元格.)
Per @Tanaike's proposed solution, retrieving all children nodes of an element, save one, can be accomplished using the XPath "not" command, i.e. !
, as applied below to the parent element of "NEW_DATE" and leaving out the child element "Id". TRANSPOSE
is used to display it in columnar form. (A1 is the cell containing the source XML URL in the question.)
=TRANSPOSE(IMPORTXML(A1, "//*[local-name() = 'NEW_DATE'][text() = '2017-02-25T00:00:00']/../*[local-name()!='Id']"))
问题2
要搜索一年前的特定日期,如果缺少该日期的数据,则要检索最接近的远期日期,则需要嵌套的GS函数首先使用上述公式检索"NEW_DATE"数据,然后反向查找使用SORT
并按MATCH
的最接近可用日期的顺序排列.然后,将INDEX
函数用于反向排序的元素以选择适当的日期. CONCATENATE
,TEXT
和TODAY
函数仅用于以与XML数据兼容的格式设置上一年的日期.公式如下.
To search for a specific date one year ago and, in case data for that date are missing, to retrieve the closest forward date will require nested GS functions to first retrieve the "NEW_DATE" data with the above formula, next to reverse its order using SORT
and to MATCH
the closest available date. The INDEX
function is then used on the reverse-sorted element to select the appropriate date. The CONCATENATE
, TEXT
and TODAY
functions are used just to set the year-ago date in a format compatible with the XML data. The formula is below.
=TRANSPOSE(IMPORTXML(A1,CONCATENATE("//*[local-name() = 'NEW_DATE'][text() = '",INDEX(SORT(IMPORTXML(A1, "//*[local-name() = 'NEW_DATE']"),1,FALSE),MATCH(CONCATENATE(TEXT(TODAY()-365,"YYYY-MM-DD"),"T00:00:00"),SORT(IMPORTXML(A1, "//*[local-name() = 'NEW_DATE']"),1,FALSE),-1)),"']/../*[local-name()! = 'Id']")))
这篇关于IMPORTXML XPath_Query for Google Sheets的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!