Excel-使用FILTERXML从字符串中提取子字符串 [英] Excel - Extract substring(s) from string using FILTERXML

查看：211 发布时间：2020/7/17 19:33:31 arrays excel xml xpath excel-formula

本文介绍了Excel-使用FILTERXML从字符串中提取子字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

_背景

最近，我一直在尝试更加熟悉将带分隔符的字符串更改为XML以与Excel的

Lately I've been trying to get more familiar with the concept of changing a delimited string into an XML to parse with Excel's FILTERXML and retrieve those substrings that are of interest. Please note that this function came available from Excel 2013 and is not available on Excel for Mac nor Excel Online.

对于定界字符串，我的意思是普通句子中使用空格作为定界符或任何其他可用于定义字符串中子字符串的字符组合的内容.例如，让我们想象以下内容:

With a delimited string, I meant anything from a normal sentence using spaces as delimiters or any other combination of characters that could be used to define substrings within a string. For example let's imagine the following:

ABC|123|DEF|456|XY-1A|ZY-2F|XY-3F|XY-4f|xyz|123

_问题

因此，许多人都知道如何获取 nth 元素(例如:=TRIM(MID(SUBSTITUTE(A1,"|",REPT(" ",LEN(A1))),3*LEN(A1)+1,LEN(A1)))来检索456).或其他与LEN()，MID()，FIND()以及所有这些构造的组合，我们如何使用FILTERXML使用更具体的条件来提取所关注的子字符串并清理整个字符串?例如，如何检索:

So, where a lot of people know how to get the nth element (e.g.: =TRIM(MID(SUBSTITUTE(A1,"|",REPT(" ",LEN(A1))),3*LEN(A1)+1,LEN(A1))) to retrieve 456). Or other combinationes with LEN(), MID(), FIND() and all those constructs, how do we use FILTERXML to use more specific criteria to extract substrings of concern and clean up the full string? For example, how to retrieve:

按位置元素
数字或非数字元素
本身包含子字符串的元素
以子字符串开头或结尾的元素
大写或小写的元素
持有数字的元素
唯一值
...

Excel's FILTERXML uses XPATH 1.0 which unfortunately means it is not as diverse as we would maybe want it to be. Also, Excel seems to not allow returning reworked node values and exclusively allows you to select nodes in order of appearance. However there is a fair share of functions we can still utilize. More information about that can be found here.

该函数有两个参数:=FILTERXML(<A string in valid XML format>,<A string in valid XPATH format>)

比方说，单元格A1包含字符串:ABC|123|DEF|456|XY-1A|ZY-2F|XY-3F|XY-4f|xyz|123.为了创建有效的XML字符串，我们使用SUBSTITUTE将定界符更改为有效的结束标记和开始标记构造.因此，对于给定的示例，要获得有效的XML构造，我们可以这样做:

Let's say cell A1 holds the string: ABC|123|DEF|456|XY-1A|ZY-2F|XY-3F|XY-4f|xyz|123. To create a valid XML string we use SUBSTITUTE to change the delimiter to valid end- and start-tag constructs. So to get a valid XML construct for the given example we could do:

"<t><s>"&SUBSTITUTE(A1,"|","</s><s>")&"</s></t>"

出于可读性原因，我将使用 <XML> 一词作为上述占位符的上述结构.在下面的有效构造中，您可以找到其他有用的XPATH函数来过滤节点:

For readability reasons I'll refer to the above construct with the word <XML> as a placeholder. Below you'll find different usefull XPATH functions in a valid construct to filter nodes:

_{1)所有元素:}

=FILTERXML(<XML>,"//s")

_{返回:ABC，123，DEF，456，XY-1A，ZY-2F，XY-3F，XY-4f，xyz和123(所有节点)}

_{Returns: ABC, 123, DEF, 456, XY-1A, ZY-2F, XY-3F, XY-4f, xyz and 123 (all nodes)}

_{2)位置元素:}

=FILTERXML(<XML>,"//s[position()=4]")

_{返回:456(索引4上的节点)}

_{Returns: 456 (node on index 4)}

=FILTERXML(<XML>,"//s[position()<4]")

_{返回值:ABC，123和DEF(索引为< 4的节点)}

_{Returns: ABC, 123 and DEF (nodes on index < 4)}

=FILTERXML(<XML>,"//s[position()=2 or position()>5]")

_{返回:123，ZY-2F，XY-3F，XY-4f，xyz和123(索引为2或> 5的节点)}

_{Returns: 123, ZY-2F, XY-3F, XY-4f, xyz and 123 (nodes on index 2 or > 5)}

=FILTERXML(<XML>,"//s[last()]")

_{返回:123(最后一个索引上的节点)}

_{Returns: 123 (node on last index)}

=FILTERXML(<XML>,"//s[position() mod 2 = 1]")

_{返回:ABC，DEF，XY-1A，XY-3F和xyz(奇数个节点)}

_{Returns: ABC, DEF, XY-1A, XY-3F and xyz (odd nodes)}

=FILTERXML(<XML>,"//s[position() mod 2 = 0]")

_{返回:123，456，ZF-2F，XY-4f和123(偶数节点)}

_{Returns: 123, 456, ZF-2F, XY-4f and 123 (even nodes)}

_{3)(非)数字元素:}

=FILTERXML(<XML>,"//s[number()=.]")

或者:

=FILTERXML(<XML>,"//s[.*0=0]")

_{返回:123，456和123(数字节点)}

_{Returns: 123, 456, and 123 (numeric nodes)}

=FILTERXML(<XML>,"//s[not(number()=.)]")

或者:

=FILTERXML(<XML>,"//s[.*0!=0)]")

_{返回:ABC，DEF，XY-1A，ZY-2F，XY-3F，XY-4f和xyz(非数字节点)}

_{Returns: ABC, DEF, XY-1A, ZY-2F, XY-3F, XY-4f and xyz (non-numeric nodes)}

_{4)包含(不包含)的元素:}

=FILTERXML(<XML>,"//s[contains(., 'Y')]")

_{返回值:XY-1A，ZY-2F，XY-3F和XY-4f(包含'Y'，注意XPATH区分大小写，不包括xyz)}

_{Returns: XY-1A, ZY-2F, XY-3F and XY-4f (containing 'Y', notice XPATH is case sensitive, exclusing xyz)}

=FILTERXML(<XML>,"//s[not(contains(., 'Y'))]")

_{返回值:ABC，123，DEF，456，xyz和123(不包含'Y'，请注意XPATH区分大小写，包括xyz )}

_{Returns: ABC, 123, DEF, 456, xyz and 123 (not containing 'Y', notice XPATH is case sensitive, including xyz)}

_{5)元素(不是)开头或/和结尾:}

=FILTERXML(<XML>,"//s[starts-with(., 'XY')]")

_{返回:XY-1A，XY-3F和XY-4f(以'XY'开头)}

_{Returns: XY-1A, XY-3F and XY-4f (starting with 'XY')}

=FILTERXML(<XML>,"//s[not(starts-with(., 'XY'))]")

_{返回值:ABC，123，DEF，456，ZY-2F，xyz和123(不要以'XY'开头)}

_{Returns: ABC, 123, DEF, 456, ZY-2F, xyz and 123 (don't start with 'XY')}

=FILTERXML(<XML>,"//s[substring(., string-length(.) - string-length('F') +1) = 'F']")

_{返回值:DEF，ZY-2F和XY-3F(以'F'结尾，注意XPATH 1.0不支持ends-with)}

_{Returns: DEF, ZY-2F and XY-3F (end with 'F', notice XPATH 1.0 does not support ends-with)}

=FILTERXML(<XML>,"//s[not(substring(., string-length(.) - string-length('F') +1) = 'F')]")

_{返回:ABC，123，456，XY-1A，XY-4f，xyz和123(不要以'F'结尾)}

_{Returns: ABC, 123, 456, XY-1A, XY-4f, xyz and 123 (don't end with 'F')}

=FILTERXML(<XML>,"//s[starts-with(., 'X') and substring(., string-length(.) - string-length('A') +1) = 'A']")

_{返回:XY-1A(以"X"开头，以"A"结尾)}

_{Returns: XY-1A (start with 'X' and end with 'A')}

_{6)大写或小写的元素:}

=FILTERXML(<XML>,"//s[translate(.,'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')=.]")

_{返回:ABC，123，DEF，456，XY-1A，ZY-2F，XY-3F和123(大写节点)}

_{Returns: ABC, 123, DEF, 456, XY-1A, ZY-2F, XY-3F and 123 (uppercase nodes)}

=FILTERXML(<XML>,"//s[translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')=.]")

_{返回值:123，456，xyz和123(小写节点)}

_{Returns: 123, 456, xyz and 123 (lowercase nodes)}

_{注意::很遗憾，XPATH 1.0不支持upper-case()或lower-case()，因此以上是一种解决方法.如有需要，请添加特殊字符.}

_{NOTE: Unfortunately XPATH 1.0 does not support upper-case() nor lower-case() so the above is a workaround. Add special characters if need be.}

_{7)不包含任何数字的元素:}

=FILTERXML(<XML>,"//s[translate(.,'1234567890','')!=.]")

_{返回值:123，456，XY-1A，ZY-2F，XY-3F，XY-4f和123(包含任何数字)}

_{Returns: 123, 456, XY-1A, ZY-2F, XY-3F, XY-4f and 123 (contain any digit)}

=FILTERXML(<XML>,"//s[translate(.,'1234567890','')=.]")

_{返回:ABC，DEF和xyz(不包含任何数字)}

_{Returns: ABC, DEF and xyz (don't contain any digit)}

=FILTERXML(<XML>,"//s[translate(.,'1234567890','')!=. and .*0!=0]")

_{返回:XY-1A，ZY-2F，XY-3F和XY-4f(保留数字而不是数字)}

_{Returns: XY-1A, ZY-2F, XY-3F and XY-4f (holding digits but not a a number on it's own)}

_{8)唯一元素或重复项:}

=FILTERXML(<XML>,"//s[preceding::*=.]")

_{返回:123(重复的节点)}

_{Returns: 123 (duplicate nodes)}

=FILTERXML(<XML>,"//s[not(preceding::*=.)]")

_{返回:ABC，123，DEF，456，XY-1A，ZY-2F，XY-3F，XY-4f和xyz(唯一节点)}

_{Returns: ABC, 123, DEF, 456, XY-1A, ZY-2F, XY-3F, XY-4f and xyz (unique nodes)}

=FILTERXML(<XML>,"//s[not(following::*=. or preceding::*=.)]")

_{返回:ABC，DEF，456，XY-1A，ZY-2F，XY-3F和XY-4f(没有相似兄弟的节点)}

_{Returns: ABC, DEF, 456, XY-1A, ZY-2F, XY-3F and XY-4f (nodes that have no similar sibling)}

_{9)一定长度的元素:}

=FILTERXML(<XML>,"//s[string-length()=5]")

_{返回值:XY-1A，ZY-2F，XY-3F和XY-4f(5个字符长)}

_{Returns: XY-1A, ZY-2F, XY-3F and XY-4f (5 characters long)}

=FILTERXML(<XML>,"//s[string-length()<4]")

_{返回值:ABC，123，DEF，456，xyz和123(少于4个字符)}

_{Returns: ABC, 123, DEF, 456, xyz and 123 (shorter than 4 characters)}

_{10)基于上/下元素:}

=FILTERXML(<XML>,"//s[preceding::*[1]='456']")

_{返回:XY-1A(先前的节点等于'456')}

_{Returns: XY-1A (previous node equals '456')}

=FILTERXML(<XML>,"//s[starts-with(preceding::*[1],'XY')]")

_{返回:ZY-2F，XY-4f和xyz(先前的节点以'XY'开头)}

_{Returns: ZY-2F, XY-4f, and xyz (previous node starts with 'XY')}

=FILTERXML(<XML>,"//s[following::*[1]='123']")

_{返回:ABC和xyz(以下节点等于'123')}

_{Returns: ABC, and xyz (following node equals '123')}

=FILTERXML(<XML>,"//s[contains(following::*[1],'1')]")

_{返回:ABC，456和xyz(以下节点包含"1")}

_{Returns: ABC, 456, and xyz (following node contains '1')}

=FILTERXML(<XML>,"//s[preceding::*='ABC' and following::*='XY-3F']")

_{返回值:123，DEF，456，XY-1A和ZY-2F(介于"ABC"和"XY-3f"之间的所有内容)}

_{Returns: 123, DEF, 456, XY-1A and ZY-2F (everything between 'ABC' and 'XY-3f')}

_{11)基于子字符串的元素:}

=FILTERXML(<XML>,"//s[substring-after(., '-') = '3F']")

_{返回:XY-3F(连字符后以'3F'结尾的节点)}

_{Returns: XY-3F (nodes ending with '3F' after hyphen)}

=FILTERXML(<XML>,"//s[contains(substring-after(., '-') , 'F')]")

_{返回:ZY-2F和XY-3F(在连字符后包含'F'的节点)}

_{Returns: ZY-2F and XY-3F (nodes containing 'F' after hyphen)}

=FILTERXML(<XML>,"//s[substring-before(., '-') = 'ZY']")

_{返回值:ZY-2F(节点以连字符前的'ZY'开头)}

_{Returns: ZY-2F (nodes starting with 'ZY' before hyphen)}

=FILTERXML(<XML>,"//s[contains(substring-before(., '-'), 'Y')]")

_{返回值:XY-1A，ZY-2F，XY-3F和XY-4f(在连字符前包含'Y'的节点)}

_{Returns: XY-1A, ZY-2F, XY-3F and XY-4f (nodes containing 'Y' before hyphen)}

_{12)基于串联的元素:}

=FILTERXML(<XML>,"//s[concat(., '|', following::*[1])='ZY-2F|XY-3F']")

_{返回值:ZY-2F(与'|'串联且后继兄弟节点等于'ZY-2F | XY-3F'的节点)}

_{Returns: ZY-2F (nodes when concatenated with '|' and following sibling equals 'ZY-2F|XY-3F')}

=FILTERXML(<XML>,"//s[contains(concat(., preceding::*[2]), 'FA')]")

_{返回值:DEF(节点与左侧同级两个索引并置时，节点包含"FA")}

_{Returns: DEF (nodes when concatenated with sibling two indices to the left contains 'FA')}

_{13)空与非空:}

=FILTERXML(<XML>,"//s[count(node())>0]")

或者:

=FILTERXML(<XML>,"//s[node()]")

_{返回:ABC，123，DEF，456，XY-1A，ZY-2F，XY-3F，XY-4f，xyz和123(所有节点不是空的)}

_{Returns: ABC, 123, DEF, 456, XY-1A, ZY-2F, XY-3F, XY-4f, xyz and 123 (all nodes that are not empty)}

=FILTERXML(<XML>,"//s[count(node())=0]")

或者:

=FILTERXML(<XML>,"//s[not(node())]")

_{返回:无(所有节点为空)}

现在，显然上述内容是对XPATH 1.0功能可能性的演示，您可以获取上述内容的更多组合以及更多内容！我试图介绍最常用的字符串函数.如果您有任何遗漏，请随时发表评论.

Now obviously the above is a demonstration of possibilities with XPATH 1.0 functions and you can get a whole range of combinations of the above and more! I tried to cover most commonly used string functions. If you are missing any please feel free to comment.

这个问题本身很广泛，我希望就如何使用FILTERXML进行手边的查询给出一些一般性的指导.该公式返回要以任何其他方式使用的节点数组.很多时候，我会在TEXTJOIN()或INDEX()中使用它.但是我想其他选择可能是可以溢出结果的新DA功能.

Whereas the question is quite broad on itself, I was hoping to give some general direction on how to use FILTERXML for the queries at hand. The formula returns an array of nodes to be used in any other way. A lot of the times I would use it in TEXTJOIN() or INDEX(). But I guess other options would be new DA-functions to spill results.

这篇关于Excel-使用FILTERXML从字符串中提取子字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Excel-使用FILTERXML从字符串中提取子字符串 [英] Excel - Extract substring(s) from string using FILTERXML

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Excel-使用FILTERXML从字符串中提取子字符串 [英] Excel - Extract substring(s) from string using FILTERXML

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭