如何使用滚动窗口按内容对XML元素进行分组? [英] How to use tumbling window to group XML elements by content?

查看：93 发布时间：2020/7/24 18:49:24 xpath xml-parsing xquery flwor shred

本文介绍了如何使用滚动窗口按内容对XML元素进行分组?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何根据tumbling window的数字是否与[0-9]匹配来分组?

How do I group based on whether there's a match to [0-9] for digits with a tumbling window?

所需的输出:

...
<record>
    <name>joe</name>
    <data>phone1</data>
    <data>phone2</data>
</record>
...

当前输出，未分组:

<xml>
  <record>
    <person key="$s" data="name">phone1</person>
  </record>
  <record>
    <person key="$s" data="name">phone2</person>
  </record>
  <record>
    <person key="$s" data="name">phone3sue</person>
  </record>
  <record>
    <person key="$s" data="name">cell4</person>
  </record>
  <record>
    <person key="$s" data="name">home5alice</person>
  </record>
  <record>
    <person key="$s" data="name">atrib6</person>
  </record>
  <record>
    <person key="$s" data="name">x7</person>
  </record>
  <record>
    <person key="$s" data="name">y9</person>
  </record>
  <record>
    <person key="$s" data="name">z10</person>
  </record>
</xml>

输入:

<text>
  <line>people</line>
  <line>joe</line>
  <line>phone1</line>
  <line>phone2</line>
  <line>phone3</line>
  <line>sue</line>
  <line>cell4</line>
  <line>home5</line>
  <line>alice</line>
  <line>atrib6</line>
  <line>x7</line>
  <line>y9</line>
  <line>z10</line>
</text>

概念是每个人"都有一个名字(无数字)，也许还有其他数据.因此，希望阅读每一行，然后根据找到名称的位置进行分组.

The notion is that each "person" will have a name (no digits) and perhaps additional data. So looking to read in each line and then group based on where the names are found.

代码:

xquery version "3.0";

<xml>
{
for tumbling window $line in db:open("foo.txt")//text()
start $s when matches($s, '[0-9]')
return   
<record>

       <person key='$s' data="name">{$line}</person>

 </record>
}
 </xml>

查看输出，"phone3sue"显然在进行一些匹配和分组，尽管并不完全符合要求，因为"phone3"应位于其自己的元素中，嵌套在"joe"而不是"sue"中.但是，仍然有一些匹配项在发生.

Looking at the output, "phone3sue" is obviously doing some matching and grouping, although not exactly as desired because "phone3" should be in its own element, nested within "joe" rather than "sue". But, still, there's some matching happening there.

从撒克逊人的邮件列表中:

from the saxon mailing list:

2020年2月19日，星期三，上午8:00，上午10:31:37，thufir scripsit:

On Wed, Feb 19, 2020 at 10:31:37AM -0800, thufir scripsit:

我将重新阅读有关窗口的部分；我的印象是用于显示或报告.

I'll re-read the section on windowing; my impression was that it was more for display or report purposes.

窗口化是您从数据流中取出大块的方式.

Windowing is how you take chunks out of a stream of data.

您所拥有的实际上是一连串的线元素；你可以识别名称"行，但是您现在不知道它们在多大程度上/如何在任何特定的名称对之间都有很多数据.

What you've got is effectively a stream of line elements; you can identify the "name" lines, but you don't now how far part they are/how much data is between any particular pair of names.

Windows让您说我想要此流的开头为姓名行，并持续到(但不包括)下一个名字线".

Windows lets you say "I want the chunk of this stream that starts with a name line and continues up to (but not including) the next name line".

您能通过两个步骤来详细说明您的意思吗?

Would you elaborate on what you mean by two steps, a bit more concretely?

您正在尝试获取一些输入XML并将其转换为不同的输出 XML.

You're trying to take some input XML and turn it into different output XML.

如果这是纯粹的转换-更改所有名为FOO的元素到名为BAZ的元素-XQuery不是最佳的工具选择.如果使用XSLT 你可以.它们在计算上是相同的，但是语言具有不同的偏见，而XSLT确实可以更自然地进行转换.

If this is pure transformation -- change all of the elements named FOO to element named BAZ -- XQuery's not the best tool choice. Use XSLT if you can. They're computationally the same but the languages have different biases and XSLT does transforms more naturally.

如果输出XML表示您输入的抽象 -从道德上讲是某种报告-进行抽象然后提出来会很有帮助.

If the output XML is a representation of an abstraction of your input -- morally some sort of report -- it helps a lot to have the abstraction, and then present it.

因此，在您的情况下，您所拥有的是一个包含隐式流的流. 名称与数据之间的关联. (这是一条直线；唯一的您知道这些数据行与名称行一起的位置.所以如果将其转换为名称之间的显式映射和数据-例如通过创建一个映射变量，其中的键是名称行的内容(以某种方式处理空格)和每个键的条目是与该名称关联的数据行- 您已经完成了抽象部分.

So in your case, what you have is a stream containing an implicit association between names and data. (It's a stream of lines; the only way you know these data lines go with that name line is position. So implicit.) If you turn that into an explicit mapping between names and data -- such as by creating a map variable where the keys are the contents of the name line (with spaces handled somehow) and the entries for each key are the data lines associated with that name -- you have done the abstraction part.

然后您可以获取该地图并生成所需的XML输出它比尝试结合创建新的XML"要简单得多和执行抽象步骤".我发布的最后一件事是将地图变成元素的示例，但是作为一种模式，它只是

You can then take that map and produce the XML output you want from it, which is much simpler than trying to combine the "create new XML" and "do the abstraction steps". The last thing I posted has an example of turning a map into elements, but as a pattern it's just

map:keys($ map)！ {.} {$ map(.)}

map:keys($map) ! {.}{$map(.)}

(如果节点中有节点或序列，则会变得更加复杂入门，但不多.)

(it gets more complicated if you've got nodes or a sequence in the entry, but not much more.)

那使事情更接近常识了吗?

That make something a little closer to sense?

-Graydon

_______________________________________________萨克森帮助邮件列表，存档于 http://saxon.markmail.org/ saxon-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/saxon-help

_______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/saxon-help

如何使用滚动窗口按内容对XML元素进行分组? [英] How to use tumbling window to group XML elements by content?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用滚动窗口按内容对XML元素进行分组? [英] How to use tumbling window to group XML elements by content?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭