如何使用滚动窗口按内容对XML元素进行分组? [英] How to use tumbling window to group XML elements by content?

查看:93
本文介绍了如何使用滚动窗口按内容对XML元素进行分组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何根据tumbling window的数字是否与[0-9]匹配来分组?

How do I group based on whether there's a match to [0-9] for digits with a tumbling window?

所需的输出:

...
<record>
    <name>joe</name>
    <data>phone1</data>
    <data>phone2</data>
</record>
...

当前输出,未分组:

<xml>
  <record>
    <person key="$s" data="name">phone1</person>
  </record>
  <record>
    <person key="$s" data="name">phone2</person>
  </record>
  <record>
    <person key="$s" data="name">phone3sue</person>
  </record>
  <record>
    <person key="$s" data="name">cell4</person>
  </record>
  <record>
    <person key="$s" data="name">home5alice</person>
  </record>
  <record>
    <person key="$s" data="name">atrib6</person>
  </record>
  <record>
    <person key="$s" data="name">x7</person>
  </record>
  <record>
    <person key="$s" data="name">y9</person>
  </record>
  <record>
    <person key="$s" data="name">z10</person>
  </record>
</xml>

输入:

<text>
  <line>people</line>
  <line>joe</line>
  <line>phone1</line>
  <line>phone2</line>
  <line>phone3</line>
  <line>sue</line>
  <line>cell4</line>
  <line>home5</line>
  <line>alice</line>
  <line>atrib6</line>
  <line>x7</line>
  <line>y9</line>
  <line>z10</line>
</text>

概念是每个人"都有一个名字(无数字),也许还有其他数据.因此,希望阅读每一行,然后根据找到名称的位置进行分组.

The notion is that each "person" will have a name (no digits) and perhaps additional data. So looking to read in each line and then group based on where the names are found.

代码:

xquery version "3.0";

<xml>
{
for tumbling window $line in db:open("foo.txt")//text()
start $s when matches($s, '[0-9]')
return   
<record>

       <person key='$s' data="name">{$line}</person>

 </record>
}
 </xml>

查看输出,"phone3sue"显然在进行一些匹配和分组,尽管并不完全符合要求,因为"phone3"应位于其自己的元素中,嵌套在"joe"而不是"sue"中.但是,仍然有一些匹配项在发生.

Looking at the output, "phone3sue" is obviously doing some matching and grouping, although not exactly as desired because "phone3" should be in its own element, nested within "joe" rather than "sue". But, still, there's some matching happening there.

从撒克逊人的邮件列表中:

from the saxon mailing list:

2020年2月19日,星期三,上​​午8:00,上午10:31:37,thufir scripsit:

On Wed, Feb 19, 2020 at 10:31:37AM -0800, thufir scripsit:

我将重新阅读有关窗口的部分;我的印象是 用于显示或报告.

I'll re-read the section on windowing; my impression was that it was more for display or report purposes.

窗口化是您从数据流中取出大块的方式.

Windowing is how you take chunks out of a stream of data.

您所拥有的实际上是一连串的线元素;你可以 识别名称"行,但是您现在不知道它们在多大程度上/如何 在任何特定的名称对之间都有很多数据.

What you've got is effectively a stream of line elements; you can identify the "name" lines, but you don't now how far part they are/how much data is between any particular pair of names.

Windows让您说我想要此流的开头为 姓名行,并持续到(但不包括)下一个名字 线".

Windows lets you say "I want the chunk of this stream that starts with a name line and continues up to (but not including) the next name line".

您能通过两个步骤来详细说明您的意思吗?

Would you elaborate on what you mean by two steps, a bit more concretely?

您正在尝试获取一些输入XML并将其转换为不同的输出 XML.

You're trying to take some input XML and turn it into different output XML.

如果这是纯粹的转换-更改所有名为FOO的元素 到名为BAZ的元素-XQuery不是最佳的工具选择.如果使用XSLT 你可以.它们在计算上是相同的,但是语言具有 不同的偏见,而XSLT确实可以更自然地进行转换.

If this is pure transformation -- change all of the elements named FOO to element named BAZ -- XQuery's not the best tool choice. Use XSLT if you can. They're computationally the same but the languages have different biases and XSLT does transforms more naturally.

如果输出XML表示您输入的抽象 -从道德上讲是某种报告-进行抽象然后提出来会很有帮助.

If the output XML is a representation of an abstraction of your input -- morally some sort of report -- it helps a lot to have the abstraction, and then present it.

因此,在您的情况下,您所拥有的是一个包含隐式流的流. 名称与数据之间的关联. (这是一条直线;唯一的 您知道这些数据行与名称行一起的位置.所以 如果将其转换为名称之间的显式映射 和数据-例如通过创建一个映射变量,其中的键是 名称行的内容(以某种方式处理空格)和 每个键的条目是与该名称关联的数据行- 您已经完成了抽象部分.

So in your case, what you have is a stream containing an implicit association between names and data. (It's a stream of lines; the only way you know these data lines go with that name line is position. So implicit.) If you turn that into an explicit mapping between names and data -- such as by creating a map variable where the keys are the contents of the name line (with spaces handled somehow) and the entries for each key are the data lines associated with that name -- you have done the abstraction part.

然后您可以获取该地图并生成所需的XML输出 它比尝试结合创建新的XML"要简单得多 和执行抽象步骤".我发布的最后一件事是 将地图变成元素的示例,但是作为一种模式,它只是

You can then take that map and produce the XML output you want from it, which is much simpler than trying to combine the "create new XML" and "do the abstraction steps". The last thing I posted has an example of turning a map into elements, but as a pattern it's just

map:keys($ map)! {.} {$ map(.)}

map:keys($map) ! {.}{$map(.)}

(如果节点中有节点或序列,则会变得更加复杂 入门,但不多.)

(it gets more complicated if you've got nodes or a sequence in the entry, but not much more.)

那使事情更接近常识了吗?

That make something a little closer to sense?

-Graydon

_______________________________________________萨克森帮助邮件列表,存档于 http://saxon.markmail.org/ saxon-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/saxon-help

_______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/saxon-help

推荐答案

以下尝试使用tumbling window,该tumbling window以不包含任何ASCII数字(personname)的任何line开头通过至少包含一个ASCII数字的任何行(即data行):

The following tries to use a tumbling window which starts with any line not containing any ASCII digit (the name of the person) followed by any line containing at least one ASCII digit (i.e. the data lines):

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";

declare option output:method 'xml';
declare option output:indent 'yes';

<xml>
{
    for tumbling window $person in text/line
    start $name next $data when matches($name, '^[^0-9]+$') and matches($data, '[0-9]')
    return
        <person>
        {
            <name>{ data($name) }</name>,
            tail($person) ! <data>{data()}</data>

        }
        </person>
}    
</xml>

https://xqueryfiddle.liberty-development.net/gWmuPs1

有输出

<?xml version="1.0" encoding="UTF-8"?>
<xml>
   <person>
      <name>joe</name>
      <data>phone1</data>
      <data>phone2</data>
      <data>phone3</data>
   </person>
   <person>
      <name>sue</name>
      <data>cell4</data>
      <data>home5</data>
   </person>
   <person>
      <name>alice</name>
      <data>atrib6</data>
      <data>x7</data>
      <data>y9</data>
      <data>z10</data>
   </person>
</xml>

这篇关于如何使用滚动窗口按内容对XML元素进行分组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆