我应该在哪里限制我的结果? [英] Where should I be limiting my results?

查看:36
本文介绍了我应该在哪里限制我的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我所做的是创建了一个 XML 文件,其中包含我需要对文档执行的数千个搜索词的列表.然后,我从一组样本搜索词中创建了这个查询,作为测试,以针对测试文档执行,并使用来自实际文档的一些样本:

What I have done is created an XML file with a list of several thousand search terms that I need to perform on a document. I then created this query, from a sample set of search terms, as a test, to perform against a test document, with some samples from the actual document:

let $keywords := ("best clients", "Very", "20")
for $keyword in $keywords
let $matches := doc('test')/set/entry[matches(comment, $keyword, 'i')]
return (<re>
{subsequence($matches/comment, 1, 1),
subsequence($matches/buyer, 1, 1)}</re>,
<re>
{subsequence($matches/comment, 2, 1),
subsequence($matches/buyer, 2, 1)}
</re>
)

试图取回,但我以粗略的顺序恢复它们.

Trying to get back <re><comment /><buyer /></re><re><comment /><buyer /></re>... continuous, but I am getting them back in a rough order.

这是正在解析的文档中的一个块(我已经删除了买家姓名和一些嵌套,以便于阅读):

This is a chunk from the document being parsed (I've removed the buyer names and some nests, to make it easier to read):

<set>
<entry>
<comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
<buyer></buyer>
<id>1282</id>
<industry>International Trade; Fish and Game</industry>
</entry>
<entry>
<comment>!On leave in October.</comment>
<buyer></buyer>
<id>709</id>
<industry>Real Estate</industry>
</entry>
<entry>
<comment>Is often !out between 1 and 3 p.m.</comment>
<buyer></buyer>
<id>127</id>
<industry>Virus Software Marketting</industry>
</entry>
<entry>
<comment>Very personable.  One of our best clients.</comment>
<buyer></buyer>
<id>14851</id>
<industry>Administrative support.</industry>
</entry>
<entry>
<comment>!Very difficult to reach, but one of our top buyers.</comment>
<buyer></buyer>
<id>1458</id>
<industry>Construction</industry>
</entry>
<entry>
<comment></comment>
<buyer></buyer>
<id>276470</id>
<industry>Bulk Furniture Sales</industry>
</entry>
<entry>
<comment>A bit of an eccentric.  One of our best clients.</comment>
<buyer></buyer>
<id>1506</id>
<industry>Sports Analysis</industry>
</entry>
<entry>
<comment>Very gullible, so please !be sure she needs what you sell her.  She's one of our best clients.</comment>
<buyer></buyer>
<id>1523</id>
<industry>International Trade</industry>
</entry>
<entry>
<comment>He wants to buy everything, but !he has a tight budget.</comment>
<buyer></buyer>
<id>1524</id>
<industry>Public Relations</industry>
</entry>
</set>

我使用的一些关键字:最佳客户*"、贸易"、20"、......

Some of the keywords I'm using: "Best client*," "Trade", "20", ....

我去过

输出是一长串条目,其中评论和买家子代作为条目元素下的兄弟.我想将返回的条目数量限制为 2 每个关键字.我还试图让以感叹号 (!) 开头的评论成为优先事项.

The output is a long list of entries with comment and buyer children as siblings under the entry element. I'd like to limit the number of entries returned to 2 per keyword. I'm also trying to get comments that begin with an exclamation point (!) to be the priority.

当前输出(接近):

<re><comment>Very personable.  One of our best clients.</comment>
  <buyer/>
</re><re><comment>A bit of an eccentric.  One of our best clients.</comment>
  <buyer/>
</re><re><comment>Very personable.  One of our best clients.</comment>
  <buyer/>
</re><re><comment>!Very difficult to reach, but one of our top buyers.</comment>
  <buyer/>
</re><re><comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
  <buyer/>
</re><re/>

当前输出格式:

<entry>
<comment>keyworda</comment>
<buyer></buyer>
</entry>
<entry>
<comment>keyworda</comment>
<buyer></buyer>
</entry>
<entry>
<comment>keywordb</comment>
<buyer></buyer>
</entry>
<entry>
<comment>!keywordb</comment> //Not prioritized.
<buyer></buyer>
</entry>
<entry>
<comment>keywordc</comment>
<buyer></buyer>
</entry>

所需的输出:

<entry>
<comment>!keyworda</comment>
<buyer></buyer>
</entry>
<entry>
<comment>keyworda</comment>
<buyer></buyer>
</entry>
<entry>
<comment>!keywordb</comment>
<buyer></buyer>
</entry>
<entry>
<comment>!keywordb</comment>
<buyer></buyer>
</entry>

(基本上,优先考虑包含感叹号的条目并将结果限制为每个关键字 2 个.

(Basically, prioritizing exclamation point-containing entries and limiting the results to 2 per keyword.).

推荐答案

let $reults :=
(
  let $pKeywords := ('best clients', 'Very', '20')
  return
    for $kw in $pKeywords
    return
    (
      /*/entry[contains(comment, concat('!', $kw))],
      /*/entry[contains(comment, $kw)]
    )
  [not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
  subsequence($results/comment, $i, 1),
  subsequence($results/buyer, $i, 1)
)

返回正确的解决方案:

<comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
<buyer/>
<comment>Very personable.  One of our best clients.</comment>
<buyer/>
<comment>!Very difficult to reach, but one of our top buyers.</comment>
<buyer/>
<comment>A bit of an eccentric.  One of our best clients.</comment>
<buyer/>

这篇关于我应该在哪里限制我的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆