ms-excel兼容的csv文件,表示MarkLogic目录中的所有文档 [英] ms-excel compatible csv file representing all documents in a MarkLogic directory

查看:88
本文介绍了ms-excel兼容的csv文件,表示MarkLogic目录中的所有文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何最好地制作一个MS-Excel兼容的csv文件,该文件表示MarkLogic目录中的所有文档,使用XCC Java Client以及TomcatMarklogic都位于远程.目录中的文档数约为15000.

How do I best make an MS-Excel compatible csv file representing all documents in a MarkLogic directory Using XCC Java Client and Tomcat and Marklogic Both are remotely located. Number of document in the directory is around 15000.

推荐答案

第一部分将所有文档都放在目录中,已经可以从

The first part, getting all the documents in a directory, is ready for us from avoiding XDMP-EXPNTREECACHEFULL and loading document

cts:search(
  collection(),
  cts:directory-query('path/to/documents/', 'infinity'))

如我在此处的回答所述,如果您需要进一步的限制,则可以cts:and-query将该cts:directory-query与其他cts:query术语一起使用.

As noted in my answer there, if you need further restrictions you could cts:and-query that cts:directory-query with other cts:query terms.

接下来,您需要将每个XML文档转换为CSV.那很简单,但是您必须知道XML的结构或推断方式.对于此示例,我将说我在某些根元素下始终具有简单的子元素abcd.因此,查询需要为这些元素生成一个CSV标头,然后是CSV行.

Next you need to turn each XML document into CSV. That's fairly simple, but you have to know how your XML is structured or have some way to infer it. For this example I will say that I always have simple child element a, b, c, d under some root element. So the query needs to produce a CSV header for those elements, followed by lines of CSV.

我们可能还想从调用者处传递目录URI.如果您使用的是REST,则将使用xdmp:get-request-field,但对于XCC,这是一个外部值.

We probably also want to hand in the directory URI from the caller. If you were using REST this would use xdmp:get-request-field but for XCC it is an external value.

declare variable $DIRECTORY-URI as xs:string external ;

declare function local:csv($root as element()) as xs:string
{
  string-join(($root/a, $root/b, $root/c, $root/d), ',')
};

'A,B,C,D',
cts:search(
  collection(),
  cts:directory-query($DIRECTORY-URI, 'infinity'))/local:csv(*)

同样,使local:csv对于您的应用程序起作用需要一些XML知识或某种推断其结构的方法.您可能还需要将一些值放在双引号中.但是,这种基本结构是解决问题的最有效方法之一.我避免使用任何XQuery FLWOR表达式,以便可以流处理结果.

Again, making local:csv work for your application requires some knowledge of the XML or some way to infer its structure. You might need to put some values in double-quotes, too. But this basic structure is one of the most efficient ways to attack the problem. I've avoided any XQuery FLWOR expressions, so that the results can stream.

另一种方法是使用范围索引和 http://docs.marklogic.com/cts:value-tuples cts:query来限制结果,然后将JSON转换为CSV.这将更加有效,因为不会提取任何片段.但这不适用于某些XML结构,并且可能无法为每个CSV字段创建范围索引.

Another approach would be to use range indexes and http://docs.marklogic.com/cts:value-tuples with a cts:query to restrict the results, then convert the JSON to CSV. This would be even more efficient because no fragments would be fetched. But this won't work well with some XML structures, and you may not have the luxury of create a range index for every CSV field.

declare variable $DIRECTORY-URI as xs:string external ;

declare function local:csv($ja as json:array) as xs:string
{
  string-join(json:array-values($ja), ',')
};

'A,B,C,D',
local:csv(
  cts:value-tuples(
    (cts:element-reference(xs:QName('a')),
     cts:element-reference(xs:QName('b')),
     cts:element-reference(xs:QName('c')),
     cts:element-reference(xs:QName('d'))),
    (),
    cts:directory-query($DIRECTORY-URI, 'infinity')))

这篇关于ms-excel兼容的csv文件,表示MarkLogic目录中的所有文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆