ms-excel兼容的csv文件,表示MarkLogic目录中的所有文档 [英] ms-excel compatible csv file representing all documents in a MarkLogic directory
问题描述
如何最好地制作一个MS-Excel
兼容的csv
文件,该文件表示MarkLogic目录中的所有文档,使用XCC
Java Client以及Tomcat
和Marklogic
都位于远程.目录中的文档数约为15000.
How do I best make an MS-Excel
compatible csv
file representing all documents in a MarkLogic directory Using XCC
Java Client and Tomcat
and Marklogic
Both are remotely located. Number of document in the directory is around 15000.
推荐答案
The first part, getting all the documents in a directory, is ready for us from avoiding XDMP-EXPNTREECACHEFULL and loading document
cts:search(
collection(),
cts:directory-query('path/to/documents/', 'infinity'))
如我在此处的回答所述,如果您需要进一步的限制,则可以cts:and-query
将该cts:directory-query
与其他cts:query
术语一起使用.
As noted in my answer there, if you need further restrictions you could cts:and-query
that cts:directory-query
with other cts:query
terms.
接下来,您需要将每个XML文档转换为CSV.那很简单,但是您必须知道XML的结构或推断方式.对于此示例,我将说我在某些根元素下始终具有简单的子元素a
,b
,c
,d
.因此,查询需要为这些元素生成一个CSV标头,然后是CSV行.
Next you need to turn each XML document into CSV. That's fairly simple, but you have to know how your XML is structured or have some way to infer it. For this example I will say that I always have simple child element a
, b
, c
, d
under some root element. So the query needs to produce a CSV header for those elements, followed by lines of CSV.
我们可能还想从调用者处传递目录URI.如果您使用的是REST,则将使用xdmp:get-request-field
,但对于XCC,这是一个外部值.
We probably also want to hand in the directory URI from the caller. If you were using REST this would use xdmp:get-request-field
but for XCC it is an external value.
declare variable $DIRECTORY-URI as xs:string external ;
declare function local:csv($root as element()) as xs:string
{
string-join(($root/a, $root/b, $root/c, $root/d), ',')
};
'A,B,C,D',
cts:search(
collection(),
cts:directory-query($DIRECTORY-URI, 'infinity'))/local:csv(*)
同样,使local:csv
对于您的应用程序起作用需要一些XML知识或某种推断其结构的方法.您可能还需要将一些值放在双引号中.但是,这种基本结构是解决问题的最有效方法之一.我避免使用任何XQuery FLWOR表达式,以便可以流处理结果.
Again, making local:csv
work for your application requires some knowledge of the XML or some way to infer its structure. You might need to put some values in double-quotes, too. But this basic structure is one of the most efficient ways to attack the problem. I've avoided any XQuery FLWOR expressions, so that the results can stream.
另一种方法是使用范围索引和 http://docs.marklogic.com/cts:value-tuples 和cts:query
来限制结果,然后将JSON转换为CSV.这将更加有效,因为不会提取任何片段.但这不适用于某些XML结构,并且可能无法为每个CSV字段创建范围索引.
Another approach would be to use range indexes and http://docs.marklogic.com/cts:value-tuples with a cts:query
to restrict the results, then convert the JSON to CSV. This would be even more efficient because no fragments would be fetched. But this won't work well with some XML structures, and you may not have the luxury of create a range index for every CSV field.
declare variable $DIRECTORY-URI as xs:string external ;
declare function local:csv($ja as json:array) as xs:string
{
string-join(json:array-values($ja), ',')
};
'A,B,C,D',
local:csv(
cts:value-tuples(
(cts:element-reference(xs:QName('a')),
cts:element-reference(xs:QName('b')),
cts:element-reference(xs:QName('c')),
cts:element-reference(xs:QName('d'))),
(),
cts:directory-query($DIRECTORY-URI, 'infinity')))
这篇关于ms-excel兼容的csv文件,表示MarkLogic目录中的所有文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!