Jena TDB,查看在tdb创建期间存储了多少个三元组 [英] Jena TDB , see how many triple stored during tdb creation

查看:156
本文介绍了Jena TDB,查看在tdb创建期间存储了多少个三元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您可能会发现在使用Java API创建tdb期间存储的三元组数量是多少? 我使用turtle中的rar文件运行TDB工厂,但是在目录中创建文件的过程中,我看不到它存储了多少个文件.我该如何解决这个问题?

Hi is possible to see the number of triple in storing during tdb creation with java api? I run the TDB factory with a rar file in turtle , but during the creation of files in my directory i cant see how many triple it has stored. How can i solve this problem?

推荐答案

您可以通过Java代码访问大容量加载器(以查看引入的三元组),如下所示:

You can access the bulk-loader through java code (to view triples introduced) as follows:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
try( final InputStream in = /*get input stream for your large file*/) {
    TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , in, true);
}

如果您的归档文件中有多个文件(为简单起见,我将不使用rar文件,而是使用zip文件),则按根据该问题的答案,您可以在将文件传递到批量加载器之前将文件串联为单个文件,从而获得最佳的性能.改进的性能来自于延迟索引创建,直到所有三元组都被引入为止.我确定还支持其他格式,但是我只测试了N-TRIPLES.

If you have multiple files in your archive (for simplicity, I'll not do rar, but rather a zip), then as per an answer to this question, you can get optimized performance by concatenating the files into a single file prior to passing them to the bulk loader. The improved performance arises from delaying index creation until all triples have been introduced. I'm sure there are other formats that are supported, but I have only tested N-TRIPLES.

以下示例利用commons-io中的IOUtils复制流:

The following example utilizes IOUtils from commons-io for copying streams:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
final PipedOutputStream concatOut = new PipedOutputStream();
final PipedInputStream concatIn = new PipedInputStream(concatOut);

final ExecutorService workers = Executors.newFixedThreadPool(2);
final Future<Long> submitter = workers.submit(new Callable<Long>(){
    @Override
    public Long call() throws Exception {
        long filesLoaded = 0;
        try( final ZipFile zipFile = new ZipFile( /* Archive Location */ ) {
            final Enumeration< ? extends ZipEntry> zipEntries = zipFile.entries();
            while( zipEntries.hasMoreElements() ) {
                final ZipEntry entry = zipEntries.nextElement();
                try( final InputStream singleIn = zipFile.getInputStream(entry) ) {
                    // If your file is in a supported format already
                    IOUtils.copy(singleIn, concatOut); 
                    /*(final Model m = ModelFactory.createDefaultModel();
                    m.read(singleIn, null, "lang");
                    m.write(concatOut, "N-TRIPLES");*/
                }
                filesLoaded++;
            }
        }
        concatOut.close();
        return filesLoaded;
    }});

final Future<Void> comitter = workers.submit(new Callable<Void>(){
    @Override
    public Void call() throws Exception {
        TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , concatIn, true);
        return null;
    }});

workers.shutdown();
System.out.println("submitted "+submitter.get()+" input files for processing");
comitter.get();
System.out.println("completed processing");
workers.awaitTermination(1, TimeUnit.SECONDS); // NOTE this wait is redundant

这篇关于Jena TDB,查看在tdb创建期间存储了多少个三元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆