游离碱的RDF转储产量只有1150万的N-Triples，而不是1.9十亿的C＃解析 [英] C# parsing of Freebase RDF dump yields only 11.5 million N-Triples instead of 1.9 billion

查看：166 发布时间：2016/10/8 15:44:06 c# rdf freebase

本文介绍了游离碱的RDF转储产量只有1150万的N-Triples，而不是1.9十亿的C＃解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在建立一个C＃程序中的谷歌中游离碱的数据转储。要开始了，我写了一个简单的循环简单地读取该文件并获得三倍的计数。然而，而不是让1.9十亿计数作为文档页面说明（以上简称），我的计划是只计算约11.5万元，然后退出。源代码的相关部分给出以下（约需30秒钟运行）。

I'm working on building a C# program to read the RDF data in the Google Freebase data dump. To start out, I've written a simple loop to simply read the file and get a count of the Triples. However, instead of getting the 1.9 billion count as stated in the documentation page (referred above), my program is counting only about 11.5 million and then exiting. The relevant portion of the source code is given below (takes about 30 seconds to run).

我缺少的是在这里吗？

// Simple reading through the gz file
try
{
    using (FileStream fileToDecompress = File.Open(@"C:\Users\Krishna\Downloads\freebase-rdf-2014-02-16-00-00.gz", FileMode.Open))
    {
        int tupleCount = 0;
        string readLine = "";

        using (GZipStream decompressionStream = new GZipStream(fileToDecompress, CompressionMode.Decompress))
        {
            StreamReader sr = new StreamReader(decompressionStream, detectEncodingFromByteOrderMarks: true);

            while (true)
            {
                readLine = sr.ReadLine();
                if (readLine != null)
                {
                    tupleCount++;
                    if (tupleCount % 1000000 == 0)
                    { Console.WriteLine(DateTime.Now.ToShortTimeString() + ": " + tupleCount.ToString()); }
                }
                else
                { break; }
            }
            Console.WriteLine("Tuples: " + tupleCount.ToString());
        }
    }
}
catch (Exception ex)
{ Console.WriteLine(ex.Message); }

（我尝试使用 GZippedNTriplesParser 在 dotNetRdf 通过建立的这一建议，但是，这似乎是在窒息的 RdfParseException 右键开头（制表符？UTF-8？）。因此，就目前而言，试图推出自己的）。

(I tried using GZippedNTriplesParser in dotNetRdf to read the data by building on this recommendation, but that seems to be choking on an RdfParseException right at the beginning (Tab delimiters? UTF-8??). So, for the moment, trying to roll my own).

游离碱的RDF转储产量只有1150万的N-Triples，而不是1.9十亿的C＃解析 [英] C# parsing of Freebase RDF dump yields only 11.5 million N-Triples instead of 1.9 billion

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

游离碱的RDF转储产量只有1150万的N-Triples，而不是1.9十亿的C＃解析 [英] C# parsing of Freebase RDF dump yields only 11.5 million N-Triples instead of 1.9 billion

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭