解析.NET中大JSON文件 [英] Parsing large json file in .NET

查看：152 发布时间：2016/8/28 15:28:47 c# json.net deserialization json-deserialization

本文介绍了解析.NET中大JSON文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我曾用JsonConvert.Deserialize（JSON）Json.Net的方法，到目前为止这工作得很好，说实话，我并不需要什么比这更。

我的工作背景（控制台）应用程序，它不断从下载不同的URL JSON的内容，那么反序列化结果放入.NET对象的列表。

 使用（Web客户端的客户端=新的WebClient（））
 {
      JSON字符串= client.DownloadString（stringUrl）;      VAR的结果= JsonConvert.DeserializeObject＆LT;名单，LT;联系与GT;＆GT;（JSON）; }

简单code段以上没有可能似乎完美，但它的工作。当文件较大（15000联系人 - 48 MB的文件），JsonConvert.DeserializeObject不是解决办法和行抛出JsonReaderException的异常类型。

下载JSON是一个数组，这是一个示例的样子。联系是反序列化JSON对象的容器类。

  [
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }，
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }，
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }，
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }
]

我最初的猜测是它运行内存不足。只是出于好奇，我试图解析它作为JArray这引起了同样的异常了。

我已经开始潜入Json.Net文件和读取类似的线程。由于我没有设法生产工作的解决方案还没有，我决定在这里张贴问题。

我倒是AP preciate任何意见/ code段，可以帮助我在研究这个问题，更多地了解它，并最终得到解决办法。

谢谢:)

更新：虽然通过反序列化行线，我得到了同样的错误：[路径'，行600003，位置1。因此，我所做的就是下载了其中两个，在记事本++检查它们。如果数组长度大于12000，第一万二千元素中的[关闭后，另一个阵列开始我注意到的是。换句话说，JSON的长相酷似这样的：

  [
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }，
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }，
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }，
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }
]
[
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }，
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }，
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }，
  {
    名字：SomeText则会，
    姓氏：SomeText则会
  }
]

解决方案

当你已经正确诊断您的更新，该问题是，JSON有一个结束] 通过开口紧接着 [开始下一组。这种格式使得JSON无效时作为一个整体，这就是为什么Json.Net抛出一个错误。幸运的是这个问题似乎要拿出往往不够，Json.Net实际上有一个特殊的设置来解决它。如果使用 JsonTextReader 直接读取JSON，你可以在 SupportMultipleContent 标记设置为真正，然后用一个循环单独反序列化的每个项目。这应该让你成功，并在内存有效的方式处理非标准JSON，不管有多少数组段有或多少个项目的每个数组中的

 使用（Web客户端的客户端=新的WebClient（））
    使用（流流= client.OpenRead（stringUrl））
    使用（StreamReader的StreamReader的=新的StreamReader（流））
    使用（JsonTextReader读卡器=新JsonTextReader（StreamReader的））
    {
        reader.SupportMultipleContent = TRUE;        VAR串行=新JsonSerializer（）;
        而（reader.Read（））
        {
            如果（reader.TokenType == JsonToken.StartObject）
            {
                联系C = serializer.Deserialize＆LT;联系与GT;（读卡器）;
                Console.WriteLine（c.FirstName ++ c.LastName）;
            }
        }
    }

在这里完整的示例： https://dotnetfiddle.net/2TQa8p

I have used "JsonConvert.Deserialize(json)" method of Json.Net so far which worked quite well and to be honest, I didn't need anything more than this.

I am working on a background (console) app which constantly downloads the json content from different urls, then deserializes the result into a list of .Net object.

 using (WebClient client = new WebClient())
 {
      string json = client.DownloadString(stringUrl);

      var result = JsonConvert.DeserializeObject<List<Contact>>(json);

 }

The simple code snippet above doesn't probably seem perfect but it does the job. When the file is large (15000 contacts - 48 mb file), JsonConvert.DeserializeObject isn't the solution and the line throws an exception type of JsonReaderException.

Downloaded json is an array and this is how a sample looks like. Contact is a container class for the deserialized json object.

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

My initial guess is it runs out of Memory. Just out of curiosity,i tried to parse it as JArray which caused the same exception too.

I have started to dive into Json.Net documentation and read similar threads. As I haven't managed to produce a working solution yet, I decided to post a question here.

I'd appreciate any advice/code snippet which could help me in researching the issue, learning more about it and eventually getting to a solution.

Thanks :)

UPDATE: While deserializing line by line, I got the same error: " [. Path '', line 600003, position 1." So what I did was to download two of them and checked them in Notepad++. What I noticed is if the array length is more than 12000, after 12000th element the "[" is closed and another array starts. In other words, the json looks exactly like this:

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]
[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

解决方案

As you've correctly diagnosed in your update, the issue is that the JSON has a closing ] followed immediately by an opening [ to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.Net throws an error. Fortunately this problem seems to come up often enough that Json.Net actually has a special setting to deal with it. If you use a JsonTextReader directly to read the JSON, you can set the SupportMultipleContent flag to true, and then use a loop to deserialize each item individually. This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many array sections there are or how many items in each array.

    using (WebClient client = new WebClient())
    using (Stream stream = client.OpenRead(stringUrl))
    using (StreamReader streamReader = new StreamReader(stream))
    using (JsonTextReader reader = new JsonTextReader(streamReader))
    {
        reader.SupportMultipleContent = true;

        var serializer = new JsonSerializer();
        while (reader.Read())
        {
            if (reader.TokenType == JsonToken.StartObject)
            {
                Contact c = serializer.Deserialize<Contact>(reader);
                Console.WriteLine(c.FirstName + " " + c.LastName);
            }
        }
    }

Full demo here: https://dotnetfiddle.net/2TQa8p

这篇关于解析.NET中大JSON文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析.NET中大JSON文件 [英] Parsing large json file in .NET

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

解析.NET中大JSON文件 [英] Parsing large json file in .NET

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭