解析.NET中大JSON文件 [英] Parsing large json file in .NET

查看:152
本文介绍了解析.NET中大JSON文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我曾用JsonConvert.Deserialize(JSON)Json.Net的方法,到目前为止这工作得很好,说实话,我并不需要什么比这更。

我的工作背景(控制台)应用程序,它不断从下载不同的URL JSON的内容,那么反序列化结果放入.NET对象的列表。

 使用(Web客户端的客户端=新的WebClient())
 {
      JSON字符串= client.DownloadString(stringUrl);      VAR的结果= JsonConvert.DeserializeObject<名单,LT;联系与GT;>(JSON); }

简单code段以上没有可能似乎完美,但它的工作。当文件较大(15000联系人 - 48 MB的文件),JsonConvert.DeserializeObject不是解决办法和行抛出JsonReaderException的异常类型。

下载JSON是一个数组,这是一个示例的样子。联系是反序列化JSON对象的容器类。

  [
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  },
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  },
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  },
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  }
]

我最初的猜测是它运行内存不足。只是出于好奇,我试图解析它作为JArray这引起了同样的异常了。

我已经开始潜入Json.Net文件和读取类似的线程。由于我没有设法生产工作的解决方案还没有,我决定在这里张贴问题。

我倒是AP preciate任何意见/ code段,可以帮助我在研究这个问题,更多地了解它,并最终得到解决办法。

谢谢:)

更新:虽然通过反序列化行线,我得到了同样的错误:[路径',行600003,位置1。因此,我所做的就是下载了其中两个,在记事本++检查它们。如果数组长度大于12000,第一万二千元素中的[关闭后,另一个阵列开始我注意到的是。换句话说,JSON的长相酷似这样的:

  [
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  },
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  },
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  },
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  }
]
[
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  },
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  },
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  },
  {
    名字:SomeText则会,
    姓氏:SomeText则会
  }
]


解决方案

当你已经正确诊断您的更新,该问题是,JSON有一个结束] 通过开口紧接着 [开始下一组。这种格式使得JSON无效时作为一个整体,这就是为什么Json.Net抛出一个错误。幸运的是这个问题似乎要拿出往往不够,Json.Net实际上有一个特殊的设置来解决它。如果使用 JsonTextReader 直接读取JSON,你可以在 SupportMultipleContent 标记设置为真正,然后用一个循环单独反序列化的每个项目。这应该让你成功,并在内存有效的方式处理非标准JSON,不管有多少数组段有或多少个项目的每个数组中的

 使用(Web客户端的客户端=新的WebClient())
    使用(流流= client.OpenRead(stringUrl))
    使用(StreamReader的StreamReader的=新的StreamReader(流))
    使用(JsonTextReader读卡器=新JsonTextReader(StreamReader的))
    {
        reader.SupportMultipleContent = TRUE;        VAR串行=新JsonSerializer();
        而(reader.Read())
        {
            如果(reader.TokenType == JsonToken.StartObject)
            {
                联系C = serializer.Deserialize<联系与GT;(读卡器);
                Console.WriteLine(c.FirstName ++ c.LastName);
            }
        }
    }

在这里完整的示例: https://dotnetfiddle.net/2TQa8p

I have used "JsonConvert.Deserialize(json)" method of Json.Net so far which worked quite well and to be honest, I didn't need anything more than this.

I am working on a background (console) app which constantly downloads the json content from different urls, then deserializes the result into a list of .Net object.

 using (WebClient client = new WebClient())
 {
      string json = client.DownloadString(stringUrl);

      var result = JsonConvert.DeserializeObject<List<Contact>>(json);

 }

The simple code snippet above doesn't probably seem perfect but it does the job. When the file is large (15000 contacts - 48 mb file), JsonConvert.DeserializeObject isn't the solution and the line throws an exception type of JsonReaderException.

Downloaded json is an array and this is how a sample looks like. Contact is a container class for the deserialized json object.

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

My initial guess is it runs out of Memory. Just out of curiosity,i tried to parse it as JArray which caused the same exception too.

I have started to dive into Json.Net documentation and read similar threads. As I haven't managed to produce a working solution yet, I decided to post a question here.

I'd appreciate any advice/code snippet which could help me in researching the issue, learning more about it and eventually getting to a solution.

Thanks :)

UPDATE: While deserializing line by line, I got the same error: " [. Path '', line 600003, position 1." So what I did was to download two of them and checked them in Notepad++. What I noticed is if the array length is more than 12000, after 12000th element the "[" is closed and another array starts. In other words, the json looks exactly like this:

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]
[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

解决方案

As you've correctly diagnosed in your update, the issue is that the JSON has a closing ] followed immediately by an opening [ to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.Net throws an error. Fortunately this problem seems to come up often enough that Json.Net actually has a special setting to deal with it. If you use a JsonTextReader directly to read the JSON, you can set the SupportMultipleContent flag to true, and then use a loop to deserialize each item individually. This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many array sections there are or how many items in each array.

    using (WebClient client = new WebClient())
    using (Stream stream = client.OpenRead(stringUrl))
    using (StreamReader streamReader = new StreamReader(stream))
    using (JsonTextReader reader = new JsonTextReader(streamReader))
    {
        reader.SupportMultipleContent = true;

        var serializer = new JsonSerializer();
        while (reader.Read())
        {
            if (reader.TokenType == JsonToken.StartObject)
            {
                Contact c = serializer.Deserialize<Contact>(reader);
                Console.WriteLine(c.FirstName + " " + c.LastName);
            }
        }
    }

Full demo here: https://dotnetfiddle.net/2TQa8p

这篇关于解析.NET中大JSON文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆