解析.NET中大JSON文件 [英] Parsing large json file in .NET
问题描述
我曾用JsonConvert.Deserialize(JSON)Json.Net的方法,到目前为止这工作得很好,说实话,我并不需要什么比这更。
我的工作背景(控制台)应用程序,它不断从下载不同的URL JSON的内容,那么反序列化结果放入.NET对象的列表。
使用(Web客户端的客户端=新的WebClient())
{
JSON字符串= client.DownloadString(stringUrl); VAR的结果= JsonConvert.DeserializeObject<名单,LT;联系与GT;>(JSON); }
简单code段以上没有可能似乎完美,但它的工作。当文件较大(15000联系人 - 48 MB的文件),JsonConvert.DeserializeObject不是解决办法和行抛出JsonReaderException的异常类型。
下载JSON是一个数组,这是一个示例的样子。联系是反序列化JSON对象的容器类。
[
{
名字:SomeText则会,
姓氏:SomeText则会
},
{
名字:SomeText则会,
姓氏:SomeText则会
},
{
名字:SomeText则会,
姓氏:SomeText则会
},
{
名字:SomeText则会,
姓氏:SomeText则会
}
]
我最初的猜测是它运行内存不足。只是出于好奇,我试图解析它作为JArray这引起了同样的异常了。
我已经开始潜入Json.Net文件和读取类似的线程。由于我没有设法生产工作的解决方案还没有,我决定在这里张贴问题。
我倒是AP preciate任何意见/ code段,可以帮助我在研究这个问题,更多地了解它,并最终得到解决办法。
谢谢:)
更新:虽然通过反序列化行线,我得到了同样的错误:[路径',行600003,位置1。因此,我所做的就是下载了其中两个,在记事本++检查它们。如果数组长度大于12000,第一万二千元素中的[关闭后,另一个阵列开始我注意到的是。换句话说,JSON的长相酷似这样的:
[
{
名字:SomeText则会,
姓氏:SomeText则会
},
{
名字:SomeText则会,
姓氏:SomeText则会
},
{
名字:SomeText则会,
姓氏:SomeText则会
},
{
名字:SomeText则会,
姓氏:SomeText则会
}
]
[
{
名字:SomeText则会,
姓氏:SomeText则会
},
{
名字:SomeText则会,
姓氏:SomeText则会
},
{
名字:SomeText则会,
姓氏:SomeText则会
},
{
名字:SomeText则会,
姓氏:SomeText则会
}
]
当你已经正确诊断您的更新,该问题是,JSON有一个结束]
通过开口紧接着 [
开始下一组。这种格式使得JSON无效时作为一个整体,这就是为什么Json.Net抛出一个错误。幸运的是这个问题似乎要拿出往往不够,Json.Net实际上有一个特殊的设置来解决它。如果使用 JsonTextReader
直接读取JSON,你可以在 SupportMultipleContent
标记设置为真正
,然后用一个循环单独反序列化的每个项目。这应该让你成功,并在内存有效的方式处理非标准JSON,不管有多少数组段有或多少个项目的每个数组中的
使用(Web客户端的客户端=新的WebClient())
使用(流流= client.OpenRead(stringUrl))
使用(StreamReader的StreamReader的=新的StreamReader(流))
使用(JsonTextReader读卡器=新JsonTextReader(StreamReader的))
{
reader.SupportMultipleContent = TRUE; VAR串行=新JsonSerializer();
而(reader.Read())
{
如果(reader.TokenType == JsonToken.StartObject)
{
联系C = serializer.Deserialize<联系与GT;(读卡器);
Console.WriteLine(c.FirstName ++ c.LastName);
}
}
}
在这里完整的示例: https://dotnetfiddle.net/2TQa8p
I have used "JsonConvert.Deserialize(json)" method of Json.Net so far which worked quite well and to be honest, I didn't need anything more than this.
I am working on a background (console) app which constantly downloads the json content from different urls, then deserializes the result into a list of .Net object.
using (WebClient client = new WebClient())
{
string json = client.DownloadString(stringUrl);
var result = JsonConvert.DeserializeObject<List<Contact>>(json);
}
The simple code snippet above doesn't probably seem perfect but it does the job. When the file is large (15000 contacts - 48 mb file), JsonConvert.DeserializeObject isn't the solution and the line throws an exception type of JsonReaderException.
Downloaded json is an array and this is how a sample looks like. Contact is a container class for the deserialized json object.
[
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
}
]
My initial guess is it runs out of Memory. Just out of curiosity,i tried to parse it as JArray which caused the same exception too.
I have started to dive into Json.Net documentation and read similar threads. As I haven't managed to produce a working solution yet, I decided to post a question here.
I'd appreciate any advice/code snippet which could help me in researching the issue, learning more about it and eventually getting to a solution.
Thanks :)
UPDATE: While deserializing line by line, I got the same error: " [. Path '', line 600003, position 1." So what I did was to download two of them and checked them in Notepad++. What I noticed is if the array length is more than 12000, after 12000th element the "[" is closed and another array starts. In other words, the json looks exactly like this:
[
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
}
]
[
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
}
]
As you've correctly diagnosed in your update, the issue is that the JSON has a closing ]
followed immediately by an opening [
to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.Net throws an error. Fortunately this problem seems to come up often enough that Json.Net actually has a special setting to deal with it. If you use a JsonTextReader
directly to read the JSON, you can set the SupportMultipleContent
flag to true
, and then use a loop to deserialize each item individually. This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many array sections there are or how many items in each array.
using (WebClient client = new WebClient())
using (Stream stream = client.OpenRead(stringUrl))
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
reader.SupportMultipleContent = true;
var serializer = new JsonSerializer();
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
Contact c = serializer.Deserialize<Contact>(reader);
Console.WriteLine(c.FirstName + " " + c.LastName);
}
}
}
Full demo here: https://dotnetfiddle.net/2TQa8p
这篇关于解析.NET中大JSON文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!