如何解析巨大的JSON文件而不将其加载到内存中 [英] how to parse a huge JSON file without loading it in memory
问题描述
我有一个大的JSON文件(2.5MB),其中包含约80000行.
它看起来像这样:
{
"a": 123,
"b": 0.26,
"c": [HUGE irrelevant object],
"d": 32
}
我只希望存储键a
,b
和d
的整数值,并忽略其余JSON(即忽略c
值中的内容).
我无法修改原始的JSON,因为它是由第三方服务创建的,该服务是从其服务器下载的.
如何在不将整个文件加载到内存的情况下执行此操作?
我尝试使用 gson 库,并创建了这样的bean:
public class MyJsonBean {
@SerializedName("a")
@Expose
public Integer a;
@SerializedName("b")
@Expose
public Double b;
@SerializedName("d")
@Expose
public Integer d;
}
但是即使如此,为了使用Gson反序列化,我还需要先下载+读取内存中的整个文件,然后将其作为字符串传递给Gson?
File myFile = new File(<FILENAME>);
myFile.createNewFile();
URL url = new URL(<URL>);
OutputStream out = new BufferedOutputStream(new FileOutputStream(myFile));
URLConnection conn = url.openConnection();
HttpURLConnection httpConn = (HttpURLConnection) conn;
InputStream in = conn.getInputStream();
byte[] buffer = new byte[1024];
int numRead;
while ((numRead = in.read(buffer)) != -1) {
out.write(buffer, 0, numRead);
}
FileInputStream fis = new FileInputStream(myFile);
byte[] data = new byte[(int) myFile.length()];
fis.read(data);
String str = new String(data, "UTF-8");
Gson gson = new Gson();
MyJsonBean response = gson.fromJson(str, MyJsonBean.class);
System.out.println("a: " + response.a + "" + response.b + "" + response.d);
有什么方法可以避免加载整个文件,而只是获取我需要的相关值?
您绝对应该检查其他方法和库.如果您真的很在意性能检查,请执行以下操作:Gson
, Jackson
和 JsonPath
库可以做到这一点并选择最快的库.绝对,您必须将整个JSON
文件加载到本地磁盘(可能是TMP
文件夹)中,然后再对其进行解析.
简单的JsonPath
解决方案如下所示:
import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;
import java.io.File;
public class JsonPathApp {
public static void main(String[] args) throws Exception {
File jsonFile = new File("./resource/test.json").getAbsoluteFile();
DocumentContext documentContext = JsonPath.parse(jsonFile);
System.out.println("" + documentContext.read("$.a"));
System.out.println("" + documentContext.read("$.b"));
System.out.println("" + documentContext.read("$.d"));
}
}
请注意,我没有创建任何POJO
,只是使用与XPath
类似的JSONPath
功能读取给定值.您可以使用Jackson
做同样的事情:
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
public class JsonPathApp {
public static void main(String[] args) throws Exception {
File jsonFile = new File("./resource/test.json").getAbsoluteFile();
ObjectMapper mapper = new ObjectMapper();
JsonNode root = mapper.readTree(jsonFile);
System.out.println(root.get("a"));
System.out.println(root.get("b"));
System.out.println(root.get("d"));
}
}
我们不需要JSONPath
,因为我们需要的值直接在root
节点中.如您所见,API
看起来几乎一样.我们还可以创建POJO
结构:
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
import java.math.BigDecimal;
public class JsonPathApp {
public static void main(String[] args) throws Exception {
File jsonFile = new File("./resource/test.json").getAbsoluteFile();
ObjectMapper mapper = new ObjectMapper();
Pojo pojo = mapper.readValue(jsonFile, Pojo.class);
System.out.println(pojo);
}
}
@JsonIgnoreProperties(ignoreUnknown = true)
class Pojo {
private Integer a;
private BigDecimal b;
private Integer d;
// getters, setters
}
即使如此,两个库都允许直接从URL
读取JSON
有效负载.我建议使用您能找到的最佳方法在另一步骤中下载它.有关更多信息,请阅读本文:从Java中的URL下载文件. /p>
I have a large JSON file (2.5MB) containing about 80000 lines.
It looks like this:
{
"a": 123,
"b": 0.26,
"c": [HUGE irrelevant object],
"d": 32
}
I only want the integer values stored for keys a
, b
and d
and ignore the rest of the JSON (i.e. ignore whatever is there in the c
value).
I cannot modify the original JSON as it is created by a 3rd party service, which I download from its server.
How do I do this without loading the entire file in memory?
I tried using gson library and created the bean like this:
public class MyJsonBean {
@SerializedName("a")
@Expose
public Integer a;
@SerializedName("b")
@Expose
public Double b;
@SerializedName("d")
@Expose
public Integer d;
}
but even then in order to deserialize it using Gson, I need to download + read the whole file in memory first and the pass it as a string to Gson?
File myFile = new File(<FILENAME>);
myFile.createNewFile();
URL url = new URL(<URL>);
OutputStream out = new BufferedOutputStream(new FileOutputStream(myFile));
URLConnection conn = url.openConnection();
HttpURLConnection httpConn = (HttpURLConnection) conn;
InputStream in = conn.getInputStream();
byte[] buffer = new byte[1024];
int numRead;
while ((numRead = in.read(buffer)) != -1) {
out.write(buffer, 0, numRead);
}
FileInputStream fis = new FileInputStream(myFile);
byte[] data = new byte[(int) myFile.length()];
fis.read(data);
String str = new String(data, "UTF-8");
Gson gson = new Gson();
MyJsonBean response = gson.fromJson(str, MyJsonBean.class);
System.out.println("a: " + response.a + "" + response.b + "" + response.d);
Is there any way to avoid loading the whole file and just get the relevant values that I need?
You should definitely check different approaches and libraries. If you are really take care about performance check: Gson
, Jackson
and JsonPath
libraries to do that and choose the fastest one. Definitely you have to load the whole JSON
file on local disk, probably TMP
folder and parse it after that.
Simple JsonPath
solution could look like below:
import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;
import java.io.File;
public class JsonPathApp {
public static void main(String[] args) throws Exception {
File jsonFile = new File("./resource/test.json").getAbsoluteFile();
DocumentContext documentContext = JsonPath.parse(jsonFile);
System.out.println("" + documentContext.read("$.a"));
System.out.println("" + documentContext.read("$.b"));
System.out.println("" + documentContext.read("$.d"));
}
}
Notice, that I do not create any POJO
, just read given values using JSONPath
feature similarly to XPath
. The same you can do with Jackson
:
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
public class JsonPathApp {
public static void main(String[] args) throws Exception {
File jsonFile = new File("./resource/test.json").getAbsoluteFile();
ObjectMapper mapper = new ObjectMapper();
JsonNode root = mapper.readTree(jsonFile);
System.out.println(root.get("a"));
System.out.println(root.get("b"));
System.out.println(root.get("d"));
}
}
We do not need JSONPath
because values we need are directly in root
node. As you can see, API
looks almost the same. We can also create POJO
structure:
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
import java.math.BigDecimal;
public class JsonPathApp {
public static void main(String[] args) throws Exception {
File jsonFile = new File("./resource/test.json").getAbsoluteFile();
ObjectMapper mapper = new ObjectMapper();
Pojo pojo = mapper.readValue(jsonFile, Pojo.class);
System.out.println(pojo);
}
}
@JsonIgnoreProperties(ignoreUnknown = true)
class Pojo {
private Integer a;
private BigDecimal b;
private Integer d;
// getters, setters
}
Even so, both libraries allow to read JSON
payload directly from URL
I suggest to download it in another step using best approach you can find. For more info, read this article: Download a File From an URL in Java.
这篇关于如何解析巨大的JSON文件而不将其加载到内存中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!