如何使用Java中的Streaming API解析JSON日志文件,然后输出制表日志文件 [英] How to parse JSON log file with Streaming API in Java, then output tabulated log file

查看:96
本文介绍了如何使用Java中的Streaming API解析JSON日志文件,然后输出制表日志文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我手边有一个问题,我试图解析以JSON格式存储的大型日志文件,然后列表数据并将其输出为另一个JSON文件。以下是我解析的日志文件的格式:

  {
timestamp:2012-10 -01TO1:00:00.000,
id:someone@somewhere.net,
action:Some_Action
responsecode:1000
}

这里的操作是一些用户执行的操作,响应代码是action。



时间戳和id实际上与我的制表无关,而我只对动作/代码字段感兴趣。在任何给定的日志文件中可能会有成千上万的这些条目,我想要做的是跟踪所有类型的动作响应代码及其各自的类型

以下是我期望生成的输出示例。

  {actionName:Some_User_Action,
responses:[{code:1000,count:36},
{code :1001,count:6},
{code:1002,count:3},
{code:1003, count:36},
{code:1004,count:2}],
totalActionCount:83}

因此,基本上,对于每个Action,我想跟踪它生成的所有不同响应以及每个发生的次数。最后,我想跟踪该行为的总数。



目前,我已经为输出对象创建了一个Java类,我计划存储输出数据。我也对我应该存储响应数组和它们各自的计数值的格式有点困惑。响应代码类型的总数也因Action而异。

根据我的研究,似乎我将需要使用流式传输的JSON解析API。使用Streaming API的原因主要是由于使用非流式API所需的内存开销量,这可能不适用于这些日志文件的大小。我目前正在考虑使用杰克逊或GSON,但我无法找到任何具体的例子或教程来让我开始。有人知道我可以学习的一个很好的例子,或者有关我如何解决这个问题的任何提示?谢谢!



编辑:我的课程定义。

  public class Action {



public static class Response {

private int _resultCode;
private int _count = 0;

public Response(){}

public int getResultCode(){return _resultCode; }
public int getCount(){return _count; }

public void setResultCode(int rc){_resultCode = rc; }
public void setCount(int c){_count = c; }

}

私人列表< Response> response = new ArrayList< Response>();
private String _name;

//我遗漏了我将添加的getters / setter和helper函数。




$ b如果我使用的是杰克逊,并希望最终能够为了将这个对象很容易地序列化回JSON,有没有关于我如何定义这个类的建议?目前,我正在使用以下命令在main()方法中创建另一个此Action类型的ArrayList:
List actions = new ArrayList();
是否使用HashMaps或其他替代方案是更好的选择?另外,它可以让我轻松地将它序列化为JSON,然后使用Jackson?

你可以看一下Genson库< a href =http://code.google.com/p/genson/ =nofollow> http://code.google.com/p/genson/ ,您将在Wiki页面找到一些关于如何使用它的例子。
自从它首次发布以来,它提供了流式传输模式,并且似乎是杰克逊之后最快的,请参阅 benchmarkmarks。



如果你想做一些非常有效的事情,并且使用小内存脚本,可以通过实例化JsonReader直接使用流API,然后用它来读取记录的结构并增加计数器。

否则,您可以使用Genson实例将文件直接解析为java对象,但在您的情况下,我不认为这是正确的解决方案,因为它需要您将所有对象存储在内存中!



以下是直接使用流api的一个简单示例。它不会完全打印您期望的结构,因为它需要更多的代码才能有效地计算结构:

  public static void main (String [] args)抛出IOException,TransformationException {
Map< String,Map< String,Integer>> actions = new HashMap< String,Map< String,Integer>>();
Genson genson = new Genson();

ObjectReader reader = genson.createReader(new FileReader(path / to / the / file));
while(reader.hasNext()){
reader.next();
reader.beginObject();
String action = readUntil(action,reader);
//假设下一个名称/值对是响应码
reader.next();
String responseCode = reader.valueAsString();
Map< String,Integer> countMap = actions.get(action);
if(countMap == null){
countMap = new HashMap< String,Integer>();
actions.put(action,countMap);
}

Integer count = countMap.get(responseCode);
if(count == null){
count = 0;
}
count ++;
countMap.put(responseCode,count);

reader.endObject();
}

//例如,如果您对同一个动作有两个不同的响应代码,它将打印
// {Some_Action:{1001:1,1000 :1}}
String json = genson.serialize(actions);


static String readUntil(String name,ObjectReader reader)抛出IOException {
while(reader.hasNext()){
reader.next();
if(name.equals(reader.name())){
return reader.valueAsString();
}
}
抛出new IllegalStateException();
}


I have a problem at hand where I am trying to parse large log files stored in JSON format, and then tabulate the data and output it as another JSON file. Following is the format of the log files that I am parsing:

{
"timestamp": "2012-10-01TO1:00:00.000",
"id": "someone@somewhere.net",
"action": "Some_Action"
"responsecode": "1000"
}

The action here is the action that some user performs, and the response code is the result of that action.

The timestamp and id are actually irrelevant for my tabulation, and I am only interested in the action/code fields. There may be tens of thousands of these entries in any given log file, and what I want to do is keep track of all the types of action's, the responsecode and their respective number of occurrences.

Below would be a sample of the output I am looking to generate.

{"actionName": "Some_User_Action",
"responses": [{"code": "1000", "count": "36"},
              {"code": "1001", "count": "6"},
              {"code": "1002", "count": "3"},
              {"code": "1003", "count": "36"},
              {"code": "1004", "count": "2"}],
"totalActionCount": "83"}

So basically, for each Action, I want to keep track of all the different responses it generates, and the number of times each occurred. Finally I want to keep track of the total number of responses for that action in total.

Currently, I have created a Java class for the output object in which I plan to store the output data. I am also a little bit confused with the format I should be storing the array of responses and their respective count numbers. The total number of response code types varies depending on the Action as well.

Based upon my research it seems that I will be needing to make use of JSON parsing using a Streaming API. The reason for using Streaming API is mainly due to the amount of memory overhead using a non-streaming API would need, which is likely not possible with the size of these log files. I am currently considering using Jackson or GSON, but I am unable to find any concrete examples or tutorials to get me started. Does anyone know of a good example that I could study or have any hints on how I go about solving this problem? Thanks you!

EDIT: My class definition.

public class Action {



public static class Response {

    private int _resultCode;
    private int _count = 0;

    public Response() {}

    public int getResultCode() { return _resultCode; }
    public int getCount() { return _count; }

    public void setResultCode(int rc) { _resultCode = rc; }
    public void setCount(int c) { _count = c; }

}

private List<Response> responses = new ArrayList<Response>();
private String _name;

// I've left out the getters/setters and helper functions that I will add in after.

}

If I am using Jackson, and want to eventually be able to serialize this object easily back into JSON, are there any suggestions with regards to how I define this class? At the moment I am creating another ArrayList of this Action type in my main() method using: List actions = new ArrayList(); Is using HashMaps or other alternatives a better option? Also, will it allow me to easily serialize it to JSON afterwards using Jackson?

解决方案

You can have a look at Genson library http://code.google.com/p/genson/, on the wiki page you will find some examples on how to use it. Since its first release it provides the streaming model and seems to be the fastest after Jackson, see the benchmarks.

If you want to do something really efficient and with a small memory foot print use directly the streaming api by instanciating a JsonReader and then use it to read the logged structure and increment your counters.

Otherwise you could use a Genson instance to parse your file directly to java objects, but in your case I don't think it is the right solution as it will require you to store all the objects in memory!

Here is a quick example by using directly the streaming api. It will not print exactly the structure you are expecting as it requires more code to count efficiently with your structure :

public static void main(String[] args) throws IOException, TransformationException {
    Map<String, Map<String, Integer>> actions = new HashMap<String, Map<String, Integer>>();
    Genson genson = new Genson();

    ObjectReader reader = genson.createReader(new FileReader("path/to/the/file"));
    while(reader.hasNext()) {
        reader.next();
        reader.beginObject();
        String action = readUntil("action", reader);
        // assuming the next name/value pair is responsecode
        reader.next();
        String responseCode = reader.valueAsString();
        Map<String, Integer> countMap = actions.get(action);
        if (countMap == null) {
            countMap = new HashMap<String, Integer>();
            actions.put(action, countMap);
        }

        Integer count = countMap.get(responseCode);
        if (count == null) {
            count = 0;
        }
        count++;
        countMap.put(responseCode, count);

        reader.endObject();
    }

    // for example if you had 2 different response codes for same action it will print
    // {"Some_Action":{"1001":1,"1000":1}}
    String json = genson.serialize(actions);
}

static String readUntil(String name, ObjectReader reader) throws IOException {
    while(reader.hasNext()) {
        reader.next();
        if (name.equals(reader.name())) {
            return reader.valueAsString();
        }
    }
    throw new IllegalStateException();
}

这篇关于如何使用Java中的Streaming API解析JSON日志文件,然后输出制表日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆