从日志文件中提取Java错误堆栈 [英] Pull out Java error stacks from log files

查看：126 发布时间：2018/5/28 19:42:48 java python xml unix grep

本文介绍了从日志文件中提取Java错误堆栈的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个Java应用程序，当出错时，为每个错误写入类似于下面的错误堆栈。

  <错误> 
< Error ErrorCode =CodeErrorDescription =DescriptionErrorInfo =ErrorId =ID> 
< Attribute Name =ErrorCodeValue =Code/> 
< Attribute Name =ErrorDescriptionValue =Description/> 
< Attribute Name =KeyValue =Key/> 
< Attribute Name =NumberValue =Number/> 
< Attribute Name =ErrorIdValue =ID/> 
< Attribute Name =UserIdValue =User/> 
< Attribute Name =ProgIdValue =Prog/> 
< Stack>典型Java堆栈< / Stack> 
< / Error> 
<错误> 
与上述
相似的信息< / Error> 
< /错误>

我编写了一个Java日志解析器来查看日志文件并收集有关此类错误的信息，确实有效，但速度慢且效率低下，特别是对于数百兆字节的日志文件。我只是基本上使用字符串操作来检测开始/结束标记的位置并将它们整理出来。

有没有办法（通过Unix grep，Python或Java）有效地提取错误并计算每个错误发生的次数？整个日志文件不是XML，所以我不能使用XML解析器或Xpath。我遇到的另一个问题是，有时一个错误的结束可能会滚到另一个文件，所以当前文件可能没有如上所述的整个堆栈。

编辑1：

以下是我目前拥有的资料（相关部分仅用于节省空间）。

 <$ c （文件f：allFiles）{
 System.out.println（Parsing：+ f.getAbsolutePath（））;解析文件
 
 BufferedReader br = new BufferedReader（new FileReader（f））; 
 String line =; 
字符串fullErrorStack =; $（）
 while（（line = br.readLine（））！= null）{
 if（line.contains（< Errors>））{
 fullErrorStack = line; 
 while（！line.contains（< / Errors>）））{
 line = br.readLine（）; 
尝试{
 fullErrorStack = fullErrorStack + line.trim（）+; 
} catch（NullPointerException e）{
 //文件结束，但错误堆栈结束位于另一个文件中。 
 fullErrorStack = fullErrorStack +< / Stack>< / Error>< / Errors>; 
休息; 
 
 
字符串errorCode = fullErrorStack.substring（fullErrorStack.indexOf（ErrorCode = \）+ErrorCode = \。length（），fullErrorStack.indexOf（ \，fullErrorStack.indexOf（ErrorCode = \）））; 
 String errorDescription = fullErrorStack.substring（fullErrorStack.indexOf（ErrorDescription = \）+ErrorDescription = \。length（），fullErrorStack.indexOf（\，fullErrorStack.indexOf ErrorDescription中= \）））; 
字符串errorStack = fullErrorStack.substring（fullErrorStack.indexOf（< Stack>）+< Stack> ;. length（），fullErrorStack.indexOf（< / Stack>，fullErrorStack.indexOf <堆栈> 中）））; 
 apiErrors.add（f.getAbsolutePath（）+ splitter + errorCode +：+ errorDescription + splitter + errorStack.trim（））; 
 fullErrorStack =; 
} 
} 
} 
 
 
 Set< String> uniqueApiErrors = new HashSet< String>（apiErrors）; 
 for（String uniqueApiError：uniqueApiErrors）{
 apiErrorsUnique.add（uniqueApiError + splitter + Collections.frequency（apiErrors，uniqueApiError））; 
} 
 Collections.sort（apiErrorsUnique）;

编辑2：

遗忘遗忘提到所需的输出。类似于下面的内容将是理想的。

Count，ErrorCode，ErrorDescription，它出现的文件列表（如果可能的话）

解决方案
鉴于您的更新问题：

$ cat tst.awk BEGIN {OFS =，} match（$ 0，/ \s + *< Error ErrorCode =（[^] +）ErrorDescription =（[^] +）。* /，a）{ code = a [1] desc [code] = a [2] count [code] ++ files [code] [FILENAME] 打印Count，ErrorCode，ErrorDescription，出现文件列表（desc中的代码）{ fnames = for（fname in files [code]）{ fnames =（fnames？fnames：）fname } print count [code]，代码，desc [code]，fnames } } $ $ awk -f tst.awk文件 Count，ErrorCode，ErrorDescription，它发生的文件列表in 1，Code，Description，file

它仍然需要s gawk 4. *为第三个arg匹配（）和二维数组，但在任何awk中都很容易解决。

gawk版本：

$ cat tst.awk BEGIN {OFS =，} / [ [：space：]] + *< Error / { split（，n2v） while（match（$ 0，/ [^ [：space：]] + =[^] + /））{ name = value = substr（$ 0，RSTART，RLENGTH） sub（/=.*/，，name） sub（/ ^ [^ =] + =/，，value） $ 0 = substr（$ 0，RSTART + RLENGTH） 2v [name] = value } code = n2v [ErrorCode ] desc [code] = n2v [ErrorDescription] count [code] ++ if（！seen [code，FILENAME] ++）{ fnames [code ] =（代码在fnames？ fnames [code]：）FILENAME } } END { printCount，ErrorCode，ErrorDescription， in for（code in desc）{ print count [code]，code，desc [code]，fnames [code] } } $ $ awk -f tst.awk文件计数，ErrorCode，ErrorDescription，文件列表出现在中1，代码，描述，文件
上面可以做各种各样的方法，有些简单，但是当输入包含name = value对时，我喜欢创建一个name2value数组（ n2v [] 是我通常给它的名字），所以我可以通过他们的名字访问这些值。使代码易于理解和修改将来添加字段等。

这是我以前的回答，因为有一些东西在其他情况下，你会发现有用的：

你不会说你想要输出看起来像什么，你的发布示例输入不足以满足测试并显示有用的输出，但是这个GNU awk脚本展示了如何计算你喜欢的任何属性名称/值对的数量：

$ cat tst.awk match（$ 0，/ \s + *< Attribute Name =（[^] +）Value =（[^] +）。* /， a）{count [a [1]] [a [2]] ++} END { print\\\ 如果您只想查看所有错误代码的计数： name =ErrorCode for（count [name]中的值）{ print name，value，count [name] [value] } print \\\ 或者如果您关心的几个特定属性： split（ErrorId ErrorCode，names，/ /） for（i = 1; i in n ames; i ++）{ name = names [i] for（value in count [name]）{ print name，value，count [name] [value] } } print\\\ 或者如果你想查看所有属性的所有值的计数： for（name in count）{ for（值为count [name]）{ print name，value，count [name] [value] } } }
$ b $ p $。文件

如果您只想查看所有错误代码的计数：
ErrorCode代码1

或者如果您关心的几个特定属性：
ErrorId ID 1
ErrorCode Code 1

或者如果您想查看所有属性的所有值的计数：
ErrorId ID 1
ErrorDescription说明1
ErrorCode Code 1
Number Number 1
ProgId Prog 1
UserId User 1
Key Key 1

如果您有数据传播multipl e文件，上面的内容并不在意，只需在命令行中列出它们即可：
gawk -f tst.awk file1 file2 file3 ...
它对真正的多维数组使用GNU awk 4. *，但如果需要的话，可以使用任何其他awk的简单解决方法。

在目录下递归找到的文件上运行awk命令的一种方法：

awk -f tst.awk $（find dir -type f -print）

I have a Java application that, when erroring out, writes an error stack similar to the below for each error.
<Errors> <Error ErrorCode="Code" ErrorDescription="Description" ErrorInfo="" ErrorId="ID"> <Attribute Name="ErrorCode" Value="Code"/> <Attribute Name="ErrorDescription" Value="Description"/> <Attribute Name="Key" Value="Key"/> <Attribute Name="Number" Value="Number"/> <Attribute Name="ErrorId" Value="ID"/> <Attribute Name="UserId" Value="User"/> <Attribute Name="ProgId" Value="Prog"/> <Stack>typical Java stack</Stack> </Error> <Error> Similar info to the above </Error> </Errors>
I wrote a Java log parser to go through the log files and gather information about such errors and while it does work, it is slow and inefficient, especially for log files in the hundreds of megabytes. I just basically use string manipulation to detect where the start/end tags are and tally them up.

Is there a way (either via Unix grep, Python, or Java) to efficiently extract the errors and get a count of the number of times each one happens? The entire log file is not XML so I cannot use an XML parser or Xpath. Another problem I am facing is that sometimes the end of an error might roll into another file so the current file might not have the entire stack as above.

EDIT 1:

Here is what I currently have (relevant portions only to save space).
//Parse files for (File f : allFiles) { System.out.println("Parsing: " + f.getAbsolutePath()); BufferedReader br = new BufferedReader(new FileReader(f)); String line = ""; String fullErrorStack = ""; while ((line = br.readLine()) != null) { if (line.contains("<Errors>")) { fullErrorStack = line; while (!line.contains("</Errors>")) { line = br.readLine(); try { fullErrorStack = fullErrorStack + line.trim() + " "; } catch (NullPointerException e) { //End of file but end of error stack is in another file. fullErrorStack = fullErrorStack + "</Stack></Error></Errors> "; break; } } String errorCode = fullErrorStack.substring(fullErrorStack.indexOf("ErrorCode=\"") + "ErrorCode=\"".length(), fullErrorStack.indexOf("\" ", fullErrorStack.indexOf("ErrorCode=\""))); String errorDescription = fullErrorStack.substring(fullErrorStack.indexOf("ErrorDescription=\"") + "ErrorDescription=\"".length(), fullErrorStack.indexOf("\" ", fullErrorStack.indexOf("ErrorDescription=\""))); String errorStack = fullErrorStack.substring(fullErrorStack.indexOf("<Stack>") + "<Stack>".length(), fullErrorStack.indexOf("</Stack>", fullErrorStack.indexOf("<Stack>"))); apiErrors.add(f.getAbsolutePath() + splitter + errorCode + ": " + errorDescription + splitter + errorStack.trim()); fullErrorStack = ""; } } } Set<String> uniqueApiErrors = new HashSet<String>(apiErrors); for (String uniqueApiError : uniqueApiErrors) { apiErrorsUnique.add(uniqueApiError + splitter + Collections.frequency(apiErrors, uniqueApiError)); } Collections.sort(apiErrorsUnique);
EDIT 2:

Sorry for forgetting to mention the desired output. Something like the below would be ideal.

Count, ErrorCode, ErrorDescription, List of files it occurs in (if possible)
解决方案
Given your updated question:
$ cat tst.awk BEGIN{ OFS="," } match($0,/\s+*<Error ErrorCode="([^"]+)" ErrorDescription="([^"]+)".*/,a) { code = a[1] desc[code] = a[2] count[code]++ files[code][FILENAME] } END { print "Count", "ErrorCode", "ErrorDescription", "List of files it occurs in" for (code in desc) { fnames = "" for (fname in files[code]) { fnames = (fnames ? fnames " " : "") fname } print count[code], code, desc[code], fnames } } $ $ awk -f tst.awk file Count,ErrorCode,ErrorDescription,List of files it occurs in 1,Code,Description,file

It still requires gawk 4.* for the 3rd arg to match() and 2D arrays but again that's easily worked around in any awk.

Per request in the comments here's a non-gawk version:
$ cat tst.awk BEGIN{ OFS="," } /[[:space:]]+*<Error / { split("",n2v) while ( match($0,/[^[:space:]]+="[^"]+/) ) { name = value = substr($0,RSTART,RLENGTH) sub(/=.*/,"",name) sub(/^[^=]+="/,"",value) $0 = substr($0,RSTART+RLENGTH) n2v[name] = value } code = n2v["ErrorCode"] desc[code] = n2v["ErrorDescription"] count[code]++ if (!seen[code,FILENAME]++) { fnames[code] = (code in fnames ? fnames[code] " " : "") FILENAME } } END { print "Count", "ErrorCode", "ErrorDescription", "List of files it occurs in" for (code in desc) { print count[code], code, desc[code], fnames[code] } } $ $ awk -f tst.awk file Count,ErrorCode,ErrorDescription,List of files it occurs in 1,Code,Description,file
There's various ways the above could be done, some briefer, but when input contains name=value pairs I like to create a name2value array (n2v[] is the name I usually give it) so I can access the values by their names. Makes the code easy to understand and modify in future to add fields, etc.

Here's my previous answer as there's some things in it you'll find usefule in other situations:

You don't say what you want the output to look like and your posted sample input isn't really adequate to test against and show useful output, but this GNU awk script shows the way to get a count of whatever attribute name/value pairs you like:
$ cat tst.awk match($0,/\s+*<Attribute Name="([^"]+)" Value="([^"]+)".*/,a) { count[a[1]][a[2]]++ } END { print "\nIf you just want to see the count of all error codes:" name = "ErrorCode" for (value in count[name]) { print name, value, count[name][value] } print "\nOr if theres a few specific attributes you care about:" split("ErrorId ErrorCode",names,/ /) for (i=1; i in names; i++) { name = names[i] for (value in count[name]) { print name, value, count[name][value] } } print "\nOr if you want to see the count of all values for all attributes:" for (name in count) { for (value in count[name]) { print name, value, count[name][value] } } }
.
$ gawk -f tst.awk file If you just want to see the count of all error codes: ErrorCode Code 1 Or if theres a few specific attributes you care about: ErrorId ID 1 ErrorCode Code 1 Or if you want to see the count of all values for all attributes: ErrorId ID 1 ErrorDescription Description 1 ErrorCode Code 1 Number Number 1 ProgId Prog 1 UserId User 1 Key Key 1
If you have data spread across multiple files, the above couldn't care less, just list them all on the command line:
gawk -f tst.awk file1 file2 file3 ...
It uses GNU awk 4.* for true multi-dimensional arrays, but there's trivial workarounds for any other awk if needed.

One way to run an awk command on files found recursively under a directory:
awk -f tst.awk $(find dir -type f -print)

这篇关于从日志文件中提取Java错误堆栈的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从日志文件中提取Java错误堆栈 [英] Pull out Java error stacks from log files

问题描述

**它仍然需要s gawk 4. *为第三个arg匹配（）和二维数组，但在任何awk中都很容易解决。**

It still requires gawk 4.* for the 3rd arg to match() and 2D arrays but again that's easily worked around in any awk.

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

从日志文件中提取Java错误堆栈 [英] Pull out Java error stacks from log files

问题描述

它仍然需要s gawk 4. *为第三个arg匹配（）和二维数组，但在任何awk中都很容易解决。

It still requires gawk 4.* for the 3rd arg to match() and 2D arrays but again that's easily worked around in any awk.

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

**它仍然需要s gawk 4. *为第三个arg匹配（）和二维数组，但在任何awk中都很容易解决。**

登录关闭