计算文档中字符串的唯一出现次数 [英] counting unique occurrences of string in document
问题描述
我正在将日志文件读入 java.对于日志文件中的每一行,我正在检查该行是否包含一个 IP 地址.如果该行包含一个 ip 地址,我想然后 +1 到该 ip 地址在日志文件中出现的次数的计数.我如何在 Java 中完成此操作?
I am reading a logfile into java. For each line in the logfile, I am checking to see if the line contains an ip address. If the line contains an ip address, I want to then +1 to the count of the number of times that ip address showed up in the log file. How can I accomplish this in Java?
下面的代码成功地从包含ip地址的每一行中提取了ip地址,但是计算ip地址出现次数的过程不起作用.
The code below successfully extracts the ip address from each line that contains an ip address, but the process for counting occurrences of ip addresses does not work.
void read(String fileName) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)));
int counter = 0;
ArrayList<IPHolder> ips = new ArrayList<IPHolder>();
try {
String line;
while ((line = br.readLine()) != null) {
if(!getIP(line).equals("0.0.0.0")){
if(ips.size()==0){
IPHolder newIP = new IPHolder();
newIP.setIp(getIP(line));
newIP.setCount(0);
ips.add(newIP);
}
for(int j=0;j<ips.size();j++){
if(ips.get(j).getIp().equals(getIP(line))){
ips.get(j).setCount(ips.get(j).getCount()+1);
}else{
IPHolder newIP = new IPHolder();
newIP.setIp(getIP(line));
newIP.setCount(0);
ips.add(newIP);
}
}
if(counter % 1000 == 0){System.out.println(counter+", "+ips.size());}
counter+=1;
}
}
} finally {br.close();}
for(int k=0;k<ips.size();k++){
System.out.println("ip, count: "+ips.get(k).getIp()+" , "+ips.get(k).getCount());
}
}
public String getIP(String ipString){//extracts an ip from a string if the string contains an ip
String IPADDRESS_PATTERN =
"(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)";
Pattern pattern = Pattern.compile(IPADDRESS_PATTERN);
Matcher matcher = pattern.matcher(ipString);
if (matcher.find()) {
return matcher.group();
}
else{
return "0.0.0.0";
}
}
持有者类是:
public class IPHolder {
private String ip;
private int count;
public String getIp(){return ip;}
public void setIp(String i){ip=i;}
public int getCount(){return count;}
public void setCount(int ct){count=ct;}
}
推荐答案
在这种情况下,要搜索的关键字是 HashMap.HashMap 是键值对的列表(在本例中为 ips 对及其计数).
The key word to search for is HashMap in this case. A HashMap is a list of key value pairs (in this case pairs of ips and their count).
"192.168.1.12" - 12
"192.168.1.13" - 17
"192.168.1.14" - 9
等等.使用和访问比总是遍历容器对象数组以找出该 IP 是否已经存在容器要容易得多.
and so on. It is much easier to use and access than to always iterate over your array of container objects to find out whether there already is a container for that ip or not.
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(/*Your file */)));
HashMap<String, Integer> occurrences = new HashMap<String, Integer>();
String line = null;
while( (line = br.readLine()) != null) {
// Iterate over lines and search for ip address patterns
String[] addressesFoundInLine = ...;
for(String ip: addressesFoundInLine ) {
// Did you already have that address in your file earlier? If yes, increase its counter by
if(occurrences.containsKey(ip))
occurrences.put(ip, occurrences.get(ip)+1);
// If not, create a new entry for this address
else
occurrences.put(ip, 1);
}
}
// TreeMaps are automatically orered if their elements implement 'Comparable' which is the case for strings and integers
TreeMap<Integer, ArrayList<String>> turnedAround = new TreeMap<Integer, ArrayList<String>>();
Set<Entry<String, Integer>> es = occurrences.entrySet();
// Switch keys and values of HashMap and create a new TreeMap (in case there are two ips with the same count, add them to a list)
for(Entry<String, Integer> en: es) {
if(turnedAround.containsKey(en.getValue()))
turnedAround.get(en.getValue()).add((String) en.getKey());
else {
ArrayList<String> ips = new ArrayList<String>();
ips.add(en.getKey());
turnedAround.put(en.getValue(), ips);
}
}
// Print out the values (if there are two ips with the same counts they are printed out without an special order, that would require another sorting step)
for(Entry<Integer, ArrayList<String>> entry: turnedAround.entrySet()) {
for(String s: entry.getValue())
System.out.println(s + " - " + entry.getKey());
}
就我而言,输出如下:
192.168.1.19 - 4
192.168.1.18 - 7
192.168.1.27 - 19
192.168.1.13 - 19
192.168.1.12 - 28
我回答了这个问题 大约半小时前,我想这正是您要搜索的内容,所以如果您需要一些示例代码,请查看它.
I answered this question about half an hour ago and I guess that is exactly what you are searching for, so if you need some example code, take a look at it.
这篇关于计算文档中字符串的唯一出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!