计算文件中字符串模式的出现次数并计数 [英] count occurrences of a string pattern in a file and count

查看:53
本文介绍了计算文件中字符串模式的出现次数并计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

团队,

我正在尝试计算文件中的两个模式并将它们列为

I am trying to count two patterns in a file and list them as

pattern1: 2
pattern2: 3

#!/usr/bin/python
import os
import re

d = dict()
with open('/home/user/waste/nodes-prod.log', 'r') as file:
    for line in file:
        line = line.strip()
        for word in line.split():
            node1 = re.match(r"team1.*", word)
            type(node1)
            node2 = re.match(r"team2.*", word)
            type(node2)
            if node1 in d:
                d[node1] = d[node1] + 1
            else:
                d[node2] = d[node2] + 1
for key in list(d.keys()):
    print(key, ":", d[key]) 

我的/home/user/waste/nodes-prod.log 在下面

cat /home/user/waste/nodes-prod.log
team1-develop
team1-work
team2-research1
team2-research2
team2-research3

输出

Traceback (most recent call last):
  File "read-and-count-words-pattern-fromfile-using-dict-in-python.py", line 17, in <module>
    d[node2] = d[node2] + 1
KeyError: <_sre.SRE_Match object; span=(0, 10), match='team2-research1'>

预期:

node1: 2
node2: 3

推荐答案

#!/usr/bin/python
import os
import re

# dict is the dictionary,
# pattern is the regular expression,
# word is the word to match.
def increment(dict: dict, pattern: str, word: str):
    match = re.match(pattern, word)
    if match:
        # re.match returns a Match object, not a string.
        # .group(n) returns n-s capture. .group() returns
        # 0th capture, i.e. the whole match:
        node = match.group()
        # Initialise the counter, if necessary:
        if not node in dict:
            dict[node] = 0
        # Increment the counter:
        dict[node] += 1

# filename is a string that contains a path to file to parse,
# patterns is a dictionary of patterns to check against,
# the function returns a dictionary.
def scores(filename: str, patterns: dict) -> dict:
    # Initialise the dictionary that keeps counters:
    d = dict()
    with open(filename, 'r') as file:
        for line in file:
            line = line.strip()
            for word in line.split():
                # Check against all patterns:
                for pattern in patterns:
                    increment(d, pattern, word)
    return d

# Patterns to search for.
# It is claimed that Python caches the compiled
# regular expressions, so that we don't need
# to pre-compile them:
patterns = [r"team1.*", r"team2.*"]

# file to parse:
filename = '/home/user/waste/nodes-prod.log'

# This is how a dictionary is iterated, when both key and value are needed:
for key, value in scores(filename, patterns).items():
    print(key, ":", value)

  • def increment(dict: dict, pattern: str, word: str): 定义了一个接收字典的函数 dict, patternword 来检查 pattern.和一个匹配对象 match.参数是类型化的,这在 Python 中是可选的.
  • def score(filename: str, patterns: dict) ->dict: 定义了一个函数,它接收 filename 作为一个字符串,一个 patterns 字典并返回另一个匹配计数字典.
    • def increment(dict: dict, pattern: str, word: str): defines a function that receives a dictionary dict, pattern and the word to check against patern. and a Match object match. The parameters are typed, which is optional in Python.
    • def scores(filename: str, patterns: dict) -> dict: defines a function that receives filename as a string, a dictionary of patterns and returns another dictionary of match counts.
    • 这篇关于计算文件中字符串模式的出现次数并计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆