如何使用正则表达式计算文本文件中短语的所有出现次数? [英] How do I count all occurrences of a phrase in a text file using regular expressions?

查看:57
本文介绍了如何使用正则表达式计算文本文件中短语的所有出现次数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从一个目录中读取多个文件,并试图找出特定短语(在本例中为至少")在每个文件中出现的次数(不仅仅是它出现了,而是每个文本中出现了多少次文件发生)我的代码如下

I am reading in multiple files from a directory and attempting to find how many times a specific phrase (in this instance "at least") occurs in each file (not just that it occurs, but how many times in each text file it occurs) My code is as follows

import glob
import os

path = 'D:/Test'

k = 0

for filename in glob.glob(os.path.join(path, '*.txt')):
    if filename.endswith('.txt'):
        f = open(filename)
        data = f.read()
        data.split()
        data.lower()
        S = re.findall(r' at least ', data, re.MULTILINE)
        count = []
        if S == True:
         for S in data:
          count.append(data.count(S))
          k= k + 1
          print("'{}' match".format(filename), count)
        else:
         print("'{}' no match".format(filename))
print("Total number of matches", k)

此时我根本没有匹配项.我可以计算该短语是否出现,但不确定为什么我无法计算每个文本文件中出现的所有次数.

At this moment I get no matches at all. I can count whether or not there is an occurrence of the phrase but am not sure why I can't get a count of all occurrences in each text file.

任何帮助将不胜感激.

问候

推荐答案

你可以完全去掉正则表达式,字符串对象的计数方法就足够了,其他很多代码也可以简化.

You can get rid of the regex entirely, the count-method of string objects is enough, much of the other code can be simplified as well.

您也没有将数据更改为小写,只是将字符串打印为小写,请注意我如何使用 data = data.lower() 实际更改变量.

You're also not changing data to lower case, just printing the string as lower case, note how I use data = data.lower() to actually change the variable.

试试这个代码:

import glob
import os

path = 'c:\script\lab\Tests'

k = 0

substring = ' at least '
for filename in glob.glob(os.path.join(path, '*.txt')):
    if filename.endswith('.txt'):
        f = open(filename)
        data = f.read()
        data = data.lower()
        S= data.count(substring)
        if S:
            k= k + 1
            print("'{}' match".format(filename), S)
        else:
            print("'{}' no match".format(filename))
print("Total number of matches", k)

如有任何不清楚的地方,请随时提问!

If anything is unclear feel free to ask!

这篇关于如何使用正则表达式计算文本文件中短语的所有出现次数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆