计算字符串的出现次数,不区分大小写的搜索 [英] count the occurence of a string, case insensitive search

查看:118
本文介绍了计算字符串的出现次数,不区分大小写的搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取文件并获取字符串出现的计数,而不考虑大小写(上/下)。但我的代码没有给出理想的结果。



为什么会这样? 另外如何使我的搜索不区分大小写

代码是:

  import  os,re 


fileName_path = input( 请输入带位置的文件名:
directory = os.path.dirname(fileName_path)
os.chdir(directory)

fileName = os.path.basename(fileName_path)
openFile = open(fileName, r

cnt = 0

openFile as readFile:
for searchpattern in readFile:
if ' tempCharSearch' searchpattern:
cnt + = 1

openFile.close()
print (cnt)





在文本文件中有14个tempCharSearch,但结果只显示3,为什么会这样?

此处附带的文本文件:



  Lorem   Ipsum    简单 虚拟  text    tempCharSearch :='100-111-875'打印  排版 行业 Lorem   Ipsum     tempCharSearch:='100-111-875'行业的标准  dummy   text  永远  tempCharSearch:=' 100-111-875' 1500s     未知  printer  参加    galley     type    scrambled   it   to   make   a  类型 标本 

幸存 几个世纪 tempCharSearch:='100- 111-875' leap into electronic < span class =code-leadattribute>排版,剩余 基本上 不变 popularized in tempCharSearch:='100-111-875' 1960s with tempCharSearch:='100-111-875' release Letraset 包含 Lorem Ipsum 段落 更多 最近 桌面 发布 software like Aldus PageMaker 包括 版本 Lorem Ipsum

tempCharSearch:='100-111-875're 很多 < span class =code-leadattribute> variants
段落 Lorem Ipsum 可用但是 tempCharSearch:='100-111-875'多数 遭遇 更改 in some form by 注入 幽默 randomized words 外观 甚至 可信如果 正在 使用 段落 Lorem Ipsum 需要 确定 tempCharSearch:='100-111-875 '不是任何 令人尴尬 隐藏 tempCharSearch:='100-111-875' text 的e> middle 所有 tempCharSearch:='100-111-875' Lorem 生成器 tempCharSearch:='100-111-875' Internet tend to 重复 预定义 chunks as 必要制作 tempCharSearch:=' 100-111-875' first true generator tempCharSearch:='100-111-875 '互联网 使用 a over code-leadattribute> dictionary 200 拉丁语 合并 a 少数 model 句子 结构 生成 Lorem Ipsum whi ch 看起来 合理。 tempCharSearch:='100-111-875'生成 Lorem Ipsum tempCharSearch:='100-111-875'refore 始终 free 来自 重复注入 幽默 非特征 etc

解决方案

您的代码不计算文件中tempCharSearch的出现次数,而是计算出现模式的行数。由于您的输入文件似乎只有三行,每个人都包含g多次出现,结果为3.



使用Python的内置字符串计数方法计算一行中的所有事件:



 cnt + = searchpattern.count('  tempCharSearch'); 





如果你想比较不区分大小写,那么在运行之前将行字符串和搜索模式转换为小写计数,例如:



  >  readFile:
cnt + = line.lower()。count(' tempcharsearch');


I am trying to read a file and get the count of occurence of a string irrespective of case(upper/lower). But my code is not giving desired results.

Why is it so? Also how can I make my search case insensitive?
code is:

import os,re


fileName_path = input ("Please input the file name with location: ")
directory = os.path.dirname(fileName_path)
os.chdir(directory)

fileName = os.path.basename(fileName_path)
openFile = open(fileName ,"r")

cnt = 0

with openFile as readFile:
    for searchpattern in readFile:
        if 'tempCharSearch' in searchpattern:
            cnt += 1

openFile.close()
print (cnt)



In the text file there are 14 tempCharSearch, but the result is showing only 3, why is it so?
The text file attached here with:

Lorem Ipsum is simply dummy text of tempCharSearch:='100-111-875' printing and typesetting industry. Lorem Ipsum has been tempCharSearch:='100-111-875' industry's standard dummy text ever since tempCharSearch:='100-111-875' 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

It has survived not only five centuries, but also tempCharSearch:='100-111-875' leap into electronic typesetting, remaining essentially unchanged. It was popularised in tempCharSearch:='100-111-875' 1960s with tempCharSearch:='100-111-875' release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

tempCharSearch:='100-111-875're are many variations of passages of Lorem Ipsum available, but tempCharSearch:='100-111-875' majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure tempCharSearch:='100-111-875're isn't anything embarrassing hidden in tempCharSearch:='100-111-875' middle of text. All tempCharSearch:='100-111-875' Lorem Ipsum generators on tempCharSearch:='100-111-875' Internet tend to repeat predefined chunks as necessary, making this tempCharSearch:='100-111-875' first true generator on tempCharSearch:='100-111-875' Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model sentence structures, to generate Lorem Ipsum which looks reasonable. tempCharSearch:='100-111-875' generated Lorem Ipsum is tempCharSearch:='100-111-875'refore always free from repetition, injected humour, or non-characteristic words etc.

解决方案

Your code is not counting the number of occurrences of "tempCharSearch' in the file, but the number of lines, in which the pattern occurs. As your input file appears to have just three lines, each one containing multiple occurrences, your result is 3.

Use Python's built in string count method to count all occurrences in a line:

cnt += searchpattern.count ('tempCharSearch');



If you want to compare case insensitive then convert both the line string and your search pattern to lower-case before running the count, for example:

for line in readFile:
    cnt += line.lower().count ('tempcharsearch');


这篇关于计算字符串的出现次数,不区分大小写的搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆