找到另一个文件中每行的计数 [英] to find count of each line in another file

查看:139
本文介绍了找到另一个文件中每行的计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

How can i get now of times a particular line of one file present in another file 

我有两个文件rule.txt和full.txt。我想检查在total.txt.please中的rule.txt中的每一行的计数帮助我
在文件rule.txt包含

I have two files rule.txt and full.txt.I want to check count of each line in rule.txt in full.txt.please help me In file rule.txt contain

    NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
    VGF--->V_VM_VF

    The another file full.txt contains 1000 of such type of rules. i want to calculate count of each rule in the rule.txt and I want to get output as line with count.that count is needed for the calculation of probability of each rule.rule.txt contain cfg rules of each sentence

    fc= codecs.open('full.txt', encoding='utf-8') 
    with open('rule.txt', 'r') as fh:
        for line in fh.readlines():
          if(line in fc.readlines()):
                print line
                count=count+1
    print count

    I have this code .but this is not working..plz help me.I need to calculate the probabilty of each  rule in the rule.txt by checking in full.txt.for probability calculation ,i need count of each rule individually.Can you please help me to count the no of times a rule present in full.txt


推荐答案

我假设你的文件不是超级巨大,你有足够的内存:

I am assuming your file is not super huge, and you ve enough memory:

这里是file1:

NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
VGF--->V_VM_VF
KGF--->V_VM_VF P_NSF SSF
VGF--->V_VM_VF KLF NFG_JP

这里是file2: / p>

Here is the file2 :

NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
VGF--->V_VM_VF
VGF--->V_VM_VF
VGF--->V_VM_VF
KGF--->V_VM_VF P_NSF SSF
KGF--->V_VM_VF P_NSF SSF
VGF--->V_VM_VF
VGF--->V_VM_VF
KGF--->V_VM_VF P_NSF SSF
KGF--->V_VM_VF P_NSF SSF
VGF--->V_VM_VF KLF NFG_JP
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
VGF--->V_VM_VF
VGF--->V_VM_VF KLF NFG_JP
VGF--->V_VM_VF KLF NFG_JP
VGF--->V_VM_VF
VGF--->V_VM_VF KLF NFG_JP
VGF--->V_VM_VF KLF NFG_JP
VGF--->V_VM_VF KLF NFG_JP
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU

以下是代码:

  #!/usr/bin/python

  import pprint

  lines1 = set()
  with open('txt1', 'r') as f1:
      lines1 = set([x.strip() for x in f1.readlines()])

  line_dict = dict()
  with open('txt2', 'r') as f2:
      for line in f2.readlines():
          line = line.strip()
          line_dict.setdefault(line, 0)
          line_dict[line] = line_dict.get(line, 0) + 1

  for line in lines1:
      print '%s : %d' % (line, line_dict.get(line, 0))

输出:

VGF--->V_VM_VF : 7
VGF--->V_VM_VF KLF NFG_JP : 6
KGF--->V_VM_VF P_NSF SSF : 4
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU : 8

这篇关于找到另一个文件中每行的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆