按第一列排序文本文件,并重复计数python [英] Sort text file by first column and count repeats python

查看:665
本文介绍了按第一列排序文本文件,并重复计数python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件需要按第一列排序,并将所有重复与数据左侧的计数合并,然后将排序/计数的数据写入已创建的csv文件。



Ex文本文件:

 ,00.000.00.000,word,00 
,00.000.00.001,word,00
,00.000.00.002,word,00
,00.000.00.000,word,00
,00.000.00.002,word,00
, 00.000.00.000,word,00

结果:


$ b b

 ,3,00.000.00.000,word,00 
,1,00,00.00.001,字,00
,2,00,00.00.002, 00

我的代码:

  for open in(list.txt):
with open(ip.strip()+。txt,a)as ip_file:
for line in open(data.txt):
new_line = line.split()
如果在new_line中blocked:
如果src =+ ip.strip new_line:
ip_file.write(,+ new_line [11])$ ​​b $ b ip_file.write(,+ new_line [12])
ip_file.write 13])

用于os.listdir(sub_dir)中的ip_file:
with open(os.path.join(sub_dir,ip_file),a)as f:
data = f.readlines()
data.sort(key = lambda l:float(l.split()[0]),reverse = True)
TypeError:'str'对象不可调用或类似的东西。我不能使用 .split().read().strip()等,而不会得到错误。



问题:如何对文件内容进行排序并计算重复行(不定义函数)?



  sort -k1 | uniq -c | sed's / ^ /,/'>> test.csv 


解决方案

  D = {} 
for open('data.txt'):#使用字典计算和过滤重复行
如果k在D中:
D [k] + = 1 #increase k一个如果已经看到。
else:
D [k] = 1#初始化键,如果第一次看到的话。

for sk in sorted(D):#sort keys
print(',',D [sk],sk.rstrip(),file = open('test.csv', 'a'))#print一个逗号,然后是行数加上行。

#Output
,3,00.000.00.000,word,00
,1,00,00.00.001,word,00
,2,00,00.00.002,word ,00


I have a text file that needs to be sorted by the first column and merge all repeats with the count to the left of the data, and then write the sorted/counted data into an already created csv file.

Ex text file:

, 00.000.00.000, word, 00
, 00.000.00.001, word, 00
, 00.000.00.002, word, 00
, 00.000.00.000, word, 00
, 00.000.00.002, word, 00
, 00.000.00.000, word, 00

Ex result:

, 3, 00.000.00.000, word, 00
, 1, 00.000.00.001, word, 00
, 2, 00.000.00.002, word, 00

My code:

for ip in open("list.txt"):
    with open(ip.strip()+".txt", "a") as ip_file:
        for line in open("data.txt"):
            new_line = line.split(" ")
            if "blocked" in new_line:
                if "src="+ip.strip() in new_line:
                    ip_file.write(", " + new_line[11])
                    ip_file.write(", " + new_line[12])
                    ip_file.write(", " + new_line[13])

for ip_file in os.listdir(sub_dir):
        with open(os.path.join(sub_dir, ip_file), "a") as f:
            data = f.readlines()
            data.sort(key = lambda l: float(l.split()[0]), reverse = True)

Whenever I test the code, I get the error TypeError: 'str' object is not callable or something similar. I can't use .split() .read() .strip() etc without getting the error.

Question: How can I sort the files' contents and count repeating lines (without defining a function)?

I'm basically trying to:

sort -k1 | uniq -c | sed 's/^/,/' >> test.csv

解决方案

D = {}
for k in open('data.txt'): #use dictionary to count and filter duplicate lines
    if k in D:
        D[k] += 1 #increase k by one if already seen.
    else:
        D[k]  = 1 #initialize key with one if seen for first time.

for sk in sorted(D): #sort keys 
    print(',', D[sk], sk.rstrip(), file=open('test.csv', 'a')) #print a comma, followed by number of lines plus line.   

#Output
, 3, 00.000.00.000, word, 00
, 1, 00.000.00.001, word, 00
, 2, 00.000.00.002, word, 00    

这篇关于按第一列排序文本文件,并重复计数python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆