如何从几个文本文件中获取元组列表? [英] How to get a list of tuples from several text files?
问题描述
我想访问46个子目录中的.txt文件,并提取每个文件文本中的0和1.到目前为止,我已经编写了以下代码:
I want to access .txt files in 46 subdirectories and extract the number of 0s and 1s in the text of each file. So far I've written this code:
from pathlib import Path
def count_0s(paths):
for p in paths:
list_zeros = []
list_ones = []
for line in p.read_text().splitlines():
zeros = 0
zeros += line.count('0')
ones = 0
ones += line.count('1')
list_zeros.append(zeros)
list_ones.append(ones)
return list_zeros, list_ones
path = "/content/drive/MyDrive/data/classes/"
paths = Path(path).glob("*/marked*.txt")
n_zeros=count_0s(paths)
n_zeros
我想以2个列表的形式返回函数(一个带有0的数目,另一个带有1的数目)以在Pandas数据框中使用.对不起,如果重复的问题.
I want to get the function return in the form of 2 lists (one with the number of 0s and the other with the number of 1s) to use in a Pandas dataframe. Sorry if the questions are duplicated.
推荐答案
函数中存在一些错误:
- 您添加了一些不必要的方括号(
splitlines()
已返回列表) - 您不需要遍历字符,而是遍历行
这是一个更正的功能:
def count_0s(paths):
zeros_list = []
ones_list = []
for p in paths:
zeros = 0
ones = 0
for line in p.read_text().splitlines():
for c in line:
if c == '0':
zeros += 1
else:
ones += 1
zeros_list.append(zeros)
ones_list.append(ones)
return zeros_list, ones_list
请注意,这可能是计算0和1的效率很低的方法.例如,仅使用 line.count('0')
而不是for循环可以使速度提高一倍之10.
Be aware that this is probably a very inefficient way of counting 0 and 1. For example just using line.count('0')
instead of a for loop can increase the speed by a factor of 10.
这篇关于如何从几个文本文件中获取元组列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!