使用 pandas 在for循环中读取csv [英] read csv in a for loop using pandas

查看:144
本文介绍了使用 pandas 在for循环中读取csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

inp_file=os.getcwd() 
files_comp = pd.read_csv(inp_file,"B00234*.csv", na_values = missing_values, nrows=10)

for f in files_comp:

    df_calculated = pd.read_csv(f, na_values = missing_values, nrows=10)
    col_length=len(df.columns)-1

您好,我如何在一个循环中读取4个csv文件.在读取上述格式的CSV时出现错误.请帮助我

Hi folks, How can I read 4 csv files in a for a loop. I am getting an error while reading the CSV in above format. Kindly help me

推荐答案

您基本上需要这样做:

  1. 获取所有目标文件的列表. files = os.listdir(path),然后仅保留以模式开头并以 .csv 结尾的文件名.您也可以使用正则表达式进行改进(通过导入 re 库以提高其复杂性,或使用 glob.glob ).
  1. Get a list of all target files. files=os.listdir(path) and then keep only the filenames that start with your pattern and end with .csv. You could also improve it using regular expression (by importing re library for more sophistication, or use glob.glob).

filesnames = os.listdir(path)
filesnames = [f for f in filesnames if (f.startswith("B00234") and f.lower().endswith(".csv"))]

  1. 使用for循环读取文件:

dfs = list()
for filename in filesnames:
     df = pd.read_csv(filename)
     dfs.append(df)

完整示例

我们将首先制作一些伪数据,然后将其保存到某些 .csv .txt 文件中.这些 .csv 文件中的某些文件将以"B00234" 开头,而其他一些文件则不会.我们将把虚拟数据写入这些文件.然后有选择地仅将 .csv 文件读入数据帧列表 dfs .

Complete Example

We will first make some dummy data and then save that to some .csv and .txt files. Some of these .csv files will begin with "B00234" and some other would not. We will write the dumy data to these files. And then selectively only read in the .csv files into a list of dataframes, dfs.

import pandas as pd
from IPython.display import display

# Define Temporary Output Folder
path = './temp_output'

# Clean Temporary Output Folder
import shutil
reset = True
if os.path.exists(path) and reset:
    shutil.rmtree(path, ignore_errors=True)

# Create Content
df0 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])

display(df0)

# Make Path
import os
if not os.path.exists(path):
    os.makedirs(path)
else:
    print('Path Exists: {}'.format(path))

# Make Filenames
filenames = list()
for i in range(10):
    if i<5:
        # Create Files starting with "B00234"
        filenames.append("B00234_{}.csv".format(i))
        filenames.append("B00234_{}.txt".format(i))
    else:
        # Create Files starting with "B00678"
        filenames.append("B00678_{}.csv".format(i))
        filenames.append("B00678_{}.txt".format(i))

# Create files
# Make files with extensions: .csv and .txt
#            and file names starting 
#            with and without: "B00234"
for filename in filenames:
    fpath = path + '/' + filename
    if filename.lower().endswith(".csv"):
        df0.to_csv(fpath, index=False)
    else:
        with open(fpath, 'w') as f:
            f.write(df0.to_string())

# Get list of target files
files = os.listdir(path)
files = [f for f in files if (f.startswith("B00234") and f.lower().endswith(".csv"))]
print('\nList of target files: \n\t{}\n'.format(files))

# Read each csv file into a dataframe
dfs = list() # a list of dataframes
for csvfile in files:
    fpath = path + '/' + csvfile
    print("Reading file: {}".format(csvfile))
    df = pd.read_csv(fpath)
    dfs.append(df)

列表 dfs 应该包含五个元素,每个元素都是从文件中读取的数据帧.

The list dfs should have five elements, where each is dataframe read from the files.

输出量:

    a   b   c
0   1   2   3
1   4   5   6
2   7   8   9

List of target files: 
    ['B00234_3.csv', 'B00234_4.csv', 'B00234_0.csv', 'B00234_2.csv', 'B00234_1.csv']

Reading file: B00234_3.csv
Reading file: B00234_4.csv
Reading file: B00234_0.csv
Reading file: B00234_2.csv
Reading file: B00234_1.csv

这篇关于使用 pandas 在for循环中读取csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆