如果值匹配,则自动比较两个csv文件的值的过程将第二个csv读取到DataFrame中 [英] Automate the process of comparing the values of 2 csv files if value matches read the second csv into the DataFrame

查看:107
本文介绍了如果值匹配,则自动比较两个csv文件的值的过程将第二个csv读取到DataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将Excel导入数据框.看起来像这样:

I have imported an excel into a dataframe. It looks like this:

然后,我使用代码按照"tx_id"对这些数据进行分组,并创建名为tx_id的单独的csv,从而为我提供了这样的数据(3e6737ae-c3af-4d19-a645-d17fc73dbb7c.csv).这是代码:

Then I used a code to groupby this data as per 'tx_id' and create separate csv with the name of tx_id which gives me data like this (3e6737ae-c3af-4d19-a645-d17fc73dbb7c.csv). This is the code:

for i, g in dframe.groupby('tx_id'):
     g.to_csv('{}.csv'.format(i.split('/')[0]), index=False)

然后我创建了一个仅包含tx_id的单独dframe,然后使用以下代码删除了重复项:

Then I created a separate dframe containing only the tx_id and then dropped the duplicates using this code:

dframe1 = dframe1.drop_duplicates()

现在我的数据框看起来像这样:

Now my dataframe looks like this:

我已将此数据帧转换为csv.现在,我想将csv文件的名称(tx_id值)与新创建的csv中存在的数据进行比较,如果名称匹配,我想将csv文件(tx_id值)读入数据框.我以前是手动导入这些csv文件的,但是我有一个很大的数据集,因此每次读取数据并对其进行进一步处理对我来说都是不可行的.现在,我正在做的是将csv文件分别导入到数据框中.我正在使用此代码:

I have converted this dataframe into csv. Now I want to compare the names of the csv file(which is the tx_id value) with the data present in the newly created csv and if the names match , I would like to read the csv file(which is the tx_id value) into the dataframe. I used to import these csv files manually but I have a large dataset , it's not feasible for me to read the data each time and do further process on it. Right now what I am doing is importing the csv files individually into a dataframe. I am using this code:

df = pd.read_csv(' ae229a81-bb33-4cf1-ba2f-360fffb0d94b.csv')

这给了我这样的结果:

然后我曾经通过使用以下代码来拆开它并应用value_counts:

Then I used to unstack it and apply value_counts by using this code:

df1 = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)

最终的结果看起来像这样:

And the end result used to look like this:

我想使此过程自动化,但我不知道如何做.你们可以帮我吗?

I want to automate this process and I don't know how. Can you guys help me?

推荐答案

您可以迭代tx_id并将数据帧附加到list:

You can iterate your tx_id and append the dataframes to list:

import pandas as pd

dfs = []
for tx in dframe1['tx_id']:
    dfs.append(pd.read_csv('%s.csv' % tx))

这仅在与csv文件位于同一目录中执行时才有效.否则:

This only works if it's executed in the same directory as the csv files. Otherwise:

import os
import pandas

dfs = []

for tx in dframe1['tx_id']:
    dfs.append(pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx)))

已编辑

要应用某些功能,而不是直接附加数据框:

Instead of appending dataframe directly, if you want to apply some functions:

for tx in dframe1['tx_id']:
    df = pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx))
    dfs.append(df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0))

现在,您的dfs具有所有value_counts()结果.您可以使用索引来引用它们.

Now your dfs has all the value_counts() results. You can refer them using indices.

如果要使用文件名查找它们,请使用dict:

If you want to find them using the filename, use dict:

df_dict = dict()
for tx in dframe1['tx_id']:
    df = pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx))
    df_dict[tx] = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)

这篇关于如果值匹配,则自动比较两个csv文件的值的过程将第二个csv读取到DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆