如果值匹配,则自动比较两个csv文件的值的过程将第二个csv读取到DataFrame中 [英] Automate the process of comparing the values of 2 csv files if value matches read the second csv into the DataFrame
问题描述
我已将Excel导入数据框.看起来像这样:
I have imported an excel into a dataframe. It looks like this:
然后,我使用代码按照"tx_id"对这些数据进行分组,并创建名为tx_id的单独的csv,从而为我提供了这样的数据(3e6737ae-c3af-4d19-a645-d17fc73dbb7c.csv).这是代码:
Then I used a code to groupby this data as per 'tx_id' and create separate csv with the name of tx_id which gives me data like this (3e6737ae-c3af-4d19-a645-d17fc73dbb7c.csv). This is the code:
for i, g in dframe.groupby('tx_id'):
g.to_csv('{}.csv'.format(i.split('/')[0]), index=False)
然后我创建了一个仅包含tx_id的单独dframe,然后使用以下代码删除了重复项:
Then I created a separate dframe containing only the tx_id and then dropped the duplicates using this code:
dframe1 = dframe1.drop_duplicates()
现在我的数据框看起来像这样:
Now my dataframe looks like this:
我已将此数据帧转换为csv.现在,我想将csv文件的名称(tx_id值)与新创建的csv中存在的数据进行比较,如果名称匹配,我想将csv文件(tx_id值)读入数据框.我以前是手动导入这些csv文件的,但是我有一个很大的数据集,因此每次读取数据并对其进行进一步处理对我来说都是不可行的.现在,我正在做的是将csv文件分别导入到数据框中.我正在使用此代码:
I have converted this dataframe into csv. Now I want to compare the names of the csv file(which is the tx_id value) with the data present in the newly created csv and if the names match , I would like to read the csv file(which is the tx_id value) into the dataframe. I used to import these csv files manually but I have a large dataset , it's not feasible for me to read the data each time and do further process on it. Right now what I am doing is importing the csv files individually into a dataframe. I am using this code:
df = pd.read_csv(' ae229a81-bb33-4cf1-ba2f-360fffb0d94b.csv')
这给了我这样的结果:
然后我曾经通过使用以下代码来拆开它并应用value_counts:
Then I used to unstack it and apply value_counts by using this code:
df1 = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)
最终的结果看起来像这样:
And the end result used to look like this:
我想使此过程自动化,但我不知道如何做.你们可以帮我吗?
I want to automate this process and I don't know how. Can you guys help me?
推荐答案
您可以迭代tx_id
并将数据帧附加到list
:
You can iterate your tx_id
and append the dataframes to list
:
import pandas as pd
dfs = []
for tx in dframe1['tx_id']:
dfs.append(pd.read_csv('%s.csv' % tx))
这仅在与csv文件位于同一目录中执行时才有效.否则:
This only works if it's executed in the same directory as the csv files. Otherwise:
import os
import pandas
dfs = []
for tx in dframe1['tx_id']:
dfs.append(pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx)))
已编辑
要应用某些功能,而不是直接附加数据框:
Instead of appending dataframe directly, if you want to apply some functions:
for tx in dframe1['tx_id']:
df = pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx))
dfs.append(df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0))
现在,您的dfs
具有所有value_counts()
结果.您可以使用索引来引用它们.
Now your dfs
has all the value_counts()
results. You can refer them using indices.
如果要使用文件名查找它们,请使用dict
:
If you want to find them using the filename, use dict
:
df_dict = dict()
for tx in dframe1['tx_id']:
df = pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx))
df_dict[tx] = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)
这篇关于如果值匹配,则自动比较两个csv文件的值的过程将第二个csv读取到DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!