如何使用 pandas 从csv读取行,执行VLOOKUP操作并将结果保存到另一个文件中? [英] How to use pandas to read a line from a csv, proceed a VLOOKUP action and save the results into another file?
问题描述
从此问题,我发现了如何使用熊猫来进行VLOOKUP。
因此,按照jezrael的建议,我这样做:
From this question, I found how to use pandas to proceed VLOOKUPs. So, as suggested by jezrael, I did this:
df1 = pd.read_csv('df1.csv', names=['a','b'])
print (df1)
a b
0 Time 07:03:52
1 EmployeeID 98766
2 EmployeeName Joao
3 Time 08:03:52
4 EmployeeID 98765
5 EmployeeName Mary
#for columns names created from file2
df2 = pd.read_csv('df2.csv')
c = df2.columns.str.strip().tolist()
print (c)
['EmployeeID', 'EmployeeName', 'Time']
#or defined in list
#c = ['Time', 'EmployeeID', 'EmployeeName']
g = df1.groupby('a').cumcount()
df1 = df1.set_index([g,'a'])['b'].unstack().reindex(columns=c)
print (df1)
我知道了:
a EmployeeID EmployeeName Time
0 98766 Joao 07:03:52
1 98765 Mary 08:03:52
现在,我想了解一下d如何获取包含所有 df1
信息的行而不是2列的.csv文件,并将其转换为2列的临时文件,所以我可以阅读并继续使用VLOOKUP到 df3
。例如下面的示例:
Now, I would like to understand how I can get a .csv file that has all the information of df1
in lines, instead of 2 columns, and transform it in a temporary file of 2 columns, so I can read it and proceed with the VLOOKUP to df3
. As the example bellow:
现在,假设 df1.csv 和 df2.csv 具有其他值:
Now, imagine that df1.csv and df2.csv have other values:
df1.csv
Symbol, A;Goal,1.07;Range,0.72 - 1.07;Return over time,15.91%;;
Symbol, B;Goal,1.06;Range,0.5 - 1.32;Return over time,9.91%;Maturity,5;Total,13.555
df2.csv
Return_over_time,Maturity,Symbol,Goal,Range,Total
我希望获得与上述相同的结果,但获取信息从 df1
的每一行开始。因此我可以得到以下结果:
I would like to get the same result as above, but getting the information from each line of df1
. So I can have this result:
df3.csv
Symbol Return_over_time Goal Range Maturity Total
A 15.91% 1.07 0.72 - 1.07
B 9.91% 1.06 0.5 - 1.32 5 13.555
谢谢!
推荐答案
首先创建DataFrame- header = None
表示没有 csv
标头:
First create DataFrame - header=None
means no csv
header:
df1 = pd.read_csv('df1.csv', sep=';',header=None)
重塑为系列
,由 stack
和 分割
,由正则表达式'\s *,\s *
表示逗号之间的双零或多个空格:
Reshape to Series
by stack
and split
by regex '\s*,\s*
means double zero or more whitespaces between comma:
df1 = df1.stack().str.split('\s*,\s*', expand=True)
print (df1)
0 1
0 0 Symbol A
1 Goal 1.07
2 Range 0.72 - 1.07
3 Return over time 15.91%
1 0 Symbol B
1 Goal 1.06
2 Range 0.5 - 1.32
3 Return over time 9.91%
4 Maturity 5
通过 重置索引
并添加由
创建的新级别 set_index
,最后通过 取消堆叠
:
df1 = df1.reset_index(level=1, drop=True).set_index(0, append=True)[1].unstack()
print (df1)
0 Goal Maturity Range Return over time Symbol Total
0 1.07 None 0.72 - 1.07 15.91% A None
1 1.06 5 0.5 - 1.32 9.91% B 13.555
这篇关于如何使用 pandas 从csv读取行,执行VLOOKUP操作并将结果保存到另一个文件中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!