列上的操作多个文件Pandas [英] Operations on Columns multiple files Pandas

查看：223 发布时间：2017/2/24 21:54:21 python file csv pandas time-series

本文介绍了列上的操作多个文件Pandas的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在Python Pandas中执行一些算术运算，并将结果合并到一个文件中。

  Path_1：File_1.csv，File_2.csv，....

这个路径有几个文件，应该在时间间隔增加。

  File_1.csv | File_2.csv 
 Nos，12：00：00 | Nos：12：30：00 
 
 123,1451 485,5464 
 656,4544 456,4865 
 853,5484 658,4584 
 
 Path_2 ：Master_1.csv 
 
 Nos，00：00：00 
 123,2000 
 485,1500 
 656,1000 
 853,2500 
 456,4500 
 658,5000

我试图阅读 n 的数量 .csv c> col [1] 标题时间序列 col [last] master_1.csv 。

 
 
 如果 Master_1.csv 没有这个时间，它应该创建一个新列，其中包含 c> col  col  c> path_1 .csv  1]  
 
 
 如果  col ['Nos'] 时，从 path_1文件然后将 NAN 替换为与 col ['Nos'] 相减的值。
 
 
  ie 
 
 
  Master_1.csv中的预期输出
  Nos，00：00 ：00,12：00：00,12：30：00，
 123,2000,549，NAN，
 485,1500，NAN，3964，
 656,1000,3544，NAN 
 853,2500,2984，NAN 
 456,4500，NAN，365 
 658,5000，NAN，-416 
  
我可以理解算术计算，但是我不能在 Nos 和 timeseries 我试图把一些代码在一起，并试图解决循环。在这方面需要帮助。感谢
  import pandas as pd 
 import numpy as np 
 
 path_1 ='/ 
 path_2 ='/'
 
 df_1 = pd.read_csv（os.path_1（'/.* csv'），Index = None，columns = ['NO'，'timeseries'] #times系列在每个文件中都不同，例如：12:00，12:30，17:30等
 df_2 = pd.read_csv（'master_1.csv'，Index = None，columns = ['Nos' 00:00:00']）＃00：00：00时间系列
 
用于df_1和df_2中的号码：
 df_1 ['Nos'] = df_2 ['Nos'] 
 new_tseries = df_2 ['00：00：00']  -  df_1 ['timeseries'] 
 
 merged.concat（'master_1.csv'，Index = None，columns = ['Nos' '00：00'，'new_tseries']，axis = 0）＃new_timeseries是每个.csv文件从path_1获得的动态时间序列
  
 
 
解决方案
您可以通过三个步骤进行：
 
  
 
将数据框合并在一起（相当于SQL左连接或Excel VLOOKUP 
 
 计算您的派生
 
 
 以下是您可以尝试的一些代码：
  #read dataframes into a list 
 import glob 
 L = [] 
在glob.glob中的fname（path_1 +'*。csv'）：
 L.append（df.read_csv（fname））
 
 #read主数据帧，并在其他数据框架中合并
 df_2 = pd.read_csv（'master_1.csv'）
 for df in L：
 df_2 = pd.merge（df_2，df，on ='Nos'，how ='left'）
 
每列的计算差异主列
 df_2.apply（lambda x：x  -  df_2 ['00：00：00']）
  
 
I am trying to perform a some arithmetic operations in Python Pandas and merge the result in one of the file. 
Path_1: File_1.csv, File_2.csv, ....
This path has several file which are supposed to be increasing in time intervals. with the following columns 
    File_1.csv    |  File_2.csv
    Nos,12:00:00  |  Nos,12:30:00

    123,1451         485,5464
    656,4544         456,4865
    853,5484         658,4584

Path_2: Master_1.csv

Nos,00:00:00
123,2000
485,1500
656,1000
853,2500
456,4500
658,5000
I am trying to read the n number of .csv files from Path_1 and compare the col[1] header timeseries with col[last] timeseries of Master_1.csv. 

If Master_1.csv does not have that time it should create a new column with timeseries from path_1 .csv files and update the values with respect col['Nos'] while subtracting them from col[1] of Master_1.csv.

If the col with time from path_1 file is present then look for col['Nos'] and then replace the NAN with the subtracted values respect to that col['Nos'].

i.e. 

Expected Output in Master_1.csv 
Nos,00:00:00,12:00:00,12:30:00,
    123,2000,549,NAN,
    485,1500,NAN,3964,
    656,1000,3544,NAN
    853,2500,2984,NAN
    456,4500,NAN,365
    658,5000,NAN,-416
I can understand the arithmetic calculations but I am not able to loop in with respect to Nos and timeseries I have tried to put some code together and trying to work around looping. Need help in that context. Thanks 
import pandas as pd 
import numpy as np

path_1 = '/'
path_2 = '/'

df_1 = pd.read_csv(os.path_1('/.*csv'), Index=None, columns=['Nos', 'timeseries'] #times series is different in every file eg: 12:00, 12:30, 17:30 etc
df_2 = pd.read_csv('master_1.csv', Index=None, columns=['Nos', '00:00:00']) #00:00:00 time series

for Nos in df_1 and df_2:
    df_1['Nos'] = df_2['Nos']
    new_tseries = df_2['00:00:00'] - df_1['timeseries']

merged.concat('master_1.csv', Index=None, columns=['Nos', '00:00:00', 'new_tseries'], axis=0) # new_timeseries is the dynamic time series that every .csv file will have from path_1

 解决方案 
You can do it in three steps

Read your csv's in to a list of dataframes
Merge the dataframes together (equivalent to a SQL left join or an Excel VLOOKUP
Calculate your derived columns using a vectorized subtraction.
Here's some code you could try:
#read dataframes into a list
import glob
L = []
for fname in glob.glob(path_1+'*.csv'):
   L.append(df.read_csv(fname))

#read master dataframe, and merge in other dataframes
df_2 = pd.read_csv('master_1.csv')
for df in L:
   df_2 = pd.merge(df_2,df, on = 'Nos', how = 'left')

#for each column, caluculate the difference with the master column
df_2.apply(lambda x: x - df_2['00:00:00'])


                        
这篇关于列上的操作多个文件Pandas的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

列上的操作多个文件Pandas [英] Operations on Columns multiple files Pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

列上的操作多个文件Pandas [英] Operations on Columns multiple files Pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭