如何从Python中等同于R的数据帧列表中选择特定的数据帧? [英] How to select a particular dataframe from a list of dataframes in Python equivalent to R?

查看:48
本文介绍了如何从Python中等同于R的数据帧列表中选择特定的数据帧?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个数据帧列表,我正试图用它选择一个特定的数据帧,如下所示:
x = listOfdf$df1$df2$df3
现在,努力寻找在Python中实现此目的的等效方法.例如,有关如何从Pandas Python的DataFrames列表中选择特定DataFrame的语法.

I have a list of dataframes in R, with which I'm trying to select a particular dataframe as follows:
x = listOfdf$df1$df2$df3
Now, trying hard to find an equivalent way to do so in Python. Like, the syntax on how a particular DataFrame be selected from a list of DataFrames in Pandas Python.

推荐答案

我看到您已经回答了自己的问题,那就是

I see you've already answered your own question, and that's cool. However, as jezrael hints in his comment, you should really consider using a dictionary. That might sound a bit scary coming from R (been there myself, now I prefer Python in most ways), but It will be worth your effort.

首先,字典是一种将值或变量映射到键(例如名称)的方法.您可以使用大括号{}来构建字典,并使用方括号[]来对其进行索引.

First of all, a dictionary is a way of mapping a value or variable to a key (like a name). You use curly brackets { } to build the dictionary, and use square brackets [ ] to index it.

假设您有两个这样的数据框:

Let's say that you have two dataframes like this:

np.random.seed(123)
# Reproducible input - Dataframe 1
rows = 10
df_1 = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_1['dates'] = datelist 
df_1 = df_1.set_index(['dates'])
df_1.index = pd.to_datetime(df_1.index)

##%%

# Reproducible input - Dataframe 2
rows = 10
df_2 = pd.DataFrame(np.random.randint(10,20,size=(rows, 2)), columns=list('CD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_2['dates'] = datelist 
df_2 = df_2.set_index(['dates'])
df_2.index = pd.to_datetime(df_2.index)

使用有限数量的数据框,您可以通过以下方式在字典中轻松组织它们:

With a limited number of dataframes you can easily organize them in a dictionary this way:

myFrames = {'df_1': df_1,
            'df_2': df_2} 

现在,您可以引用数据框以及自己定义的名称或键.您可以在此处找到更详细的说明.

Now you have a reference to your dataframes, as well as your own defined names or keys. You'll find a more elaborate explanation here.

使用方法如下:

print(myFrames['df_1'])

您还可以使用该引用对其中一个数据框进行更改,然后将其添加到字典中:

You can also use that reference to make changes to one of your dataframes, and add that to your dictionary:

df_3 = myFrames['df_1']
df_3 = df_3*10
myFrames.update({'df_3': df_3})
print(myFrames)

现在可以说,您有一堆想要以相同方式组织的数据框.您可以列出所有可用数据框的名称,如下所述.但是,您应该注意,出于很多原因使用eval()经常不推荐.

Now lets say that you have a whole bunch of dataframes that you'd like to organize the same way. You can make a list of the names of all available dataframes like described below. However, you should be aware that using eval() for many reasons often is not recommended.

无论如何,我们要走了:首先,您将获得所有数据框名称的字符串列表

Anyway, here we go: First you get a list of strings of all dataframe names like this:

alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]

如果您同时进行大量操作,那么您很可能对所有这些都不感兴趣.因此,可以说您感兴趣的所有数据框的名称均以"df_"开头.您可以像这样隔离它们:

It's more than likely that you won't be interested in ALL of them if you've got a lot going on at the same time. So lets say that the names of all your dataframes of particluar interest start with 'df_'. You can isolate them like this:

dfNames = []
for elem in alldfs:
   if str(elem)[:3] == 'df_':
       dfNames.append(elem)

现在,您可以将该列表与eval()结合使用来制作字典:

Now you can use that list in combination with eval() to make a dictionary:

myFrames2 = {}
for dfName in dfNames:
    myFrames2[dfName] = eval(dfName)

现在,您可以遍历该词典并对它们进行处理. 例如,您可以将每个数据框的最后一列乘以10,然后使用以下值创建一个新的数据框:

Now you can loop through that dictionary and do something with each of them. You could, as an example, take the last column of each dataframe, multiply by 10, and make a new dataframe with those values:

j = 1
for key in myFrames.keys():

    # Build new column names for your brand new df
    colName = []
    colName.append('column_' + str(j))

    if j == 1:
        # First, make a new df by referencing the dictionary
        df_new = myFrames2[key]

        # Subset the last column and make sure it doesn't
        # turn into a pandas series instead of a dataframe in the process
        df_new = df_new.iloc[:,-1].to_frame()

        # Set new column names
        df_new.columns = colName[:]
    else:
        # df_new already exists, so you can add
        # new columns and names for the rest of the columns
        df_new[colName] = myFrames2[key].iloc[:,-1].to_frame()
    j = j + 1

print(df_new)

希望您会发现它有用!

顺便说一句...对于您的下一个问题,请提供一些可重现的代码以及有关您自己尝试过的解决方案的几句话.您可以在此处中详细了解如何提出一个出色的问题.

And by the way... For your next question, please provide some reproducible code as well as a few words about what solutions you have tried yourself. You can read more about how to ask an excellent question here.

下面是完整的内容,可方便地进行复制和粘贴:

And here is the whole thing for an easy copy&paste:

#%%

# Imports
import pandas as pd
import numpy as np

np.random.seed(123)

# Reproducible input - Dataframe 1
rows = 10
df_1 = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_1['dates'] = datelist 
df_1 = df_1.set_index(['dates'])
df_1.index = pd.to_datetime(df_1.index)

##%%

# Reproducible input - Dataframe 2
rows = 10
df_2 = pd.DataFrame(np.random.randint(10,20,size=(rows, 2)), columns=list('CD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_2['dates'] = datelist 
df_2 = df_2.set_index(['dates'])
df_2.index = pd.to_datetime(df_2.index)

print(df_1)
print(df_2)
##%%


# If you dont have that many dataframes, you can organize them in a dictionary like this:
myFrames = {'df_1': df_1,
            'df_2': df_2}  


# Now you can reference df_1 in that collecton by using:
print(myFrames['df_1'])

# You can also use that reference to make changes to one of your dataframes,
# and add that to your dictionary
df_3 = myFrames['df_1']
df_3 = df_3*10
myFrames.update({'df_3': df_3})

# And now you have a happy little family of dataframes:
print(myFrames)
##%%

# Now lets say that you have whole bunch of dataframes that you'd like to organize the same way.
# You can make a list of the names of all available dataframes like this:
alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]

##%%
# It's likely that you won't be interested in all of them if you've got a lot going on.
# Lets say that all your dataframes of interest start with 'df_'
# You get them like this:
dfNames = []
for elem in alldfs:
   if str(elem)[:3] == 'df_':
       dfNames.append(elem)

##%%
# Now you can use that list in combination with eval() to make a dictionary:
myFrames2 = {}
for dfName in dfNames:
    myFrames2[dfName] = eval(dfName)

##%%
# And now you can reference each dataframe by name in that new dictionary:
myFrames2['df_1']

##%%
#Loop through that dictionary and do something with each of them.

j = 1
for key in myFrames.keys():

    # Build new column names for your brand new df
    colName = []
    colName.append('column_' + str(j))

    if j == 1:
        # First, make a new df by referencing the dictionary
        df_new = myFrames2[key]

        # Subset the last column and make sure it doesn't
        # turn into a pandas series instead for a dataframe in the process
        df_new = df_new.iloc[:,-1].to_frame()

        # Set new column names
        df_new.columns = colName[:]
    else:
        # df_new already exists, so you can add
        # new columns and names for the rest of the columns
        df_new[colName] = myFrames2[key].iloc[:,-1].to_frame()
    j = j + 1

print(df_new)

这篇关于如何从Python中等同于R的数据帧列表中选择特定的数据帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆