将 pandas 数据框中的列表转换为列 [英] Converting list in panda dataframe into columns

查看:89
本文介绍了将 pandas 数据框中的列表转换为列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

city        state   neighborhoods       categories
Dravosburg  PA      [asas,dfd]          ['Nightlife']
Dravosburg  PA      [adad]              ['Auto_Repair','Automotive']

我在上面的数据框中想要将列表的每个元素转换为列,例如:

I have above dataframe I want to convert each element of a list into column for eg:

city        state asas dfd adad Nightlife Auto_Repair Automotive 
Dravosburg  PA    1     1   0   1         1           0    

我正在使用以下代码执行此操作:

I am using following code to do this :

def list2columns(df):
"""
to convert list in the columns 
of a dataframe
"""
columns=['categories','neighborhoods']
for col in columns:    
    for i in range(len(df)):
        for element in eval(df.loc[i,"categories"]):
            if len(element)!=0:
                if element not in df.columns:
                    df.loc[:,element]=0
                else:
                    df.loc[i,element]=1

  1. 如何以更有效的方式做到这一点?
  2. 为什么我已经在使用df.loc时仍然出现以下警告

  1. How to do this in more efficient way?
  2. Why still there is below warning when I am using df.loc already

SettingWithCopyWarning: A value is trying to be set on a copy of a slice
from a DataFrame.Try using .loc[row_indexer,col_indexer] = value instead

推荐答案

由于您使用的是eval(),因此我假设每一列都有列表的字符串表示形式,而不是列表本身.另外,与上面的示例不同,我假设您的neighborhoods列(df.iloc[0, 'neighborhoods'] == "['asas','dfd']")列表中的项目周围有引号,否则,您的eval()将会失败.

Since you're using eval(), I assume each column has a string representation of a list, rather than a list itself. Also, unlike your example above, I'm assuming there are quotes around the items in the lists in your neighborhoods column (df.iloc[0, 'neighborhoods'] == "['asas','dfd']"), because otherwise your eval() would fail.

如果这都是正确的,则可以尝试执行以下操作:

If this is all correct, you could try something like this:

def list2columns(df):
"""
to convert list in the columns of a dataframe
"""
columns = ['categories','neighborhoods']
new_cols = set()      # list of all new columns added
for col in columns:    
    for i in range(len(df[col])):
        # get the list of columns to set
        set_cols = eval(df.iloc[i, col])
        # set the values of these columns to 1 in the current row
        # (if this causes new columns to be added, other rows will get nans)
        df.iloc[i, set_cols] = 1
        # remember which new columns have been added
        new_cols.update(set_cols)
# convert any un-set values in the new columns to 0
df[list(new_cols)].fillna(value=0, inplace=True)
# if that doesn't work, this may:
# df.update(df[list(new_cols)].fillna(value=0))

我只能推测出您的第二个问题,有关SettingWithCopy警告的答案.

I can only speculate on an answer to your second question, about the SettingWithCopy warning.

(但不太可能)使用df.iloc而不是df.loc会有所帮助,因为这是按行号进行选择的(在您的情况下,df.loc[i, col]仅适用于未设置索引的情况,因此,pandas使用默认索引,该索引与行号匹配).

It's possible (but unlikely) that using df.iloc instead of df.loc will help, since that is intended to select by row number (in your case, df.loc[i, col] only works because you haven't set an index, so pandas uses the default index, which matches the row number).

另一种可能性是,传递给您的函数的df已经是来自较大数据帧的切片,这将导致SettingWithCopy警告.

Another possibility is that the df that is passed in to your function is already a slice from a larger dataframe, and that is causing the SettingWithCopy warning.

我还发现将df.loc与混合索引模式(行的逻辑选择器和列的列名称)一起使用会产生SettingWithCopy警告;您的切片选择器可能会引起类似的问题.

I've also found that using df.loc with mixed indexing modes (logical selectors for rows and column names for columns) produces the SettingWithCopy warning; it's possible that your slice selectors are causing similar problems.

希望上面的代码中更简单,更直接的索引可以解决所有这些问题.但是,如果您仍然看到该警告,请进行报告(并提供生成df的代码).

Hopefully the simpler and more direct indexing in the code above will solve any of these problems. But please report back (and provide code to generate df) if you are still seeing that warning.

这篇关于将 pandas 数据框中的列表转换为列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆