需要通过读取带有随机列的csv文件来创建Pandas数据框 [英] Need to create a Pandas dataframe by reading csv file with random columns

查看：114 发布时间：2020/10/12 22:04:17 python csv pandas

本文介绍了需要通过读取带有随机列的csv文件来创建Pandas数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下带有记录的csv文件：

I have the following csv file with records:

A 1，B 2，C 10，D 15

A 5，D 10，G 2

D 6，E 7

H 7，G 8

A 1, B 2, C 10, D 15
A 5, D 10, G 2
D 6, E 7
H 7, G 8

我的列标题/名称为：A，B，C，D，E，F，G

My column headers/names are: A, B, C, D, E, F, G

因此，使用 read_csv后，我的初始数据帧变为：

So my initial dataframe after using "read_csv" becomes:

A     B     C      D       E      F      G   
A 1   B 2   C 10   D 15   NaN    NaN    NaN
A 5   D 10  G 2    NaN    NaN    NaN    NaN
D 6   E 7   NaN    NaN    NaN    NaN    NaN
H 7   G 8   NaN    NaN    NaN    NaN    Nan

该值可以分为[column name] [column value]，因此A 1表示col = A且value = 1 ，而D 15表示col = D且value = 15，依此类推...

The value can be separate into [column name][column value], so A 1 means col=A and value=1, and D 15 means col=D and value=15, etc...

我想要的是基于$ b将数值分配给适当的列$ b并具有一个如下所示的数据框：

What I want is to assign the numeric value to the appropriate column based on the and have a dataframe that looks like this:

A     B     C      D       E      F      G   
A 1   B 2   C 10   D 15   NaN    NaN    NaN
A 5   Nan   NaN    D 10   NaN    NaN    G 2
NaN   NaN   NaN    D 6    E 7    NaN    NaN
NaN   NaN   NaN    NaN    NaN    NaN    G 8

甚至更好，仅是值：

A     B     C      D       E      F      G   
1     2     10     15      NaN    NaN    NaN
5     Nan   NaN    10      NaN    NaN    2
NaN   NaN   NaN    6       7      NaN    NaN
NaN   NaN   NaN    NaN     NaN    NaN    8

推荐答案

应用解决方案：

使用 分割 ，用NaN 行http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html rel = nofollow> dropna ， set_index 并转换一列 DataFrame 到系列，由 DataFrame.squeeze 。最后 reindex 通过新列名：

Use split by whitespace, remove NaN rows by dropna, set_index and convert one column DataFrame to Series by DataFrame.squeeze. Last reindex by new column names:

print (df.apply(lambda x: x.str.split(expand=True)
                               .dropna()
                               .set_index(0)
                               .squeeze(), axis=1)
         .reindex(columns=list('ABCDEFGH')))

     A    B    C    D    E   F    G    H
0    1    2   10   15  NaN NaN  NaN  NaN
1    5  NaN  NaN   10  NaN NaN    2  NaN
2  NaN  NaN  NaN    6    7 NaN  NaN  NaN
3  NaN  NaN  NaN  NaN  NaN NaN    8    7

堆栈解决方案：

使用 stack 用于创建 Series ， split 并创建空白列，并在列中添加新列名（ A ， B ...）由索引 http://pandas.pydata.org/pandas-docs/stable/generation/pandas.DataFrame.set_index.html rel = nofollow> set_index ，通过DataFrame 转换为 Series -docs / stable / generated / pandas.DataFrame.squeeze.html rel = nofollow> DataFrame.squeeze ，通过删除旧列名称的索引值 reset_index ， unstack ， 重新索引 通过新的列名（它添加由 NaN 填充的缺失列），通过float http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html rel = nofollow> astype 最后通过 rename_axis删除列名 （ pandas 0.18.0 中的新功能）：

Use stack for creating Series, split by whitespace and create new columns, append column with new column names (A, B...) to index by set_index, convert one column DataFrame to Series by DataFrame.squeeze, remove index values with old column names by reset_index, unstack, reindex by new column names (it add missing columns filled by NaN),convert values to float by astype and last remove column name by rename_axis (new in pandas 0.18.0):

print (df.stack()
         .str.split(expand=True)
         .set_index(0, append=True)
         .squeeze()
         .reset_index(level=1, drop=True)
         .unstack()
         .reindex(columns=list('ABCDEFGH'))
         .astype(float)
         .rename_axis(None, axis=1))

     A    B     C     D    E   F    G    H
0  1.0  2.0  10.0  15.0  NaN NaN  NaN  NaN
1  5.0  NaN   NaN  10.0  NaN NaN  2.0  NaN
2  NaN  NaN   NaN   6.0  7.0 NaN  NaN  NaN
3  NaN  NaN   NaN   NaN  NaN NaN  8.0  7.0

这篇关于需要通过读取带有随机列的csv文件来创建Pandas数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

需要通过读取带有随机列的csv文件来创建Pandas数据框 [英] Need to create a Pandas dataframe by reading csv file with random columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

需要通过读取带有随机列的csv文件来创建Pandas数据框 [英] Need to create a Pandas dataframe by reading csv file with random columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭