如何将数据帧列分成多个列 [英] How to split a dataframe column into multiple columns

查看：132 发布时间：2017/2/25 19:49:58 python csv pandas dataframe

本文介绍了如何将数据帧列分成多个列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

经过多番努力，我开始将我的R脚本迁移到Python。我在R中的大部分工作都涉及数据框架，我使用的是来自pandas包的 DataFrame 对象。在我的脚本中，我需要读入一个csv文件，并将数据导入到一个 DataFrame 对象。接下来，我需要将十六进制值转换为标记为 DATA 的列到按位数据，然后创建16个新列，每个位一个。

我在文件 test.txt 中的输入数据示例如下，

PREFIX，TEST，ZONE，ROW，COL，DATA

6_6，READ，0,0,0，BFED

6_6，READ，0，1，0，BB7D

6_6，READ，0,2,0，FFF7

6_6，READ，0,3,0，E7FF

6_6，READ，0,4,0，FBF8

6_6，READ，0，5，0，DE75

6_6，READ，0,6,0，DFFE

我的python脚本 test.py 如下，

  import glob 
 
 import pandas as pd 
 
 import numpy as np 
 
 fname ='test.txt'
 
 df = pd.read_csv（fname，comment =＃）
 
 dfs = df [df.TEST =='READ '] 
 
＃函数将hexstring转换为二进制字符串
 
 def hex2bin（hstr）：
 
 return bin（int（hstr，16 ）[2：] 
 
 
＃将列DATA中的hexstring转换为binarystring ROWDATA 
 
 dfs ['BINDATA'] = dfs ['DATA']。 apply（hex2bin）
 
＃删除列DATA 
 
 del dfs ['DATA']

当我运行这个脚本，并检查对象 dfs ，我得到以下，

PREFIX TEST ZONE ROW COL BINDATA

0 6_6 READ 0 0 0 1011111111101101

1 6_6 READ 0 1 0 1011101101111101

2 6_6 READ 0 2 0 1111111111110111

3 6_6 READ 0 3 0 1110011111111111

4 6_6 READ 0 4 0 1111101111111000

5 6_6 READ 0 5 0 1101111001110101

6 6_6 READ 0 6 0 1101111111111110

将名为 BINDATA 的列拆分为16个新列（可命名为B0，B0，B2，...，B15）。任何帮助将不胜感激。

谢谢&

解决方案

我不知道是否它可以做得更简单（没有for循环），但这是诀窍：

  for i in range（16） 
 dfs ['B'+ str（i）] = dfs ['BINDATA']。str [i]

b $ b

本系列的 str 属性允许访问一些对每个元素起作用的矢量化字符串方法（参见docs： http://pandas.pydata.org/pandas-docs/stable/basics.html#vectorized-字符串方法）。在这种情况下，我们只是索引字符串以访问不同的字符。

这给我：

  [20]：dfs 
 Out [20]：
 BINDATA B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 
 0 1011111111101101 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 
 1 1011101101111101 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 
 2 1111111111110111 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 
 3 1110011111111111 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 
 4 1111101111111000 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 
 5 1101111001110101 1 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 
 6 1101111111111110 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0

如果你想要它们为int而不是字符串，你可以添加 .astype（int）

编辑：另一种方法（一个工作，但你必须更改列名第二步）：

 在[34]：splitted = dfs ['BINDATA']。apply（lambda x：pd。系列（列表（x）））
 
 In [35]：splitted.columns = ['B'+ str（x）for x in splitted.columns] 
 
 [36]：dfs.join（splitted）
 Out [36]：
 BINDATA B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 
 0 1011111111101101 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 
 1 1011101101111101 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 
 2 1111111111110111 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 
 3 1110011111111111 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 
 4 1111101111111000 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 
 5 1101111001110101 1 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 
 6 1101111111111110 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0

After much prodding I am starting migrating my R scripts to Python. Most of my work in R involved data frames, and I am using the DataFrame object from the pandas package. In my script I need to read in a csv file and import the data into a DataFrame object. Next I need to convert the hex values into a column labelled DATA into bitwise data, and then create 16 new columns, one for each bit.



My example input data in file test.txt looks as follows,

  PREFIX,TEST,ZONE,ROW,COL,DATA
  
  6_6,READ,0,  0,  0,BFED
  
  6_6,READ,0,  1,  0,BB7D
  
  6_6,READ,0,  2,  0,FFF7
  
  6_6,READ,0,  3,  0,E7FF
  
  6_6,READ,0,  4,  0,FBF8
  
  6_6,READ,0,  5,  0,DE75
  
  6_6,READ,0,  6,  0,DFFE
My python script test.py is as follows,
import glob

import pandas as pd

import numpy as np

fname = 'test.txt'

df = pd.read_csv(fname, comment="#")

dfs = df[df.TEST == 'READ']

# function to convert the hexstring into a binary string

def hex2bin(hstr):

    return bin(int(hstr,16))[2:]


# convert the hexstring in column DATA to binarystring ROWDATA

dfs['BINDATA'] = dfs['DATA'].apply(hex2bin)

# get rid of the column DATA

del dfs['DATA']
When I run this script, and inspect the object dfs, I get the following,

  PREFIX  TEST  ZONE  ROW  COL           BINDATA
  
  0    6_6  READ     0    0    0  1011111111101101
  
  1    6_6  READ     0    1    0  1011101101111101
  
  2    6_6  READ     0    2    0  1111111111110111
  
  3    6_6  READ     0    3    0  1110011111111111
  
  4    6_6  READ     0    4    0  1111101111111000
  
  5    6_6  READ     0    5    0  1101111001110101
  
  6    6_6  READ     0    6    0  1101111111111110

    

      
    

  

So now I am not sure how to split the column named BINDATA into 16 new columns (could be named B0, B0, B2, ...., B15). Any help will be appreciated. 

Thanks & Regards,

Derric.
 解决方案 
I don't know if it can be done simpler (without the for loop), but this does the trick:
for i in range(16):
    dfs['B'+str(i)] = dfs['BINDATA'].str[i]
The str attribute of the Series gives access to some vectorized string methods which act upon each element (see docs: http://pandas.pydata.org/pandas-docs/stable/basics.html#vectorized-string-methods). In this case we just index the string to acces the different characters.

This gives me:
In [20]: dfs
Out[20]:
            BINDATA B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15
0  1011111111101101  1  0  1  1  1  1  1  1  1  1   1   0   1   1   0   1
1  1011101101111101  1  0  1  1  1  0  1  1  0  1   1   1   1   1   0   1
2  1111111111110111  1  1  1  1  1  1  1  1  1  1   1   1   0   1   1   1
3  1110011111111111  1  1  1  0  0  1  1  1  1  1   1   1   1   1   1   1
4  1111101111111000  1  1  1  1  1  0  1  1  1  1   1   1   1   0   0   0
5  1101111001110101  1  1  0  1  1  1  1  0  0  1   1   1   0   1   0   1
6  1101111111111110  1  1  0  1  1  1  1  1  1  1   1   1   1   1   1   0
If you want them as ints instead of strings, you can add .astype(int) in the for loop.



EDIT: Another way to do it (a oneliner, but you have to change the column names in a second step):
In [34]: splitted = dfs['BINDATA'].apply(lambda x: pd.Series(list(x)))

In [35]: splitted.columns = ['B'+str(x) for x in splitted.columns]

In [36]: dfs.join(splitted)
Out[36]:
            BINDATA B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15
0  1011111111101101  1  0  1  1  1  1  1  1  1  1   1   0   1   1   0   1
1  1011101101111101  1  0  1  1  1  0  1  1  0  1   1   1   1   1   0   1
2  1111111111110111  1  1  1  1  1  1  1  1  1  1   1   1   0   1   1   1
3  1110011111111111  1  1  1  0  0  1  1  1  1  1   1   1   1   1   1   1
4  1111101111111000  1  1  1  1  1  0  1  1  1  1   1   1   1   0   0   0
5  1101111001110101  1  1  0  1  1  1  1  0  0  1   1   1   0   1   0   1
6  1101111111111110  1  1  0  1  1  1  1  1  1  1   1   1   1   1   1   0


                        
这篇关于如何将数据帧列分成多个列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何将数据帧列分成多个列 [英] How to split a dataframe column into multiple columns

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将数据帧列分成多个列 [英] How to split a dataframe column into multiple columns

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭