如何在数据框中创建矩阵元素的数据集? [英] How can make a dataset of elements of matrices in dataframe?
问题描述
我在.TXT
文件中有3个参数'A'
,'B'
,'C'
的数据集,并在24x20矩阵中将它们打印后,我需要收集'A'
,'B'
,'C'
在panda dataframe
中放入长数组,然后每个数组放入第二个元素,然后依次排列第三个元素,依此类推直到第480个元素.
I have dataset of 3 parameters 'A'
,'B'
,'C'
in .TXT
file and after I print them in 24x20 matrices I need to collect the 1st elements of 'A'
,'B'
,'C'
put in long arrays in panda dataframe
and then 2nd elements of each then 3rd and so on till 480th elements.
所以我的数据在文本文件中是这样的: 我的数据是txt文件,如下:
So my data is like this in text file: my data is txt file is following:
id_set: 000
A: -2.46882615679
B: -2.26408246559
C: -325.004619528
我已经制成了熊猫dataframe
,其中包括3列'A'
,'B'
,'C'
和index
列,并定义了以正确方式打印24x20矩阵的函数.通过2x2矩阵的简单示例:
I already made a panda dataframe
includes 3 columns of 'A'
,'B'
,'C'
and index
and defined functions to print 24x20 matric in right way. Simple example via 2x2 matrices:
1st cycle: A = [1,2, B = [4,5, C = [8,9,
3,4] 6,7] 10,11]
2nd cycle: A = [0,8, B = [1,9, C = [10,1,
2,5] 4,8] 2,7]
重塑为这种形式:
A(1,1),B(1,1),C(1,1),A(1,2),B(1,2),C(1,2),.....
Result= [1,4,8,2,5,9,3,6,10,4,7,11] #1st cycle
[0,1,10,8,9,1,2,4,2,5,8,7] #2nd cycle
我的脚本如下:
import numpy as np
import pandas as pd
import os
def normalize(value, min_value, max_value, min_norm, max_norm):
new_value = ((max_norm - min_norm)*((value - min_value)/(max_value - min_value))) + min_norm
return new_value
dft = pd.read_csv('D:\mc25.TXT', header=None)
id_set = dft[dft.index % 4 == 0].astype('int').values
A = dft[dft.index % 4 == 1].values
B = dft[dft.index % 4 == 2].values
C = dft[dft.index % 4 == 3].values
data = {'A': A[:,0], 'B': B[:,0], 'C': C[:,0]}
df = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])
#next iteration create all plots, change the number of cycles
cycles = int(len(df)/480)
print(cycles)
for cycle in range(0,10):
count = '{:04}'.format(cycle)
j = cycle * 480
for i in df:
try:
os.mkdir(i)
except:
pass
min_val = df[i].min()
min_nor = -1
max_val = df[i].max()
max_nor = 1
ordered_data = mkdf(df.iloc[j:j+480][i])
csv = print_df(ordered_data)
#Print .csv files contains matrix of each parameters by name of cycles respectively
csv.to_csv(f'{i}/{i}{count}.csv', header=None, index=None)
if 'C' in i:
min_nor = -40
max_nor = 150
#Applying normalization for C between [-40,+150]
new_value3 = normalize(df['C'].iloc[j:j+480], min_val, max_val, -40, 150)
df3 = print_df(mkdf(new_value3))
df3.to_csv(f'{i}/norm{i}{count}.csv', header=None, index=None)
else:
#Applying normalization for A,B between [-1,+1]
new_value1 = normalize(df['A'].iloc[j:j+480], min_val, max_val, -1, 1)
new_value2 = normalize(df['B'].iloc[j:j+480], min_val, max_val, -1, 1)
df1 = print_df(mkdf(new_value1))
df2 = print_df(mkdf(new_value2))
df1.to_csv(f'{i}/norm{i}{count}.csv', header=None, index=None)
df2.to_csv(f'{i}/norm{i}{count}.csv', header=None, index=None)
注意2::我在文本文件中提供了3个周期的数据集: 文本数据集
Note2: I provided a dataset in text file for 3 cycles: Text dataset
推荐答案
我不确定我是否完全理解您的问题,但这是一个解决方案:
I am not sure if I understood your question fully but this is a solution:
使用as_matrix()将数据帧转换为2d numpy数组,然后使用ravel()获得大小为480 * 3的向量,然后在循环中循环,并使用vstack方法在结果中彼此堆叠行,这是包含示例数据的代码:
Convert your data frame to a 2d numpy array using as_matrix() then use ravel() to get a vector of size 480 * 3 then cycle over your cycles and use vstack method for stacking rows over each other in your result, this is a code with your example data:
A = [[1,2,3,4], [10,20,30,40]]
B = [[4,5,6,7], [40,50,60,70]]
C = [[8,9,10,11], [80,90,100,110]]
cycles = 2
for cycle in range(cycles):
data = {'A': A[cycle], 'B': B[cycle], 'C': C[cycle]}
df = pd.DataFrame(data)
D = df.as_matrix().ravel()
if cycle == 0:
Results = np.array(D)
else:
Results = np.vstack((Results, D2))
# Output: Results= array([[ 1, 4, 8, 2, 5, 9, 3, 6, 10, 4, 7, 11], [ 10, 40, 80, 20, 50, 90, 30, 60, 100, 40, 70, 110]], dtype=int64)
np.savetxt("Results.csv", Results, delimiter=",")
这是您想要的吗?
这篇关于如何在数据框中创建矩阵元素的数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!