如何在数据框中制作矩阵元素的数据集? [英] How can make a dataset of elements of matrices in dataframe?
问题描述
我在.TXT
中有3个参数'A'
,'B'
,'C'
的数据集> 文件,在我将它们打印成 24x20 矩阵后,我需要收集 'A'
,'B'
,'C'
的第一个元素在熊猫 dataframe
中放入长数组,然后是每个的第二个元素,然后是第三个,依此类推,直到第 480 个元素.
I have dataset of 3 parameters 'A'
,'B'
,'C'
in .TXT
file and after I print them in 24x20 matrices I need to collect the 1st elements of 'A'
,'B'
,'C'
put in long arrays in panda dataframe
and then 2nd elements of each then 3rd and so on till 480th elements.
所以我的数据在文本文件中是这样的:我的数据是txt文件如下:
So my data is like this in text file: my data is txt file is following:
id_set: 000
A: -2.46882615679
B: -2.26408246559
C: -325.004619528
我已经制作了一个熊猫 dataframe
包括 3 列 'A'
,'B'
,'C'
和 index
并定义了以正确方式打印 24x20 矩阵的函数.通过 2x2 矩阵的简单示例:
I already made a panda dataframe
includes 3 columns of 'A'
,'B'
,'C'
and index
and defined functions to print 24x20 matric in right way. Simple example via 2x2 matrices:
1st cycle: A = [1,2, B = [4,5, C = [8,9,
3,4] 6,7] 10,11]
2nd cycle: A = [0,8, B = [1,9, C = [10,1,
2,5] 4,8] 2,7]
重塑为这种形式:
A(1,1),B(1,1),C(1,1),A(1,2),B(1,2),C(1,2),.....
Result= [1,4,8,2,5,9,3,6,10,4,7,11] #1st cycle
[0,1,10,8,9,1,2,4,2,5,8,7] #2nd cycle
我的脚本如下:
import numpy as np
import pandas as pd
import os
def normalize(value, min_value, max_value, min_norm, max_norm):
new_value = ((max_norm - min_norm)*((value - min_value)/(max_value - min_value))) + min_norm
return new_value
dft = pd.read_csv('D:\mc25.TXT', header=None)
id_set = dft[dft.index % 4 == 0].astype('int').values
A = dft[dft.index % 4 == 1].values
B = dft[dft.index % 4 == 2].values
C = dft[dft.index % 4 == 3].values
data = {'A': A[:,0], 'B': B[:,0], 'C': C[:,0]}
df = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])
#next iteration create all plots, change the number of cycles
cycles = int(len(df)/480)
print(cycles)
for cycle in range(0,10):
count = '{:04}'.format(cycle)
j = cycle * 480
for i in df:
try:
os.mkdir(i)
except:
pass
min_val = df[i].min()
min_nor = -1
max_val = df[i].max()
max_nor = 1
ordered_data = mkdf(df.iloc[j:j+480][i])
csv = print_df(ordered_data)
#Print .csv files contains matrix of each parameters by name of cycles respectively
csv.to_csv(f'{i}/{i}{count}.csv', header=None, index=None)
if 'C' in i:
min_nor = -40
max_nor = 150
#Applying normalization for C between [-40,+150]
new_value3 = normalize(df['C'].iloc[j:j+480], min_val, max_val, -40, 150)
df3 = print_df(mkdf(new_value3))
df3.to_csv(f'{i}/norm{i}{count}.csv', header=None, index=None)
else:
#Applying normalization for A,B between [-1,+1]
new_value1 = normalize(df['A'].iloc[j:j+480], min_val, max_val, -1, 1)
new_value2 = normalize(df['B'].iloc[j:j+480], min_val, max_val, -1, 1)
df1 = print_df(mkdf(new_value1))
df2 = print_df(mkdf(new_value2))
df1.to_csv(f'{i}/norm{i}{count}.csv', header=None, index=None)
df2.to_csv(f'{i}/norm{i}{count}.csv', header=None, index=None)
注 2: 我在文本文件中提供了 3 个周期的数据集:文本数据集
Note2: I provided a dataset in text file for 3 cycles: Text dataset
推荐答案
我不确定我是否完全理解你的问题,但这是一个解决方案:
I am not sure if I understood your question fully but this is a solution:
使用 as_matrix() 将数据帧转换为 2d numpy 数组,然后使用 ravel() 获得大小为 480 * 3 的向量,然后循环循环并使用 vstack 方法在结果中将行彼此堆叠,这是带有示例数据的代码:
Convert your data frame to a 2d numpy array using as_matrix() then use ravel() to get a vector of size 480 * 3 then cycle over your cycles and use vstack method for stacking rows over each other in your result, this is a code with your example data:
A = [[1,2,3,4], [10,20,30,40]]
B = [[4,5,6,7], [40,50,60,70]]
C = [[8,9,10,11], [80,90,100,110]]
cycles = 2
for cycle in range(cycles):
data = {'A': A[cycle], 'B': B[cycle], 'C': C[cycle]}
df = pd.DataFrame(data)
D = df.as_matrix().ravel()
if cycle == 0:
Results = np.array(D)
else:
Results = np.vstack((Results, D2))
# Output: Results= array([[ 1, 4, 8, 2, 5, 9, 3, 6, 10, 4, 7, 11], [ 10, 40, 80, 20, 50, 90, 30, 60, 100, 40, 70, 110]], dtype=int64)
np.savetxt("Results.csv", Results, delimiter=",")
这是你想要的吗?
这篇关于如何在数据框中制作矩阵元素的数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!