如何在 Julia 中创建关联矩阵 [英] how can I create an incidence matrix in Julia
问题描述
我想创建一个关联矩阵.
我有一个包含 3 列的文件,例如:
I would like to create an incidence matrix.
I have a file with 3 columns, like:
id x y
A 22 2
B 4 21
C 21 360
D 26 2
E 22 58
F 2 347
我想要一个类似的矩阵(没有列名和行名):
And I want a matrix like (without col and row names):
2 4 21 22 26 58 347 360
A 1 0 0 1 0 0 0 0
B 0 1 1 0 0 0 0 0
C 0 0 1 0 0 0 0 1
D 1 0 0 0 1 0 0 0
E 0 0 0 1 0 1 0 0
F 1 0 0 0 0 0 1 0
我已经开始了这样的代码:
I have started the code like:
haps = readdlm("File.txt",header=true)
hap1_2 = map(Int64,haps[1][:,2:end])
ID = (haps[1][:,1])
dic1 = Dict()
for (i in 1:21)
dic1[ID[i]] = hap1_2[i,:]
end
X=[zeros(21,22)]; #the original file has 21 rows and 22 columns
X1 = hcat(ID,X)
现在的问题是我不知道如何在上面的示例中的特定列中用 1 填充矩阵.
我也不确定我是否走对了.
The problem now is that I don't know how to fill the matrix with 1s in the specific columns as in the example above.
I'm also not sure if I'm on the right way.
有什么建议可以帮到我吗?
Any suggestion that could help me??
谢谢!
推荐答案
NamedArrays
是一个简洁的包,它允许命名行和列,似乎适合这个问题.假设数据在 data.csv
中,这是一种解决方法(使用 Pkg.add("NamedArrays")
安装 NamedArrays
):
NamedArrays
is a neat package which allows naming both rows and columns and seems to fit the bill for this problem. Suppose the data is in data.csv
, here is one method to go about it (install NamedArrays
with Pkg.add("NamedArrays")
):
data,header = readcsv("data.csv",header=true);
# get the column names by looking at unique values in columns
cols = unique(vec([(header[j+1],data[i,j+1]) for i in 1:size(data,1),j=1:2]))
# row names from ID column
rows = data[:,1]
using NamedArrays
narr = NamedArray(zeros(Int,length(rows),length(cols)),(rows,cols),("id","attr"));
# now stamp in the 1s in the right places
for r=1:size(data,1),c=2:size(data,2) narr[data[r,1],(header[c],data[r,c])] = 1 ; end
现在我们有了(注意我转置了 narr
以获得更好的打印输出):
Now we have (note I transposed narr
for better printout):
julia> narr'
10x6 NamedArray{Int64,2}:
attr ╲ id │ A B C D E F
──────────┼─────────────────
("x",22) │ 1 0 0 0 1 0
("x",4) │ 0 1 0 0 0 0
("x",21) │ 0 0 1 0 0 0
("x",26) │ 0 0 0 1 0 0
("x",2) │ 0 0 0 0 0 1
("y",2) │ 1 0 0 1 0 0
("y",21) │ 0 1 0 0 0 0
("y",360) │ 0 0 1 0 0 0
("y",58) │ 0 0 0 0 1 0
("y",347) │ 0 0 0 0 0 1
但是,如果 DataFrames
是必需的,则应该应用类似的技巧.
But, if DataFrames
are necessary, similar tricks should apply.
--------- 更新 ----------
---------- UPDATE ----------
如果值的列应被忽略,即 x=2 和 y=2 都应在列上为值 2 设置 1,则代码变为:
In case the column of a value should be ignored i.e. x=2 and y=2 should both set a 1 on column for value 2, then the code becomes:
using NamedArrays
data,header = readcsv("data.csv",header=true);
rows = data[:,1]
cols = map(string,sort(unique(vec(data[:,2:end]))))
narr = NamedArray(zeros(Int,length(rows),length(cols)),(rows,cols),("id","attr"));
for r=1:size(data,1),c=2:size(data,2) narr[data[r,1],string(data[r,c])] = 1 ; end
给予:
julia> narr
6x8 NamedArray{Int64,2}:
id ╲ attr │ 2 4 21 22 26 58 347 360
──────────┼───────────────────────────────────────
A │ 1 0 0 1 0 0 0 0
B │ 0 1 1 0 0 0 0 0
C │ 0 0 1 0 0 0 0 1
D │ 1 0 0 0 1 0 0 0
E │ 0 0 0 1 0 1 0 0
F │ 1 0 0 0 0 0 1 0
这篇关于如何在 Julia 中创建关联矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!