如何在 Julia 中创建关联矩阵 [英] how can I create an incidence matrix in Julia

查看:28
本文介绍了如何在 Julia 中创建关联矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个关联矩阵.
我有一个包含 3 列的文件,例如:

I would like to create an incidence matrix.
I have a file with 3 columns, like:

id x  y   
A  22   2   
B   4  21   
C  21 360   
D  26   2   
E  22  58   
F   2 347   

我想要一个类似的矩阵(没有列名和行名):

And I want a matrix like (without col and row names):

  2 4 21 22 26 58 347 360   
A 1 0  0  1  0  0   0   0   
B 0 1  1  0  0  0   0   0   
C 0 0  1  0  0  0   0   1   
D 1 0  0  0  1  0   0   0   
E 0 0  0  1  0  1   0   0   
F 1 0  0  0  0  0   1   0   

我已经开始了这样的代码:

I have started the code like:

haps = readdlm("File.txt",header=true)      
hap1_2 = map(Int64,haps[1][:,2:end])    
ID = (haps[1][:,1])                      
dic1 = Dict()

for (i in 1:21)
    dic1[ID[i]] = hap1_2[i,:]
end

X=[zeros(21,22)];       #the original file has 21 rows and 22 columns 
X1 = hcat(ID,X)

现在的问题是我不知道如何在上面的示例中的特定列中用 1 填充矩阵.
我也不确定我是否走对了.

The problem now is that I don't know how to fill the matrix with 1s in the specific columns as in the example above.
I'm also not sure if I'm on the right way.

有什么建议可以帮到我吗?

Any suggestion that could help me??

谢谢!

推荐答案

NamedArrays 是一个简洁的包,它允许命名行和列,似乎适合这个问题.假设数据在 data.csv 中,这是一种解决方法(使用 Pkg.add("NamedArrays") 安装 NamedArrays):

NamedArrays is a neat package which allows naming both rows and columns and seems to fit the bill for this problem. Suppose the data is in data.csv, here is one method to go about it (install NamedArrays with Pkg.add("NamedArrays")):

data,header = readcsv("data.csv",header=true);
# get the column names by looking at unique values in columns
cols = unique(vec([(header[j+1],data[i,j+1]) for i in 1:size(data,1),j=1:2]))
# row names from ID column
rows = data[:,1]

using NamedArrays
narr = NamedArray(zeros(Int,length(rows),length(cols)),(rows,cols),("id","attr"));
# now stamp in the 1s in the right places
for r=1:size(data,1),c=2:size(data,2) narr[data[r,1],(header[c],data[r,c])] = 1 ; end

现在我们有了(注意我转置了 narr 以获得更好的打印输出):

Now we have (note I transposed narr for better printout):

julia> narr'
10x6 NamedArray{Int64,2}:
attr ╲ id │ A  B  C  D  E  F
──────────┼─────────────────
("x",22)  │ 1  0  0  0  1  0
("x",4)   │ 0  1  0  0  0  0
("x",21)  │ 0  0  1  0  0  0
("x",26)  │ 0  0  0  1  0  0
("x",2)   │ 0  0  0  0  0  1
("y",2)   │ 1  0  0  1  0  0
("y",21)  │ 0  1  0  0  0  0
("y",360) │ 0  0  1  0  0  0
("y",58)  │ 0  0  0  0  1  0
("y",347) │ 0  0  0  0  0  1

但是,如果 DataFrames 是必需的,则应该应用类似的技巧.

But, if DataFrames are necessary, similar tricks should apply.

--------- 更新 ----------

---------- UPDATE ----------

如果值的列应被忽略,即 x=2 和 y=2 都应在列上为值 2 设置 1,则代码变为:

In case the column of a value should be ignored i.e. x=2 and y=2 should both set a 1 on column for value 2, then the code becomes:

using NamedArrays
data,header = readcsv("data.csv",header=true);
rows = data[:,1]
cols = map(string,sort(unique(vec(data[:,2:end]))))
narr = NamedArray(zeros(Int,length(rows),length(cols)),(rows,cols),("id","attr"));
for r=1:size(data,1),c=2:size(data,2) narr[data[r,1],string(data[r,c])] = 1 ; end

给予:

julia> narr
6x8 NamedArray{Int64,2}:
id ╲ attr │   2    4   21   22   26   58  347  360
──────────┼───────────────────────────────────────
A         │   1    0    0    1    0    0    0    0
B         │   0    1    1    0    0    0    0    0
C         │   0    0    1    0    0    0    0    1
D         │   1    0    0    0    1    0    0    0
E         │   0    0    0    1    0    1    0    0
F         │   1    0    0    0    0    0    1    0

这篇关于如何在 Julia 中创建关联矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆