基于数据的情节中某些点之间的连线? (与R) [英] Lines between certain points in a plot, based on the data? (with R)

查看:127
本文介绍了基于数据的情节中某些点之间的连线? (与R)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经完成了我的搜索和Google搜索,但尚未找到解决以下问题的方法。我经常从这个论坛找到与R相关的问题的解决方案,所以我想我会试一试,希望有人能提出一些建议。我需要它作为我的博士论文;任何代码或建议我将使用自然会被承认和记入。

因此:我需要绘制线/段来连接图中的点(多维缩放,特别是)在R(基于SPSS的解决方案也是受欢迎的) - 但不是所有点之间,只是表示至少有一个数据项共享的属性/变量的那些点 - 行的位置应该是基于所讨论的情节基于其自身的数据。让我解开;以下是虚拟变量的一些虚构数据,其中'1'表示该项目有属性: $ babc
items---------
tree | 1 1 0
house | 0 1 1
hut | 0 1 1
book | 1 0 0


该图是一个多维缩放图(距离被解释为不相似)。这是逻辑:


  • 在A和B之间有一条线,因为至少有一个项目/变量(树)在
    具有两个属性的数据;
  • 在B和C之间存在一条线,因为数据中至少有一个项目(house和hut)它有两个属性;
  • 有一个只有一个属性(A)的项目(book),因此它不会影响行的放置位置

  • 重要的是,A和C之间存在 no line ,因为数据中没有包含这两个属性的项。



我正在寻找的是一种自动/计算添加灰线的方法,我现在手动绘制在上图中。自动绘图应基于上述数据。对于小数据集,手动绘制线条不成问题,但是当存在几十个这样的属性和数百个项目/行数据时,就成了问题。
任何想法?一些R代码(如果可能,评论)将是最受欢迎的!



编辑:我似乎忘记了一件非常重要的事情。首先,下面的@GaborCsardi提出的解决方案与示例数据完美结合,谢谢!但我忘了包括点的连接也应该是保守的,尽可能少的连接线。例如,如果存在具有所有属性的项目,则不应在该图中的每个属性点之间创建线条,因为如果这些点已经与其他项目已连接(即使是间接连接)。因此,基于以下数据的情节不应该是一个完整的三角形,即使item1具有所有三个属性:

  ABC 
item1 1 1 1
item2 1 1 0
item3 0 1 1

相反,A,B和B,C应该通过一条线连接,但是A和C之间的一条线会变得缓慢,因为它们已经间接连接(通过B)。这可以通过入射图来完成吗?

解决方案

如果您使用图形并创建二分图的投影,你在桌上有。例如

  library(igraph)

##一些示例数据
mat < 物业
物品abc
树1 1 0
房屋0 1 1
小屋0 1 1
预订1 0 0

标签< - read.table(textConnection(mat),skip = 1,
header = TRUE,row.names = 1)

##创建一个二部图
graph< ; - graph.incidence(as.matrix(tab))

##计划双向图
proj< - bipartite.projection(graph)

# #绘制其中一个预测,您需要的
##恰好是第二个
plot(proj $ proj2)

##投影的最小生成树
plot(minimum.spanning.tree(proj $ proj2))

有关更多信息,请参阅手册页,即?igraph-package ?graph.incidence ?bipartite。投影?plot.igraph


I have done my research and googling but have yet to find a solution to the following problem. I have quite often found solutions to R-related issues from this forum, so I thought I'd give it a try and hope that somebody can suggest something. I would need it for my PhD thesis; anybody who's code or suggestions I will use will naturally be acknowledged and credited.

So: I need to draw lines/segments to connect points in a plot (of multidimensional scaling, specifically) in R (SPSS-based solutions are welcome as well) - but not between all points, just those that represent properties/variables that at least one data item shares - the placement of the lines should be based on the data that the plot in question is based on itself. Let me exeplify; below are some fictional data with dummy variables, where '1' means that the item has the property:

       "properties"
        a   b   c
"items" ---------
tree  | 1   1   0
house | 0   1   1
hut   | 0   1   1
book  | 1   0   0

The plot is a multidimensional scaling plot (distances are to be interpreted as dissimilarities). This is the logic:

  • there's a line between A and B, because there is at least one item/variable ("tree") in the data that has both properties;
  • there is a line between B and C, because there is at least one item in the data ("house" and "hut") that has both properties;
  • there is an item ("book") that has only one property (A), so it does not affect the placement of the lines
  • importantly, there is no line between A and C because there are no items in the data that have both properties.

What I am looking for is a way to add the grey lines automatically/computationally that I have for now drawn manually on the plot above. The automatic drawing should be based on the data as described above. With a small data set, drawing the lines manually is no problem, but becomes a problem when there are tens of such "properties" and hundreds of items/rows of data. Any ideas? Some R code (commented if possible) would be most welcome!

EDIT: It seems I forgot something very important. First thing, the solution proposed by @GaborCsardi below works perfectly with the example data, thanks for that! But I forgot to include that the linking of the points should also be "conservative", with as few connecting lines as possible. For example, if there is an item that has all the "properties", then it should not create lines between every single property point in the plot just because of that, if the points are connected by other items already, even if indirectly. So a plot based on the following data should not be a full triangle, even though item1 has all three properties:

      A B C
item1 1 1 1
item2 1 1 0
item3 0 1 1

Instead, A,B and B,C should be connected by a line, but a line between A and C would be exessive, as they are already indirectly connected (through B). Could this be done with incidence graphs?

解决方案

This is very easy if you use graphs, and create the projection of the bipartite graph that you have in your table. E.g.

library(igraph)

## Some example data
mat <- "       properties
        items  a   b   c
        tree   1   1   0
        house  0   1   1
        hut    0   1   1
        book   1   0   0
       "
tab <- read.table(textConnection(mat), skip=1,
                  header=TRUE, row.names=1)

## Create a bipartite graph
graph <- graph.incidence(as.matrix(tab))

## Project the bipartite graph
proj <- bipartite.projection(graph)

## Plot one of the projections, the one you need 
## happens to be the second one
plot(proj$proj2)

## Minimum spanning tree of the projection
plot(minimum.spanning.tree(proj$proj2))

For more information see the manual pages, i.e. ?"igraph-package" ?graph.incidence, ?bipartite.projection and ?plot.igraph.

这篇关于基于数据的情节中某些点之间的连线? (与R)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆