在单个图中获得geom_hex中的观测值(发光) [英] Obtain observations in geom_hex in a single plot (Shiny)

查看:90
本文介绍了在单个图中获得geom_hex中的观测值(发光)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个六边形的交互式图,用户可以在其中单击给定的六边形,并接收在该单击的六边形中分组的原始数据帧的所有观察结果的列表.

下面是一个MWE,看起来非常接近我的目标.我正在使用Shiny,hexbin()和ggplotly.

app.R

library(shiny)
library(plotly)
library(data.table)
library(GGally)
library(reshape2)
library(hexbin)

ui <- fluidPage(
  plotlyOutput("plot"),
  verbatimTextOutput("click")
)

server <- function(input, output, session) {
  #Create data
  set.seed(1)
  bindata <- data.frame(x=rnorm(100), y=rnorm(100))

  h <- hexbin (bindata, xbins = 5, IDs = TRUE, xbnds = range (bindata$x), ybnds = range (bindata$y))

  # As we have the cell IDs, we can merge this data.frame with the proper coordinates
  hexdf <- data.frame (hcell2xy (h),  ID = h@cell, counts = h@count)

  # I have tried different methods of generating the ggplot object
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = counts)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = ID)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, colours = ID)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, colours = ID, aes(x=x, y=y, colours = ID, fill = counts)) + geom_hex(stat="identity")
  p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, ID=ID)) + geom_hex(stat="identity")

  output$plot <- renderPlotly({
    ggplotly(p)
  })

  d <- reactive(event_data("plotly_click"))

  output$click <- renderPrint({
    if (is.null(d())){
      "Click on a state to view event data"
    }
    else{
      str(d())
      #Next line would deliver all observations from original data frame (bindata) that are in the clicked hexbin... if d() from event_data() was returning ID instead of curveNumber
      #bindata[which(h@cID==d()$curveNumber),]
    }
  })
}

shinyApp(ui, server)

h @ cID对象内部是所有数据点的ID(显示哪个数据点进入哪个hexbin).因此,我觉得如果用户单击时能够获取event_data()返回hexbin ID,那么我应该能够成功地将该hexbin ID映射回h @ cID对象以获得相应的数据点. /p>

不幸的是,我目前编写的方式是,event_data()将返回"curveNumber",它似乎不等于ID.它似乎也没有转换为ID(即使使用h对象中的所有信息-不仅是h @ cID,而且还有更多信息,例如h @ xcm,h @ ycm等)

有人知道有什么方法可以解决此类问题吗?任何想法将不胜感激!

注意:我最近的两个帖子(包括赏金)与该问题非常相似.它们位于此处(使用geom_hex()散点图在ggplotly中进行交互式选择)和(使用plotly和Shiny在geom_hex中获得观测值).不同之处在于,我在每个步骤中都使问题变得更加简单.谢谢.

编辑-可能的答案

我想我可能已经解决了这个问题.就像@oshun注意到的一样,在event_data()返回的curveNumber和hexbin ID之间存在一些隐藏的转换.似乎通过增加hexbins的数量,curveNumbers首先从最小到最大进行了排序.然后,在给定的计数内,通过增加ID,似乎curverNumber进一步从最小到最大进行了排序.但是,该ID按字符(而不是 number )排序.例如,数字18被认为小于数字2,因为18以小于数字2的数字1开头.

当本示例中的完整数据集用下面的count,ID和curveNumber表示时,您可以看到此模式:

count=1 (ID=24) —> curveNumber 0
count=1 (ID=26) —> curveNumber 1
count=1 (ID=34) —> curveNumber 2
count=1 (ID=5) —> curveNumber 3
count=1 (ID=7) —> curveNumber 4
count=2 (ID=11) —> curveNumber 5
count=2 (ID=14) —> curveNumber 6
count=2 (ID=19) —> curveNumber 7
count=2 (ID=23) —> curveNumber 8
count=2 (ID=3) —> curveNumber 9
count=2 (ID=32) —> curveNumber 10
count=2 (ID=4) —> curveNumber 11
count=3 (ID=10) —> curveNumber 12
count=3 (ID=13) —> curveNumber 13
count=3 (ID=33) —> curveNumber 14
count=3 (ID=40) —> curveNumber 15
count=3 (ID=9) —> curveNumber 16
count=4 (ID=17) —> curveNumber 17
count=4 (ID=20) —> curveNumber 18
count=5 (ID=28) —> curveNumber 19 
count=5 (ID=8) —> curveNumber 20
count=6 (ID=21) —> curveNumber 21 
count=8 (ID=27) —> curveNumber 22 
count=9 (ID=22) —> curveNumber 23 
count=11 (ID=16)—> curveNumber 24
count=14 (ID=15)—> curveNumber 25

以下是我针对此问题的尝试性解决方案.我很确定它可以用于此 this 数据集,但是可以肯定的是,我计划在更多数据集上对其进行测试.

app.R

library(shiny)
library(plotly)
library(data.table)
library(GGally)
library(reshape2)
library(hexbin)

ui <- fluidPage(
  plotlyOutput("plot"),
  verbatimTextOutput("click")
)

server <- function(input, output, session) {

  # Curve number to ID
  cnToID <- function(h){
    df <- data.frame(table(h@cID))
    colnames(df) <- c("ID","count")
    cnID <- df[order(df$count,as.character(df$ID)),]
    cnID$curveNumber <- seq(0, nrow(cnID)-1)
    return(cnID)
  }

  # Create data
  set.seed(1)
  bindata <- data.frame(x=rnorm(100), y=rnorm(100))
  h <- hexbin (bindata, xbins = 5, IDs = TRUE, xbnds = range (bindata$x), ybnds = range (bindata$y))
  hexdf <- data.frame (hcell2xy (h),  ID = h@cell, counts = h@count)
  p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, ID=ID)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = counts), ID=ID) + geom_hex(stat="identity")
  cnID <- cnToID(h)

  output$plot <- renderPlotly({
    p2 <- ggplotly(p)
    for (i in 1:nrow(hexdf)){
      p2$x$data[[i]]$text <- gsub("<.*$", "", p2$x$data[[i]]$text)
    }
    p2
  })

  d <- reactive(event_data("plotly_click"))

  output$click <- renderPrint({
    if (is.null(d())){
      "Click on a state to view event data"
    }
    else{
      clickID <- as.numeric(as.character(cnID[which(cnID$curveNumber==d()$curveNumber),]$ID))
      clickID
      bindata[which(h@cID==clickID),]
    }
  })
}

shinyApp(ui, server)

解决方案

简化了您的问题,我可以为您提供部分答案.下面的代码使您可以单击合并的数据(绘制为正方形)并获取原始数据.

x的形式返回有关点击事件的信息, ycurveNumberpointNumber. curveNumber为跟踪建立索引,但这似乎取决于调用图的方式而有所不同. pointNumber似乎根据数据的顺序建立索引(并且也链接到curveNumber).如果仅绘制一组点,则映射到原始数据相对简单.

以下解决方案适用于点,因为它使用pointNumber(xy可能是更好的查找组合,因为它们是绝对值而不是相对顺序).该解决方案不适用于您最初要求的geom_hex六角形,因为单击鼠标仅返回curveNumber.看起来六边形是首先通过计数添加的,然后是通过其他排序变量添加的.如果要使用geom_hex,则解决curveNumber编号背后的原理是关键.

下面是两个屏幕抓图:左=带有geom_hex的原始图.右=用geom_point使用pointNumber正确地索引结果的修改图.

修改后的代码如下.我和OP都从此答案中获取了很多关于hexbins的信息.

library(shiny); library(plotly); library(GGally); library(reshape2); library(hexbin)

ui <- fluidPage(
  plotlyOutput("plot"),
  checkboxInput("squarePoints", label = "Switch to points?"),
  verbatimTextOutput("click"),
  HTML("Check the work:"),
  plotlyOutput("plot1")
)

server <- function(input, output, session) {
  #Create data
  set.seed(1)
  bindata <- data.frame(myIndex = factor(paste0("ID",1:100)), 
                        x=rnorm(100), y=rnorm(100))

  h <- hexbin (bindata[,2:3], xbins = 5, IDs = TRUE, 
               xbnds = range(bindata$x), ybnds = range(bindata$y))

  # As we have the cell IDs, we can merge this data.frame with the proper coordinates
  hexdf <- data.frame (hcell2xy (h),  ID = h@cell, counts = h@count)

  #New code added below ###
  counts <- hexTapply(h, bindata$myIndex, table)  #list of 26
  counts <- t(simplify2array (counts))
  counts <- melt (counts)                 #2600 rows = 26 hexagons * 100 observations
  colnames (counts)  <- c ("ID", "myIndex", "present")

  allhex <- merge (counts, hexdf)         #2600 rows = 26 hexagons * 100 observations
  #rename hex coordinates
  names(allhex)[names(allhex) %in% c("x", "y")] <- c("hex.x", "hex.y")  
  allhex <- merge(allhex, bindata)
  somehex <- allhex[allhex$present > 0,]  #100 rows (original data)

  #Plotly graphs objects in a certain order, so sort the lookup data by the same order 
  #in which it's plotted.
  #No idea how curveNumber plots data. First by counts, then by ...?
  #pointNumber seems more straightforward. 
  sorthex <- hexdf[with(hexdf, order(ID)), ]

  #Create a switch to change between geom_hex() and geom_point()
  switchPoints <- reactive(if(input$squarePoints) {
    geom_point(shape = 22, size = 10)
    } else {  
      geom_hex(stat = "identity")
      })

  hexdf$myIndex <- "na" #Added here for second plotly
  ### New code added above ###

  p <- reactive(ggplot(hexdf, aes(x=x, y=y, fill = counts))  + coord_equal() +
                switchPoints() )

  output$plot <- renderPlotly({
    ggplotly(p())
  })

  d <- reactive(event_data("plotly_click"))
  #pointNumber = index starting from 0
  hexID <- reactive(sorthex[d()$pointNumber + 1, "ID"]) 

  output$click <- renderPrint({
    if (is.null(d())){
      "Click on a state to view event data"
    }
    else{
      list(
      str(d()),
      somehex[somehex$ID == hexID(),]
      )
    }
  })

  #Check your work: plot raw data over hexagons
  p.check <- ggplot(hexdf, aes(x=x, y=y, fill = counts)) + geom_hex(stat="identity") +
    geom_point(data = somehex, aes(x=x, y=y)) + coord_equal()

  output$plot1 <- renderPlotly({
    ggplotly(p.check + aes(label= myIndex) )
  })


}

shinyApp(ui, server)

I am attempting to create an interactive plot of hexbins where a user can click on a given hexbin and receive a list of all observations of the original data frame that were grouped in that clicked hexbin.

Below is a MWE that seems pretty close to my goal. I am using Shiny, hexbin(), and ggplotly.

app.R

library(shiny)
library(plotly)
library(data.table)
library(GGally)
library(reshape2)
library(hexbin)

ui <- fluidPage(
  plotlyOutput("plot"),
  verbatimTextOutput("click")
)

server <- function(input, output, session) {
  #Create data
  set.seed(1)
  bindata <- data.frame(x=rnorm(100), y=rnorm(100))

  h <- hexbin (bindata, xbins = 5, IDs = TRUE, xbnds = range (bindata$x), ybnds = range (bindata$y))

  # As we have the cell IDs, we can merge this data.frame with the proper coordinates
  hexdf <- data.frame (hcell2xy (h),  ID = h@cell, counts = h@count)

  # I have tried different methods of generating the ggplot object
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = counts)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = ID)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, colours = ID)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, colours = ID, aes(x=x, y=y, colours = ID, fill = counts)) + geom_hex(stat="identity")
  p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, ID=ID)) + geom_hex(stat="identity")

  output$plot <- renderPlotly({
    ggplotly(p)
  })

  d <- reactive(event_data("plotly_click"))

  output$click <- renderPrint({
    if (is.null(d())){
      "Click on a state to view event data"
    }
    else{
      str(d())
      #Next line would deliver all observations from original data frame (bindata) that are in the clicked hexbin... if d() from event_data() was returning ID instead of curveNumber
      #bindata[which(h@cID==d()$curveNumber),]
    }
  })
}

shinyApp(ui, server)

Inside the h@cID object is the ID for all data points (showing which data point goes into which hexbin). Hence, I feel that if I am able to get event_data() to return the hexbin ID when a user clicks, then I should be able to successfully map that hexbin ID back to the h@cID object to obtain the corresponding data points.

Unfortunately, the way I have it written currently, event_data() will return "curveNumber" which does not seem to equal ID. It also does not seem to translate into ID (even when using all the information in the h object - not just h@cID, but also more such as h@xcm, h@ycm, etc.)

Is there any method anyone is aware of to solve this type of problem? Any ideas would be appreciated!

Note: My two most recent posts (including a bounty) are very similar to this question. They are located here (Interactive selection in ggplotly with geom_hex() scatterplot) and (Obtain observations in geom_hex using plotly and Shiny). The difference is that I have been making the problem more simple each step. Thank you.

Edit - Possible Answer

I think I may have obtained the solution for this problem. Like what @oshun noticed, there is some hidden conversion between the curveNumber returned from event_data() and the hexbin ID. It seems that curveNumbers are first sorted from smallest to largest by increasing count of the hexbins. Then, within a given count, it seems that curverNumber is further sorted from smallest to largest by increasing ID. However, the ID is sorted by character (not number). For instance, the number 18 would be considered smaller than the number 2 because 18 starts with the digit 1 which is smaller than the digit 2.

You can see this pattern when the full dataset in this example is represented with count, ID, and curveNumber below:

count=1 (ID=24) —> curveNumber 0
count=1 (ID=26) —> curveNumber 1
count=1 (ID=34) —> curveNumber 2
count=1 (ID=5) —> curveNumber 3
count=1 (ID=7) —> curveNumber 4
count=2 (ID=11) —> curveNumber 5
count=2 (ID=14) —> curveNumber 6
count=2 (ID=19) —> curveNumber 7
count=2 (ID=23) —> curveNumber 8
count=2 (ID=3) —> curveNumber 9
count=2 (ID=32) —> curveNumber 10
count=2 (ID=4) —> curveNumber 11
count=3 (ID=10) —> curveNumber 12
count=3 (ID=13) —> curveNumber 13
count=3 (ID=33) —> curveNumber 14
count=3 (ID=40) —> curveNumber 15
count=3 (ID=9) —> curveNumber 16
count=4 (ID=17) —> curveNumber 17
count=4 (ID=20) —> curveNumber 18
count=5 (ID=28) —> curveNumber 19 
count=5 (ID=8) —> curveNumber 20
count=6 (ID=21) —> curveNumber 21 
count=8 (ID=27) —> curveNumber 22 
count=9 (ID=22) —> curveNumber 23 
count=11 (ID=16)—> curveNumber 24
count=14 (ID=15)—> curveNumber 25

Below is my tentative solution for this problem. I am pretty sure it works for this this dataset, but I plan to test it on more datasets to be sure.

app.R

library(shiny)
library(plotly)
library(data.table)
library(GGally)
library(reshape2)
library(hexbin)

ui <- fluidPage(
  plotlyOutput("plot"),
  verbatimTextOutput("click")
)

server <- function(input, output, session) {

  # Curve number to ID
  cnToID <- function(h){
    df <- data.frame(table(h@cID))
    colnames(df) <- c("ID","count")
    cnID <- df[order(df$count,as.character(df$ID)),]
    cnID$curveNumber <- seq(0, nrow(cnID)-1)
    return(cnID)
  }

  # Create data
  set.seed(1)
  bindata <- data.frame(x=rnorm(100), y=rnorm(100))
  h <- hexbin (bindata, xbins = 5, IDs = TRUE, xbnds = range (bindata$x), ybnds = range (bindata$y))
  hexdf <- data.frame (hcell2xy (h),  ID = h@cell, counts = h@count)
  p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, ID=ID)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = counts), ID=ID) + geom_hex(stat="identity")
  cnID <- cnToID(h)

  output$plot <- renderPlotly({
    p2 <- ggplotly(p)
    for (i in 1:nrow(hexdf)){
      p2$x$data[[i]]$text <- gsub("<.*$", "", p2$x$data[[i]]$text)
    }
    p2
  })

  d <- reactive(event_data("plotly_click"))

  output$click <- renderPrint({
    if (is.null(d())){
      "Click on a state to view event data"
    }
    else{
      clickID <- as.numeric(as.character(cnID[which(cnID$curveNumber==d()$curveNumber),]$ID))
      clickID
      bindata[which(h@cID==clickID),]
    }
  })
}

shinyApp(ui, server)

Edit 2:

解决方案

Simplified your questions enough that I can give you a partial answer. The code below allows you to click on binned data (plotted as squares) and get the original data.

Plotly returns information on click events in the form of x, y, curveNumber and pointNumber. curveNumber indexes the trace, but this seems to vary depending on how plotly is called. pointNumber appears to index according to the order of the data (and it is also linked to curveNumber). If only one group of points is plotted, this is relatively straightforward to map to the original data.

The solution below works with points because it uses pointNumber (x and y is probably a better lookup combo because these are absolute values instead of a relative order). The solution does not work with geom_hex hexagons as you originally requested because only curveNumber is returned with a mouse-click. It looks hexagons are added first by count then by some other sorting variable. Solving the rationale behind the curveNumber numbering is the key if you want to use geom_hex.

Below are two screengrabs: Left = Original plot with geom_hex. Right = Modified plot with geom_point using pointNumber to correctly index results.

Modified code is below. Both the OP and I borrow heavily from this answer about hexbins.

library(shiny); library(plotly); library(GGally); library(reshape2); library(hexbin)

ui <- fluidPage(
  plotlyOutput("plot"),
  checkboxInput("squarePoints", label = "Switch to points?"),
  verbatimTextOutput("click"),
  HTML("Check the work:"),
  plotlyOutput("plot1")
)

server <- function(input, output, session) {
  #Create data
  set.seed(1)
  bindata <- data.frame(myIndex = factor(paste0("ID",1:100)), 
                        x=rnorm(100), y=rnorm(100))

  h <- hexbin (bindata[,2:3], xbins = 5, IDs = TRUE, 
               xbnds = range(bindata$x), ybnds = range(bindata$y))

  # As we have the cell IDs, we can merge this data.frame with the proper coordinates
  hexdf <- data.frame (hcell2xy (h),  ID = h@cell, counts = h@count)

  #New code added below ###
  counts <- hexTapply(h, bindata$myIndex, table)  #list of 26
  counts <- t(simplify2array (counts))
  counts <- melt (counts)                 #2600 rows = 26 hexagons * 100 observations
  colnames (counts)  <- c ("ID", "myIndex", "present")

  allhex <- merge (counts, hexdf)         #2600 rows = 26 hexagons * 100 observations
  #rename hex coordinates
  names(allhex)[names(allhex) %in% c("x", "y")] <- c("hex.x", "hex.y")  
  allhex <- merge(allhex, bindata)
  somehex <- allhex[allhex$present > 0,]  #100 rows (original data)

  #Plotly graphs objects in a certain order, so sort the lookup data by the same order 
  #in which it's plotted.
  #No idea how curveNumber plots data. First by counts, then by ...?
  #pointNumber seems more straightforward. 
  sorthex <- hexdf[with(hexdf, order(ID)), ]

  #Create a switch to change between geom_hex() and geom_point()
  switchPoints <- reactive(if(input$squarePoints) {
    geom_point(shape = 22, size = 10)
    } else {  
      geom_hex(stat = "identity")
      })

  hexdf$myIndex <- "na" #Added here for second plotly
  ### New code added above ###

  p <- reactive(ggplot(hexdf, aes(x=x, y=y, fill = counts))  + coord_equal() +
                switchPoints() )

  output$plot <- renderPlotly({
    ggplotly(p())
  })

  d <- reactive(event_data("plotly_click"))
  #pointNumber = index starting from 0
  hexID <- reactive(sorthex[d()$pointNumber + 1, "ID"]) 

  output$click <- renderPrint({
    if (is.null(d())){
      "Click on a state to view event data"
    }
    else{
      list(
      str(d()),
      somehex[somehex$ID == hexID(),]
      )
    }
  })

  #Check your work: plot raw data over hexagons
  p.check <- ggplot(hexdf, aes(x=x, y=y, fill = counts)) + geom_hex(stat="identity") +
    geom_point(data = somehex, aes(x=x, y=y)) + coord_equal()

  output$plot1 <- renderPlotly({
    ggplotly(p.check + aes(label= myIndex) )
  })


}

shinyApp(ui, server)

这篇关于在单个图中获得geom_hex中的观测值(发光)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆