在 Shiny 中从大数据图中高效渲染数据点 [英] Efficient rendering of data points from large data plot in Shiny

查看:67
本文介绍了在 Shiny 中从大数据图中高效渲染数据点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

实施一个闪亮的应用程序,以有效地可视化和调整上传的数据集.每个集合可能包含100000至200000行.数据调整完成后,可以下载调整后的数据.分步进行:

Implement a Shiny app to efficiently visualize and adjust uploaded data sets. Each set may contain 100000 to 200000 rows. After data adjustments are done, the adjusted data can be downloaded. In steps:

  1. 数据上传
  2. 数据选择和可视化
  3. 数据(点)删除
  4. 下载选项

问题

虽然该应用程序主要运行,但数据可视化和删除需要太多时间.

Issue

While the app works in principal, data visualization and removal take too much time.

生成了一些样本数据.数据可以上传到闪亮的应用程序.样本数据分布与我的实际数据不同.实际数据包含明显可识别的异常值,看起来像是带有峰的光谱.

Some sample data is generated. The data can be uploaded onto the shiny app. The sample data distribution is not similar to my actual data. The actual data contains clearly identifiable outliers and looks like a spectra with peaks.

a = sample(1:1e12, 1e5, replace=TRUE)
b = sample(1:1e12, 1e5, replace=TRUE)
dummy1 = data.frame(Frequency = a, Amplitude = a)
dummy2 = data.frame(Frequency = b, Amplitude = b)
dummy3 = data.frame(Frequency = a, Amplitude = b)
# Sample data
write.csv(dummy1,'dummy1.csv')
write.csv(dummy2,'dummy2.csv')
write.csv(dummy3,'dummy2.csv')

发光的应用程序

该应用将获取上传的数据并进行绘制.(可以将示例虚拟数据上传到应用程序.)可以删除部分数据点并可以下载新数据.

Shiny app

The app takes the uploaded data and plots it. (Sample dummy data can be uploaded onto the app.) Section of data points can be removed and the new data can be downloaded.

# Packages
library(shiny)
library(ggplot2)
library(data.table)
# UI
ui = fluidPage(
    fluidRow(selectInput("selection", "Set Selection:", choices = '', selected = '', multiple = TRUE)),
    fluidRow(plotOutput(outputId = "plot", brush = "plot_brush_"), 
             downloadButton('download',"Download the data"))
)

# Server
server = function(session, input, output){
    # Pop up for data upload
    query_modal = modalDialog(title = "Upload Spectrum",
                              fileInput("file", 
                              "file",
                              multiple = TRUE,
                              accept = c(".csv")),
                              easyClose = FALSE)
    showModal(query_modal)

    ## Upload
    mt1 = reactive({
       req(input$file)
       cs = list()
       for(nr in 1:length(input$file[ , 1])){
          c = read.csv(input$file[[nr, 'datapath']])
          cs[[nr]] = data.table(Frequency = as.numeric(c[[1]]), 
                                Amplitude = as.numeric(c[[2]]), 
                                Indicator = as.factor(nr))}
        c = do.call(rbind, cs)
        c = reactiveValues(data = c)
        return(c)})

    ## Input selection
    observeEvent(
      mt1(),
      updateSelectInput(
        session, 
        "selection", 
        "Set Selection:", 
        choices = levels(mt1()$data$Indicator), 
        selected = 'Entire'))
    
    ## Plot
    output$plot <- renderPlot({
      mt = mt1()$data
      mt = mt[mt$Indicator %in% input$selection,]
      p = ggplot(mt, aes(Frequency, Amplitude, color = Indicator)) 
      p + geom_point(show.legend = TRUE)})
    
    ## Download
    output$download = downloadHandler(
      filename = function(){paste(gsub('.{1}$', '', input$file$name[1]), 'manipulated', '.csv', sep= '')}, 
      content = function(fname){
        mt = mt1()$data
        mt = mt[, .SD, .SDcols= c('Frequency', 
                                  'Amplitude', 
                                  'Indicator')]
        write.csv(mt, fname, row.names = FALSE)})
    
    ## Adjust
    observe({
      d = mt$data
      keep = mt$data[!Indicator %in% input$selection]
      df = brushedPoints(d, brush = input$plot_brush_, allRows = TRUE) 
      df = df[selected_ == FALSE]
      df$selected_ = NULL
      mt$data = rbind(keep , df[Indicator %in% input$selection,  ])})
}

# Run app
shinyApp(ui = ui, server = server)

推荐答案

您可以在R和Shiny中使用 matplotlib Python绘图库以及 reticulate 包:

You could use matplotlib Python drawing library inside R and Shiny with the reticulate package :

  1. 设置软件包和库:

install.packages('reticulate')

# Install python environment
reticulate::install_miniconda() 
# if Python is already installed, you can specify the path with use_python(path)

# Install matplotlib library
reticulate::py_install('matplotlib')

  1. 测试安装:

library(reticulate)
mpl <- import("matplotlib")
mpl$use("Agg") # Stable non interactive backend
mpl$rcParams['agg.path.chunksize'] = 0 # Disable error check on too many points

plt <- import("matplotlib.pyplot")
np <- import("numpy")

# generate lines cloud
xx = np$random$randn(100000L)
yy = np$random$randn(100000L)

plt$figure()
plt$plot(xx,yy)
plt$savefig('test.png')
plt$close(plt$gcf())

test.png:

  1. 在Shiny中使用 matplotlib ,将1e5片段的绘制持续时间控制在2秒以下:
  1. Use matplotlib in Shiny, drawing duration below 2 seconds for 1e5 segments :

# Packages
library(shiny)
library(ggplot2)
library(data.table)
# UI
ui = fluidPage(
  fluidRow(selectInput("selection", "Set Selection:", choices = '', selected = '', multiple = TRUE)),
  fluidRow(imageOutput(outputId = "image"), 
           downloadButton('download',"Download the data"))
)

# Server
server = function(session, input, output){
  
  # Setup Python objects
  mpl <- reticulate::import("matplotlib")
  plt <- reticulate::import("matplotlib.pyplot")
  mpl$use("Agg") 
  mpl$rcParams['agg.path.chunksize'] = 0
  
  
  # Pop up for data upload
  query_modal = modalDialog(title = "Upload Spectrum",
                            fileInput("file", 
                                      "file",
                                      multiple = TRUE,
                                      accept = c(".csv")),
                            easyClose = FALSE)
  showModal(query_modal)
  
  ## Upload
  mt1 = reactive({
    req(input$file)
    cs = list()
    for(nr in 1:length(input$file[ , 1])){
      c = read.csv(input$file[[nr, 'datapath']])
      cs[[nr]] = data.table(Frequency = as.numeric(c[[1]]), 
                            Amplitude = as.numeric(c[[2]]), 
                            Indicator = as.factor(nr))}
    c = do.call(rbind, cs)
    c = reactiveValues(data = c)
    return(c)})
  
  ## Input selection
  observeEvent(
    mt1(),
    updateSelectInput(
      session, 
      "selection", 
      "Set Selection:", 
      choices = levels(mt1()$data$Indicator), 
      selected = 'Entire'))
  
  ## Render matplotlib image
  output$image <- renderImage({
    # Read myImage's width and height. These are reactive values, so this
    # expression will re-run whenever they change.
    width  <- session$clientData$output_image_width
    height <- session$clientData$output_image_height
    
    # For high-res displays, this will be greater than 1
    pixelratio <- session$clientData$pixelratio
    
    # A temp file to save the output.
    outfile <- tempfile(fileext='.png')
    
    # Generate the image file
    mt = mt1()$data
    mt = mt[mt$Indicator %in% input$selection,]
    xx = mt$Frequency
    yy = mt$Amplitude
    
    plt$figure()
    plt$plot(xx,yy)
    plt$savefig(outfile)
    plt$close(plt$gcf())
    
    # Return a list containing the filename
    list(src = outfile,
         width = width,
         height = height,
         alt = "This is alternate text")
  }, deleteFile = TRUE)
  
  ## Download
  output$download = downloadHandler(
    filename = function(){paste(gsub('.{1}$', '', input$file$name[1]), 'manipulated', '.csv', sep= '')}, 
    content = function(fname){
      mt = mt1()$data
      mt = mt[, .SD, .SDcols= c('Frequency', 
                                'Amplitude', 
                                'Indicator')]
      write.csv(mt, fname, row.names = FALSE)})
  
  ## Adjust
  observe({
    mt = mt1()
    df = brushedPoints(mt$data, brush = input$plot_brush_, allRows = TRUE) 
    mt$data = df[df$selected_ == FALSE,  ]})
}

# Run app
shinyApp(ui = ui, server = server)

您需要手动处理颜色,因为 matplotlib 不是 ggplot2

You'll need to handle color manually, because matplotlib isn't ggplot2

这篇关于在 Shiny 中从大数据图中高效渲染数据点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆