属性的损失,尽管尝试preserve他们 [英] Loss of attributes despite attempts to preserve them

查看:175
本文介绍了属性的损失,尽管尝试preserve他们的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我与问题做出总是重建的Makefile 目标(的make总是重建Makefile定义的详情)及其调查发现的另一个问题,这是这个问题的主题。在对象的属性的过程中数据转换操作的亏损以下研究 code结果的重复执行。

The problem that I have with make always rebuilding Makefile targets (make always rebuilds Makefile targets) and its investigation uncovered another issue, which is the subject of this question. Repeated execution of the following R code results in a loss of objects' attributes during data transformation operations.

有关的记录,我不得不说,我已经写了这个问题(<一个href=\"http://stackoverflow.com/questions/23841387/approaches-to-$p$pserving-objects-attributes-during-extract-replace-operations\">Approaches要提取时preserving对象的属性/替换操作的),但问题和答案都比较一般(我是不正确的简单储蓄属性的作品 - 它的工作对我来说,写作的,因为当时我没有执行操作,可能为对象的属性是危险的)。

For the record, I have to say that I've already written on this subject (Approaches to preserving object's attributes during extract/replace operations), but that question and answer were more general (and I was incorrect that simple saving attributes works - it worked for me as of that writing, because at the time I haven't been performing operations, potentially dangerous for objects' attributes).

从我的R code,在那里我遇到的属性丢失以下是节选。

The following are excerpts from my R code, where I'm experiencing loss of attributes.

##### GENERIC TRANSFORMATION FUNCTION #####

transformResult <- function (dataSource, indicator, handler) {

  fileDigest <- base64(indicator)
  rdataFile <- paste0(CACHE_DIR, "/", dataSource, "/",
                      fileDigest, RDS_EXT)
  if (file.exists(rdataFile)) {
    data <- readRDS(rdataFile)

    # Preserve user-defined attributes for data frame's columns
    # via defining new class 'avector' (see code below)). Also,
    # preserve attributes (comments) for the data frame itself.
    data2 <- data.frame(lapply(data, function(x) 
      { structure(x, class = c("avector", class(x))) } ))
    #mostattributes(data2) <- attributes(data)
    attributes(data2) <- attributes(data)

    result <- do.call(handler, list(indicator, data2))
    saveRDS(result, rdataFile)
    rm(result)
  }
  else {
    error("RDS file for \'", indicator, "\' not found! Run 'make' first.")
  }
}


## Preserve object's special attributes:
## use a class with a "as.data.frame" and "[" method

as.data.frame.avector <- as.data.frame.vector

`[.avector` <- function (x, ...) {
  #attr <- attributes(x)
  r <- NextMethod("[")
  mostattributes(r) <- attributes(x)
  #attributes(r) <- attr
  return (r)
}

##### HANDLER FUNCTION DEFINITIONS #####

projectAge <- function (indicator, data) {

  # do not process, if target column already exists
  if ("Project Age" %in% names(data)) {
    message("Project Age: ", appendLF = FALSE)
    message("Not processing - Transformation already performed!\n")
    return (invisible())
  }

  transformColumn <- as.numeric(unlist(data["Registration Time"]))
  regTime <- as.POSIXct(transformColumn, origin="1970-01-01")
  prjAge <- difftime(Sys.Date(), as.Date(regTime), units = "weeks")
  data[["Project Age"]] <- as.numeric(round(prjAge)) / 4 # in months

  # now we can delete the source column
  if ("Registration Time" %in% names(data))
    data <- data[setdiff(names(data), "Registration Time")]

    if (DEBUG2) {print(summary(data)); print("")}

  return (data)
}


projectLicense <- function (indicator, data) {

  # do not process, if target column (type) already exists
  if (is.factor(data[["Project License"]])) {
    message("Project License: ", appendLF = FALSE)
    message("Not processing - Transformation already performed!\n")
    return (invisible())
  }

  data[["Project License"]] <- 
    factor(data[["Project License"]],
           levels = c('gpl', 'lgpl', 'bsd', 'other',
                      'artistic', 'public', '(Other)'),
           labels = c('GPL', 'LGPL', 'BSD', 'Other',
                      'Artistic', 'Public', 'Unknown'))

  if (DEBUG2) {print(summary(data)); print("")}

  return (data)
}


devTeamSize <- function (indicator, data) {

  var <- data[["Development Team Size"]]

  # convert data type from 'character' to 'numeric' 
  if (!is.numeric(var)) {
    data[["Development Team Size"]] <- as.numeric(var)
  }

  if (DEBUG2) {print(summary(data)); print("")}

  return (data)
}


##### MAIN #####

# construct list of indicators & corresponding transform. functions
indicators <- c("prjAge", "prjLicense", "devTeamSize")
transforms <- list(projectAge, projectLicense, devTeamSize)

# sequentially call all previously defined transformation functions
lapply(seq_along(indicators),
       function(i) {
         transformResult("SourceForge",
                         indicators[[i]], transforms[[i]])
         })

这code,名字项目时代和工程许可证的第二次运行以及数据帧的其他用户定义的属性后数据2 都将丢失。

After the second run of this code, names "Project Age" and "Project License" as well as other user-defined attributes of the data frame data2 are lost.

我的问题这里是多方面的:

1)什么语句在我的code可能导致的属性,为什么损失;

1) what statements in my code could lead to loss of attributes AND WHY;

2)什么是在正确 code(的行mostattributes&LT; - 属性属性 - LT ; - 在 transformResult()属性/ ATTR avector 类的定义和原因;

2) what is the correct line of code (mostattributes <- attributes or attributes <- attributes/attr) in transformResult() and avector class definition AND WHY;

3)是语句 as.data.frame.avector&LT; - as.data.frame.vector 真正需要的,如果我添加类属性 avector 来一个数据帧对象,并在一般情况下,一个PFER通用的解决方案的 $ p $ (不仅适用于数据帧);为什么或为什么不。

3) is the statement as.data.frame.avector <- as.data.frame.vector really needed, if I add class attribute avector to a data frame object and, in general, prefer a generic solution (applicable not only to data frames); WHY OR WHY NOT.

4)通过 ATTR 保存类定义不工作时,出现以下错误:

4) saving via attr in class definition doesn't work, it fails with the following error:

Error in attributes(r) <- attr :
  'names' attribute [5] must be the same length as the vector [3]
Calls: lapply ... summary.data.frame -> lapply -> FUN -> summary.default -> [ -> [.avector

所以,我不得不回到使用 mostattributes()。是否确定?

==========

==========

我看了关于这一主题的以下内容:

I have read the following on the subject:


  1. SO问题:<一href=\"http://stackoverflow.com/questions/10404224/how-to-delete-a-row-from-a-data-frame-without-losing-the-attributes\">How删除从data.frame行不失属性(我喜欢班·巴恩斯的解决方案,但它不同于一个接格罗腾迪克的Gabor和马克施瓦茨提出了一点 - 见下文);

  1. SO question: How to delete a row from a data.frame without losing the attributes (I like the solution by Ben Barns, but it differs a bit from the one suggested by Gabor Grothendieck and Marc Schwartz - see below);

SO问题:索引操作删除属性(而解决方案是清晰的,我preFER之一,基于类定义/子类/);

SO question: indexing operation removes attributes (while the solution is legible, I prefer one, based on class definition /sub-classing?/);

亨氏Tuechler提出了一个通用的解决方案( HTTPS ?://stat.ethz.ch/pipermail/r-help/2006-July/109148.html ) - 我需要这一点;

A generic solution suggested by Heinz Tuechler (https://stat.ethz.ch/pipermail/r-help/2006-July/109148.html) - Do I need this?;

由布赖恩·里普利的解释(的http://r.789695.n4.nabble.com/Losing-attributes-in-data-frame-PR-10873-tp919265p919266.html) - 我发现它有点混乱;

An explanation by Brian Ripley (http://r.789695.n4.nabble.com/Losing-attributes-in-data-frame-PR-10873-tp919265p919266.html) - I found it somewhat confusing;

由格罗腾迪克的Gabor( HTTPS提出了一个解决方案//stat.ethz.ch/pipermail/r-help/2006-May/106308.html );

A solution suggested by Gabor Grothendieck (https://stat.ethz.ch/pipermail/r-help/2006-May/106308.html);

马克·施瓦茨(的格罗腾迪克的Gabor公司的解决方案的说明的 https://stat.ethz.ch/pipermail/r-help/2006-May/106351.html ) - 很不错的解释;

An explanation of Gabor Grothendieck's solution by Marc Schwartz (https://stat.ethz.ch/pipermail/r-help/2006-May/106351.html) - very nice explanation;

第28年8月1日和R地狱一书(www.burns-stat.com/pages/Tutor/R_inferno.pdf)的29年1月8日 - 我已经尝试使用他的建议 storage.mode(),但并不能真正解决问题,通过存储强迫不影响<$ C $对象的C>类(更不用提,它不包括不是强迫属性结算操作,如子集和索引等;

Sections 8.1.28 and 8.1.29 of the "R Inferno" book (www.burns-stat.com/pages/Tutor/R_inferno.pdf) - I've tried his suggestions of using storage.mode(), but doesn't really solve the problem, as coercing via storage doesn't affect class of an object (not to mention that it doesn't cover other than coercion attribute-clearing operations, such as subsetting and indexing;

http://adv-r.had.co .NZ /数据structures.html#属性;

的http://统计.ethz.ch / R-手动/ R-的devel /库/基/ HTML / attributes.html ;

http://cran.r-project.org/doc/manuals/r-devel/R-lang.html#Copying-of-attributes.

P.S。我相信这个问题是普遍性问题,所以我没有在这个时候提供了一个可重复的例子。我希望,它可能回答这个没有这样的例子,但是,如果没有,请让我知道。

P.S. I believe that this question is of general nature, so I haven't provided a reproducible example at this time. I hope that it's possible to answer this without such example, but, if not, please let me know.

推荐答案

我回答我的问题 - 嗯,现在,只有部分:

I'm answering my own question - well, for now, only partially:

1)在更激烈的调查,经过一番code更新,看来属性实际上不被丢失(仍然试图找出什么样的变化引起的预期行为 - 将报告更高版本)。

1) Under more intense investigation and after some code updates, it appears that attributes in fact are NOT being lost (still trying to figure out what changes caused the expected behavior - will report later).

2)本人的想通了的原因间歇性输出和改造后丢失所有缓存数据的,如下所示。在多个后续的code的运行时,每个第二次运行变换(处理)功能 projectAge() projectLicense() devTeamSize())返回NULL,因为改造已经完成:

2) I have figured out the reason of intermittent output and losing all cache data after the transformation, as follows. During multiple subsequent runs of the code, the second run of each transformation (handler) function (projectAge(), projectLicense() and devTeamSize()) returns NULL, since the transformation has already been done:

if (<condition>) {
  ...
  message("Not processing - Transformation already performed!\n")
  return (invisible()) # <= returns NULL
}

返回NULL,则渐渐传递到 saveRDS(),从而导致缓存数据丢失。

The returned NULL then was getting passed to saveRDS(), thus, causing the loss of cache data.

我解决了这个问题通过简单的验证结果保存改造对象之前:

I fixed this problem by simple validation of result before saving the transformed object:

# the next line is problematic due to wrong assumption of always having full data returned
result <- do.call(handler, list(indicator, data2))
if (!is.null(result)) saveRDS(result, rdataFile) # <= fixed by validating incoming data

就这样,到目前为止,感谢您的阅读!直到所有问题得到澄清,我会更新这个答案。

这篇关于属性的损失,尽管尝试preserve他们的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆