R中具有动态条件的子集数据 [英] Subset data with dynamic conditions in R

查看:78
本文介绍了R中具有动态条件的子集数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个2500行的数据集,它们都是银行贷款.每笔银行贷款都有未偿还的金额和抵押类型. (房地产,机床等)

I have a dataset of 2500 rows which are all bank loans. Each bank loan has an outstanding amount and collateral type. (Real estate, Machine tools.. etc)

我需要从该数据集中抽取一个随机选择,例如,未偿还金额的总和= 250万+ -5%,并且同一资产类别的最大25%贷款.

I need to draw a random selection out of this dataset where for example the sum of outstanding amount = 2.5Million +-5% and maximum 25% loans with the same asset class.

我发现函数很乐观,但这需要一个函数,并且看起来是为了优化股票投资组合而构建的,这要复杂得多.我会说有一个简单的方法可以实现这一目标吗?

I found the function optim, but this asks for a function and looks to be constructed for optimization a portfolio of stocks, which is much more complex. I would say that there is an easy way of achieving this?

我创建了一个样本数据集,可以更好地说明我的问题:

I created a sample data set which could illustrate my question better:

dataset <- data.frame(balance=c(25000,50000,35000,40000,65000,10000,5000,2000,2500,5000)
                      ,Collateral=c("Real estate","Aeroplanes","Machine tools","Auto Vehicles","Real estate",
                                    "Machine tools","Office equipment","Machine tools","Real estate","Auto Vehicles"))

例如,如果我想要从该数据集中提取5笔贷款,则其未偿还余额之和= 200.000(保证金为10%)且不超过40%允许为同一抵押类型. (因此,在此示例中,最多5分之2)

If I want for example 5 loans out of this dataset which sum of outstanding balance = 200.000 (with 10% margin) and not more than 40% is allowed to be the same collateral type. (so maximum 2 out of 5 in this example)

如果需要其他信息,请告诉我. 非常感谢, 蒂姆

Please let me know if additional information is necessary. Many thanks, Tim

推荐答案

此功能有效:

pick_records <- function(df,size,bal,collat,max.it) {
  i <- 1
  j <- 1
  while ( i == 1 ) {
    s_index <- sample(1:nrow(df) , size)
    print(s_index)
    output <- df[s_index,]
    out_num <- lapply(output,as.numeric)
    tot.col <- sum(as.numeric(out_num$Collateral))
    if (sum(out_num$balance) < (bal*1.1) &
          sum(out_num$balance) > (bal*0.9) &
          all(  table(out_num$Collateral)/size  <= collat)   ) {
      return(output)
      break
    }
    print(j)
    j <- j + 1
    if ( j == max.it+1) {
      print('No solution found')
      break}     
  }
} 

> a <- pick_records(dataset,5,200000,0.4,20)
> a
  balance       Collateral
3   35000    Machine tools
7    5000 Office equipment
4   40000    Auto Vehicles
5   65000      Real estate
2   50000       Aeroplanes

其中df是您的数据帧,size是所需的记录数,并且max.it在返回no solution found错误之前找到解决方案的最大迭代数,bal是余额限制和collat对于抵押品相同.您可以根据需要更改它们.

Where df is your dataframe, size is the number of records you want and max.it the number of maximum iterations to find a solution before returning a no solution found error, bal is the limit for balance and collat the same for Collateral. You can change those as you please.

如果您没有得到任何帮助,请告诉我.

Let me know if you don't get any part of it.

这篇关于R中具有动态条件的子集数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆