使用R将多个字符串ID转换为多级分析 [英] converting string IDs into numbers in a multilevel analysis using R

查看:184
本文介绍了使用R将多个字符串ID转换为多级分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据集,一个用于学生级数据,另一个用于类级数据。学生和班级ID生成为字符串值,如:



学生数据集:



学生证 - 141PSDM2L,1420CHY1L,1JNLV36HH,1MNSBXUST,2K7EVS7X6,2N2SC26HL,...



类ID - > XK37HDN,XK37HDN,XK37HDN,3K3EH77,3K3EH77,2K36HN6,...



类级数据集:



类ID - > XK37HDN,3K3EH77,2K36HN6,3K3LHSH,3K3LHSY,DK3EH14,DK3EH1H,DK3EH1K,...



在学生数据集中,每个类别ID重复等于课程中的学生人数,但在课程级数据集中,我们只为每个类别提供一个代码。



如何将这些ID转换为整数?考虑到学生和班级ID.IN其他单词,我想要ID如下(或类似的东西):



学生数据集:



学生证 - > 1,2,3,4,5,6,...



class ID - > 1,1,1,2,2,3,...



类级数据集:



类ID - > 1,2,3,4,5,6,7,8,。 ..



学生级数据转换并不困难。当我想要转换类级数据时出现问题。由于学生数据集中的类ID重复,所以类ID取值为1到1533,但是在类级别数据中进行相同的转换方法会产生1到896之间的值,所以我不知道,例如, 45级的学生级别数据在类级数据集中具有类别45的地位。

解决方案

您可以通过创建因素从每个id向量,并将级别更改为数值:

  classIDs<  -  as.factor(classIDs)
levels(classIDs)< - 1:length(levels(classIDs))

这将用数字值替换每个唯一的 classIDs



编辑:多个表中的ClassID:
根据下面的评论,还有 classID 在学生表中。这需要一个稍微复杂的解决方案。

 #对变量名称的一些假设:
#classes:具有所有类数据的data.frame。将classID作为列。
#students:具有学生类配对的data.frame。有两个classID和
#studentIDs作为列

#首先我们得到所有唯一的类的列表:
allClasses< - unique(c(unique(classes $ classIDs) ,$($)

#现在一个命名向量将类映射到数值:
numMap< - 1:length(allClasses)
names(numMap) < - allClasses

#现在我们可以使用numMap重新分配数值
classes $ classIDs< - numMap [classes $ classIDs]
students $ classIDs< - numMap [学生$ classIDs]

#清理
rm(allClasses)

studentIDs 仍然可以用上面的因子方法替换。


I have two data sets, one for student level data and another one for class level data. Student and class level IDs are generated as string values like:

Student data set:

student ID ->141PSDM2L,1420CHY1L,1JNLV36HH,1MNSBXUST,2K7EVS7X6,2N2SC26HL,...

class ID ->XK37HDN,XK37HDN,XK37HDN,3K3EH77,3K3EH77,2K36HN6,...

class level data set:

class ID ->XK37HDN,3K3EH77,2K36HN6,3K3LHSH,3K3LHSY,DK3EH14,DK3EH1H,DK3EH1K,...

In student data set,each class ID is repeated equal to the number of students in the class but in class level data set we only have one code for each class.

How can I convert those ID into integers? considering both student and class level ID.IN other words, I want to have IDs as below (or something similar):

Student data set:

student ID ->1,2,3,4,5,6,...

class ID ->1,1,1,2,2,3,...

class level data set:

class ID ->1,2,3,4,5,6,7,8,...

Conversion on student level data is not difficult. The problem arises when I want to convert class level data. Because of the repetition of class IDs in student data set, class IDs take values from 1 to 1533 but doing the same conversion method in class level data produces values from 1 to 896 so I don't know if , for example,class ID of 45 in student level data has the position as class ID 45 in class level data set.

解决方案

You can do this by creating factors from each of the id vectors, and changing the levels to numeric values:

classIDs <- as.factor(classIDs)
levels(classIDs) <- 1:length(levels(classIDs))

This will replace each of the unique classIDs strings with a numeric value.

Edit: ClassIDs in multiple tables: Based on the comment below, there are also classIDs in the student table. This requires a slightly more complicated solution.

# Some assumptions on variable names:
# classes: The data.frame with all of the class data. Has classIDs as a column.
# students: The data.frame with the student-class pairings. Has both classIDs and 
#           studentIDs as a column

# First we get a list of all unique classes:
allClasses <- unique(c(unique(classes$classIDs), unique(students$classIDs)))

# Now a named vector mapping classes to numeric values:
numMap <- 1:length(allClasses)
names(numMap) <- allClasses

# Now we can use numMap to reassign numeric values
classes$classIDs <- numMap[classes$classIDs]
students$classIDs <- numMap[students$classIDs]

# clean up
rm(allClasses)

studentIDs can still be replaced with the factor method above.

这篇关于使用R将多个字符串ID转换为多级分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆