使用R将多个字符串ID转换为多级分析 [英] converting string IDs into numbers in a multilevel analysis using R
问题描述
学生数据集:
学生证 - 141PSDM2L,1420CHY1L,1JNLV36HH,1MNSBXUST,2K7EVS7X6,2N2SC26HL,...
类ID - > XK37HDN,XK37HDN,XK37HDN,3K3EH77,3K3EH77,2K36HN6,...
类级数据集:
类ID - > XK37HDN,3K3EH77,2K36HN6,3K3LHSH,3K3LHSY,DK3EH14,DK3EH1H,DK3EH1K,...
在学生数据集中,每个类别ID重复等于课程中的学生人数,但在课程级数据集中,我们只为每个类别提供一个代码。
如何将这些ID转换为整数?考虑到学生和班级ID.IN其他单词,我想要ID如下(或类似的东西):
学生数据集:
学生证 - > 1,2,3,4,5,6,...
class ID - > 1,1,1,2,2,3,...
类级数据集:
类ID - > 1,2,3,4,5,6,7,8,。 ..
学生级数据转换并不困难。当我想要转换类级数据时出现问题。由于学生数据集中的类ID重复,所以类ID取值为1到1533,但是在类级别数据中进行相同的转换方法会产生1到896之间的值,所以我不知道,例如, 45级的学生级别数据在类级数据集中具有类别45的地位。
您可以通过创建因素
从每个id向量,并将级别更改为数值:
classIDs< - as.factor(classIDs)
levels(classIDs)< - 1:length(levels(classIDs))
这将用数字
值替换每个唯一的 classIDs
编辑:多个表中的ClassID:
根据下面的评论,还有 classID
在学生表中。这需要一个稍微复杂的解决方案。
#对变量名称的一些假设:
#classes:具有所有类数据的data.frame。将classID作为列。
#students:具有学生类配对的data.frame。有两个classID和
#studentIDs作为列
#首先我们得到所有唯一的类的列表:
allClasses< - unique(c(unique(classes $ classIDs) ,$($)
#现在一个命名向量将类映射到数值:
numMap< - 1:length(allClasses)
names(numMap) < - allClasses
#现在我们可以使用numMap重新分配数值
classes $ classIDs< - numMap [classes $ classIDs]
students $ classIDs< - numMap [学生$ classIDs]
#清理
rm(allClasses)
studentIDs
仍然可以用上面的因子方法替换。
I have two data sets, one for student level data and another one for class level data. Student and class level IDs are generated as string values like:
Student data set:
student ID ->141PSDM2L,1420CHY1L,1JNLV36HH,1MNSBXUST,2K7EVS7X6,2N2SC26HL,...
class ID ->XK37HDN,XK37HDN,XK37HDN,3K3EH77,3K3EH77,2K36HN6,...
class level data set:
class ID ->XK37HDN,3K3EH77,2K36HN6,3K3LHSH,3K3LHSY,DK3EH14,DK3EH1H,DK3EH1K,...
In student data set,each class ID is repeated equal to the number of students in the class but in class level data set we only have one code for each class.
How can I convert those ID into integers? considering both student and class level ID.IN other words, I want to have IDs as below (or something similar):
Student data set:
student ID ->1,2,3,4,5,6,...
class ID ->1,1,1,2,2,3,...
class level data set:
class ID ->1,2,3,4,5,6,7,8,...
Conversion on student level data is not difficult. The problem arises when I want to convert class level data. Because of the repetition of class IDs in student data set, class IDs take values from 1 to 1533 but doing the same conversion method in class level data produces values from 1 to 896 so I don't know if , for example,class ID of 45 in student level data has the position as class ID 45 in class level data set.
You can do this by creating factors
from each of the id vectors, and changing the levels to numeric values:
classIDs <- as.factor(classIDs)
levels(classIDs) <- 1:length(levels(classIDs))
This will replace each of the unique classIDs
strings with a numeric
value.
Edit: ClassIDs in multiple tables:
Based on the comment below, there are also classIDs
in the student table. This requires a slightly more complicated solution.
# Some assumptions on variable names:
# classes: The data.frame with all of the class data. Has classIDs as a column.
# students: The data.frame with the student-class pairings. Has both classIDs and
# studentIDs as a column
# First we get a list of all unique classes:
allClasses <- unique(c(unique(classes$classIDs), unique(students$classIDs)))
# Now a named vector mapping classes to numeric values:
numMap <- 1:length(allClasses)
names(numMap) <- allClasses
# Now we can use numMap to reassign numeric values
classes$classIDs <- numMap[classes$classIDs]
students$classIDs <- numMap[students$classIDs]
# clean up
rm(allClasses)
studentIDs
can still be replaced with the factor method above.
这篇关于使用R将多个字符串ID转换为多级分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!