在R中使用具有多个条件的gsub函数 [英] Using gsub function with multiple criteria in R

查看:459
本文介绍了在R中使用具有多个条件的gsub函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

结束问题搜索

Folllow up on question Searching for unique values in dataframe and creating a table with them

这里是我的数据看起来像

Here is how my data looks like

    UUID    Source
1   Jane    http//mywebsite.com44bb00?utm_source=ADW&utm_medium=banner&utm_campaign=Monk&gclid1234
2   Mike    http//mywebsite.com44bb00?utm_source=Google&utm_medium=cpc&utm_campaign=DOG&gclid1234
3   John    http//mywebsite.com44bb00?utm_source=Yahoo&utm_medium=banner&utm_campaign=DOG&gclid1234
4   Sarah   http//mywebsite.com44bb00?utm_source=Facebookdw&utm_medium=cpc&utm_campaign=CAT&gclid1234
5   Michael http//mywebsite.com44bb00?utm_source=Twitter&utm_medium=GDNr&utm_campaign=CAT&gclid1234
6   Bob     http//mywebsite.com44bb00?utm_source=ADW&utm_medium=GDN&utm_campaign=DOG&gclid1234
7   Mark    http//mywebsite.com44bb00?utm_source=Twitter&utm_medium=banner&utm_campaign=MONK&gclid1234
8   Anna    http//mywebsite.com44bb00?utm_source=Facebook&utm_medium=banner&utm_campaign=MONK&gclid1234

这是我想要实现的输出。

And here is the desired output of what I am trying to achieve

    NAME    UTM_SOURCE  UTM_MEDIUM  UTM_CAMPAIGN
1   Jane    ADW             banner     Monk
2   Mike    Google          cpc        DOG
3   John    Yahoo           banner     DOG
4   Sarah   Faceboo         cpc        CAT
5   Michael Twitter         GDN        CAT
6   Bob     ADW             GDN        DOG
7   Mark    Twitter         banner     MONK
8   Anna    Facebook        banner     MONK

换句话说,我想要的是以基于标准获得特定的信息。示例:在数据帧中搜索值utmsource =,找到后,复制=和&之间的任何信息标志。在用户no1(Jame)的情况下,如果查看原始文件,她的源URL包含值utm_source = ADW。在输出文件中,ADW位被提取并插入到名为utm_source的新列中。所有其他用户和其他缩图(utm_medium& utm_campaign)都有相同的原则

So in other words what I want is to obtain a specific piece of information based on a criteria. Example: search in the dataframe for the value "utmsource=" and when found, copy whatever information is found between the "=" and "&" signs. In the case of User no1 (Jame) if you look in the original file, her Source URL contains the value "utm_source=ADW". In the output file, the "ADW" bit is extracted and imputed in a new column named "utm_source". Same principle for all other users and other dimmensions (utm_medium & utm_campaign)

我理解函数 gsub 帮我。这是我到目前为止所尝试的:

I understand that the function gsub can help me. Here is what I have tried so far:

> file1 <- read.csv("C:/Users/Dumitru Ostaciu/Desktop/Users.csv")
> file1 <- transform(file1, Source = as.character(Source))
> file2 <- gsub(".*\\?utm_source=", "", file1$Source)


$ b b

这是我得到的结果

And this is the result I got

  UUID  SOURCE
    1   ADW&utm_medium=banner&utm_campaign=Monk&gclid1234
    2   Google&utm_medium=cpc&utm_campaign=DOG&gclid1234
    3   Yahoo&utm_medium=banner&utm_campaign=DOG&gclid1234
    4   Facebookdw&utm_medium=cpc&utm_campaign=CAT&gclid1234
    5   Twitter&utm_medium=GDNr&utm_campaign=CAT&gclid1234
    6   ADW&utm_medium=GDN&utm_campaign=DOG&gclid1234
    7   Twitter&utm_medium=banner&utm_campaign=MONK&gclid1234
    8   Facebook&utm_medium=banner&utm_campaign=MONK&gclid1234   

我有2个问题:

1)在我得到的输出中,函数复制了跟在值utm_source-后面的所有内容。如何添加另一个维度,使公式只复制=和&之间的内容

1) In the output that I got, the function copied everything that followed the value "utm_source-" . How do I add another dimension to make the formula copy only what is between "=" and "&"

2)如何保留最初在第一列(UUID),Jane,Mike,John等中的值?

2) How do i keep the values that were initially in the first column (UUID) , Jane, Mike, John, etc?

推荐答案

您需要执行两项操作:


  1. 使用 gsub 从您的来源剥离网站名称

  2. 使用 strsplit 每次

  1. Use gsub to strip the website name from your Source
  2. Use strsplit to separate the remaining string at each occurrence of ?

读取数据:

x <- read.table(text="
UUID    Source
1   Jane    http//mywebsite.com44bb00?utm_source=ADW&utm_medium=banner&utm_campaign=Monk&gclid1234
2   Mike    http//mywebsite.com44bb00?utm_source=Google&utm_medium=cpc&utm_campaign=DOG&gclid1234
3   John    http//mywebsite.com44bb00?utm_source=Yahoo&utm_medium=banner&utm_campaign=DOG&gclid1234
4   Sarah   http//mywebsite.com44bb00?utm_source=Facebookdw&utm_medium=cpc&utm_campaign=CAT&gclid1234
5   Michael http//mywebsite.com44bb00?utm_source=Twitter&utm_medium=GDNr&utm_campaign=CAT&gclid1234
6   Bob     http//mywebsite.com44bb00?utm_source=ADW&utm_medium=GDN&utm_campaign=DOG&gclid1234
7   Mark    http//mywebsite.com44bb00?utm_source=Twitter&utm_medium=banner&utm_campaign=MONK&gclid1234
8   Anna    http//mywebsite.com44bb00?utm_source=Facebook&utm_medium=banner&utm_campaign=MONK&gclid1234", header=TRUE, stringsAsFactors=FALSE)

使用 strsplit 在每个?分隔源字符串?

Use strsplit to separate the Source string at each ?:

z <- matrix(
  unlist(strsplit(gsub(".*\\?", "", x$Source), "\\&")), 
  ncol=4, byrow=TRUE)
cbind(x$UUID, gsub(".*=", "", z))

     [,1]      [,2]         [,3]     [,4]   [,5]       
[1,] "Jane"    "ADW"        "banner" "Monk" "gclid1234"
[2,] "Mike"    "Google"     "cpc"    "DOG"  "gclid1234"
[3,] "John"    "Yahoo"      "banner" "DOG"  "gclid1234"
[4,] "Sarah"   "Facebookdw" "cpc"    "CAT"  "gclid1234"
[5,] "Michael" "Twitter"    "GDNr"   "CAT"  "gclid1234"
[6,] "Bob"     "ADW"        "GDN"    "DOG"  "gclid1234"
[7,] "Mark"    "Twitter"    "banner" "MONK" "gclid1234"
[8,] "Anna"    "Facebook"   "banner" "MONK" "gclid1234"






然后转换为数据框并添加名称:


And then convert to a data frame and add names:

z <- matrix(
  unlist(strsplit(gsub(".*\\?", "", x$Source), "\\&")), 
  ncol=4, byrow=TRUE)
z <- cbind(x$UUID, gsub(".*=", "", z))
z <- as.data.frame(z[, -5])
names(z) <- c("UUID", "UTM_SOURCE", "UTM_MEDIUM", "UTM_CAMPAIGN")
z

     UUID UTM_SOURCE UTM_MEDIUM UTM_CAMPAIGN
1    Jane        ADW     banner         Monk
2    Mike     Google        cpc          DOG
3    John      Yahoo     banner          DOG
4   Sarah Facebookdw        cpc          CAT
5 Michael    Twitter       GDNr          CAT
6     Bob        ADW        GDN          DOG
7    Mark    Twitter     banner         MONK
8    Anna   Facebook     banner         MONK

这篇关于在R中使用具有多个条件的gsub函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆