使用R语言读写csv文件的问题 [英] issue with reading and writing a csv file in R language

查看:1892
本文介绍了使用R语言读写csv文件的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有CSV格式的表格,数据如下:

  1 3 1 2 
1415_at 1 8.512147859 8.196725061 8.174426394 8.62388149
1411_at 2 9.119200527 9.190318548 9.149239039 9.211401637
1412_at 3 10.03383593 9.575728316 10.06998673 9.735217522
1413_at 4 5.925999419 5.692092375 5.689299161 7.807354922

当我读它:

  m < csv(table.csv)

并打印m的值,我注意到他们更改为

  X X.1 X1 X3 X4 X1.1 
1 1415_at 1 8.512148 8.196725 8.174426 8.623881



我做了一些操作,只保留那些标记为1或2的列,所以我这样做:

  smallerdat<  -  m [grep(^ X $ | ^ X.1 $ | ^ X1 $ | ^ X2 $ | 1 \\ \\\。| 2 \\。,names(m))] 

write.csv(smallerdat,table2.csv)

它给我写了那些烦人的标题文件,并添加了第一列,我不需要它:

  X X.1 X1 X1.1 X2 
1 1415_at 1 8.512148 8.174426 8.623881

所以当我在Excel中打开数据的标题仍然是X,X.1和儿子。我需要的是,头保持不变为:

  1 1 2 
1415_at 1 8.196725061 8.174426394 8.62388149

有任何帮助吗?



自动添加的列,我不需要它,那么我该如何摆脱那个列呢?

解决方案

这里有两个问题。


  1. 要阅读CSV文件,请使用:

      M<  -  read.csv(table.csv,check.names = FALSE)

    请注意,通过这样做,您不能轻易地使用列名称。您必须使用反引号来引用它们,并且很可能仍会遇到重复的列名称的问题:

      m $ 1 
    #错误:mydf $ 1中的意外数字常量
    mydf $`1`
    #[1] 8.512148 9.119201 10.033836 5.925999


  2. 要将m对象写入CSV文件,请使用:

      write.csv(m,table2.csv,row.names = FALSE)







使用步骤1中的方法读取文件后,您可以按以下方式子集。如果你想要第一列和任何名为3或4的列,你可以使用:

  m [names m)%in%c(,3,4)] 
#3 4
#1 1415_at 1 8.196725 8.623881
#2 1411_at 2 9.190319 9.211402
#3 1412_at 3 9.575728 9.735218
#4 1413_at 4 5.692092 7.807355



更新:之前使用 write.csv



如果您不想从某个步骤1开始,仍然解决您的问题。虽然您已成功使用 grep 语句获取子集,但不会更改列名(不知道为什么您会期望它应该)。您必须使用 gsub 或其他 regex 解决方案之一。



以下是您在CSV中读取的列的名称:

  m)
#[1]XX.1X1X3X1.1X2

您要:




  • 移除所有「X」

  • 删除所有.some-number



所以,这里有一个解决方法:

 #更改原始数据集中的名称
names(m)< - gsub(^ X | \\。[0-9] $,,names(m))
#创建一个临时对象以匹配所需的名称
getme< - names(m)%in%c(,1,2 )
#子集你的数据
smallerdat< - m [getme]
#重新分配你的子集名称
名称(smallerdat)


I have a table in csv format, the data is the following:

            1           3            1          2
1415_at 1   8.512147859 8.196725061 8.174426394 8.62388149
1411_at 2   9.119200527 9.190318548 9.149239039 9.211401637
1412_at 3   10.03383593 9.575728316 10.06998673 9.735217522
1413_at 4   5.925999419 5.692092375 5.689299161 7.807354922

When I read it with:

m <- read.csv("table.csv")

and print the values of m, I notice that they change to:

        X   X.1        X1       X3      X1.1       X4
1 1415_at   1       8.512148 8.196725  8.174426 8.623881

I made some manipulation to keep only those columns that are labelled 1 or 2, so I do that with:

smallerdat <- m[ grep("^X$|^X.1$|^X1$|^X2$|1\\.|2\\." , names(m) ) ]

write.csv(smallerdat,"table2.csv")

it writes me the file with those annoying headers and that first column added, which I do not need it:

      X   X.1        X1             X1.1       X2
1 1415_at   1       8.512148   8.174426 8.623881

so when I open that data in Excel the headers are still X, X.1 and son on. What I need is that the headers remain the same as:

                     1      1           2
1415_at 1       8.196725061 8.174426394 8.62388149

any help?

Please notice also that first column that is added automatically, I do not need it, so how I can get rid that of that column?

解决方案

There are two issues here.

  1. For reading your CSV file, use:

    m <- read.csv("table.csv", check.names = FALSE)
    

    Notice that by doing this, though, you can't use the column names as easily. You have to quote them with backticks instead, and will most likely still run into problems because of duplicated column names:

    m$1
    # Error: unexpected numeric constant in "mydf$1"
    mydf$`1`
    # [1]  8.512148  9.119201 10.033836  5.925999
    

  2. For writing your "m" object to a CSV file, use:

    write.csv(m, "table2.csv", row.names = FALSE)
    


After reading your file in using the method in step 1, you can subset as follows. If you wanted the first column and any columns named "3" or "4", you can use:

m[names(m) %in% c("", "3", "4")]
#                    3        4
# 1 1415_at 1 8.196725 8.623881
# 2 1411_at 2 9.190319 9.211402
# 3 1412_at 3 9.575728 9.735218
# 4 1413_at 4 5.692092 7.807355

Update: Fixing the names before using write.csv

If you don't want to start from step 1 for whatever reason, you can still fix your problem. While you've succeeded in taking a subset with your grep statement, that doesn't change the column names (not sure why you would expect that it should). You have to do this by using gsub or one of the other regex solutions.

Here are the names of the columns with the way you have read in your CSV:

names(m)
# [1] "X"    "X.1"  "X1"   "X3"   "X1.1" "X2"  

You want to:

  • Remove all "X"s
  • Remove all ".some-number"

So, here's a workaround:

# Change the names in your original dataset
names(m) <- gsub("^X|\\.[0-9]$", "", names(m))
# Create a temporary object to match desired names
getme <- names(m) %in% c("", "1", "2")
# Subset your data
smallerdat <- m[getme]
# Reassign names to your subset
names(smallerdat) <- names(m)[getme]

这篇关于使用R语言读写csv文件的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆