读入两行标题的文件 [英] Reading in files with two rows for header

查看:76
本文介绍了读入两行标题的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用SurveyGizmo,它可以将数据导出为CSV文件,但是可惜它有两行用于标题。第一行指定问题,第二行包含受访者可能已经核对的可能答案。在数据读写世界中,这似乎异常,但在调查领域中,这似乎很正常。

I have been using SurveyGizmo which can export data as a CSV file but alas it has two rows for header. The first row specifies the question and the second row contains possible responses that the respondent could have checked off. This seems highly aberrant in the data-read-and-write world but seems quite normal in the survey world. How does one read such a file into R?

SurveyGizmo曾经有一个旧的文件,它是如何读入R的?导出格式将所有内容排成一排,但似乎公司不再支持它。在一次简单的调查中,一位正在帮助我的实习生能够使用以下代码克服了这个问题。但是,对于更长的调查,更多的问题以及更长的问题(因此标题也更长),我们上面的蛮力方法不起作用。

SurveyGizmo used to have an "old" export format put everything into one row but it seems that company is not supporting it anymore. In a simple survey an intern who was helping me was able to overcome the problem with the following code. However, with a longer survey with more questions and with the questions being longer (and thus the headers being longer) our brute force method above is not working.

#Read csv file with two rows of headers
#Append the second row to the first row
df <-read.csv(csvfile,skip=1,stringsAsFactors=FALSE) #Read csv without any header
hl=readLines(csvfile, 2)            #Read the two header lines as char strings
hl=strsplit(hl,',')                   #Split headers up by commas
colnames(df)=sub('_$','',paste(hl[[1]],hl[[2]],sep=""))  #join second row to first row  

最后,我想要一个带有列标题的数据框,然后将其与来自后续调查的另一个数据框合并。

At the end I want a data frame with column headings which I will then merge with another data frame coming from a follow up survey.

这是带有两个标题行的CSV文件的示例。第三行也是最后一行是数据的第一行(不是真实信息)。

Here is an example of the CSV file with two header rows. The 3rd and final row is the first line of data (not real information).

"","","","","","","","","","Inclusion Criteria I or my child is a patient with recurrent respiratory papillomatosis (RRP)How do you know that you or your child has RRP? Please check whatever is true.","","","Exclusion Criteria Do any of the following apply? Please put a check next to any condition that is present.In the unlikely event that one of the following conditions apply, then unfortunately we cannot enroll you in this study. You could stop or you could carry on telling us about yourself, whichever you prefer. ","","Confused or have questions?If you are confused about any items or if you want us to clarify something then here is the place that you can express yourself freely. Also, you can call us at (412) 567-7870 or at (888) 887-7729.You are encouraged to review the consent form. You do not have to sign it now but you will need to do so once we enroll you. ","Please tell us who you are - referring to you, the person completing the form. Different people feel differently about their privacy and about how they are contacted. We will do our utmost to protect your privacy. Please do not give us your e-mail address if you do not want us to use it. Remember that e-mail should be private but is not always so. The safest way to think about it is as if e-mail was similar to a post card. Please do not give us a telephone number you do not want us to contact you on.","","","","","","","","","","","Who are you? Are you the patient or a parent or someone else?","When was the person with RRP born?Enter the date as MM/DD/YYYY","Approximately when was RRP diagnosed? This can be very approximate. If you do not remember the date then please put down your best guess. We will use it to work out how old the patient was when he or she was diagnosed. Enter the date as MM/DD/YYYY.","Has the patient with RRP ever received Gardasil? Gardasil is a vaccine against HPV 6, 11, 16 and 18 that was approved by the Food and Drug Administration (FDA) for use in females to prevent gynecologic diseases. ","Please ignore this question. It is for our internal tracking. Are you?","gender","race","Has there been human contact? By e-mail or by telephone or by anything in which we discussed informed consent","What is the subject number?","Merck Research Laboratory Accession Number?","Second Merck  Accession Number?","FedEx Tracking Number","Date Shipped Out","Date EMSI Notified"
"Response ID","RespondantKey","Edit Link","IP","Date Started","Date Finished","Status","Linked From","Comments","histopathconfirm","surgeonseaid","other","cancer","none","","First Name","Last Name","Street Address","Apt/Suite/Office","City","State","Postal Code","Country","Email Address","Phone Number","Mobile Phone","","","","","","","","","","","","","",""
"6990181","4099941","http://s-gtzd7-14166.sgizmo.com/?edit=6770181&cc=e246ecb7095b983xxxxx7ec0a9","1991.157.178.134","2009-04-30 07:57:24","2009-04-15 14:56:01","Submitted","","Spoke to her Thursday, 20 Apr 2009 20:26. No questions ready to go.09/11/2009 consent mailed..mrs accession number 304074333811wp, 01wp SFJB06123 Fedex tracking 865888887357 sent Tues April 29; called her Thurs, 10 May 2009 20:21 she will sign slip","histopathconfirm","surgeonseaid","","","none","","Jane","Doe","23 Hastings Rd","29th floor","Oranje","ny","27935","USA","mystry@gmail.com","728-850-7252","626-922-2239","Patient","02/21/1965","01/01/1976","No","Key Person","","","Yes","SFJB06123","304033385811wp","303334485801wp","865333807357","4/11/2007","4/11/2007"


推荐答案

为什么不读 .csv 读入第一行标题(据我所知,这是实际的标题),然后跳过下一行:

Why not just have read.csv read in the first header row (which are the actual headers as i understand your question) then skip the next row:

read.csv(file, header=T, skip=1)

或者,如果第二个标题行以特殊字符开头(在数据中找不到),则可以通过将以该行开头的字符作为参数 comment.char (如果该行以#开头,​​则为:)

Alternatively, if that second header line begins with an idiosyncratic character (not found in your data) then you can specify that line as a comment line by passing in the character that begins the line as the value to the argument comment.char (if that line began w/ "#" for instance, it would be)::

read.csv(file, header=T, comment.char="#")

这篇关于读入两行标题的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆