将样本数据映射到实际的csv数据 [英] Mapping sample data to actual csv data

查看:46
本文介绍了将样本数据映射到实际的csv数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

感谢戴维和每个我都取得了进步的人.它仍然不会产生折线图,但是在逻辑上看,逻辑上没有任何错误.我在这里没有信誉-我只是剪切并粘贴比我想像的还要聪明的人,但是我仍然看不到图表.最后链接到github csv.

  data = read.csv("C:/Users/12083/Desktop/librarydata.csv")#将数据读入Rhead(data)#质量控制,看起来不错str(数据)data $ dates = as.Date(data $ dates,format =%d/%m/%Y")#此格式将日期格式化为R的日期library(tidyverse)#这将导入您需要的某些功能,特别是%>%和ggplot#步骤0:看起来数据对您有意义摘要(数据$日期)摘要(data $ city)#步骤1:过滤正确的数据start.date = as.Date("2003-01-02")end.date = as.Date("2010-05-04")已过滤=数据%&%过滤器(日期> =开始日期&date< = end.date)#这只会在这些日期之间使用行摘要(已过滤)姓氏(已过滤)图书馆(dplyr)filtered_agg<-过滤后的%&%;%group_by(城市,日期,位置)%&%;%总结(location_sum = n())filtered_agg摘要(filtered_agg)#步骤2:绘制#现在您可以使用ggplot创建图:#注意:#我添加了geom_point(),以便每个X值都得到一个点.#我认为它更容易阅读.如果愿意,可以将其删除#还添加了颜色,因为我喜欢它,请随时删除#问题出在这里-某处地块= ggplot(filtered_agg,aes(x =日期,y =位置,组=城市))+ geom_line(aes(linetype = city,color = city))+ geom_point(aes(color = city))阴谋输出 

希望对您有帮助.由于我们没有特定的* .csv文件,而且您在绘制特定的数据框时也不会遇到麻烦,因此,最困难的地方就是确保在读取文件时,您的数据采用您期望的格式.此外,请确保您的代码正在调用以绘制正确的数据框.

Thanks to Davy and everyone I think I made progress. It still will not produce a line graph but there is nothing logically in the code that looks wrong. I take no credit here - I just cut and paste what smarter people than me have figured out but I still don't get a graph. Link to github csv at the end.

data = read.csv("C:/Users/12083/Desktop/librarydata.csv") # Read the data into R

head(data)                                            # Quality control, looks good
str(data)
data$dates = as.Date(data$dates, format = "%d/%m/%Y") # This formats the date as dates for R
library(tidyverse)                                    # This will import some functions that you need, spcifically %>% and ggplot
# Step 0: look that the data makes sense to you
summary(data$dates)
summary(data$city)

# Step 1: filter the right data
start.date = as.Date("2003-01-02")
end.date   = as.Date("2010-05-04")

filtered = data %>% 
  filter(dates >= start.date & 
           dates <= end.date) # This will only take rows between those dates
summary(filtered)
colnames(filtered)

library(dplyr)

filtered_agg <- filtered %>%
  group_by(city, dates, Location) %>%
  summarize(location_sum=n()) 

filtered_agg
summary(filtered_agg)
# Step 2: Plotting
# Now you can create the plot with ggplot:
# Notes: 
# I added geom_point() so that each X value gets a point. 
# I think it's easier to read. You can remove this if you like
# Also added color, because I like it, feel free to delete



# The problem is in here - somewhere
Plot = ggplot(filtered_agg, aes(x=dates, y=Location, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city))
Plot
dput

https://github.com/karl1776/chart colnames(filtered) 1 "ï..Class.ID" "city" "dates" "year" "month"
[6] "day" "cit" "Department.College" "Course.Level" "Course.Title"
[11] "Tour." "TILT." "Date.Taught" "Session.Number" "AM.PM"
[16] "Hour.Count" "Library.Instructor" "Other.Library.Instructor" "Duplicate." "Course.Instructor"
[21] "ACRL" "IPED" "Location" "Building.Room" "Distance.Class."
[26] "Location.of.Site.1" "Site.1.Number.of.Students" "Location.of.Site.2" "Site.2.Number.of.Students" "Location.of.Site.3"
[31] "Site.3.Number.of.Students" "Location.of.Site.4" "Site.4.Number.of.Students" "Location.of.Site.5" "Site.5.Number.of.Students" [36] "Location.of.Site.6" "Site.6.Number.of.Students" "Location.of.Site.7" "Site.7.Number.of.Students" "Location.of.Site.8"
[41] "Site.8.Number.of.Students" "Location.of.Site.9" "Site.9.Number.of.Students" "Location.of.Site.10" "Site.10.Number.of.Students"

Maybe I just don't see it but I have a hard time looking at examples with dummy data and translating that to how to load actual data from a csv file The picture shows my output from the dummy data -- exactly what I want. When I use my actual data nothing happens - have I left out a ggplot command to print the plot?

library(readxl)
require(tidyverse)
require(ggplot2)
require(lubridate)
#load data
df <- read_excel("C:/Users/12083/Desktop/librarydata.xlsx")
#plot data
df_example %>%
  ggplot(aes(date,city, color=city))+
  geom_line(aes(linetype=lt))+ #you can use single string for the same linetype for all lines or a vector of strings for each data point
  scale_linetype_identity()+ #this removes the linetype from the legend
  theme_minimal()

df_example

I get this output -- this is exactly right but no plot to accompany it.

city      dates classes       lt
1       Boise 2020-01-01      52    solid
2       Boise 2020-02-01      36    solid
3       Boise 2020-03-01      69    solid
4       Boise 2020-04-01     100    solid
5       Boise 2020-05-01      72    solid
6   Pocatello 2020-01-01      82   dashed
7   Pocatello 2020-02-01      15   dashed
8   Pocatello 2020-03-01      68   dashed
9   Pocatello 2020-04-01      17   dashed
10  Pocatello 2020-05-01      51   dashed
11  Salt Lake 2020-01-01      71   dotted
12  Salt Lake 2020-02-01      65   dotted
13  Salt Lake 2020-03-01      33   dotted
14  Salt Lake 2020-04-01      44   dotted
15  Salt Lake 2020-05-01      16   dotted
16 Twin Falls 2020-01-01       3  dotdash
17 Twin Falls 2020-02-01      30  dotdash
18 Twin Falls 2020-03-01      19  dotdash
19 Twin Falls 2020-04-01      34  dotdash
20 Twin Falls 2020-05-01      69  dotdash
21  Elsewhere 2020-01-01      62 longdash
22  Elsewhere 2020-02-01      14 longdash
23  Elsewhere 2020-03-01      59 longdash
24  Elsewhere 2020-04-01      35 longdash
25  Elsewhere 2020-05-01      91 longdash

dput

structure(list(`Class ID` = c(4438, 4439, 4428, 4437, 4430, 4431, 
4432, 4433, 4434, 4435, 4436, 4427, 4440, 4417, 4414, 4407, 4413, 
4412, 4418, 4410), city = c("Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Meridian", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Idaho Falls"), date = structure(c(1468972800, 1468972800, 
1468886400, 1468800000, 1468454400, 1468454400, 1468368000, 1468368000, 
1468368000, 1468281600, 1468281600, 1466553600, 1466553600, 1461283200, 
1460592000, 1460419200, 1460419200, 1460073600, 1460073600, 1459987200
), tzone = "UTC", class = c("POSIXct", "POSIXt")), year = c(2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016), month = c(7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 4, 4, 4, 4, 4, 4, 4), day = c(20, 
20, 29, 18, 14, 14, 13, 13, 13, 12, 12, 22, 22, 22, 13, 12, 12, 
8, 8, 7), cit = c("Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Meridian", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Idaho Falls"), `Department/College` = c("College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "Library", "Library", "Library", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Education", "Library", "Division of Health Sciecnes", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters"), 
    `Course Level` = c("Lower Division", "Lower Division", "Lower Division", 
    "Lower Division", "Lower Division", "Lower Division", "K-12", 
    "K-12", "K-12", "Lower Division", "Lower Division", "Lower Division", 
    "K-12", "Graduate", "Lower Division", "Lower Division", "Lower Division", 
    "Lower Division", "Lower Division", "Lower Division"), `Course Title` = c("ACAD 1111", 
    "ACAD 1111", "POLS 1110", "ENGL 1123", "ACAD 1111", "ACAD 1111", 
    "Kid University", "Kid University", "Kid University", "ACAD 1111", 
    "ACAD 1111", "EDUC 1110", "Kid University", "Nursing_Orientation", 
    "ENGL 1102", "ENGL 1101", "ENGL 1101", "ENGL 1102", "ENGL 1102", 
    "ENGL 1102"), `Tour?` = c(FALSE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
    FALSE, FALSE, TRUE, TRUE, FALSE), `TILT?` = c(FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
    ), `Date Taught` = structure(c(1468972800, 1468972800, 1468886400, 
    1468800000, 1468454400, 1468454400, 1468368000, 1468368000, 
    1468368000, 1468281600, 1468281600, 1466553600, 1466553600, 
    1461283200, 1460592000, 1460419200, 1460419200, 1460073600, 
    1460073600, 1459987200), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), `Session Number` = c("Third Session", "Third Session", 
    "Single Session", NA, "Second Session", "Second Session", 
    "Single Session", "Single Session", "Single Session", "First Session", 
    "First Session", "Single Session", "Single Session", "Single Session", 
    "Single Session", "Single Session", "First Session", "Third Session", 
    "Third Session", "Second Session"), `AM/PM` = c("AM", "PM", 
    "PM", "PM", "AM", "PM", "PM", "PM", "PM", "AM", "PM", "PM", 
    "PM", "AM", "PM", "PM", "AM", "AM", "AM", "AM"), `Hour Count` = c(1.5, 
    1.5, 1, 1.5, 1.5, 1.5, 0.5, 0.5, 1, 1.5, 1.5, 1.5, 1, 1, 
    1.5, 1.5, 1.5, 1, 1, 1.5), 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Cathy Gray", 
    NA, NA, NA, NA, "Monte Asche", "Philip Homan", NA), `Duplicate?` = c(FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE), ACRL = c(0, 0, 7, 5, 0, 0, 7, 7, 7, 22, 9, 
    8, 13, 35, 19, 6, 8, 0, 0, 0), IPED = c(22, 9, 7, 5, 23, 
    9, 7, 7, 7, 22, 9, 8, 13, 35, 19, 6, 8, 19, 19, 22), `Location of Instructor` = c("Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Meridian", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Idaho Falls"), `Building/Room` = c("LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", 
    "Special Collections", "LIBR 212", "LIBR 212", "LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "Meridian", "LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", "CHE 306"
    ), `Distance Class?` = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), `Location of Site 1` = c("Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise"), `Site 1 Number of Students` = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    `Location of Site 2` = c("Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls"), `Site 2 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 3` = c("Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls"), 
    `Site 3 Number of Students` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 4` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `Site 4 Number of Students` = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    `Location of Site 5` = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), `Site 5 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 6` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 6 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 7` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 7 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 8` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 8 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 9` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 9 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 10` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 10 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))
> 

解决方案

OP, it seems you're having some trouble generally with how to import data from a *.csv and translate that into your desired plot. Since it seems you're able to create a plot, I'll gloss over that part and walk you through an example of a good way to approach importing data, then ensuring you can translate that to your plot.

Importing the .csv file and preparing the data

I will start with a .csv file that I have created using the output you posted of df_example in your question. I exported that data to a *.csv file, and now we can import it:

df <- read.csv('OP_example.csv')

The first step once you import the data is to ensure it "looks right" and to get an idea of the structure. Even when you created the file yourself, it's very important to ensure df looks the way it should. Here, head(), str(), and summary() are your friends.

> head(df)
  X      city      dates classes     lt
1 1     Boise 2020-01-01      52  solid
2 2     Boise 2020-02-01      36  solid
3 3     Boise 2020-03-01      69  solid
4 4     Boise 2020-04-01     100  solid
5 5     Boise 2020-05-01      72  solid
6 6 Pocatello 2020-01-01      82 dashed

> str(df)
'data.frame':   25 obs. of  5 variables:
 $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ city   : chr  "Boise" "Boise" "Boise" "Boise" ...
 $ dates  : chr  "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" ...
 $ classes: int  52 36 69 100 72 82 15 68 17 51 ...
 $ lt     : chr  "solid" "solid" "solid" "solid" ...

You can see that in writing the *.csv file, it created an "X" column that's just the row number. No big deal. We also have everything else looking fine, except that you'll notice that df$dates is read in as a chr, not as a Date or another date-like class. Since I'm going to create a plot using this column, I will need it as a date:

> df$dates <- as.Date(df$dates, format='%Y-%m-%d')

> str(df)
'data.frame':   25 obs. of  5 variables:
 $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ city   : chr  "Boise" "Boise" "Boise" "Boise" ...
 $ dates  : Date, format: "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" ...
 $ classes: int  52 36 69 100 72 82 15 68 17 51 ...
 $ lt     : chr  "solid" "solid" "solid" "solid" ...

Notice that I specify the format= for the date. You'll find information on the nomenclature of % associated with format= within the documentation for the strptime() function. When I run str() again on df, you'll see that df$dates is now a Date class instead of chr.

Plotting

Now for the plot, just make sure that you are reading and plotting the correct dataframe. From your code example... you are plotting using df_example, but reading in df. Not sure if that was a typo.

Your preference appears to be using the pipe %>% command, rather than stating the dataframe within ggplot(), so I'll do that here:

df %>%
  ggplot(aes(x=dates, y=classes, color=city)) +
  geom_line() + geom_point() + theme_bw()

Giving you:

Hope that helps you out. Since we don't have your particular *.csv file and you are not having trouble plotting a particular data frame, the most reasonable place that you're having difficulty is ensuring that when you are reading in your file, the columns and class of your data is in the format you expect. Additionally, please ensure your code is calling to plot the correct data frame.

这篇关于将样本数据映射到实际的csv数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆