Count data divided by year and by region in R
up vote
9
down vote
favorite
I have a very large (too big to open in Excel) biological dataset that looks something like this
year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990,
1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C', 'A')
region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
3, 2, 2, 1, 1, 1)
df <- data.frame(year, species, region)
df
year species region
1 1990 A 1
2 1980 A 1
3 1985 B 1
4 1980 B 3
5 1990 B 2
6 1990 C 3
7 1980 C 3
8 1985 C 2
9 1985 A 1
10 1990 A 1
11 1980 A 3
12 1985 B 3
13 1980 B 3
14 1990 B 2
15 1990 C 2
16 1980 C 1
17 1985 C 1
18 1985 A 1
19 1990 A 1
20 1980 A 3
21 1985 B 3
22 1980 B 3
23 1990 B 2
24 1990 C 2
25 1980 C 1
26 1985 C 1
27 1985 A 1
What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).
I'm looking to end up with a dataset that looks something along the lines of this,
region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
1 1 0 0 0 0 0 0 0 0 0
2 2 1 1 1 1 1 1 1 1 1
3 3 2 2 2 2 2 2 2 2 2
such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread
function in conjunction with the group_by
dplyr function, but I couldn't get it to do anything close to what I want.
Does anyone have any suggestions?
r grouping tidyverse data-management
add a comment |
up vote
9
down vote
favorite
I have a very large (too big to open in Excel) biological dataset that looks something like this
year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990,
1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C', 'A')
region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
3, 2, 2, 1, 1, 1)
df <- data.frame(year, species, region)
df
year species region
1 1990 A 1
2 1980 A 1
3 1985 B 1
4 1980 B 3
5 1990 B 2
6 1990 C 3
7 1980 C 3
8 1985 C 2
9 1985 A 1
10 1990 A 1
11 1980 A 3
12 1985 B 3
13 1980 B 3
14 1990 B 2
15 1990 C 2
16 1980 C 1
17 1985 C 1
18 1985 A 1
19 1990 A 1
20 1980 A 3
21 1985 B 3
22 1980 B 3
23 1990 B 2
24 1990 C 2
25 1980 C 1
26 1985 C 1
27 1985 A 1
What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).
I'm looking to end up with a dataset that looks something along the lines of this,
region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
1 1 0 0 0 0 0 0 0 0 0
2 2 1 1 1 1 1 1 1 1 1
3 3 2 2 2 2 2 2 2 2 2
such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread
function in conjunction with the group_by
dplyr function, but I couldn't get it to do anything close to what I want.
Does anyone have any suggestions?
r grouping tidyverse data-management
add a comment |
up vote
9
down vote
favorite
up vote
9
down vote
favorite
I have a very large (too big to open in Excel) biological dataset that looks something like this
year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990,
1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C', 'A')
region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
3, 2, 2, 1, 1, 1)
df <- data.frame(year, species, region)
df
year species region
1 1990 A 1
2 1980 A 1
3 1985 B 1
4 1980 B 3
5 1990 B 2
6 1990 C 3
7 1980 C 3
8 1985 C 2
9 1985 A 1
10 1990 A 1
11 1980 A 3
12 1985 B 3
13 1980 B 3
14 1990 B 2
15 1990 C 2
16 1980 C 1
17 1985 C 1
18 1985 A 1
19 1990 A 1
20 1980 A 3
21 1985 B 3
22 1980 B 3
23 1990 B 2
24 1990 C 2
25 1980 C 1
26 1985 C 1
27 1985 A 1
What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).
I'm looking to end up with a dataset that looks something along the lines of this,
region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
1 1 0 0 0 0 0 0 0 0 0
2 2 1 1 1 1 1 1 1 1 1
3 3 2 2 2 2 2 2 2 2 2
such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread
function in conjunction with the group_by
dplyr function, but I couldn't get it to do anything close to what I want.
Does anyone have any suggestions?
r grouping tidyverse data-management
I have a very large (too big to open in Excel) biological dataset that looks something like this
year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990,
1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,
1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)
species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',
'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C', 'A')
region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,
3, 2, 2, 1, 1, 1)
df <- data.frame(year, species, region)
df
year species region
1 1990 A 1
2 1980 A 1
3 1985 B 1
4 1980 B 3
5 1990 B 2
6 1990 C 3
7 1980 C 3
8 1985 C 2
9 1985 A 1
10 1990 A 1
11 1980 A 3
12 1985 B 3
13 1980 B 3
14 1990 B 2
15 1990 C 2
16 1980 C 1
17 1985 C 1
18 1985 A 1
19 1990 A 1
20 1980 A 3
21 1985 B 3
22 1980 B 3
23 1990 B 2
24 1990 C 2
25 1980 C 1
26 1985 C 1
27 1985 A 1
What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).
I'm looking to end up with a dataset that looks something along the lines of this,
region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990
1 1 0 0 0 0 0 0 0 0 0
2 2 1 1 1 1 1 1 1 1 1
3 3 2 2 2 2 2 2 2 2 2
such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread
function in conjunction with the group_by
dplyr function, but I couldn't get it to do anything close to what I want.
Does anyone have any suggestions?
r grouping tidyverse data-management
r grouping tidyverse data-management
edited Nov 18 at 0:54
m0nhawk
15k83160
15k83160
asked Nov 18 at 0:34
cb14
484
484
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
11
down vote
accepted
Something like this?
library(dplyr)
df2 <- df %>%
mutate(sp_year = paste(species, year, sep = "_")) %>%
group_by(region) %>%
count(sp_year) %>%
spread(sp_year,n)
df2
Which gives this:
# A tibble: 3 x 10
# Groups: region [3]
region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 3 NA 1 NA 2 2 NA
2 2 NA NA NA NA NA 3 NA 1 2
3 3 2 NA NA 3 2 NA 1 NA 1
1
also possible to use?tidyr::unite
instead ofmutate(paste)
. Would be less verbose at the very least.
– Shree
Nov 18 at 1:36
add a comment |
up vote
5
down vote
Similar to wl1234's answer but more concise. We can use unite
to combine columns. We can also use count
without group_by
the variable. Finally, we can set fill = 0
in the spread
function to replace NA
with 0.
library(tidyverse)
df2 <- df %>%
unite(sp_year, species, year, sep = "_") %>%
count(sp_year, region) %>%
spread(sp_year, n, fill = 0)
df2
# # A tibble: 3 x 10
# region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 3 3 0 1 0 2 2 0
# 2 2 0 0 0 0 0 3 0 1 2
# 3 3 2 0 0 3 2 0 1 0 1
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know aboutunite
. I will use that instead ofpaste
next time.
– wl1234
Nov 18 at 3:45
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
11
down vote
accepted
Something like this?
library(dplyr)
df2 <- df %>%
mutate(sp_year = paste(species, year, sep = "_")) %>%
group_by(region) %>%
count(sp_year) %>%
spread(sp_year,n)
df2
Which gives this:
# A tibble: 3 x 10
# Groups: region [3]
region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 3 NA 1 NA 2 2 NA
2 2 NA NA NA NA NA 3 NA 1 2
3 3 2 NA NA 3 2 NA 1 NA 1
1
also possible to use?tidyr::unite
instead ofmutate(paste)
. Would be less verbose at the very least.
– Shree
Nov 18 at 1:36
add a comment |
up vote
11
down vote
accepted
Something like this?
library(dplyr)
df2 <- df %>%
mutate(sp_year = paste(species, year, sep = "_")) %>%
group_by(region) %>%
count(sp_year) %>%
spread(sp_year,n)
df2
Which gives this:
# A tibble: 3 x 10
# Groups: region [3]
region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 3 NA 1 NA 2 2 NA
2 2 NA NA NA NA NA 3 NA 1 2
3 3 2 NA NA 3 2 NA 1 NA 1
1
also possible to use?tidyr::unite
instead ofmutate(paste)
. Would be less verbose at the very least.
– Shree
Nov 18 at 1:36
add a comment |
up vote
11
down vote
accepted
up vote
11
down vote
accepted
Something like this?
library(dplyr)
df2 <- df %>%
mutate(sp_year = paste(species, year, sep = "_")) %>%
group_by(region) %>%
count(sp_year) %>%
spread(sp_year,n)
df2
Which gives this:
# A tibble: 3 x 10
# Groups: region [3]
region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 3 NA 1 NA 2 2 NA
2 2 NA NA NA NA NA 3 NA 1 2
3 3 2 NA NA 3 2 NA 1 NA 1
Something like this?
library(dplyr)
df2 <- df %>%
mutate(sp_year = paste(species, year, sep = "_")) %>%
group_by(region) %>%
count(sp_year) %>%
spread(sp_year,n)
df2
Which gives this:
# A tibble: 3 x 10
# Groups: region [3]
region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 3 NA 1 NA 2 2 NA
2 2 NA NA NA NA NA 3 NA 1 2
3 3 2 NA NA 3 2 NA 1 NA 1
answered Nov 18 at 0:59
wl1234
197212
197212
1
also possible to use?tidyr::unite
instead ofmutate(paste)
. Would be less verbose at the very least.
– Shree
Nov 18 at 1:36
add a comment |
1
also possible to use?tidyr::unite
instead ofmutate(paste)
. Would be less verbose at the very least.
– Shree
Nov 18 at 1:36
1
1
also possible to use
?tidyr::unite
instead of mutate(paste)
. Would be less verbose at the very least.– Shree
Nov 18 at 1:36
also possible to use
?tidyr::unite
instead of mutate(paste)
. Would be less verbose at the very least.– Shree
Nov 18 at 1:36
add a comment |
up vote
5
down vote
Similar to wl1234's answer but more concise. We can use unite
to combine columns. We can also use count
without group_by
the variable. Finally, we can set fill = 0
in the spread
function to replace NA
with 0.
library(tidyverse)
df2 <- df %>%
unite(sp_year, species, year, sep = "_") %>%
count(sp_year, region) %>%
spread(sp_year, n, fill = 0)
df2
# # A tibble: 3 x 10
# region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 3 3 0 1 0 2 2 0
# 2 2 0 0 0 0 0 3 0 1 2
# 3 3 2 0 0 3 2 0 1 0 1
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know aboutunite
. I will use that instead ofpaste
next time.
– wl1234
Nov 18 at 3:45
add a comment |
up vote
5
down vote
Similar to wl1234's answer but more concise. We can use unite
to combine columns. We can also use count
without group_by
the variable. Finally, we can set fill = 0
in the spread
function to replace NA
with 0.
library(tidyverse)
df2 <- df %>%
unite(sp_year, species, year, sep = "_") %>%
count(sp_year, region) %>%
spread(sp_year, n, fill = 0)
df2
# # A tibble: 3 x 10
# region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 3 3 0 1 0 2 2 0
# 2 2 0 0 0 0 0 3 0 1 2
# 3 3 2 0 0 3 2 0 1 0 1
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know aboutunite
. I will use that instead ofpaste
next time.
– wl1234
Nov 18 at 3:45
add a comment |
up vote
5
down vote
up vote
5
down vote
Similar to wl1234's answer but more concise. We can use unite
to combine columns. We can also use count
without group_by
the variable. Finally, we can set fill = 0
in the spread
function to replace NA
with 0.
library(tidyverse)
df2 <- df %>%
unite(sp_year, species, year, sep = "_") %>%
count(sp_year, region) %>%
spread(sp_year, n, fill = 0)
df2
# # A tibble: 3 x 10
# region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 3 3 0 1 0 2 2 0
# 2 2 0 0 0 0 0 3 0 1 2
# 3 3 2 0 0 3 2 0 1 0 1
Similar to wl1234's answer but more concise. We can use unite
to combine columns. We can also use count
without group_by
the variable. Finally, we can set fill = 0
in the spread
function to replace NA
with 0.
library(tidyverse)
df2 <- df %>%
unite(sp_year, species, year, sep = "_") %>%
count(sp_year, region) %>%
spread(sp_year, n, fill = 0)
df2
# # A tibble: 3 x 10
# region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 3 3 0 1 0 2 2 0
# 2 2 0 0 0 0 0 3 0 1 2
# 3 3 2 0 0 3 2 0 1 0 1
edited Nov 18 at 1:40
answered Nov 18 at 1:35
www
25.5k102240
25.5k102240
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know aboutunite
. I will use that instead ofpaste
next time.
– wl1234
Nov 18 at 3:45
add a comment |
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know aboutunite
. I will use that instead ofpaste
next time.
– wl1234
Nov 18 at 3:45
1
1
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53
I didn't know about
unite
. I will use that instead of paste
next time.– wl1234
Nov 18 at 3:45
I didn't know about
unite
. I will use that instead of paste
next time.– wl1234
Nov 18 at 3:45
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356871%2fcount-data-divided-by-year-and-by-region-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown