Count data divided by year and by region in R

up vote
9
down vote

favorite

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.

Does anyone have any suggestions?

edited Nov 18 at 0:54

m0nhawk

15k83160

asked Nov 18 at 0:34

cb14

484

add a comment |

up vote
9
down vote

favorite

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

Does anyone have any suggestions?

edited Nov 18 at 0:54

m0nhawk

15k83160

asked Nov 18 at 0:34

cb14

484

add a comment |

up vote
9
down vote

favorite

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

Does anyone have any suggestions?

edited Nov 18 at 0:54

m0nhawk

15k83160

asked Nov 18 at 0:34

cb14

484

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

Does anyone have any suggestions?

r grouping tidyverse data-management

edited Nov 18 at 0:54

m0nhawk

15k83160

asked Nov 18 at 0:34

cb14

484

edited Nov 18 at 0:54

m0nhawk

15k83160

asked Nov 18 at 0:34

cb14

484

edited Nov 18 at 0:54

m0nhawk

15k83160

edited Nov 18 at 0:54

m0nhawk

15k83160

edited Nov 18 at 0:54

m0nhawk

15k83160

asked Nov 18 at 0:34

cb14

484

asked Nov 18 at 0:34

cb14

484

asked Nov 18 at 0:34

cb14

484

add a comment |

2 Answers
2

active

oldest

votes

up vote
11
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered Nov 18 at 0:59

wl1234

197212

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

add a comment |

up vote
5
down vote

Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.5k102240

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356871%2fcount-data-divided-by-year-and-by-region-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
11
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered Nov 18 at 0:59

wl1234

197212

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

add a comment |

up vote
11
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered Nov 18 at 0:59

wl1234

197212

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

add a comment |

up vote
11
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered Nov 18 at 0:59

wl1234

197212

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered Nov 18 at 0:59

wl1234

197212

answered Nov 18 at 0:59

wl1234

197212

answered Nov 18 at 0:59

wl1234

197212

answered Nov 18 at 0:59

wl1234

197212

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

add a comment |

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

add a comment |

up vote
5
down vote

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.5k102240

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

add a comment |

up vote
5
down vote

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.5k102240

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

add a comment |

up vote
5
down vote

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.5k102240

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.5k102240

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.5k102240

answered Nov 18 at 1:35

www

25.5k102240

answered Nov 18 at 1:35

www

25.5k102240

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

add a comment |

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

W8JZp NxJCic7Vpgt6qznNHlAPZ,Rz

搜尋此網誌

Vrftjkry