How to get sum of values in column based on variables in other column separately? [duplicate]

up vote
4
down vote

favorite

This question already has an answer here:

How to calculate the sum of the data that have the same ID in the first column?

4 answers

I have a table data like below

abc 1   1   1

bcd 2   2   4

bcd 12  23  3

cde 3   5   5

cde 3   4   5

cde 14  2   25

I want the sum of values in each column based on variables in first column and desired result is like below:

abc 1   1   1

bcd 14  25  7

cde 20  11  35

I used awk command like this

awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath

and I got a result below:

abc 3

bcd 46

cde 66

I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.

edited Nov 27 at 11:40

terdon♦

126k31244421

asked Nov 27 at 6:05

awkprob

232

New contributor

marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

up vote
4
down vote

favorite

This question already has an answer here:

How to calculate the sum of the data that have the same ID in the first column?

4 answers

I have a table data like below

abc 1   1   1

bcd 2   2   4

bcd 12  23  3

cde 3   5   5

cde 3   4   5

cde 14  2   25

I want the sum of values in each column based on variables in first column and desired result is like below:

abc 1   1   1

bcd 14  25  7

cde 20  11  35

I used awk command like this

awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath

and I got a result below:

abc 3

bcd 46

cde 66

I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.

edited Nov 27 at 11:40

terdon♦

126k31244421

asked Nov 27 at 6:05

awkprob

232

New contributor

marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

up vote
4
down vote

favorite

This question already has an answer here:

How to calculate the sum of the data that have the same ID in the first column?

4 answers

I have a table data like below

abc 1   1   1

bcd 2   2   4

bcd 12  23  3

cde 3   5   5

cde 3   4   5

cde 14  2   25

I want the sum of values in each column based on variables in first column and desired result is like below:

abc 1   1   1

bcd 14  25  7

cde 20  11  35

I used awk command like this

awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath

and I got a result below:

abc 3

bcd 46

cde 66

I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.

edited Nov 27 at 11:40

terdon♦

126k31244421

asked Nov 27 at 6:05

awkprob

232

New contributor

This question already has an answer here:

How to calculate the sum of the data that have the same ID in the first column?

4 answers

I have a table data like below

abc 1   1   1

bcd 2   2   4

bcd 12  23  3

cde 3   5   5

cde 3   4   5

cde 14  2   25

I want the sum of values in each column based on variables in first column and desired result is like below:

abc 1   1   1

bcd 14  25  7

cde 20  11  35

I used awk command like this

awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath

and I got a result below:

abc 3

bcd 46

cde 66

I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.

This question already has an answer here:

How to calculate the sum of the data that have the same ID in the first column?

4 answers

shell-script text-processing awk numeric-data

edited Nov 27 at 11:40

terdon♦

126k31244421

asked Nov 27 at 6:05

awkprob

232

New contributor

edited Nov 27 at 11:40

terdon♦

126k31244421

asked Nov 27 at 6:05

awkprob

232

New contributor

edited Nov 27 at 11:40

terdon♦

126k31244421

edited Nov 27 at 11:40

terdon♦

126k31244421

edited Nov 27 at 11:40

terdon♦

126k31244421

asked Nov 27 at 6:05

awkprob

232

New contributor

asked Nov 27 at 6:05

awkprob

232

asked Nov 27 at 6:05

awkprob

232

New contributor

awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

3 Answers
3

active

oldest

votes

up vote
4
down vote

accepted

You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.

This is similar to Inian's answer,
but trivially extendable to handle any number of columns:

awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}

        END {for(i in a) {

                printf "%s", i

                for (n=2; n<=4; ++n) printf "t%s", a[i][n]

                printf "n"

             }

        }'

Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.

answered Nov 27 at 6:27

Scott

6,77642650

Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

|
show 3 more comments

up vote
4
down vote

So long as your file is tab-delimited, datamash is a good fit for this.

$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath

abc     1       1       1

bcd     14      25      7

cde     20      11      35

Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.

Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:

sed -i 's/ +/t/g' tablefilepath

answered Nov 27 at 6:12

cryptarch

3766

1

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

add a comment |

up vote
2
down vote

Using awk summing up the columns 2-4 based on 1.

awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file

answered Nov 27 at 6:17

Inian

3,805824

add a comment |

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
4
down vote

accepted

You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.

This is similar to Inian's answer,
but trivially extendable to handle any number of columns:

awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}

        END {for(i in a) {

                printf "%s", i

                for (n=2; n<=4; ++n) printf "t%s", a[i][n]

                printf "n"

             }

        }'

Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.

answered Nov 27 at 6:27

Scott

6,77642650

Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

|
show 3 more comments

up vote
4
down vote

accepted

You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.

This is similar to Inian's answer,
but trivially extendable to handle any number of columns:

awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}

        END {for(i in a) {

                printf "%s", i

                for (n=2; n<=4; ++n) printf "t%s", a[i][n]

                printf "n"

             }

        }'

Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.

answered Nov 27 at 6:27

Scott

6,77642650

Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

|
show 3 more comments

up vote
4
down vote

accepted

You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.

This is similar to Inian's answer,
but trivially extendable to handle any number of columns:

awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}

        END {for(i in a) {

                printf "%s", i

                for (n=2; n<=4; ++n) printf "t%s", a[i][n]

                printf "n"

             }

        }'

Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.

answered Nov 27 at 6:27

Scott

6,77642650

You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.

This is similar to Inian's answer,
but trivially extendable to handle any number of columns:

awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}

        END {for(i in a) {

                printf "%s", i

                for (n=2; n<=4; ++n) printf "t%s", a[i][n]

                printf "n"

             }

        }'

Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.

answered Nov 27 at 6:27

Scott

6,77642650

answered Nov 27 at 6:27

Scott

6,77642650

answered Nov 27 at 6:27

Scott

6,77642650

answered Nov 27 at 6:27

Scott

6,77642650

Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

|
show 3 more comments

Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

Why limit it at all? Why not

awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'

? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

Why limit it at all? Why not

awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'

? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

|
show 3 more comments

up vote
4
down vote

So long as your file is tab-delimited, datamash is a good fit for this.

$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath

abc     1       1       1

bcd     14      25      7

cde     20      11      35

Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.

sed -i 's/ +/t/g' tablefilepath

answered Nov 27 at 6:12

cryptarch

3766

1

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

add a comment |

up vote
4
down vote

So long as your file is tab-delimited, datamash is a good fit for this.

$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath

abc     1       1       1

bcd     14      25      7

cde     20      11      35

Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.

sed -i 's/ +/t/g' tablefilepath

answered Nov 27 at 6:12

cryptarch

3766

1

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

add a comment |

up vote
4
down vote

So long as your file is tab-delimited, datamash is a good fit for this.

$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath

abc     1       1       1

bcd     14      25      7

cde     20      11      35

Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.

sed -i 's/ +/t/g' tablefilepath

answered Nov 27 at 6:12

cryptarch

3766

So long as your file is tab-delimited, datamash is a good fit for this.

$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath

abc     1       1       1

bcd     14      25      7

cde     20      11      35

Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.

sed -i 's/ +/t/g' tablefilepath

answered Nov 27 at 6:12

cryptarch

3766

answered Nov 27 at 6:12

cryptarch

3766

answered Nov 27 at 6:12

cryptarch

3766

answered Nov 27 at 6:12

cryptarch

3766

1

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

add a comment |

1

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

add a comment |

up vote
2
down vote

Using awk summing up the columns 2-4 based on 1.

awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file

answered Nov 27 at 6:17

Inian

3,805824

add a comment |

up vote
2
down vote

Using awk summing up the columns 2-4 based on 1.

awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file

answered Nov 27 at 6:17

Inian

3,805824

add a comment |

up vote
2
down vote

Using awk summing up the columns 2-4 based on 1.

awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file

answered Nov 27 at 6:17

Inian

3,805824

Using awk summing up the columns 2-4 based on 1.

awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file

answered Nov 27 at 6:17

Inian

3,805824

answered Nov 27 at 6:17

Inian

3,805824

answered Nov 27 at 6:17

Inian

3,805824

answered Nov 27 at 6:17

Inian

3,805824

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vrftjkry