How to get sum of values in column based on variables in other column separately? [duplicate]
up vote
4
down vote
favorite
This question already has an answer here:
How to calculate the sum of the data that have the same ID in the first column?
4 answers
I have a table data like below
abc 1 1 1
bcd 2 2 4
bcd 12 23 3
cde 3 5 5
cde 3 4 5
cde 14 2 25
I want the sum of values in each column based on variables in first column and desired result is like below:
abc 1 1 1
bcd 14 25 7
cde 20 11 35
I used awk command like this
awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath
and I got a result below:
abc 3
bcd 46
cde 66
I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.
shell-script text-processing awk numeric-data
New contributor
awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
up vote
4
down vote
favorite
This question already has an answer here:
How to calculate the sum of the data that have the same ID in the first column?
4 answers
I have a table data like below
abc 1 1 1
bcd 2 2 4
bcd 12 23 3
cde 3 5 5
cde 3 4 5
cde 14 2 25
I want the sum of values in each column based on variables in first column and desired result is like below:
abc 1 1 1
bcd 14 25 7
cde 20 11 35
I used awk command like this
awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath
and I got a result below:
abc 3
bcd 46
cde 66
I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.
shell-script text-processing awk numeric-data
New contributor
awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
This question already has an answer here:
How to calculate the sum of the data that have the same ID in the first column?
4 answers
I have a table data like below
abc 1 1 1
bcd 2 2 4
bcd 12 23 3
cde 3 5 5
cde 3 4 5
cde 14 2 25
I want the sum of values in each column based on variables in first column and desired result is like below:
abc 1 1 1
bcd 14 25 7
cde 20 11 35
I used awk command like this
awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath
and I got a result below:
abc 3
bcd 46
cde 66
I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.
shell-script text-processing awk numeric-data
New contributor
awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
This question already has an answer here:
How to calculate the sum of the data that have the same ID in the first column?
4 answers
I have a table data like below
abc 1 1 1
bcd 2 2 4
bcd 12 23 3
cde 3 5 5
cde 3 4 5
cde 14 2 25
I want the sum of values in each column based on variables in first column and desired result is like below:
abc 1 1 1
bcd 14 25 7
cde 20 11 35
I used awk command like this
awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath
and I got a result below:
abc 3
bcd 46
cde 66
I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.
This question already has an answer here:
How to calculate the sum of the data that have the same ID in the first column?
4 answers
shell-script text-processing awk numeric-data
shell-script text-processing awk numeric-data
New contributor
awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited Nov 27 at 11:40
terdon♦
126k31244421
126k31244421
New contributor
awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked Nov 27 at 6:05
awkprob
232
232
New contributor
awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
4
down vote
accepted
You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.
This is similar to Inian's answer,
but trivially extendable to handle any number of columns:
awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=2; n<=4; ++n) printf "t%s", a[i][n]
printf "n"
}
}'
Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.
Why limit it at all? Why notawk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why usefor (n=2; n<=4; ++n)
in theEND{}
block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I getbcd
,abc
,cde
); but that can be fixed by pipingawk
intosort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
|
show 3 more comments
up vote
4
down vote
So long as your file is tab-delimited, datamash is a good fit for this.
$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
abc 1 1 1
bcd 14 25 7
cde 20 11 35
Datamash can also work with non-tabs, if you specify -t <delimiter>
. But tabs seem closest to the example input you have provided.
Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:
sed -i 's/ +/t/g' tablefilepath
1
At least in recent versions, there's a-W
(--whitespace
) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
add a comment |
up vote
2
down vote
Using awk
summing up the columns 2-4 based on 1.
awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i] }' file
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.
This is similar to Inian's answer,
but trivially extendable to handle any number of columns:
awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=2; n<=4; ++n) printf "t%s", a[i][n]
printf "n"
}
}'
Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.
Why limit it at all? Why notawk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why usefor (n=2; n<=4; ++n)
in theEND{}
block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I getbcd
,abc
,cde
); but that can be fixed by pipingawk
intosort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
|
show 3 more comments
up vote
4
down vote
accepted
You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.
This is similar to Inian's answer,
but trivially extendable to handle any number of columns:
awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=2; n<=4; ++n) printf "t%s", a[i][n]
printf "n"
}
}'
Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.
Why limit it at all? Why notawk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why usefor (n=2; n<=4; ++n)
in theEND{}
block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I getbcd
,abc
,cde
); but that can be fixed by pipingawk
intosort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
|
show 3 more comments
up vote
4
down vote
accepted
up vote
4
down vote
accepted
You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.
This is similar to Inian's answer,
but trivially extendable to handle any number of columns:
awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=2; n<=4; ++n) printf "t%s", a[i][n]
printf "n"
}
}'
Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.
You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.
This is similar to Inian's answer,
but trivially extendable to handle any number of columns:
awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=2; n<=4; ++n) printf "t%s", a[i][n]
printf "n"
}
}'
Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.
answered Nov 27 at 6:27
Scott
6,77642650
6,77642650
Why limit it at all? Why notawk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why usefor (n=2; n<=4; ++n)
in theEND{}
block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I getbcd
,abc
,cde
); but that can be fixed by pipingawk
intosort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
|
show 3 more comments
Why limit it at all? Why notawk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why usefor (n=2; n<=4; ++n)
in theEND{}
block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I getbcd
,abc
,cde
); but that can be fixed by pipingawk
intosort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
Why limit it at all? Why not
awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why use for (n=2; n<=4; ++n)
in the END{}
block instead of just iterating over the array so you don't need to keep track of its size?– terdon♦
Nov 27 at 11:46
Why limit it at all? Why not
awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why use for (n=2; n<=4; ++n)
in the END{}
block instead of just iterating over the array so you don't need to keep track of its size?– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "
for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd
, abc
, cde
); but that can be fixed by piping awk
into sort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.– Scott
Nov 27 at 19:10
@terdon: Thanks for dropping by. "
for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd
, abc
, cde
); but that can be fixed by piping awk
into sort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
|
show 3 more comments
up vote
4
down vote
So long as your file is tab-delimited, datamash is a good fit for this.
$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
abc 1 1 1
bcd 14 25 7
cde 20 11 35
Datamash can also work with non-tabs, if you specify -t <delimiter>
. But tabs seem closest to the example input you have provided.
Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:
sed -i 's/ +/t/g' tablefilepath
1
At least in recent versions, there's a-W
(--whitespace
) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
add a comment |
up vote
4
down vote
So long as your file is tab-delimited, datamash is a good fit for this.
$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
abc 1 1 1
bcd 14 25 7
cde 20 11 35
Datamash can also work with non-tabs, if you specify -t <delimiter>
. But tabs seem closest to the example input you have provided.
Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:
sed -i 's/ +/t/g' tablefilepath
1
At least in recent versions, there's a-W
(--whitespace
) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
add a comment |
up vote
4
down vote
up vote
4
down vote
So long as your file is tab-delimited, datamash is a good fit for this.
$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
abc 1 1 1
bcd 14 25 7
cde 20 11 35
Datamash can also work with non-tabs, if you specify -t <delimiter>
. But tabs seem closest to the example input you have provided.
Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:
sed -i 's/ +/t/g' tablefilepath
So long as your file is tab-delimited, datamash is a good fit for this.
$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
abc 1 1 1
bcd 14 25 7
cde 20 11 35
Datamash can also work with non-tabs, if you specify -t <delimiter>
. But tabs seem closest to the example input you have provided.
Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:
sed -i 's/ +/t/g' tablefilepath
answered Nov 27 at 6:12
cryptarch
3766
3766
1
At least in recent versions, there's a-W
(--whitespace
) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
add a comment |
1
At least in recent versions, there's a-W
(--whitespace
) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
1
1
At least in recent versions, there's a
-W
(--whitespace
) option that should allow arbitrary whitespace delimiters– steeldriver
Nov 27 at 6:17
At least in recent versions, there's a
-W
(--whitespace
) option that should allow arbitrary whitespace delimiters– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
add a comment |
up vote
2
down vote
Using awk
summing up the columns 2-4 based on 1.
awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i] }' file
add a comment |
up vote
2
down vote
Using awk
summing up the columns 2-4 based on 1.
awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i] }' file
add a comment |
up vote
2
down vote
up vote
2
down vote
Using awk
summing up the columns 2-4 based on 1.
awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i] }' file
Using awk
summing up the columns 2-4 based on 1.
awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i] }' file
answered Nov 27 at 6:17


Inian
3,805824
3,805824
add a comment |
add a comment |