Como calcular o valor médio (por coluna) dos dados que possuem o mesmo ID na primeira coluna?

2

Eu quero calcular o valor médio dos meus dados quando eles tiverem o mesmo ID na primeira coluna. Eu quero calcular os valores médios para as colunas restantes que têm o mesmo valor na primeira coluna. Eu sei como fazer isso por uma coluna, mas tenho dificuldade em fazer a média para todas as colunas que tenho.

Aqui estão meus dados de amostra Nome do arquivo: abraham

year    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
1929    7.5 0   7.78    3.695   1.139   0.186   -0.074  -0.205  -0.323  -0.434  -0.539  -0.639  -0.735  -0.916  -1.085  -1.241  -1.387  -1.522  -1.818  -2.058  -2.247  -2.389  -2.488  -2.551  -2.582  -2.521  -2.404  -2.262  -2.118  -1.987  -1.872  -1.775  -1.691  -1.628  -1.578  -1.484  -1.395  -1.311  -1.229  -1.151  -1.075  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1929    6.11    0   6.112   3.536   1.227   0.252   -0.063  -0.201  -0.32   -0.43   -0.535  -0.634  -0.73   -0.91   -1.078  -1.234  -1.379  -1.512  -1.806  -2.046  -2.236  -2.378  -2.478  -2.54   -2.573  -2.515  -2.402  -2.262  -2.119  -1.988  -1.873  -1.776  -1.692  -1.629  -1.578  -1.484  -1.395  -1.311  -1.229  -1.151  -1.075  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1929    2.22    0   2.227   2.542   1.022   0.222   -0.058  -0.198  -0.317  -0.427  -0.531  -0.63   -0.725  -0.905  -1.072  -1.227  -1.371  -1.505  -1.798  -2.038  -2.227  -2.37   -2.471  -2.533  -2.567  -2.512  -2.4    -2.262  -2.12   -1.989  -1.874  -1.776  -1.693  -1.63   -1.579  -1.485  -1.395  -1.311  -1.229  -1.151  -1.075  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1929    2.78    0   2.779   1.993   0.819   0.18    -0.056  -0.195  -0.314  -0.424  -0.527  -0.626  -0.721  -0.9    -1.066  -1.221  -1.364  -1.498  -1.79   -2.03   -2.219  -2.362  -2.464  -2.526  -2.562  -2.509  -2.399  -2.262  -2.12   -1.99   -1.875  -1.777  -1.693  -1.63   -1.579  -1.485  -1.396  -1.311  -1.229  -1.151  -1.075  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1930    0.83    0   0.833   0.85    0.354   0.073   -0.052  -0.187  -0.305  -0.414  -0.517  -0.614  -0.708  -0.885  -1.049  -1.202  -1.344  -1.477  -1.768  -2.006  -2.196  -2.34   -2.442  -2.508  -2.549  -2.5    -2.395  -2.261  -2.122  -1.992  -1.877  -1.779  -1.695  -1.631  -1.58   -1.485  -1.396  -1.311  -1.23   -1.151  -1.076  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1930    -0.28   0   -0.278  0.461   0.242   0.049   -0.052  -0.185  -0.302  -0.411  -0.513  -0.61   -0.704  -0.88   -1.044  -1.196  -1.338  -1.47   -1.761  -1.999  -2.188  -2.332  -2.435  -2.502  -2.544  -2.498  -2.394  -2.261  -2.122  -1.992  -1.878  -1.779  -1.695  -1.631  -1.581  -1.486  -1.396  -1.312  -1.23   -1.152  -1.076  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1930    0.56    0.002   0.511   0.348   0.165   0.033   -0.052  -0.183  -0.299  -0.407  -0.509  -0.607  -0.7    -0.875  -1.038  -1.19   -1.332  -1.463  -1.753  -1.991  -2.18   -2.325  -2.429  -2.496  -2.54   -2.495  -2.393  -2.261  -2.123  -1.993  -1.878  -1.78   -1.696  -1.632  -1.581  -1.486  -1.396  -1.312  -1.23   -1.152  -1.076  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1930    3.89    0   3.884   0.96    0.243   0.041   -0.052  -0.181  -0.297  -0.404  -0.506  -0.603  -0.696  -0.87   -1.033  -1.184  -1.325  -1.457  -1.746  -1.983  -2.173  -2.318  -2.422  -2.49   -2.536  -2.492  -2.391  -2.26   -2.123  -1.994  -1.879  -1.78   -1.696  -1.632  -1.581  -1.486  -1.397  -1.312  -1.23   -1.152  -1.076  -1.002  -0.86   -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.558
1931    -7.22   0.057   -4.563  -0.299  0.035   0.007   -0.054  -0.175  -0.286  -0.391  -0.49   -0.585  -0.676  -0.847  -1.007  -1.156  -1.295  -1.425  -1.711  -1.947  -2.136  -2.282  -2.389  -2.46   -2.514  -2.478  -2.384  -2.259  -2.125  -1.996  -1.882  -1.783  -1.698  -1.633  -1.582  -1.487  -1.397  -1.312  -1.231  -1.152  -1.076  -1.003  -0.86   -0.722  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.307   0.444   0.558
1931    -6.11   0.057   -4.1    -0.663  0.004   -0.001  -0.055  -0.174  -0.284  -0.388  -0.487  -0.581  -0.672  -0.843  -1.002  -1.151  -1.289  -1.419  -1.704  -1.94   -2.129  -2.275  -2.382  -2.454  -2.51   -2.476  -2.383  -2.258  -2.125  -1.997  -1.883  -1.783  -1.698  -1.633  -1.583  -1.487  -1.397  -1.312  -1.231  -1.152  -1.076  -1.003  -0.86   -0.722  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.307   0.444   0.558
1931    -10.28  0.057   -6.362  -1.527  0   -0.001  -0.055  -0.173  -0.283  -0.386  -0.484  -0.578  -0.668  -0.839  -0.997  -1.145  -1.283  -1.413  -1.697  -1.933  -2.122  -2.268  -2.376  -2.448  -2.506  -2.473  -2.382  -2.258  -2.125  -1.998  -1.883  -1.784  -1.698  -1.633  -1.583  -1.487  -1.398  -1.312  -1.231  -1.152  -1.076  -1.003  -0.86   -0.722  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.307   0.444   0.558
1931    -10 0.057   -7.018  -2.259  -0.001  -0.001  -0.056  -0.172  -0.281  -0.384  -0.481  -0.575  -0.665  -0.834  -0.992  -1.14   -1.278  -1.406  -1.691  -1.926  -2.115  -2.261  -2.369  -2.443  -2.501  -2.47   -2.38   -2.258  -2.125  -1.998  -1.884  -1.784  -1.699  -1.634  -1.583  -1.487  -1.398  -1.313  -1.231  -1.152  -1.076  -1.003  -0.86   -0.722  -0.588  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.307   0.444   0.558

saída desejada

year    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
1929    4.6525  0   4.7245  2.9415  1.05175 0.21    -0.06275    -0.19975    -0.3185 -0.42875    -0.533  -0.63225    -0.72775    -0.90775    -1.07525    -1.23075    -1.37525    -1.50925    -1.803  -2.043  -2.23225    -2.37475    -2.47525    -2.5375 -2.571  -2.51425    -2.40125    -2.262  -2.11925    -1.9885 -1.8735 -1.776  -1.69225    -1.62925    -1.5785 -1.4845 -1.39525    -1.311  -1.229  -1.151  -1.075  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1930    1.25    0.0005  1.2375  0.65475 0.251   0.049   -0.052  -0.184  -0.30075    -0.409  -0.51125    -0.6085 -0.702  -0.8775 -1.041  -1.193  -1.33475    -1.46675    -1.757  -1.99475    -2.18425    -2.32875    -2.432  -2.499  -2.54225    -2.49625    -2.39325    -2.26075    -2.1225 -1.99275    -1.878  -1.7795 -1.6955 -1.6315 -1.58075    -1.48575    -1.39625    -1.31175    -1.23   -1.15175    -1.076  -1.002  -0.85925    -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.55875
1931    -8.4025 0.057   -5.51075    -1.187  0.0095  0.001   -0.055  -0.1735 -0.2835 -0.38725    -0.4855 -0.57975    -0.67025    -0.84075    -0.9995 -1.148  -1.28625    -1.41575    -1.70075    -1.9365 -2.1255 -2.2715 -2.379  -2.45125    -2.50775    -2.47425    -2.38225    -2.25825    -2.125  -1.99725    -1.883  -1.7835 -1.69825    -1.63325    -1.58275    -1.487  -1.3975 -1.31225    -1.231  -1.152  -1.076  -1.003  -0.86   -0.722  -0.58725    -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.307   0.444   0.558
    
por AiB 07.10.2013 / 08:05

3 respostas

2

Aqui está uma solução Perl. Ele não espera que os anos sejam classificados, mas precisa ler o arquivo inteiro (bem, as somas de cada coluna, pelo menos) na memória. Contanto que seu arquivo não seja muito grande, ele deve fazer o que você deseja:

#!/usr/bin/env perl

## Read the input file, line by line
while (<>){
    ## Print and skip the 1st line
    if($.==1){
    print; 
    next
    }; 
    ## Split the fileds into the array @F
    @F=split(/\s+/);

    ## $k (%k) is a hash of hashes. The first element of 
    ## $F ($F[0]) is the year, so $k{$year}{tot} will be 
    ## the nuber of lines for each year.
    $k{$F[0]}{tot}++;

    ## Go through each field add its value
    ## to $k{$year}{field number}
    for($i=1;$i<=$#F;$i++){
    $k{$F[0]}{$i}+=$F[$i];
    }
 }
## Now print the data for each year
foreach $y (sort keys(%k)){
    ## $y is the year
    print "$y "; 
    ## $k{$y}{$i} is the sum of the values for this column/year
    ## and $k{$y}{tot} is the number of lines for this year
    for($i=1;$i<=$#F;$i++){
    print " " . $k{$y}{$i}/$k{$y}{tot};

    }
    print "\n";
}

E aqui está o mesmo script que um forro (longo):

$ perl -ane '$.==1 && do {print; next}; $k{$F[0]}{tot}++;
           for($i=1;$i<=$#F;$i++){$k{$F[0]}{$i}+=$F[$i];}
           END{foreach $y (sort keys(%k)){
                    print "$y "; 
                    for($i=1;$i<=$#F;$i++){
                      print " " . $k{$y}{$i}/$k{$y}{tot};
                    } print "\n";} }' abraham 
year    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
1929  4.6525 0 4.7245 2.9415 1.05175 0.21 -0.06275 -0.19975 -0.3185 -0.42875 -0.533 -0.63225 -0.72775 -0.90775 -1.07525 -1.23075 -1.37525 -1.50925 -1.803 -2.043 -2.23225 -2.37475 -2.47525 -2.5375 -2.571 -2.51425 -2.40125 -2.262 -2.11925 -1.9885 -1.8735 -1.776 -1.69225 -1.62925 -1.5785 -1.4845 -1.39525 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930  1.25 0.0005 1.2375 0.65475 0.251 0.049 -0.052 -0.184 -0.30075 -0.409 -0.51125 -0.6085 -0.702 -0.8775 -1.041 -1.193 -1.33475 -1.46675 -1.757 -1.99475 -2.18425 -2.32875 -2.432 -2.499 -2.54225 -2.49625 -2.39325 -2.26075 -2.1225 -1.99275 -1.878 -1.7795 -1.6955 -1.6315 -1.58075 -1.48575 -1.39625 -1.31175 -1.23 -1.15175 -1.076 -1.002 -0.85925 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.55875
1931  -8.4025 0.057 -5.51075 -1.187 0.0095 0.001 -0.055 -0.1735 -0.2835 -0.38725 -0.4855 -0.57975 -0.67025 -0.84075 -0.9995 -1.148 -1.28625 -1.41575 -1.70075 -1.9365 -2.1255 -2.2715 -2.379 -2.45125 -2.50775 -2.47425 -2.38225 -2.25825 -2.125 -1.99725 -1.883 -1.7835 -1.69825 -1.63325 -1.58275 -1.487 -1.3975 -1.31225 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.58725 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
    
por 07.10.2013 / 16:47
2

Você pode usar uma matriz para armazenar uma soma dos valores de cada coluna e, quando $ 1 for alterado, obtenha a média a partir de lá.

# print header
NR==1 {
    print
    next
}

# print average of each column per year
#  then, reset columns sums and number of lines
function print_sum() {
    printf prev
    # needs GNU awk, for length of array
    for (i=2; i < length(sum) + 2; i++) {
            printf FS sum[i]/nlines
            sum[i] = 0
    }
    printf ORS
    nlines = 0
}

# print average when $1 changes, but not the first time
# also, on end of script
NR>2 && prev!=$1 { print_sum() }
END              { print_sum() }

# for every line with the same $1, sum column values, increment number of lines
{
    prev=$1;
    nlines++
    for (i=2; i <= NF; i++) {
            sum[i]+=$i
    }
}
    
por 07.10.2013 / 12:55
1

Com gnu datamash :

{ head -n 1; datamash -s -g 1 $(printf 'mean %s ' {2..54}); } <infile

em que 54 é o número de colunas. Isso pressupõe que seus campos são separados por tabulações. Se eles são separados por espaço

{ head -n 1; datamash -t ' ' -s -g 1 $(printf 'mean %s ' {2..54}); } <infile

ou se estiverem separados por vários espaços em branco (espaços / tabulações):

{ head -n 1; datamash -W -s -g 1 $(printf 'mean %s ' {2..54}); } <infile

Observe que a saída será classificada pela primeira coluna.

    
por 27.09.2015 / 22:25