Python (e a biblioteca pandas
em particular é muito adequada para esse tipo de trabalho
data = """count id type
588 10 | 3
10 12 | 3
883 14 | 3
98 17 | 3
17 18 | 1
77598 18 | 3
10000 21 | 3
17892 2 | 3
20000 23 | 3
63 27 | 3
6 3 | 3
2446 35 | 3
14 4 | 3
15 4 | 1
253 4 | 2
19857 4 | 3
1000 5 | 3"""
import pandas as pd
from io import StringIO # to read from string, not needed to read from file
df = pd.read_csv(StringIO(data), sep=sep='\s+\|?\s*', index_col=None, engine='python')
Isto lê os dados csv para um pandas DataFrame
count id type
0 588 10 3
1 10 12 3
2 883 14 3
3 98 17 3
4 17 18 1
5 77598 18 3
6 10000 21 3
7 17892 2 3
8 20000 23 3
9 63 27 3
10 6 3 3
11 2446 35 3
12 14 4 3
13 15 4 1
14 253 4 2
15 19857 4 3
16 1000 5 3
Depois, agrupamos esses dados por id
e calculamos a soma da coluna count
df_sum = df.groupby(('type', 'id'))['count'].sum().unstack('type').fillna(0)
A unstack
remodela isso para mova os id's para as colunas, e o fillna
preenche os campos vazios com 0's
df_sum.to_csv()
Isso retorna
id,1,2,3
2,0.0,0.0,17892.0
3,0.0,0.0,6.0
4,15.0,253.0,19871.0
5,0.0,0.0,1000.0
10,0.0,0.0,588.0
12,0.0,0.0,10.0
14,0.0,0.0,883.0
17,0.0,0.0,98.0
18,17.0,0.0,77598.0
21,0.0,0.0,10000.0
23,0.0,0.0,20000.0
27,0.0,0.0,63.0
35,0.0,0.0,2446.0
Como o dataframe contém dados ausentes (combinações de tipo de ID vazias), os pandas transformam o int
s em float
(limitação do funcionamento interno)
Se você souber que as entradas serão apenas int, você poderá alterar a penúltima linha para df_sum = df.groupby(('type', 'id'))['count'].sum().unstack('type').fillna(0).astype(int)