Data Wrangling with python Datatable: Aggregate columns into new columns based on prefix
# import libraries
from collections import defaultdict
from datatable import dt, f
df = dt.Frame({'sn': [1, 2, 3],
'C1-1': [4, 2, 1],
'C1-2': [3, 2, 2],
'C1-3': [5, 0, 0],
'H2-1': [4, 2, 0],
'H2-2': [1, 0, 2],
'K3-1': [4, 1, 1],
'K3-2': [2, 2, 2]})
df
| sn C1-1 C1-2 C1-3 H2-1 H2-2 K3-1 K3-2
| int32 int32 int32 int32 int32 int32 int32 int32
-- + ----- ----- ----- ----- ----- ----- ----- -----
0 | 1 4 3 5 4 1 4 2
1 | 2 2 2 0 2 0 1 2
2 | 3 1 2 0 0 2 1 2
[3 rows x 8 columns]
Create a dictionary where the key is the prefix, and the values are the columns that start with the prefix.
mapping = defaultdict(list)
for entry in df.names[1:]:
key = entry.split("-")[0]
key = f"total_{key}" # f-strings
mapping[key].append(f[entry]) # f-expressions
Create a dictionary containing f-expressions, that are essentially the rowsum of the values in mapping:
mapping = {key: dt.rowsum(value)
for key, value in mapping.items()}
mapping
{'total_C1': Expr:rowsum([FExpr<f['C1-1']>, FExpr<f['C1-2']>, FExpr<f['C1-3']>]; ),
'total_H2': Expr:rowsum([FExpr<f['H2-1']>, FExpr<f['H2-2']>]; ),
'total_K3': Expr:rowsum([FExpr<f['K3-1']>, FExpr<f['K3-2']>]; )}
Aggregate to create new columns
df[:, f.sn.extend(mapping)]
| sn total_C1 total_H2 total_K3
| int32 int32 int32 int32
-- + ----- -------- -------- --------
0 | 1 12 5 6
1 | 2 4 2 3
2 | 3 3 2 3
[3 rows x 4 columns]