Data Wrangling with Python Datatable - Transformations Within a GroupBy

  • Task: Group data by Date and Zip and get the ratio of Price to maximum Price per group.
from datatable import dt, f, update, by

df = dt.Frame( { "Date": ["2019-01-01", "2019-01-01", 
                          "2019-01-01",  "2019-01-01",
                          "2019-01-01"],
                 "Zip": [90102, 90102, 90102, 90102, 90103],
                 "Price": [58.02, 81.55, 11.97, 93.23, 13.68]})
   | Date          Zip    Price
   | str32       int32  float64
-- + ----------  -----  -------
 0 | 2019-01-01  90102    58.02
 1 | 2019-01-01  90102    81.55
 2 | 2019-01-01  90102    11.97
 3 | 2019-01-01  90102    93.23
 4 | 2019-01-01  90103    13.68
[5 rows x 3 columns]

SOLUTION

df[:, update(Ratio=f.Price / dt.max(f.Price)), by("Date", "Zip")]
df

   | Date          Zip    Price     Ratio
   | str32       int32  float64   float64
-- + ----------  -----  -------  --------
 0 | 2019-01-01  90102    58.02  0.622332
 1 | 2019-01-01  90102    81.55  0.874718
 2 | 2019-01-01  90102    11.97  0.128392
 3 | 2019-01-01  90102    93.23  1       
 4 | 2019-01-01  90103    13.68  1       
[5 rows x 4 columns]

Resources: