Data Wrangling with Python Datatable - Row-wise Transformations

datatable docs

Task: Get the difference between the maximum and minimum values per row for Value columns.

from datatable import dt, f, update

df = dt.Frame({'Ind': [1, 2, 3],
               'Department': ['Electronics', 'Clothing', 'Grocery'],
               'Value1': [5, 4, 3],
               'Value2': [4, 3, 3],
               'Value3': [3, 2, 5],
               'Value4': [2, 1, 1]})
   |   Ind  Department   Value1  Value2  Value3  Value4
   | int32  str32         int32   int32   int32   int32
-- + -----  -----------  ------  ------  ------  ------
 0 |     1  Electronics       5       4       3       2
 1 |     2  Clothing          4       3       2       1
 2 |     3  Grocery           3       3       5       1
[3 rows x 6 columns]

SOLUTION

  • Step 1 : Filter for columns that start with Value and prefix with the f symbol
value_columns = [f[name] for name in df.names if "Value" in name]
value_columns

[FExpr<f['Value1']>,
 FExpr<f['Value2']>,
 FExpr<f['Value3']>,
 FExpr<f['Value4']>]
  • Step 2 : Create an f-expression of the difference between the row maximum and row minimum of value_columns. Note that there is no execution at this point; the execution of a f-expression only occurs within the brackets of a datatable frame.
max_min_diff = dt.rowmax(value_columns) - dt.rowmin(value_columns)
max_min_diff
FExpr<(?) - (?)>
  • Step 3: Apply max_min_diff to the datatable frame to get the results
df[:, update(difference = max_min_diff)]
df

   |   Ind  Department   Value1  Value2  Value3  Value4        diff
   | int32  str32         int32   int32   int32   int32       int32
-- + -----  -----------  ------  ------  ------  ------  ----------
 0 |     1  Electronics       5       4       3       2           3
 1 |     2  Clothing          4       3       2       1           3
 2 |     3  Grocery           3       3       5       1           4
[3 rows x 7 columns]

Resources: