Data Wrangling with Python Datatable - Row-wise Transformations
Link to Source data
Task: Get the difference between the maximum and minimum values per row for Value
columns.
from datatable import dt, f, update
df = dt.Frame({'Ind': [1, 2, 3],
'Department': ['Electronics', 'Clothing', 'Grocery'],
'Value1': [5, 4, 3],
'Value2': [4, 3, 3],
'Value3': [3, 2, 5],
'Value4': [2, 1, 1]})
| Ind Department Value1 Value2 Value3 Value4
| int32 str32 int32 int32 int32 int32
-- + ----- ----------- ------ ------ ------ ------
0 | 1 Electronics 5 4 3 2
1 | 2 Clothing 4 3 2 1
2 | 3 Grocery 3 3 5 1
[3 rows x 6 columns]
SOLUTION
- Step 1 : Filter for columns that start with
Value
and prefix with thef
symbol
value_columns = [f[name] for name in df.names if "Value" in name]
value_columns
[FExpr<f['Value1']>,
FExpr<f['Value2']>,
FExpr<f['Value3']>,
FExpr<f['Value4']>]
- Step 2 : Create an
f-expression
of the difference between the row maximum and row minimum ofvalue_columns
. Note that there is no execution at this point; the execution of af-expression
only occurs within the brackets of a datatable frame.
max_min_diff = dt.rowmax(value_columns) - dt.rowmin(value_columns)
max_min_diff
FExpr<(?) - (?)>
- Step 3: Apply
max_min_diff
to the datatable frame to get the results
df[:, update(difference = max_min_diff)]
df
| Ind Department Value1 Value2 Value3 Value4 diff
| int32 str32 int32 int32 int32 int32 int32
-- + ----- ----------- ------ ------ ------ ------ ----------
0 | 1 Electronics 5 4 3 2 3
1 | 2 Clothing 4 3 2 1 3
2 | 3 Grocery 3 3 5 1 4
[3 rows x 7 columns]
Resources: