Data Wrangling with Python Datatable - Conditional Statements

Transformation based on a Single Condition

Source data

Task: if c is positive, then value should be a - b, else b - a.

from datatable import dt, f, update, ifelse

df = dt.Frame({"a": [1, 1, 2, 4], 
               "b": [2, 2, 3, 2], 
               "c": [3, -3, 2, -1]})
df

   |     a      b      c
   | int32  int32  int32
-- + -----  -----  -----
 0 |     1      2      3
 1 |     1      2     -3
 2 |     2      3      2
 3 |     4      2     -1
[4 rows x 3 columns]

The ifelse function handles conditional transformations. It is similar to python's if...else statements or SQL's case_when function.

BREAKDOWN OF SOLUTION

  • Step 1 : Define the condition, with the True and False values.
condition = f.c >= 0 # positive values
true = f.a - f.b
false = f.b - f.a
  • Step 2 : Create the ifelse expression.
if_statement = ifelse(condition, true, false)
  • Step 3: Apply the if_statement to the datatable frame to get the results
df[:, update(b = if_statement)]
df

   |     a      b      c
   | int32  int32  int32
-- + -----  -----  -----
 0 |     1     -1      3
 1 |     1      1     -3
 2 |     2     -1      2
 3 |     4     -2     -1
[4 rows x 3 columns]

Transformation based on Multiple Conditions

Source data

df = dt.Frame({"Type": ["A", "B", "B", "C"], 
               "Set": ["Z", "Z", "X", "Y"]})
df

   | Type   Set  
   | str32  str32
-- + -----  -----
 0 | A      Z    
 1 | B      Z    
 2 | B      X    
 3 | C      Y    
[4 rows x 2 columns]

Task:

  1. if Set is equal to 'Z' and Type is equal to 'A' then assign 'yellow' to color.
  2. If Set is equal to 'Z' and Type is equal to 'B' then assign 'blue' to color.
  3. If Type is equal to 'B' then assign 'purple' to color.
  4. Otherwise, assign 'black' to color.

BREAKDOWN OF SOLUTION

  • Step 1 : Define the conditions, with the True and False values.
condition1 = (f.Set == "Z") & (f.Type == "A")
true1 = "yellow"
condition2 = (f.Set == "Z") & (f.Type == "B")
true2 = "blue"
condition3 = f.Type == "B"
true3 = "purple"
false = "black"
  • Step 2 : Create the ifelse expression.
if_statements = ifelse(condition1, true1, 
                       condition2, true2, 
                       condition3, true3, 
                       false)
  • Step 3: Apply the if_statements to the datatable frame to get the results
df[:, update(color = if_statements)]
df

   | Type   Set    color 
   | str32  str32  str32 
-- + -----  -----  ------
 0 | A      Z      yellow
 1 | B      Z      blue  
 2 | B      X      purple
 3 | C      Y      black 
[4 rows x 3 columns]

Resources: