Data Wrangling with Python Datatable - Replicate Pandas' Map Function

Link to Source data

Task: Create a new column by mapping the values in the city column to the city_state dictionary

from datatable import dt, f

raw_data = {"first_name": ["Jason", "Molly", 
                           "Tina", "Jake", "Amy"],
            "last_name": ["Miller", "Jacobson", 
                          "Ali", "Milner", "Cooze"],
            "age": [42, 52, 36, 24, 73],
            "city": ["San Francisco", "Baltimore", 
                     "Miami", "Douglas", "Boston"] }

df = dt.Frame(raw_data)
df

   | first_name  last_name    age  city         
   | str32       str32      int32  str32        
-- + ----------  ---------  -----  -------------
 0 | Jason       Miller        42  San Francisco
 1 | Molly       Jacobson      52  Baltimore    
 2 | Tina        Ali           36  Miami        
 3 | Jake        Milner        24  Douglas      
 4 | Amy         Cooze         73  Boston
city_to_state = { "San Francisco": "California",
                  "Baltimore": "Maryland",
                  "Miami": "Florida",
                  "Douglas": "Arizona",
                  "Boston": "Massachusetts" }
city_to_state

{'San Francisco': 'California',
 'Baltimore': 'Maryland',
 'Miami': 'Florida',
 'Douglas': 'Arizona',
 'Boston': 'Massachusetts'}

Solution:

  • Create a temporary dataframe to hold the values in city.
m = df['city']
m
   | city         
   | str32        
-- + -------------
 0 | San Francisco
 1 | Baltimore    
 2 | Miami        
 3 | Douglas      
 4 | Boston       
[5 rows x 1 column]
  • Replace the values in m with city_to_state, by using the replace function. Note that the replace function does not require assignment, as the computation is done inplace:
m.replace(city_to_state)
m

   | city         
   | str32        
-- + -------------
 0 | California   
 1 | Maryland     
 2 | Florida      
 3 | Arizona      
 4 | Massachusetts
[5 rows x 1 column]
  • Assign m to new column state in df:
df["state"] = m
df

   | first_name  last_name    age  city           state        
   | str32       str32      int32  str32          str32        
-- + ----------  ---------  -----  -------------  -------------
 0 | Jason       Miller        42  San Francisco  California   
 1 | Molly       Jacobson      52  Baltimore      Maryland     
 2 | Tina        Ali           36  Miami          Florida      
 3 | Jake        Milner        24  Douglas        Arizona      
 4 | Amy         Cooze         73  Boston         Massachusetts
[5 rows x 5 columns]