Image for post
Image for post

One alternative to using a loop to iterate over a DataFrame is to use the pandas .apply() method. This function acts as a map() function in Python. It takes a function as an input and applies this function to an entire DataFrame.

If you are working with tabular data, you must specify an axis you want your function to act on (0 for columns; and 1 for rows).

Much like the map() function, the apply()method can also be used with anonymous functions or lambda functions. Let's look at some apply() examples using baseball data.

Much like the map() function, the apply()method can also be used with anonymous functions or lambda functions. Let's look at some apply() examples using baseball data.

Calculating Run Differentials With .apply()

First, you will call the .apply() method on the basebal_df dataframe. Then use the lambdafunction to iterate over the rows of the dataframe. For every row, we grab the RS and RA columns and pass them to the calc_run_diff function. Finally, you will specify the axis=1 to tell the .apply() method that we want to apply it on the rows instead of columns.

Image for post
Image for post

You’ll notice that we don’t need to use a forloop. You can collect the run differentials directly into an object called run_diffs_apply. After creating a new column and printing the dataframe you will notice that the results are similar to what you will get with the .iterrows() method.

Image for post
Image for post

Example Using .apply()

The Tampa Bay Rays want you to analyze their data.

They’d like the following metrics:

  • The sum of each column in the data
  • The total amount of runs scored in a year ('RS' + 'RA' for each year)
  • The 'Playoffs' column in text format rather than using 1's and 0's

The below function can be used to convert the 'Playoffs' column to text:

Image for post
Image for post

Use .apply() to get these metrics. A DataFrame (rays_df) has been printed below. This DataFrame is indexed on the 'Year'column.

Image for post
Image for post
  • Apply sum() to each column of the rays_dfto collect the sum of each column. Be sure to specify the correct axis.
Image for post
Image for post

When we run the above code, it produces the following result:

Image for post
Image for post

RELATED LINKS

Written by

Data Scientist & Machine Learning Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store