Python Select Columns

Image for post
Image for post

If we have a DataFrame & would like to access or select a specific few rows/columns from that DataFrame, we can use square brackets or other advanced methods such as loc & iloc.

Selecting Columns Using Square Brackets

Now suppose that we want to select the country column from the brics DataFrame. To achieve this, we will type brics & then the column label inside the square brackets.

Selecting a Column

Image for post
Image for post

Checking the Type of the Object

Let’s check the type of the object that gets returned with the type function.

Image for post
Image for post

As we can see from the above output, we are dealing with a pandas series here! Series could be thought of as a one-dimensional array that could be labeled just like a DataFrame.

If we want to select data & keep it in a DataFrame, we will need to use double square brackets:

Image for post
Image for post

If we check the type of this output, it’s a DataFrame! With only one column, though.

Image for post
Image for post

Selecting Multiple Columns

We can extend this call to select two columns. Let’s try to select country & capital.

Image for post
Image for post

If we look at this closely, we are actually putting a list with column labels inside another set of square brackets & end up with a sub DataFrame containing only the country & capitalcolumns.

Selecting Rows Using Square Brackets

Square brackets can do more than just selecting columns. We can also use them to get rows, or observations, from a DataFrame.

Example

We can only select rows using square brackets if we specify a slice, like 0:4. Also, we’re using the integer indexes of the rows here, not the row labels!

To get the second, third, & fourth rows of brics DataFrame, we use the slice 1 through 4. Remember that end the of the slice is exclusive, & the index starts at zero.

Image for post
Image for post

These square brackets work, but they only offer limited functionality. Ideally, we would want something similar to 2D Numpy arrays, where we also use square brackets. The index, or slice, before the comma refers to the rows, & the slice after the comma refers to the columns.

Example of 2D Numpy array:

Image for post
Image for post

If we want to do something similar with pandas, we need to look at using the loc & ilocfunctions.

  • loc: label-based
  • iloc: integer position-based

loc Function

loc is a technique to select parts of our data based on labels. Let's look at the brics DataFrame & get the rows for Russia.

To achieve this, we will put the label of interest in square brackets after loc.

Selecting Rows

Image for post
Image for post

We get a pandas series containing all of the rows information; inconveniently, though, it is shown on different lines. To get a DataFrame, we have to put the RU sting in another pair of brackets. We can also select multiple rows at the same time. Suppose we want to also include India & China. Simply add those row labels to the list.

Image for post
Image for post

The difference between using a loc & basic square brackets is that we can extend the selection with a comma & a specification of the columns of interest.

Selecting Rows & Columns

Let’s extend the previous call to only include the country &capital columns. We add a comma & list the column labels we want to keep. The intersection gets returned.

Image for post
Image for post

We can also use loc to select all rows but only a specific number of columns. Simply replace the first list that specifies the row labels with a colon. A slice going from beginning to end. This time, we get back all of the rows but only two columns.

Selecting All Rows & Specific Columns

Image for post
Image for post

iloc Function

The iloc function allows us to subset pandas DataFrames based on their position or index.

Selecting Rows

Let’s use the same data & similar examples as we did for loc. Let's start by getting the row for Russia.

Image for post
Image for post

To get the rows for Russia, India, & China. We can now use a list of index 1, 2, 3.

Image for post
Image for post

Selecting Rows & Columns

Similar to loc, we can also select both rows & columns using iloc. Here, we will select rows for Russia, India, & China and columns country & capital.

Image for post
Image for post

Selecting All Rows & Specific Columns

Finally, if we wanted to select all rows but just keep the country & capital columns, we can:

Image for post
Image for post

loc & iloc functions are pretty similar. The only difference is how we refer to columns & rows.

Interactive Example on Selecting a Subset of Data

In the following example, the cars data is imported from a CSV file as a Pandas DataFrame. To select only the cars_per_cap column from cars, we can use:

Image for post
Image for post

The single bracket version gives a Pandas Series; the double bracket version gives a Pandas DataFrame.

  • We will use single square brackets to print out the country column of cars as a Pandas Series.
  • Then use double square brackets to print out the country column of cars as a Pandas DataFrame.
  • Finally, use the double square brackets to print out a DataFrame with both the country & drives_right columns of cars, in this order.
Image for post
Image for post

When we run the above code, it produces the following result:

Image for post
Image for post

RELATED LINKS

Written by

Data Scientist & Machine Learning Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store