Python Select Columns

If we have a DataFrame & would like to access or select a specific few rows/columns from that DataFrame, we can use square brackets or other advanced methods such as loc
& iloc
.
Selecting Columns Using Square Brackets
Now suppose that we want to select the country column from the brics
DataFrame. To achieve this, we will type brics
& then the column label inside the square brackets.
Selecting a Column

Checking the Type of the Object
Let’s check the type of the object that gets returned with the type function.

As we can see from the above output, we are dealing with a pandas
series here! Series could be thought of as a one-dimensional array that could be labeled just like a DataFrame.
If we want to select data & keep it in a DataFrame, we will need to use double square brackets:

If we check the type of this output, it’s a DataFrame! With only one column, though.

Selecting Multiple Columns
We can extend this call to select two columns. Let’s try to select country
& capital
.

If we look at this closely, we are actually putting a list with column labels inside another set of square brackets & end up with a sub DataFrame containing only the country
& capital
columns.
Selecting Rows Using Square Brackets
Square brackets can do more than just selecting columns. We can also use them to get rows, or observations, from a DataFrame.
Example
We can only select rows using square brackets if we specify a slice, like 0:4
. Also, we’re using the integer indexes of the rows here, not the row labels!
To get the second, third, & fourth rows of brics
DataFrame, we use the slice 1 through 4. Remember that end the of the slice is exclusive, & the index starts at zero.

These square brackets work, but they only offer limited functionality. Ideally, we would want something similar to 2D Numpy
arrays, where we also use square brackets. The index, or slice, before the comma refers to the rows, & the slice after the comma refers to the columns.
Example of 2D Numpy
array:

If we want to do something similar with pandas
, we need to look at using the loc
& iloc
functions.
loc
: label-basediloc
: integer position-based
loc
Function
loc
is a technique to select parts of our data based on labels. Let's look at the brics DataFrame & get the rows for Russia.
To achieve this, we will put the label of interest in square brackets after loc
.
Selecting Rows

We get a pandas
series containing all of the rows information; inconveniently, though, it is shown on different lines. To get a DataFrame, we have to put the RU
sting in another pair of brackets. We can also select multiple rows at the same time. Suppose we want to also include India & China. Simply add those row labels to the list.

The difference between using a loc
& basic square brackets is that we can extend the selection with a comma & a specification of the columns of interest.
Selecting Rows & Columns
Let’s extend the previous call to only include the country
&capital
columns. We add a comma & list the column labels we want to keep. The intersection gets returned.

We can also use loc
to select all rows but only a specific number of columns. Simply replace the first list that specifies the row labels with a colon. A slice going from beginning to end. This time, we get back all of the rows but only two columns.
Selecting All Rows & Specific Columns

iloc
Function
The iloc
function allows us to subset pandas
DataFrames based on their position or index.
Selecting Rows
Let’s use the same data & similar examples as we did for loc
. Let's start by getting the row for Russia.

To get the rows for Russia, India, & China. We can now use a list of index 1, 2, 3
.

Selecting Rows & Columns
Similar to loc
, we can also select both rows & columns using iloc
. Here, we will select rows for Russia, India, & China and columns country
& capital
.

Selecting All Rows & Specific Columns
Finally, if we wanted to select all rows but just keep the country
& capital
columns, we can:

loc
& iloc
functions are pretty similar. The only difference is how we refer to columns & rows.
Interactive Example on Selecting a Subset of Data
In the following example, the cars data is imported from a CSV file as a Pandas DataFrame. To select only the cars_per_cap column
from cars, we can use:

The single bracket version gives a Pandas Series; the double bracket version gives a Pandas DataFrame.
- We will use single square brackets to print out the
country
column ofcars
as a Pandas Series. - Then use double square brackets to print out the
country
column ofcars
as a Pandas DataFrame. - Finally, use the double square brackets to print out a DataFrame with both the
country
&drives_right
columns ofcars
, in this order.

When we run the above code, it produces the following result:

RELATED LINKS