Adventures in Machine Learning

Mastering Data Selection in Pandas with loc and iloc Functions

Overview of loc and iloc functions

The loc and iloc functions are Pandas DataFrame methods used for selection (indexing and slicing) of elements in the DataFrame according to the index label or integer position. The loc function is used with index labels, while iloc is used with integer positions.

The loc function operates on two parameters: rows and columns. Rows are specified by index labels or Boolean arrays, while columns can be specified as index labels or Boolean arrays.

You can use the colon (:) to select ranges of rows and columns. The iloc function works the same way as loc, but with integer positions, not index labels.

Use of loc function

The loc function is incredibly useful for row and column selection, as well as filtering data based on criteria. Let’s see some examples:

Creating the DataFrame for loc example

Pandas is all about DataFrames. Before we explore the loc function in detail, we need a DataFrame to work with.

Let’s create one using Pandas’ built-in ‘read_csv’ function.

import pandas as pd
data = pd.read_csv('mydata.csv')
print(data.head())

Using loc for row selection

The loc function can select specific rows based on index labels or Boolean arrays. By default, Pandas assigns an integer index to each row, starting from 0, but this can be overridden with custom index labels.

To select a row using loc, we need to provide a single index label that specifies the row we want. Let’s select the first row of our DataFrame, which has an index label of 0:

data.loc[0]

Using loc for row and column selection

Let’s say we want to select specific rows and columns from our DataFrame. We can use the loc function with Boolean arrays or index labels to filter our data.

Here’s an example:

data.loc[(data['Age'] > 30) & (data['Gender'] == 'Male'), ['Name', 'Age']]

In the example above, we used Boolean filters to select only the rows where the Age is greater than 30 and the Gender is male. We also specified two columns – Name and Age – to display by passing a list of column names as the second parameter.

Using loc with: for range selection

The colon (:) can be used with the loc function to select a range of rows or columns based on index labels. To select a range of rows, we provide two index labels separated by a colon.

Here’s an example that selects all the rows between index 3 and index 7:

data.loc[3:7]

To select a range of columns, we provide two column labels separated by a colon. Here’s an example that selects all the columns between ‘Name’ and ‘Age’:

data.loc[:, 'Name':'Age']

Use of iloc function

The iloc function is incredibly useful when you need to select specific rows and columns from a DataFrame based on their integer positions. Here are some examples:

Using iloc for row selection

We can use the iloc function to select specific rows from the DataFrame based on their integer position. To select the first row of our DataFrame, which has an integer position of 0, we can use the following code:

data.iloc[0]

Using iloc for row and column selection

We can use the iloc function to select specific rows and columns from the DataFrame based on their integer position. Here’s an example:

data.iloc[[0, 2, 4], [1, 3]]

In the example above, we used a list of integer positions to select the first, third, and fifth rows and columns 1 and 3.

By passing a list of integer positions as the first parameter, we can select multiple rows or columns. Using iloc with: for range selection

The iloc function can be used to select a range of rows or columns based on their integer positions.

Here’s an example:

data.iloc[3:7] # selects the rows between positions 3 and 7
data.iloc[:, 1:4] # selects the columns between positions 1 and 4

Conclusion

Pandas’ loc and iloc functions are powerful tools for selecting elements from DataFrames. The loc function is used with index labels, while iloc is used with integer positions.

Both functions can select specific rows and columns by using Boolean arrays or index labels/integers. Remember to use the colon (:) with both loc and iloc functions to select ranges of rows and columns.

With this knowledge, you should be able to navigate and analyze complex datasets more effectively.

Example 2 – How to Use iloc in Pandas

In our previous section, we learned that iloc is used for row and column selection based on integer positions. In this section, we will explore how to use iloc to filter our data based on those specific positions.

We will use a similar DataFrame to Example 1 but with more data.

Creating the DataFrame for iloc example

Let’s create a new DataFrame to work with:

import pandas as pd

data = {'Name': ['Adam', 'Bailey', 'Charles', 'David', 'Emily'],
        'Age': [27, 34, 19, 44, 38],
        'Gender': ['Male', 'Female', 'Male', 'Male', 'Female'],
        'Salary': [50000, 60000, 40000, 80000, 65000],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Seattle', 'Denver']}

df = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E'])

Let’s see some use cases of the iloc function.

Using iloc for row selection

We can select rows from a DataFrame based on their row index. For example, to select the first row of our DataFrame, we can use the following code:

df.iloc[0]

Output:

Name         Adam
Age            27
Gender       Male
Salary     50000
City     New York
Name: A, dtype: object

To select multiple rows, we can use iloc with a list of integer positions.

df.iloc[[1, 3, 4]]

Output:

      Name  Age  Gender  Salary         City
B  Bailey   34  Female   60000  Los Angeles
D    David   44    Male   80000      Seattle
E    Emily   38  Female   65000       Denver

Using iloc for row and column selection

We can use iloc to select specific rows and columns of a DataFrame. For example, to select the first two rows of our DataFrame, and their first two columns, we can use the following code:

df.iloc[0:2, 0:2]

Output:

     Name  Age
A    Adam   27
B  Bailey   34

In the above example, we used integer positions to specify both rows (0 to 2) and columns (0 to 2). Note that the upper-bound range value is exclusive of the selected element.

We can also use iloc with a list of integer positions to select specific rows and columns:

df.iloc[[1, 3], [0, 3]]

Output:

    Name  Salary
B  Bailey   60000
D   David   80000

Using iloc with: for range selection

We can use iloc with the : operator to select a range of rows or columns. For example, to return the first three rows of the DataFrame,

df.iloc[:3,:]

Output:

      Name  Age  Gender  Salary         City
A     Adam   27    Male   50000     New York
B   Bailey   34  Female   60000  Los Angeles
C  Charles   19    Male   40000      Chicago

Similarly, we can select a range of columns by specifying the range of column positions.

df.iloc[:, 1:4]

Output:

   Age  Gender  Salary
A   27    Male   50000
B   34  Female   60000
C   19    Male   40000
D   44    Male   80000
E   38  Female   65000

Additional Resources

Pandas is a powerful library with a variety of complex functions that you can use to work with data effectively. If you are new to pandas, you might want to consider starting with these common operations and functions:

  • Data input/output (read and write data from/to various sources such as CSV, excel spreadsheet, database)
  • Data selection (loc, iloc, boolean indexing)
  • Data aggregation (groupby, pivot tables)
  • Data cleaning and manipulation (merging, concatenating, pivoting, reshaping data)
  • Data visualization

If you’re looking to learn more about these operations, there are many comprehensive tutorials and documentation available.

Here are some helpful Pandas resources to get you started:

Don’t forget to practice your new skills with Pandas in Jupyter notebooks. Jupyter notebooks allow you to experiment and iterate quickly, visualize data in real-time, and document your analysis in code.

In this article, we explored the difference between loc and iloc functions in Pandas data selection. While loc is used for indexing by label, iloc is used for indexing by integer position.

We learned how to use both functions to select and filter rows and columns from a DataFrame, and how to use them to select ranges of rows and columns. By understanding loc and iloc and their usage, you can effectively extract meaningful insights from your datasets.

Remember to consult additional resources such as the Pandas documentation, tutorials, and cheat sheet to further improve your Pandas knowledge and skills. With these takeaway points, you can apply Pandas effectively to your data analysis tasks.

Popular Posts