Pandas (Part 3): Data selection and indexing
Following our lesson on Reading and writing data, this article explores Pandas (Part 3): Data selection and indexing. Selecting and filtering data is one of the most common tasks in data analysis.
📚 Prerequisites
- Understanding of Pandas DataFrames.
🎯 Article Outline: What You'll Master
- ✅ Selecting Columns: How to select a single column or multiple columns.
- ✅ Selecting Rows: Using
.locfor label-based indexing and.ilocfor position-based indexing. - ✅ Boolean Indexing: Filtering data based on conditions.
- ✅ Setting Data: How to set new values in a DataFrame.
🧠 Section 1: The Core Concepts of Data Selection
Pandas provides a variety of ways to select data. The most common are:
[]: Selects columns by name..loc[]: Selects rows and columns by label..iloc[]: Selects rows and columns by integer position.
💻 Section 2: Deep Dive - Implementation and Walkthrough
import pandas as pd
import numpy as np
# Create a sample DataFrame
dates = pd.date_range('20230101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
# Selecting a single column
print(df['A'])
# Selecting multiple columns
print(df[['A', 'B']])
# Selecting rows by label
print(df.loc['20230102':'20230104'])
# Selecting rows by position
print(df.iloc[3])
# Boolean indexing
print(df[df['A'] > 0])
# Setting a new column
df['F'] = ['foo', 'bar', 'baz', 'qux', 'quux', 'corge']
print(df)
💡 Conclusion & Key Takeaways
You've learned the fundamental techniques for selecting and filtering data in a Pandas DataFrame.
Let's summarize the key takeaways:
- Use
[]for selecting columns. - Use
.locfor label-based selection and.ilocfor position-based selection. - Boolean indexing is a powerful way to filter data.
➡️ Next Steps
In the next article, "Pandas (Part 4): Data cleaning and preparation", we'll learn how to handle missing data and perform other data cleaning tasks.
Glossary
- Indexing: The process of selecting data from a DataFrame.
- Label: The name of a row or column.
- Position: The integer location of a row or column.