Skip to main content

Pandas (Part 1): Introduction to Series and DataFrames

Following our exploration of NumPy, this article introduces Pandas (Part 1): Introduction to Series and DataFrames. Pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation tool, built on top of the Python programming language.


📚 Prerequisites

  • Basic understanding of Python.
  • Familiarity with NumPy is helpful but not required.

🎯 Article Outline: What You'll Master

  • Installation: How to install Pandas.
  • Core Concepts: What Series and DataFrames are.
  • Creating Series and DataFrames: Different ways to create these data structures.
  • Basic Inspection: How to view the first few rows, get summary statistics, and see the data types.

🧠 Section 1: The Core Concepts of Series and DataFrames

  • Series: A one-dimensional labeled array capable of holding any data type. The labels are collectively called the index.
  • DataFrame: A two-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or a SQL table. It is the most commonly used pandas object.

💻 Section 2: Deep Dive - Implementation and Walkthrough

2.1 - Installation

pip install pandas

2.2 - Creating Series and DataFrames

import pandas as pd

# Creating a Series
s = pd.Series([1, 3, 5, np.nan, 6, 8])

# Creating a DataFrame from a dictionary
data = {'Name': ['Tom', 'Nick', 'John', 'Tom'],
'Age': [20, 21, 19, 20]}
df = pd.DataFrame(data)

2.3 - Basic Inspection

# View the first 5 rows
print(df.head())

# View the last 3 rows
print(df.tail(3))

# Get descriptive statistics
print(df.describe())

# Get information about the DataFrame
print(df.info())

💡 Conclusion & Key Takeaways

You've learned about the two primary data structures in Pandas: Series and DataFrames. You've also seen how to create them and perform some basic inspections.

Let's summarize the key takeaways:

  • Pandas is the go-to library for data manipulation and analysis in Python.
  • Series are for one-dimensional data, and DataFrames are for two-dimensional data.
  • Pandas provides many convenient functions for quickly inspecting your data.

➡️ Next Steps

In the next article, "Pandas (Part 2): Reading and writing data (CSV, Excel)", we'll learn how to get data into and out of Pandas.


Glossary

  • Pandas: A Python library for data manipulation and analysis.
  • Series: A one-dimensional labeled array.
  • DataFrame: A two-dimensional labeled data structure.

Further Reading