Pandas (Part 2): Reading and writing data (CSV, Excel)
Following our Introduction to Series and DataFrames, this article explores Pandas (Part 2): Reading and writing data (CSV, Excel). A common task in data science is to read data from various file formats and to write data out to files.
📚 Prerequisites
- Understanding of Pandas DataFrames.
🎯 Article Outline: What You'll Master
- ✅ Reading CSV Files: How to read data from a CSV file into a DataFrame.
- ✅ Writing to CSV Files: How to save a DataFrame to a CSV file.
- ✅ Reading Excel Files: How to read data from an Excel file.
- ✅ Writing to Excel Files: How to save a DataFrame to an Excel file.
🧠 Section 1: The Core Concepts of Data I/O
Pandas provides a rich set of functions for reading and writing data in various formats. The most common are read_csv and to_csv for CSV files, and read_excel and to_excel for Excel files. These functions have many parameters to handle different file structures and encodings.
💻 Section 2: Deep Dive - Implementation and Walkthrough
To work with Excel files, you may need to install openpyxl:
pip install openpyxl
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Tom', 'Nick', 'John'], 'Age': [20, 21, 19]}
df = pd.DataFrame(data)
# --- CSV ---
# Write to a CSV file
df.to_csv('data.csv', index=False)
# Read from a CSV file
df_from_csv = pd.read_csv('data.csv')
print(df_from_csv)
# --- Excel ---
# Write to an Excel file
df.to_excel('data.xlsx', sheet_name='Sheet1', index=False)
# Read from an Excel file
df_from_excel = pd.read_excel('data.xlsx', sheet_name='Sheet1')
print(df_from_excel)
💡 Conclusion & Key Takeaways
You've learned how to read data from and write data to CSV and Excel files using Pandas. This is a fundamental skill for any data analysis task.
Let's summarize the key takeaways:
- Pandas provides easy-to-use functions for data I/O.
read_csvandto_csvare for CSV files.read_excelandto_excelare for Excel files.- The
index=Falseparameter is often used to avoid writing the DataFrame index to the file.
➡️ Next Steps
In the next article, "Pandas (Part 3): Data selection and indexing", we will learn how to select and filter data from a DataFrame.
Glossary
- CSV: Comma-Separated Values. A common text file format for tabular data.
- I/O: Input/Output.