Skip to main content

Pandas (Part 2): Reading and writing data (CSV, Excel)

Following our Introduction to Series and DataFrames, this article explores Pandas (Part 2): Reading and writing data (CSV, Excel). A common task in data science is to read data from various file formats and to write data out to files.


📚 Prerequisites

  • Understanding of Pandas DataFrames.

🎯 Article Outline: What You'll Master

  • Reading CSV Files: How to read data from a CSV file into a DataFrame.
  • Writing to CSV Files: How to save a DataFrame to a CSV file.
  • Reading Excel Files: How to read data from an Excel file.
  • Writing to Excel Files: How to save a DataFrame to an Excel file.

🧠 Section 1: The Core Concepts of Data I/O

Pandas provides a rich set of functions for reading and writing data in various formats. The most common are read_csv and to_csv for CSV files, and read_excel and to_excel for Excel files. These functions have many parameters to handle different file structures and encodings.


💻 Section 2: Deep Dive - Implementation and Walkthrough

To work with Excel files, you may need to install openpyxl:

pip install openpyxl
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Tom', 'Nick', 'John'], 'Age': [20, 21, 19]}
df = pd.DataFrame(data)

# --- CSV ---

# Write to a CSV file
df.to_csv('data.csv', index=False)

# Read from a CSV file
df_from_csv = pd.read_csv('data.csv')
print(df_from_csv)

# --- Excel ---

# Write to an Excel file
df.to_excel('data.xlsx', sheet_name='Sheet1', index=False)

# Read from an Excel file
df_from_excel = pd.read_excel('data.xlsx', sheet_name='Sheet1')
print(df_from_excel)

💡 Conclusion & Key Takeaways

You've learned how to read data from and write data to CSV and Excel files using Pandas. This is a fundamental skill for any data analysis task.

Let's summarize the key takeaways:

  • Pandas provides easy-to-use functions for data I/O.
  • read_csv and to_csv are for CSV files.
  • read_excel and to_excel are for Excel files.
  • The index=False parameter is often used to avoid writing the DataFrame index to the file.

➡️ Next Steps

In the next article, "Pandas (Part 3): Data selection and indexing", we will learn how to select and filter data from a DataFrame.


Glossary

  • CSV: Comma-Separated Values. A common text file format for tabular data.
  • I/O: Input/Output.

Further Reading