PyMongo Basics: Connect, Insert, Query Documents
PyMongo is the official Python driver for MongoDB. It lets you connect to MongoDB servers, insert documents into collections, and query them with filters—all without writing SQL. PyMongo translates Python dictionaries to BSON (MongoDB's binary format) and back, making database code feel like native Python.
I first used PyMongo in 2019 when our team migrated from SQLAlchemy to MongoDB. The experience taught me that PyMongo's simplicity masks powerful features: automatic connection pooling, type hints for Python 3.8+, and seamless integration with async frameworks like asyncio. This guide walks you through every essential operation.
Installing PyMongo and Connecting to MongoDB
PyMongo is installed via pip. MongoDB can run locally (via Docker or a system package) or in the cloud (MongoDB Atlas, which is free for small deployments).
pip install pymongo pymongo[srv]
The [srv] extra enables MongoDB's SRV connection strings (required for MongoDB Atlas).
Once MongoDB is running, connect with PyMongo:
from pymongo import MongoClient
# Local MongoDB (default: localhost:27017)
client = MongoClient('mongodb://localhost:27017/')
# MongoDB Atlas (cloud)
# client = MongoClient('mongodb+srv://username:[email protected]/dbname?retryWrites=true&w=majority')
# Get a database
db = client['my_app']
# Get a collection (like a table)
users = db['users']
print(f"Connected. Collections: {db.list_collection_names()}")
PyMongo's MongoClient manages a connection pool by default. Do not create a new client for every operation; create one at startup and reuse it. If your Python process shuts down, the client automatically closes the connection pool.
Inserting Documents: insert_one and insert_many
A document is a Python dictionary. MongoDB adds an _id field (ObjectId) automatically if you do not provide one.
from pymongo import MongoClient
from datetime import datetime
client = MongoClient('mongodb://localhost:27017/')
db = client['blog_app']
users = db['users']
# Insert a single document
user = {
'name': 'Alice Chen',
'email': '[email protected]',
'age': 28,
'created_at': datetime.now(),
'tags': ['python', 'mongodb']
}
result = users.insert_one(user)
print(f"Inserted document ID: {result.inserted_id}")
# Insert many documents
many_users = [
{'name': 'Bob Smith', 'email': '[email protected]', 'age': 32},
{'name': 'Carol Johnson', 'email': '[email protected]', 'age': 25},
{'name': 'David Park', 'email': '[email protected]', 'age': 35},
]
result = users.insert_many(many_users)
print(f"Inserted {len(result.inserted_ids)} documents")
Each insert_one returns an InsertOneResult with inserted_id. The insert_many method is faster for bulk writes (up to 1000 documents per batch) because it sends them in a single network round-trip.
Querying Documents: find_one and find
Query documents using find_one() (returns one) or find() (returns a cursor that iterates through results).
# Find a single document by email
user = users.find_one({'email': '[email protected]'})
print(f"Found user: {user['name']}")
# Find all users older than 30
cursor = users.find({'age': {'$gt': 30}})
for user in cursor:
print(f"{user['name']}: {user['age']}")
# Find with multiple conditions (AND)
young_pythonistas = users.find({'age': {'$lt': 30}, 'tags': 'python'})
# Find with OR condition
new_or_old = users.find({
'$or': [
{'age': {'$lt': 25}},
{'age': {'$gt': 35}}
]
})
MongoDB uses operators like $gt (greater than), $lt (less than), $in (in list), and $or (logical OR). These are standard MongoDB query syntax translated directly to PyMongo.
Updating Documents: update_one and update_many
Update a document using update_one() (first match) or update_many() (all matches). Use the $set operator to update specific fields without replacing the entire document.
# Update a single document
result = users.update_one(
{'email': '[email protected]'},
{'$set': {'age': 29}}
)
print(f"Matched: {result.matched_count}, Modified: {result.modified_count}")
# Update many documents (add a 'verified' field)
result = users.update_many(
{'age': {'$gte': 30}},
{'$set': {'verified': True, 'updated_at': datetime.now()}}
)
print(f"Modified {result.modified_count} users")
# Increment a field
users.update_one({'email': '[email protected]'}, {'$inc': {'login_count': 1}})
# Push to an array (add a tag if not present)
users.update_one(
{'email': '[email protected]'},
{'$addToSet': {'tags': 'database'}} # Adds only if not already present
)
The update returns MatchedCount (how many documents matched the filter) and ModifiedCount (how many were actually changed). If your update does not match any documents, matched_count is zero—useful for detecting when an update failed silently.
Deleting Documents: delete_one and delete_many
Delete documents with delete_one() or delete_many().
# Delete a single user
result = users.delete_one({'email': '[email protected]'})
print(f"Deleted {result.deleted_count} user")
# Delete all unverified users
result = users.delete_many({'verified': {'$ne': True}})
print(f"Deleted {result.deleted_count} unverified users")
# Delete all documents in a collection
users.delete_many({})
A filter of {} matches all documents. Be careful with delete_many({}) in production—consider adding a safeguard or using soft deletes (a deleted_at field instead).
Handling Errors and Connection Loss
PyMongo raises exceptions if the MongoDB server is unreachable or if validation fails. Always wrap operations in try-except blocks.
from pymongo.errors import ServerSelectionTimeoutError, DuplicateKeyError
try:
users.insert_one({'email': '[email protected]'}) # Assume unique index on email
except DuplicateKeyError:
print("Error: email already exists")
users.update_one({'email': '[email protected]'}, {'$set': {'updated_at': datetime.now()}})
try:
user = users.find_one({'_id': ObjectId('invalid')})
except Exception as e:
print(f"Invalid ObjectId format: {e}")
# Retry on connection loss
import time
max_retries = 3
for attempt in range(max_retries):
try:
result = users.insert_one({'name': 'Test'})
print("Success")
break
except ServerSelectionTimeoutError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
raise
Key Takeaways
- PyMongo translates Python dictionaries to MongoDB documents; install with
pip install pymongo - Create one
MongoClientat app startup; reuse it across your application - Insert documents with
insert_one()orinsert_many(); use operators like$gt,$in,$orfor complex queries - Update with
$set,$inc,$addToSetoperators; delete withdelete_one()ordelete_many() - Always handle exceptions (
DuplicateKeyError,ServerSelectionTimeoutError) with try-except blocks - MongoDB's flexible schema allows different documents in the same collection, but adopt semi-schemas in code (Pydantic models) to catch type errors early
Frequently Asked Questions
What is ObjectId and why is it automatic?
ObjectId is a 12-byte unique identifier generated by MongoDB for every document. It encodes the timestamp, machine ID, and a random counter—ensuring global uniqueness without a central authority. If you do not provide an _id, MongoDB adds one. You can query by ObjectId: users.find_one({'_id': ObjectId('xyz')}).
Is PyMongo thread-safe?
Yes. MongoClient is thread-safe and maintains an internal connection pool (default 50 connections). Each thread borrows a connection from the pool, uses it, and returns it. Do not pass connections between threads; share the MongoClient instead.
How do I handle concurrent writes?
MongoDB uses optimistic concurrency control via the _version field pattern or pessimistic locking with transactions. For high-contention counters, use atomic operations like $inc. For complex multi-step updates, wrap them in a multi-document transaction (MongoDB 4.0+).
Can I use PyMongo with async/await?
Yes, using motor (async PyMongo). motor is a drop-in replacement: await users.insert_one(doc) instead of users.insert_one(doc). It is ideal for async web frameworks like FastAPI and Sanic.
What is the difference between find_one and find?
find_one() returns a single dictionary (or None). find() returns a cursor that lazily evaluates (does not fetch all results until you iterate). For single lookups, use find_one(); for bulk operations, use find() to save memory.