Skip to main content

LangChain Output Parsers: Extracting Structured Data

LLMs output text. When you need structured data—JSON, CSV, function calls, Python objects—you must parse the text and validate the result. LangChain's output parsers automate this, converting unstructured LLM responses into validated schemas. They also handle retries when parsing fails, eliminating manual error handling.

I spent a day building custom regex and JSON parsing logic until a colleague showed me PydanticOutputParser. One line replaced my entire parsing module, with built-in retry logic and validation.

String and JSON Output Parsers

The simplest parsers extract strings or JSON:

from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

model = ChatOpenAI(model="gpt-4o-mini", temperature=0.5)

# StrOutputParser extracts the string response
chain = (
ChatPromptTemplate.from_template("Summarize: {text}")
| model
| StrOutputParser()
)

result = chain.invoke({"text": "LangChain is..."})
print(result) # Plain string

# JsonOutputParser extracts valid JSON
json_parser = JsonOutputParser()
chain = (
ChatPromptTemplate.from_template("""
Return a JSON object with 'sentiment' and 'confidence':
{text}
""")
| model
| json_parser
)

result = chain.invoke({"text": "This is amazing!"})
print(result) # {"sentiment": "positive", "confidence": 0.95}

StrOutputParser strips message wrappers and returns plain text. JsonOutputParser validates the response is valid JSON; if not, it raises an exception.

Structured Output with Pydantic Models

For type-safe, validated structured data, use PydanticOutputParser with Pydantic models:

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# Define the output schema
class SentimentAnalysis(BaseModel):
text: str = Field(description="The input text analyzed")
sentiment: str = Field(
description="The sentiment: positive, negative, or neutral"
)
confidence: float = Field(
description="Confidence score between 0 and 1"
)
keywords: list[str] = Field(
description="List of key topics"
)

# Create parser
parser = PydanticOutputParser(pydantic_object=SentimentAnalysis)

# Format prompt with parser's instructions
prompt = ChatPromptTemplate.from_template("""
Analyze the sentiment of the following text.
{format_instructions}

Text: {text}
""")

model = ChatOpenAI(model="gpt-4o-mini", temperature=0.5)

# Chain it
chain = (
prompt.partial(
format_instructions=parser.get_format_instructions()
)
| model
| parser
)

result = chain.invoke({
"text": "LangChain makes LLM apps so much easier!"
})

print(result)
# Output: SentimentAnalysis(
# text="LangChain...",
# sentiment="positive",
# confidence=0.98,
# keywords=["LangChain", "LLM", "ease"]
# )

# Access typed attributes
print(f"Sentiment: {result.sentiment}")
print(f"Keywords: {result.keywords}")

The parser.get_format_instructions() automatically generates instructions telling the model how to format JSON. The parser then validates that the response matches the schema; if not, it raises a PydanticOutputParserException.

Comma-Separated Value (CSV) and List Parsers

For comma-separated outputs:

from langchain_core.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()

chain = (
ChatPromptTemplate.from_template(
"Generate 5 Python design patterns: {format_instructions}"
).partial(
format_instructions=parser.get_format_instructions()
)
| model
| parser
)

result = chain.invoke({})
print(result)
# ["Singleton", "Factory", "Observer", "Strategy", "Decorator"]

Simpler than JSON for flat lists; the parser splits on commas and strips whitespace.

Combining Parsers with Retry Logic

When parsing fails, OutputParserException is raised. Use with_fallback() or explicit retry with OutputParserException handling:

from langchain_core.output_parsers import OutputParserException
from langchain_core.runnables import RunnableLambda
from tenacity import stop_after_attempt, wait_exponential

def parse_with_retry(text, parser, max_retries=3):
"""Retry parsing up to max_retries times."""
for attempt in range(max_retries):
try:
return parser.parse(text)
except OutputParserException as e:
if attempt == max_retries - 1:
raise
print(f"Parse attempt {attempt + 1} failed, retrying...")

# Wrap parser
chain = (
prompt
| model
| RunnableLambda(
lambda x: parse_with_retry(x.content, parser)
)
)

Better: use OutputParserWithRetry which reruns the LLM if parsing fails:

from langchain.output_parsers.retry import RetryOutputParser

# Automatically rerun the chain if parsing fails
retry_parser = RetryOutputParser.from_llm_and_parser(
parser=PydanticOutputParser(pydantic_object=SentimentAnalysis),
llm=model,
max_retries=3
)

chain = prompt | model | retry_parser

This asks the LLM to fix its output format, avoiding silent failures.

Custom Output Parsers

For specialized needs, subclass BaseOutputParser:

from langchain_core.output_parsers import BaseOutputParser

class TabSeparatedValuesParser(BaseOutputParser):
"""Parse tab-separated values."""

def parse(self, text: str) -> dict:
lines = text.strip().split('\n')
result = {}
for line in lines:
key, value = line.split('\t')
result[key.strip()] = value.strip()
return result

parser = TabSeparatedValuesParser()
result = parser.parse("Name\tAlice\nAge\t30")
# {"Name": "Alice", "Age": "30"}

Parser Comparison Table

ParserInputOutputUse Case
StrOutputParserTextStringSimple text extraction
JsonOutputParserJSON textDictUntyped JSON structures
PydanticOutputParserJSON textPydantic modelTyped, validated structures
CommaSeparatedListOutputParserCSV textList of stringsFlat lists, categories
RetryOutputParserAny + LLMParsed objectAuto-retry failed parses
CustomAny formatAny typeDomain-specific formats

Key Takeaways

  • StrOutputParser extracts plain text from LLM responses
  • JsonOutputParser validates and parses JSON objects
  • PydanticOutputParser ensures type-safe, validated structured outputs using Pydantic schemas
  • CommaSeparatedListOutputParser splits comma-separated text into lists
  • RetryOutputParser automatically reruns the LLM if parsing fails
  • Custom parsers let you handle domain-specific formats
  • Always use get_format_instructions() to guide the LLM toward valid output

Frequently Asked Questions

What happens if the LLM doesn't produce valid JSON?

JsonOutputParser raises OutputParserException. Use RetryOutputParser to ask the LLM to fix it, or catch the exception and retry manually.

Can I parse nested JSON with PydanticOutputParser?

Yes. Define nested Pydantic models: class Response(BaseModel): metadata: dict; items: list[Item]. The parser validates the entire tree.

How do I make some fields optional?

Use Field(default=None) or Optional[Type]: score: Optional[float] = Field(default=None). The parser won't require optional fields.

Does parsing add latency?

Minimal—it's just text/JSON parsing in Python. The latency bottleneck is the LLM API call, not parsing.

Can I validate the parsed output after parsing?

Yes. Add custom validation in Pydantic using @validator decorators or implement parse() to add post-processing logic.

Further Reading