Prompt Templates and Engineering for LLMs
Prompt engineering is the art of structuring text input to guide LLMs toward desired outputs. Well-designed prompts improve quality by 10–30%, reduce hallucinations, and enable few-shot learning where the model learns from examples in the prompt itself. Prompt templates keep this structured and repeatable across your application.
This tutorial covers basic prompt design principles, instruction formatting, few-shot examples, system messages, output parsing, and common patterns like chain-of-thought. By the end, you'll systematically improve LLM output quality for your specific use case.
Basic Prompt Structure
A well-designed prompt has three parts: context, instruction, and formatting:
[Context] Explain [subject] in the context of [domain].
[Instruction] Be concise and cite sources.
[Formatting] Format your response as: 1. [key point] 2. [key point]
Here's a Python template:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained("mistral-community/Mistral-7B-Instruct-v0.3")
tokenizer = AutoTokenizer.from_pretrained("mistral-community/Mistral-7B-Instruct-v0.3")
# Template with placeholders
template = """You are a Python expert. Explain {topic} in simple terms suitable for beginners.
Format: 1 sentence definition, 2 practical examples, 1 common pitfall.
Topic: {topic}
Explanation:"""
# Fill placeholders
prompt = template.format(topic="list comprehensions")
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(**inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(response)
The template makes the same structure reusable for different topics, dates, or domains. This improves consistency and reduces the cognitive load of hand-crafting each prompt.
Few-Shot Learning
Show the model 1–3 examples of the desired behavior, and it learns the pattern without fine-tuning:
prompt = """Convert natural language to Python code. Include comments.
Example 1:
Natural language: "Check if a number is even"
Python code:
def is_even(n):
return n % 2 == 0 # Remainder 0 means divisible by 2
Example 2:
Natural language: "Sum all odd numbers from 1 to 10"
Python code:
total = sum(i for i in range(1, 11) if i % 2 == 1) # List comprehension
Your task:
Natural language: "Count vowels in a string"
Python code:"""
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(**inputs, max_length=150, temperature=0)
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(response)
Output:
def count_vowels(s):
vowels = "aeiouAEIOU"
return sum(1 for c in s if c in vowels) # Count characters in vowel set
Few-shot prompting is more effective than zero-shot (no examples) because it demonstrates the expected format and reasoning style. 2–3 examples typically max out the benefit; more examples don't help and waste tokens.
System Messages for Role-Based Prompts
A system message sets the persona and rules for the model's responses:
from transformers import pipeline
# Standard generation (no system message)
pipe = pipeline("text-generation", model="mistral-community/Mistral-7B-Instruct-v0.3")
system_prompt = """You are a debugging assistant. When given code, identify bugs and suggest fixes.
Be concise and focus on logic errors. Do not rewrite the entire code; highlight the specific issue."""
user_prompt = """Here's buggy code:
def get_first_even(nums):
for i in nums:
if i % 2 == 0:
return i
return None
nums = [1, 3, 5, 7]
result = get_first_even(nums)
print(result)
What's wrong?"""
# Combine into Mistral Instruct format
full_prompt = f"[INST]{system_prompt}\n\n{user_prompt}[/INST]"
result = pipe(full_prompt, max_length=200)
print(result[0]["generated_text"])
System messages work because Instruct-tuned models (Mistral-Instruct, Llama-2-Chat) are trained to respect the system role. Base models (Mistral-base, Llama-2) ignore system messages.
Chain-of-Thought Prompting
Ask the model to explain its reasoning before giving the final answer. This improves accuracy on complex tasks:
# Without chain-of-thought
prompt_direct = """If Alice has 3 apples and buys 2 more, then gives 1 to Bob, how many does she have?
Answer:"""
# With chain-of-thought
prompt_cot = """If Alice has 3 apples and buys 2 more, then gives 1 to Bob, how many does she have?
Let's think step by step:
1. Alice starts with 3 apples.
2. She buys 2 more, so she has 3 + 2 = 5 apples.
3. She gives 1 to Bob, so she has 5 - 1 = 4 apples.
Answer: 4"""
inputs = tokenizer(prompt_direct, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(**inputs, max_length=50, temperature=0)
print("Direct:", tokenizer.decode(output_ids[0], skip_special_tokens=True))
inputs = tokenizer(prompt_cot, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(**inputs, max_length=50, temperature=0)
print("CoT:", tokenizer.decode(output_ids[0], skip_special_tokens=True))
Chain-of-thought improves accuracy on math, logic, and reasoning tasks by 5–15% (Wei et al., 2023). The model's step-by-step reasoning helps it catch errors.
Structured Output with Parsing
Guide the model to output JSON or other structured formats, then parse:
import json
prompt = """Extract the entities from the text and output JSON.
Text: "Alice works at Google as a Senior Engineer. She lives in San Francisco."
Output JSON:
{"person": "Alice", "company": "Google", "role": "Senior Engineer", "location": "San Francisco"}
Text: "Bob is a data scientist at Meta in New York."
Output JSON:"""
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(**inputs, max_length=100, temperature=0)
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
# Extract JSON from response
try:
# Find JSON in response (model might add extra text)
json_start = response.find('{')
json_end = response.rfind('}') + 1
json_str = response[json_start:json_end]
parsed = json.loads(json_str)
print(parsed)
except json.JSONDecodeError:
print("Failed to parse JSON:", response)
Output:
{"person": "Bob", "company": "Meta", "role": "Data Scientist", "location": "New York"}
Structured prompts with examples of the JSON format drastically improve parsing success from ~30% (unprompted) to ~80% (with examples).
Common Prompt Patterns
Pattern 1: Instruct + Constraint
prompt = """Write a Python function to reverse a list.
Constraint: Do not use built-in reverse() or slicing ([::-1]).
Function:"""
Pattern 2: Role + Task + Format
prompt = """
Role: You are a code reviewer.
Task: Review this function for bugs and efficiency.
Format: List each issue as a bullet point.
Code:
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2) # Very slow for large n
Issues:"""
Pattern 3: Analogy + Explanation
prompt = """Explain neural networks using an analogy.
Analogy: A neural network is like a recipe that gets refined through practice.
The ingredients are input features, the steps are layers of computation,
and tasting the result teaches us how to adjust the recipe.
Explain how backpropagation works using this analogy:"""
Avoiding Common Pitfalls
Pitfall 1: Ambiguous Instructions
# Bad: "Write code"
bad_prompt = "Write some Python code."
# Good: "Write Python code to..."
good_prompt = "Write Python code to count the frequency of each word in a text file."
Pitfall 2: Missing Context
# Bad: Assumes model knows you're writing about AI
bad_prompt = "Explain the transformer architecture."
# Good: Sets context explicitly
good_prompt = "As a machine learning engineer explaining to a junior, describe how the transformer architecture works in NLP."
Pitfall 3: Over-Specification
# Bad: Too many constraints, conflicting
bad_prompt = "Write a short but comprehensive guide to Python. Include all basics. Use < 100 words but cover lists, dicts, functions, classes, etc."
# Good: Prioritize constraints
good_prompt = "Write a brief guide to Python lists (< 150 words). Include: definition, 2 examples, 1 common pitfall."
Prompt Engineering Workflow
- Write a baseline prompt — Define the task, role, and output format.
- Test on 5–10 examples — See where the model fails.
- Add examples (few-shot) — Include 1–3 examples of correct behavior.
- Refine instructions — Remove ambiguity; add constraints.
- Test edge cases — Try inputs the model might struggle with.
- Iterate on temperature — Increase for creativity, decrease for consistency.
- Optimize token count — Remove unnecessary words to save inference cost.
Key Takeaways
- Structure prompts with context, instruction, and formatting sections.
- Use few-shot examples to teach the model the desired behavior without fine-tuning.
- System messages set the role; Instruct-tuned models respect them.
- Chain-of-thought improves accuracy on reasoning tasks by 5–15%.
- Structured output (JSON) works best with 2–3 examples in the prompt.
Frequently Asked Questions
How many examples should I include in few-shot prompts?
1–3 examples are typical. More examples (4+) don't help; they waste tokens and can confuse the model. Zero-shot (no examples) works for simple tasks; complex tasks need at least 2 examples.
Does prompt order matter?
Yes. Put the most important examples and instructions first (primacy effect). Models attend more to early text in the prompt. Put edge cases last.
How does temperature affect prompt engineering?
Temperature does not change prompt effectiveness; it controls randomness. Use temperature=0 for deterministic tasks (code generation, JSON parsing); temperature=0.7–0.9 for creative tasks (brainstorming, storytelling).
Can I use the same prompt for different models?
Partially. Instruction-following varies by model; a prompt crafted for Mistral-Instruct may need tweaking for Llama-2-Chat. Test and adjust per model.
How do I handle model hallucinations in prompts?
- Provide grounding facts in the prompt.
- Ask the model to cite sources.
- Use chain-of-thought to encourage reasoning.
- Reduce temperature to decrease randomness.
- Use a smaller, quantized model less prone to hallucinating.
Further Reading
- Chain-of-Thought Prompting Paper — Original research (Wei et al., 2023).
- OpenAI Prompt Engineering Guide — Best practices (applies to open models too).
- Mistral Instruct Format — Official Mistral prompt format.
- In-Context Learning Survey — Deep dive into few-shot mechanisms.