Skip to main content

Prompt Templates and Engineering for LLMs

Prompt engineering is the art of structuring text input to guide LLMs toward desired outputs. Well-designed prompts improve quality by 10–30%, reduce hallucinations, and enable few-shot learning where the model learns from examples in the prompt itself. Prompt templates keep this structured and repeatable across your application.

This tutorial covers basic prompt design principles, instruction formatting, few-shot examples, system messages, output parsing, and common patterns like chain-of-thought. By the end, you'll systematically improve LLM output quality for your specific use case.

Basic Prompt Structure

A well-designed prompt has three parts: context, instruction, and formatting:

[Context] Explain [subject] in the context of [domain].
[Instruction] Be concise and cite sources.
[Formatting] Format your response as: 1. [key point] 2. [key point]

Here's a Python template:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("mistral-community/Mistral-7B-Instruct-v0.3")
tokenizer = AutoTokenizer.from_pretrained("mistral-community/Mistral-7B-Instruct-v0.3")

# Template with placeholders
template = """You are a Python expert. Explain {topic} in simple terms suitable for beginners.
Format: 1 sentence definition, 2 practical examples, 1 common pitfall.

Topic: {topic}
Explanation:"""

# Fill placeholders
prompt = template.format(topic="list comprehensions")

inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(**inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(response)

The template makes the same structure reusable for different topics, dates, or domains. This improves consistency and reduces the cognitive load of hand-crafting each prompt.

Few-Shot Learning

Show the model 1–3 examples of the desired behavior, and it learns the pattern without fine-tuning:

prompt = """Convert natural language to Python code. Include comments.

Example 1:
Natural language: "Check if a number is even"
Python code:
def is_even(n):
return n % 2 == 0 # Remainder 0 means divisible by 2

Example 2:
Natural language: "Sum all odd numbers from 1 to 10"
Python code:
total = sum(i for i in range(1, 11) if i % 2 == 1) # List comprehension

Your task:
Natural language: "Count vowels in a string"
Python code:"""

inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(**inputs, max_length=150, temperature=0)
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(response)

Output:

def count_vowels(s):
vowels = "aeiouAEIOU"
return sum(1 for c in s if c in vowels) # Count characters in vowel set

Few-shot prompting is more effective than zero-shot (no examples) because it demonstrates the expected format and reasoning style. 2–3 examples typically max out the benefit; more examples don't help and waste tokens.

System Messages for Role-Based Prompts

A system message sets the persona and rules for the model's responses:

from transformers import pipeline

# Standard generation (no system message)
pipe = pipeline("text-generation", model="mistral-community/Mistral-7B-Instruct-v0.3")

system_prompt = """You are a debugging assistant. When given code, identify bugs and suggest fixes.
Be concise and focus on logic errors. Do not rewrite the entire code; highlight the specific issue."""

user_prompt = """Here's buggy code:
def get_first_even(nums):
for i in nums:
if i % 2 == 0:
return i
return None

nums = [1, 3, 5, 7]
result = get_first_even(nums)
print(result)

What's wrong?"""

# Combine into Mistral Instruct format
full_prompt = f"[INST]{system_prompt}\n\n{user_prompt}[/INST]"

result = pipe(full_prompt, max_length=200)
print(result[0]["generated_text"])

System messages work because Instruct-tuned models (Mistral-Instruct, Llama-2-Chat) are trained to respect the system role. Base models (Mistral-base, Llama-2) ignore system messages.

Chain-of-Thought Prompting

Ask the model to explain its reasoning before giving the final answer. This improves accuracy on complex tasks:

# Without chain-of-thought
prompt_direct = """If Alice has 3 apples and buys 2 more, then gives 1 to Bob, how many does she have?
Answer:"""

# With chain-of-thought
prompt_cot = """If Alice has 3 apples and buys 2 more, then gives 1 to Bob, how many does she have?
Let's think step by step:
1. Alice starts with 3 apples.
2. She buys 2 more, so she has 3 + 2 = 5 apples.
3. She gives 1 to Bob, so she has 5 - 1 = 4 apples.
Answer: 4"""

inputs = tokenizer(prompt_direct, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(**inputs, max_length=50, temperature=0)
print("Direct:", tokenizer.decode(output_ids[0], skip_special_tokens=True))

inputs = tokenizer(prompt_cot, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(**inputs, max_length=50, temperature=0)
print("CoT:", tokenizer.decode(output_ids[0], skip_special_tokens=True))

Chain-of-thought improves accuracy on math, logic, and reasoning tasks by 5–15% (Wei et al., 2023). The model's step-by-step reasoning helps it catch errors.

Structured Output with Parsing

Guide the model to output JSON or other structured formats, then parse:

import json

prompt = """Extract the entities from the text and output JSON.

Text: "Alice works at Google as a Senior Engineer. She lives in San Francisco."

Output JSON:
{"person": "Alice", "company": "Google", "role": "Senior Engineer", "location": "San Francisco"}

Text: "Bob is a data scientist at Meta in New York."

Output JSON:"""

inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(**inputs, max_length=100, temperature=0)
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)

# Extract JSON from response
try:
# Find JSON in response (model might add extra text)
json_start = response.find('{')
json_end = response.rfind('}') + 1
json_str = response[json_start:json_end]
parsed = json.loads(json_str)
print(parsed)
except json.JSONDecodeError:
print("Failed to parse JSON:", response)

Output:

{"person": "Bob", "company": "Meta", "role": "Data Scientist", "location": "New York"}

Structured prompts with examples of the JSON format drastically improve parsing success from ~30% (unprompted) to ~80% (with examples).

Common Prompt Patterns

Pattern 1: Instruct + Constraint

prompt = """Write a Python function to reverse a list.
Constraint: Do not use built-in reverse() or slicing ([::-1]).
Function:"""

Pattern 2: Role + Task + Format

prompt = """
Role: You are a code reviewer.
Task: Review this function for bugs and efficiency.
Format: List each issue as a bullet point.

Code:
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2) # Very slow for large n

Issues:"""

Pattern 3: Analogy + Explanation

prompt = """Explain neural networks using an analogy.

Analogy: A neural network is like a recipe that gets refined through practice.
The ingredients are input features, the steps are layers of computation,
and tasting the result teaches us how to adjust the recipe.

Explain how backpropagation works using this analogy:"""

Avoiding Common Pitfalls

Pitfall 1: Ambiguous Instructions

# Bad: "Write code"
bad_prompt = "Write some Python code."

# Good: "Write Python code to..."
good_prompt = "Write Python code to count the frequency of each word in a text file."

Pitfall 2: Missing Context

# Bad: Assumes model knows you're writing about AI
bad_prompt = "Explain the transformer architecture."

# Good: Sets context explicitly
good_prompt = "As a machine learning engineer explaining to a junior, describe how the transformer architecture works in NLP."

Pitfall 3: Over-Specification

# Bad: Too many constraints, conflicting
bad_prompt = "Write a short but comprehensive guide to Python. Include all basics. Use < 100 words but cover lists, dicts, functions, classes, etc."

# Good: Prioritize constraints
good_prompt = "Write a brief guide to Python lists (< 150 words). Include: definition, 2 examples, 1 common pitfall."

Prompt Engineering Workflow

  1. Write a baseline prompt — Define the task, role, and output format.
  2. Test on 5–10 examples — See where the model fails.
  3. Add examples (few-shot) — Include 1–3 examples of correct behavior.
  4. Refine instructions — Remove ambiguity; add constraints.
  5. Test edge cases — Try inputs the model might struggle with.
  6. Iterate on temperature — Increase for creativity, decrease for consistency.
  7. Optimize token count — Remove unnecessary words to save inference cost.

Key Takeaways

  • Structure prompts with context, instruction, and formatting sections.
  • Use few-shot examples to teach the model the desired behavior without fine-tuning.
  • System messages set the role; Instruct-tuned models respect them.
  • Chain-of-thought improves accuracy on reasoning tasks by 5–15%.
  • Structured output (JSON) works best with 2–3 examples in the prompt.

Frequently Asked Questions

How many examples should I include in few-shot prompts?

1–3 examples are typical. More examples (4+) don't help; they waste tokens and can confuse the model. Zero-shot (no examples) works for simple tasks; complex tasks need at least 2 examples.

Does prompt order matter?

Yes. Put the most important examples and instructions first (primacy effect). Models attend more to early text in the prompt. Put edge cases last.

How does temperature affect prompt engineering?

Temperature does not change prompt effectiveness; it controls randomness. Use temperature=0 for deterministic tasks (code generation, JSON parsing); temperature=0.7–0.9 for creative tasks (brainstorming, storytelling).

Can I use the same prompt for different models?

Partially. Instruction-following varies by model; a prompt crafted for Mistral-Instruct may need tweaking for Llama-2-Chat. Test and adjust per model.

How do I handle model hallucinations in prompts?

  1. Provide grounding facts in the prompt.
  2. Ask the model to cite sources.
  3. Use chain-of-thought to encourage reasoning.
  4. Reduce temperature to decrease randomness.
  5. Use a smaller, quantized model less prone to hallucinating.

Further Reading