Structured Outputs with Large Language Models (LLMs) - A Practical Guide
Achieve consistent and reliable outputs from LLMs using advanced prompting techniques.
Introduction
I’ve been working with large language models (LLMs) professionally since the early days of GPT-3 (specifically text-davinci-002
). At the time, being able to condense 3-5 pages of text into a concise summary was something that felt almost magical. Not so long after, with the release of GPT-3.5 model series and ChatGPT, and as people became more familiar with the capabilities of LLMs, expectations quickly shifted. Summarization was no longer enough; suddenly, the demand expanded to more complex use cases like automating complex workflows, retrieval-augmented generation (RAG), augmented analytics, and beyond.
One of the key challenges when working with LLMs, and especially with more complex tasks, is consistency. Because LLMs are inherently probabilistic, the same prompt can yield different outputs each time it is run. A slight change in wording or phrasing would be acceptable, but when a system’s workflow/UI/Logic depends on a specific structure or format of the LLM’s output, this variability can lead to significant issues.
Early on, a common approach to address this was to use few-shot prompting. By providing the model with a few examples of the desired output format and asking it to “respond only in valid JSON” it could learn to mimic that structure in its responses. While this method can be effective to some extent for simpler use cases, it has its limitations as the complexity of the output increases.
More recently, a more robust solution has emerged: structured outputs. This feature was first introduced in OpenAI’s API with the gpt-4o
model and has since been adopted by other providers. Structured outputs allows us to define a schema for the expected output, and the model is then guided to generate responses that adhere to this schema. This approach significantly improves the reliability and consistency of the outputs, making it much easier to integrate LLMs into complex systems.
In this post, I’ll give an overview of structured outputs, along with common pitfalls, limitations, and best practices to keep in mind when using them in real-world production applications.
Getting Started with Structured Outputs
You can define a schema using JSON Schema, which is a powerful way to describe the structure and constraints of JSON data.
For this example, let’s say we want to extract information about a book from a given text. We can define a schema that specifies the required fields and their types.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title of the book"
},
"author": {
"type": "string",
"description": "The author of the book"
},
"publication_year": {
"type": "integer",
"description": "The year the book was published"
},
"genres": {
"type": "array",
"items": {
"type": "string"
},
"description": "A list of genres the book belongs to"
}
},
"required": ["title", "author", "publication_year"]
}
Or simply with Python’s pydantic
library:
1
2
3
4
5
6
7
from pydantic import BaseModel, Field
class Book(BaseModel):
title: str = Field(..., description="The title of the book")
author: str = Field(..., description="The author of the book")
publication_year: int = Field(..., description="The year the book was published")
genres: list[str] = Field([], description="A list of genres the book belongs to")
As you can see, the schema defines an object with 3 required fields (title
, author
, and publication_year
) and one optional field (genres
). Each field has a specified type and a description.
By providing descriptions for each field, we give the model additional context about the expected data, allowing it to produce accurate, well-structured outputs, without the need for few-shot examples or an explicit system prompt.
Let’s see how we can use this schema with OpenAI’s API to generate structured outputs.
A Simple Example with OpenAI’s API
For this first simple example, we’ll use OpenAI’s GPT-5-nano model, which natively supports structured outputs.
We’ll provide a paragraph from a book review sourced from Britannica about George Orwell’s 1984 and ask the model to extract the relevant information based on our defined schema.
Finally, we will not provide any system prompt, relying solely on the schema to guide the model’s output to demonstrate the power of structured outputs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from openai import OpenAI
from pydantic import BaseModel, Field
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
class Book(BaseModel):
title: str = Field(..., description="The title of the book")
author: str = Field(..., description="The author of the book")
publication_year: int = Field(..., description="The year the book was published")
genres: list[str] = Field([], description="A list of genres the book belongs to")
POST_BODY = """
Nineteen Eighty-four, novel by English author George Orwell published in 1949 as
a warning against totalitarianism. The novel's chilling dystopia made a deep
impression on readers, and Orwell's ideas entered mainstream culture in a way
achieved by very few books. The book's title and many of its concepts, such as
Big Brother and the Thought Police, are instantly recognized and understood,
often as bywords for modern social and political abuses.
"""
response = client.responses.parse(
model="gpt-5-nano-2025-08-07",
input=[
{
"role": "user",
"content": POST_BODY,
}
],
text_format=Book,
)
Thanks to OpenAI's Python SDK
and pydantic
integration, the response is automatically parsed into an instance of the Book
model.
Let’s first inspect the type of the parsed output.
1
print(type(response.output_parsed))
1
<class '**main**.Book'>
Now, let’s print the output as JSON:
1
print(response.output_parsed.model_dump_json(indent=2))
1
2
3
4
5
6
{
"title": "Nineteen Eighty-four",
"author": "George Orwell",
"publication_year": 1949,
"genres": ["Dystopian", "Science Fiction", "Political Fiction"]
}
As you can see, the model has successfully extracted the relevant information from the provided text and structured it according to our defined schema. The output is a valid JSON object that conforms to the schema, making it easy to work with in our application.
Improving Output Reliability with typing.Literal
Next, we’ll take this a step further and define possible genres using Literal
from typing
module to constrain the values the model can return.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from typing import Literal
...
class Book(BaseModel):
title: str = Field(..., description="The title of the book")
author: str = Field(..., description="The author of the book")
publication_year: int = Field(..., description="The year the book was published")
genres: list[
Literal[
"Classic",
"Dystopian",
"Science Fiction",
"Fantasy",
"Non-Fiction",
"Mystery",
"Romance",
"Horror",
]
] = Field(..., description="The genres of the book")
Let’s see how the output looks now.
1
2
3
4
5
6
{
"title": "Nineteen Eighty-Four",
"author": "George Orwell",
"publication_year": 1949,
"genres": ["Dystopian", "Science Fiction", "Classic"]
}
With this change, the model is now constrained to only return genres that are part of the defined Literal
set. This helps ensure that the output is not only structured but also semantically valid according to our application’s requirements.
Enhancing Model Guidance with System Messages
In the previous examples, we did not provide any system prompt to the model. The schema alone was sufficient to guide the model’s output. However, if we take a deeper look in the generated output, we might find areas for improvement.
Let’s first check the type
of the generated response:
1
print(response.output[0].type)
1
reasoning
Let’s also inspect the usage
details:
1
print(response.usage.model_dump_json(indent=2))
1
2
3
4
5
6
7
8
9
10
11
{
"input_tokens": 236,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 1899,
"output_tokens_details": {
"reasoning_tokens": 1856
},
"total_tokens": 2135
}
The GPT-5
series of models, including gpt-5-nano-2025-08-07
, are designed to provide detailed reasoning behind their outputs. This can be beneficial in scenarios where understanding the model’s thought process is important. However, in simpler use cases such as this one, the additional reasoning is not necessary and can lead to increased latency, token usage, and costs.
In this particular case total output tokens are 1899
with 1856
tokens used for reasoning alone. While the model has successfully extracted the relevant information, the reasoning effort is excessive for this simple task.
An easy way to address this, is to simply set reasoning effort
to minimal
and the model will not provide any reasoning in the output.
However, let’s leave reasoning as is, and use this opportunity to demonstrate how we can provide additional context to the model using a system
message and potentially improve both the quality and reliability of the output but also reduce the overall token usage.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
...
SYSTEM_PROMPT = """
You are an expert in extracting book information from text. Your task is to read the provided text and extract the relevant information about the book, including its title, author, publication year, and genres.
"""
response = client.responses.parse(
model="gpt-5-nano-2025-08-07",
input=[
{
"role": "system",
"content": SYSTEM_PROMPT,
},
{
"role": "user",
"content": POST_BODY,
}
],
text_format=Book
)
The system prompt provides the model with additional context about its role and the task at hand. This way the model will have a clearer understanding of what is expected, which can lead to less reasoning tokens being generated.
Lets inspect the usage
details again:
1
2
3
4
5
6
7
8
9
10
11
{
"input_tokens": 311,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 744,
"output_tokens_details": {
"reasoning_tokens": 704
},
"total_tokens": 1055
}
And the output:
1
2
3
4
5
6
{
"title": "Nineteen Eighty-Four",
"author": "George Orwell",
"publication_year": 1949,
"genres": ["Dystopian", "Science Fiction", "Classic"]
}
As you can see, the total output tokens have been reduced from 1899
to 744
, with reasoning tokens dropping from 1856
to 704
. The output remains consistent and valid according to our schema but with significantly less reasoning effort and token usage.
Handling Missing Data and Hallucinations
When working with real-world data, it’s common to encounter situations where the input text may not contain all the required information. In such cases, the model may attempt to infer or “hallucinate” missing data to satisfy the schema.
To illustrate this scenario, let’s modify our schema and prompt to include one additional required filed that is not present in the input text: setting
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
...
class Book(BaseModel):
title: str = Field(..., description="The title of the book")
author: str = Field(..., description="The author of the book")
publication_year: int = Field(..., description="The year the book was published")
genres: list[
Literal[
"Classic",
"Dystopian",
"Science Fiction",
"Fantasy",
"Non-Fiction",
"Mystery",
"Romance",
"Horror",
]
] = Field(..., description="The genres of the book")
setting: str = Field(
...,
description="The main setting or location where the story takes place"
)
...
SYSTEM_PROMPT = """
You are an expert in extracting book information from text.
You will be provided with a short summary of a book. Your task is to extract the
relevant information about the book from the provided text, such as its title,
author, publication year, genres and setting.
For the genres, choose only from the available options provided in the schema.
"""
Now, let’s see how the model handles this situation.
1
2
3
4
5
6
7
{
"title": "Nineteen Eighty-four",
"author": "George Orwell",
"publication_year": 1949,
"genres": ["Dystopian", "Classic", "Science Fiction"],
"setting": "Airstrip One, Oceania"
}
As you can see, the model has filled in the setting
field with “Airstrip One, Oceania”, which is a reasonable inference based on the context of the book. However, it’s important to note that this information was not explicitly provided in the input text.
In real-world applications, it’s crucial to handle such cases appropriately. Depending on the use case, you may choose to:
- Make uncertain fields optional in the schema.
- Instruct the model to return
null
or a specific value (e.g., “unknown”) when it cannot find the information in the input text. - Implement additional validation or verification steps to ensure the accuracy of the extracted data.
Let’s modify the schema to make setting
optional and see how the model responds.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
...
class Book(BaseModel):
title: str = Field(..., description="The title of the book")
author: str = Field(..., description="The author of the book")
publication_year: int = Field(..., description="The year the book was published")
genres: list[
Literal[
"Classic",
"Dystopian",
"Science Fiction",
"Fantasy",
"Non-Fiction",
"Mystery",
"Romance",
"Horror",
]
] = Field(..., description="The genres of the book")
setting: str | None = Field(
...,
description="The main setting or location where the story takes place"
)
Now the setting
field is optional, let’s review the output once again:
1
2
3
4
5
6
7
{
"title": "Nineteen Eighty-four",
"author": "George Orwell",
"publication_year": 1949,
"genres": ["Dystopian", "Classic", "Science Fiction"],
"setting": null
}
As you can see, the model has returned null
for the setting
field, indicating that it could not find this information in the input text. This approach helps prevent the model from hallucinating or inferring data that may not be accurate.
Similar results can be achieved by giving the model explicit instructions through the system prompt.
Nuances, Limitations, and Practical Considerations
Missing data and hallucinations are just one of the many nuances to consider when working with structured outputs. Here are some additional points to keep in mind:
- Schema complexity: The more fields or nested structures a schema has, the higher the likelihood of inaccuracies. Keep schemas simple and modular when possible. Different models have different tolerances for complexity, so it’s worth testing with your target model and use case.
- Overly strict constraints: Using constructs like
Literal
improves reliability but can reduce flexibility. If new genres or categories appear, the model will reject or ignore them until the schema is updated. Balance strictness with adaptability based on your application’s needs. - Performance impact: Schema enforcement introduces minor latency (due to retries and validation) and slightly increases token usage, especially on smaller models.
- Cross-model differences: Not all models adhere to schemas equally. Larger models (like GPT-5) are highly compliant; smaller ones may occasionally produce invalid structures, requiring fallback validation on your end.
Structured outputs deliver predictability and structure, but they are not a substitute for proper testing, validation, and responsible system design.
Final Thoughts and Responsible Use
Structured outputs are a major step forward in making LLMs more deterministic and reliable. By defining explicit schemas, we can transform free-form model responses into validated, machine-readable data - a critical capability for production systems.
Schema complexity, prompt phrasing, and model version can all influence results in subtle ways. Don’t assume that a schema that works perfectly with one model or prompt will behave the same in another context. Always test, observe, and iterate.
But most importantly, use AI responsibly. As developers and engineers, we have a duty to understand what the models that we use are doing, validate what they produce, and design our systems so that humans remain in control of critical decisions. Structured outputs can make LLMs safer and more useful — but thoughtful engineering is what makes them trustworthy.
Resources and Example Scripts
- Nineteen Eighty-four
- Structured Outputs with OpenAI
- GPT-5-nano
Python script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
from openai import OpenAI from pydantic import BaseModel, Field from dotenv import load_dotenv from typing import Literal import os load_dotenv() client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) class Book(BaseModel): title: str = Field(..., description="The title of the book") author: str = Field(..., description="The author of the book") publication_year: int = Field(..., description="The year the book was published") genres: list[ Literal[ "Classic", "Dystopian", "Science Fiction", "Fantasy", "Non-Fiction", "Mystery", "Romance", "Horror", ] ] = Field(..., description="The genres of the book") setting: str | None = Field( ..., description="The main setting or location where the story takes place" ) POST_BODY = """ Nineteen Eighty-four, novel by English author George Orwell published in 1949 as a warning against totalitarianism. The novel's chilling dystopia made a deep impression on readers, and Orwell's ideas entered mainstream culture in a way achieved by very few books. The book's title and many of its concepts, such as Big Brother and the Thought Police, are instantly recognized and understood, often as bywords for modern social and political abuses. """ SYSTEM_PROMPT = """ You are an expert in extracting book information from text. You will be provided with a short summary of a book. Your task is to extract the relevant information about the book from the provided text, such as its title, author, publication year, genres and setting. For the genres, choose only from the available options provided in the schema. """ response = client.responses.parse( model="gpt-5-nano-2025-08-07", input=[ { "role": "system", "content": SYSTEM_PROMPT, }, { "role": "user", "content": POST_BODY, }, ], text_format=Book, ) # Print the type of the output response print(type(response.output_parsed)) # Print the output response as JSON print(response.output_parsed.model_dump_json(indent=2)) # Print the type of the generated response print(response.output[0].type) # Print the usage information as JSON print(response.usage.model_dump_json(indent=2))