Using Exa with instructor to generate structured outputs from web content.
EXA_API_KEY
and OPENAI_API_KEY
.
{ "type": "json_object" }
which will make the LLM return a JSON object. But for this, we would need to provide a JSON schema, which can get large and complex.
Instead of doing this, we can use Instructor. Instructor is powered by pydantic, which means that it integrates with your IDE. We use pydantic’s BaseModel
to define the output model:
QuantumComputingAdvancement
class that inherits from BaseModel
from Pydantic. This class will be used by Instructor to validate the output from the LLM and for the LLM as a response model. We also implement the __str__()
method for easy printing of the output. We then initialize OpenAI()
and wrap instructor on top of it with instructor.from_openai
to create a client that will return structured outputs. If the output is not structured as our class, Instructor makes the LLM retry until max_retries is reached. You can read more about how Instructor retries here.
This example demonstrates how to use Exa to search for content about quantum computing advancements and structure the output using Instructor.
field_validator
, we can create our own rules to validate each field to be exactly what we want, so that we can work with predictable data even though we are using an LLM. Additionally, implementing the __str__()
method allows for more readable and convenient output formatting. Read more about different pydantic validators here. Because we don’t specify that the Title
should be in uppercase in the prompt, this will result in at least two API calls. You should avoid using field_validator
s as the only means to get the data in the right format; instead, you should include instructions in the prompt, such as specifying that the Title
should be in uppercase/all-caps.
This advanced example demonstrates how to use Exa and Instructor to analyze multiple research papers, extract structured information, and provide a comprehensive summary of the findings.
pip install rich
.