Example project using the Exa Node SDK

Introduction

In this example, we use Exa to automatically research and generate reports on any topic. In a few lines of code, you can feed your LLM high quality, up-to-date web content. If you just want to see the code, checkout the Google Colab.

This tutorial requires an Exa API key and an OpenAI API key. The Exa API uses embeddings to retrieve the best content on the web for your AI apps. Get 1000 free searches per month just for signing up!

!pip install exa_py openai

from exa_py import Exa
from openai import OpenAI, AzureOpenAI
import json

EXA_API_KEY = ''
exa = Exa(api_key = EXA_API_KEY)

OPENAI_API_KEY = ''
client = OpenAI(api_key = OPENAI_API_KEY)
model = 'gpt-4-turbo'

# OPTIONAL: If you want to use Azure instead

# AZURE_OPENAI_API_KEY = ''
# client = AzureOpenAI(
#     api_key = AZURE_OPENAI_API_KEY,
#     azure_endpoint = '',
#     api_version = '2023-05-15'
#     )
# model = 'gpt4'

Getting Started

First, let's use an LLM to generate a research report without Exa's web search. We use the example topic The latest startups using AI to optimize renewable energy systems.

example_topic = 'The latest startups using AI to optimize renewable energy systems'

def generate_report_without_exa(topic):
  content = f"Write a comprehensive and professional three paragraph research report about {topic}. Include citations with source, month, and year."
  completion = client.chat.completions.create(
      model = model,
      messages=[
          {"role": "user", "content": content},
      ],
      temperature = 0
      )
  return completion.choices[0].message.content


report = generate_report_without_exa(example_topic)
report

As of early 2023, the integration of artificial intelligence (AI) into renewable energy systems has seen significant advancements, spearheaded by innovative startups aiming to optimize energy production, distribution, and storage. One notable player in this field is Heliogen, a California-based startup that utilizes AI to enhance solar energy capture. Heliogen's technology employs computer vision software to precisely align a large array of mirrors to reflect sunlight towards a single point, thus generating heat that can be used to produce electricity or stored for later use. This method significantly increases the efficiency of solar thermal power plants by optimizing the concentration of solar energy, which can be particularly beneficial in utility-scale installations (Source: EnergyTech Magazine, March 2023).

Another pioneering startup, WeaveGrid, focuses on the integration of electric vehicles (EVs) into the energy grid. WeaveGrid develops software that uses AI to manage and optimize the charging of electric vehicles based on various factors such as energy demand, grid capacity, and renewable energy availability. This not only helps in smoothing out the demand spikes on the electrical grid but also enhances the incorporation of renewable energy sources by aligning EV charging times with periods of high renewable energy production, such as during peak solar generation hours. This approach aids in reducing reliance on non-renewable energy sources and stabilizing grid operations (Source: Renewable Energy World, February 2023).

Lastly, the startup Xpansiv is making strides in the digital transformation of renewable energy markets. Xpansiv's platform leverages AI to provide a digital infrastructure for environmental commodities like renewable energy certificates and carbon credits. By using AI to analyze market trends and data, Xpansiv facilitates more efficient trading and better pricing transparency in these markets. This not only helps in promoting the adoption of renewable energy by making it more economically viable but also ensures that the environmental impact of energy production is accurately accounted for and minimized. The platform's innovative use of AI in environmental commodity markets is a testament to the potential of digital technologies in advancing the sustainability agenda (Source: GreenTech Media, January 2023).

Unfortunately, the sources mentioned are all from over a year ago. Also, without the ability to review relevant documents, the LLM can easily hallucinate inaccurate information. Especially for research reports, I need to know about reliable, up-to-date information about companies, news, papers, etc! Let's fix this by supercharging our report with Exa's semantic web search.

Adding Web Search

To generate a more up-to-date research report, let's use Exa to find web content we can give our LLM. To start, we take our original topic and create a list of 6 relevant web search queries to make sure our final report is comprehensive.

def create_custom_function(num_subqueries):
    properties = {}
    for i in range(1, num_subqueries + 1):
        key = f'subquery_{i}'
        properties[key] = {
            'type': 'string',
            'description': 'Search queries that would be useful for generating a report on my main topic'
        }

    custom_function = {
        'name': 'generate_exa_search_queries',
        'description': 'Generates Exa search queries to investigate the main topic',
        'parameters': {
            'type': 'object',
            'properties': properties
        }
    }

    return [custom_function]

# example
custom_functions = create_custom_function(6)
custom_functions

def generate_subqueries_from_topic(topic, num_subqueries=6):
    content =  f"I'm going to give you a topic I want to research. I want you to generate {num_subqueries} interesting, diverse search queries that would be useful for generating a report on my main topic. Here is the main topic: {topic}."
    custom_functions = create_custom_function(num_subqueries)
    completion = client.chat.completions.create(
        model = model,
        messages=[
            {"role": "user", "content": content},
        ],
        temperature=0,
        functions = custom_functions,
        function_call = 'auto',
    )

    json_response = json.loads(completion.choices[0].message.function_call.arguments)
    return list(json_response.values())


# example
example_topic = 'The latest startups using AI to optimize renewable energy systems'
example_subqueries = generate_subqueries_from_topic(example_topic)
example_subqueries

Search and Contents

Our queries are looking good! Next, let's use Exa's search_and_contents function to find the relevant URLs for each query.

Retrieving Contents

For each search result, Exa can also provide:

  • the site's full text content
  • a highlight, the portion of text most relevant to our query.

In this case, let's use Exa's highlights which we can easily pass to our LLM to generate our research report. Exa highlights are extremely customizable. In this case we specify highlights={"num_sentences": 5} so that for each URL, Exa gives us a 5 sentence highlight.

We also specify num_results=5 so that for each search query, we get the top 5 URL results.

# Exa searches each subtopic
def exa_search_each_subquery(subqueries):
  list_of_query_exa_pairs = []
  for query in subqueries:
    search_response = exa.search_and_contents(
      query,
      num_results=5,
      use_autoprompt=True,
      start_published_date="2023-06-01", # To give us only recent information post-June 2023
      highlights={"num_sentences": 5},
    )
    query_object = {
        'subquery': query,
        'results': search_response.results
    }
    list_of_query_exa_pairs.append(query_object)
  return list_of_query_exa_pairs

example_list_of_query_exa_pairs = exa_search_each_subquery(example_subqueries)
example_list_of_query_exa_pairs

Let's now reformat Exa's results to pass into the LLM. For each subquery's results, we keep the URL, highlight, and publish date.

# Reformat Exa results into a string
def format_exa_results_for_llm(list_of_query_exa_pairs, content_slice=750):
    formatted_string = ""
    for i in list_of_query_exa_pairs:
      formatted_string += f"[{i['subquery']}]:\n"
      for result in i['results']:
        content = result.text if result.text else " ".join(result.highlights)
        publish_date = result.published_date
        formatted_string += f"URL: {result.url}\nContent: {content}\nPublish Date: {publish_date}\n"
      formatted_string += "\n"

    return formatted_string

#example
print(format_exa_results_for_llm(example_list_of_query_exa_pairs))

Finally, let's pass everything into an LLM to generate our final research report with citations.

def generate_report_from_exa_results(topic, list_of_query_exa_pairs):
  formatted_exa_content = format_exa_results_for_llm(list_of_query_exa_pairs)
  content = f"Write a comprehensive and professional three paragraph research report about {topic} based on the provided information. Include citations in the text using footnote notation ([citation #]), for example [2]. First provide the report, followed by a single `References` section that only lists the URLs (and their published date) used, in the format [#] <url>. For the published date, only include the month and year. Reset the citations index and ignore the order of citations in the provided information. Here is the information: {formatted_exa_content}."

  completion = client.chat.completions.create(
      model=model,
      messages=[
          {"role": "user", "content": content},
      ],
  )

  return completion.choices[0].message.content

Putting it all together, we get:

topic = 'The latest startups using AI to optimize renewable energy systems'

def generate_report(topic):
  subqueries = generate_subqueries_from_topic(topic)
  list_of_query_exa_pairs = exa_search_each_subquery(subqueries)
  report = generate_report_from_exa_results(topic, list_of_query_exa_pairs)
  return report

example_report = generate_report(topic)
print(example_report)

In recent advancements, startups utilizing AI to optimize renewable energy systems have produced notable innovations across various facets of the energy sector. SparkHQ.AI has emerged as a significant player, collaborating with utility-scale developers to expedite solar deployment and harness the potential of advanced platforms [1]. Concurrently, the AI infrastructure provided by Crusoe.AI utilizes underutilized energy sources for high-performance computing, thereby reducing greenhouse gas emissions and enhancing resource efficiency [2]. Verse Inc.'s Aria platform, harnessing generative AI, has significantly streamlined the procurement of clean energy, drastically reducing transaction costs and expediting contracting processes [3]. These transformations underscore a pivotal shift toward integrating AI to achieve efficiency and sustainability in renewable energy systems.

Moreover, the application of AI in renewable energy extends to predictive analysis and operational optimization. Ogre.AI offers cutting-edge forecasting tools and digital solutions that are reshaping utility management and renewable energy forecasts [4]. These algorithms facilitate precise energy production predictions, aiding in effective grid management and resilience against varying environmental conditions. Another startup, PlanetTE.AI, leverages AI to assess multiple environmental risks, crucial for strategic planning and management in renewable energy operations, illustrating the comprehensive role AI plays beyond mere energy production [5].

The AI-driven revolution in renewable energy is bolstered by research into novel technologies and methodologies. Recent studies highlight AI's role in enhancing the efficiency of solar cell production, particularly perovskite cells, which are pivotal for next-generation solar technologies [6]. AI is instrumental in identifying optimal manufacturing processes and materials, which are critical for maximizing energy conversion efficiency and reducing waste during production. Such innovations are critical for scaling up renewable energy solutions, thereby fostering a more sustainable energy landscape that aligns with global climate goals.

References:
[1] https://www.sparkhq.ai/ (January 2024)
[2] https://crusoe.ai/ (October 2023)
[3] https://verse.inc/ (June 2023)
[4] https://www.ogre.ai/ (July 2023)
[5] https://www.planette.ai/ (October 2023)
[6] https://www.kit.edu/kit/english/pi_2023_94_ai-for-perovskite-solar-cells-key-to-better-manufacturing.php (November 2023)

And that's it!

Sweet! We can now generate comprehensive, up-to-date research reports with citations. You can use this to do in-depth research on the latest ML innovations, market industries, startups, global affairs, anything!

You can signup for the Exa API here, and automatically get 1000 requests/month free!