Example project using the Metaphor Node SDK

In this example, we will build Exa Researcher, a Javascript app that given a research topic, automatically searches for different sources about the topic with Exa and synthesizes the searched contents into a research report.

This interactive notebook was made with the Deno Javascript kernel for Jupyter. Check out the plain JS version if you prefer a regular Javascript file you can run with NodeJS, or want to skip to the final result. If you'd like to run this notebook locally, Installing Deno and connecting Deno to Jupyter is fast and easy.

To play with this code, first we need a Exa API key and an OpenAI API key. Get 1000 Exa searches per month free just for signing up!

Let's import the Exa and OpenAI SDKs and put in our API keys to create a client object for each.

Make sure to pick the right imports for your runtime and paste or load your API keys.

// Deno imports
import Exa from 'npm:exa-js';
import OpenAI from 'npm:openai';

// NodeJS imports
//import Exa from 'exa-js';
//import OpenAI from 'openai';


const EXA_API_KEY = "" // insert or load your API key here
const OPENAI_API_KEY = ""// insert or load your API key here

const exa = new Exa(EXA_API_KEY);
const openai = new OpenAI({ apiKey: OPENAI_API_KEY });

Since we'll be making several calls to the OpenAI API to get a completion from GPT 3.5-turbo, let's make a simple utility wrapper function so we can pass in the system and user messages directly, and get the LLM's response back as a string.

async function getLLMResponse({system = 'You are a helpful assistant.', user = '', temperature = 1, model = 'gpt-3.5-turbo'}){
    const completion = await openai.chat.completions.create({
        model,
        temperature,
        messages: [
            {'role': 'system', 'content': system},
            {'role': 'user', 'content': user},
        ]
    });
    return completion.choices[0].message.content;
}

Okay, great! Now let's starting building Exa Researcher. The app should be able to automatically generate research reports for all kinds of different topics. Here's two to start:

const SAMA_TOPIC = 'Sam Altman';
const ART_TOPIC = 'renaissance art';

The first thing our app has to do is decide what kind of search to do for the given topic.

Exa offers two kinds of search: neural and keyword search. Here's how we decide:

  • Neural search is preferred when the query is broad and complex because it lets us retrieve high quality, semantically relevant data. Neural search is especially suitable when a topic is well-known and popularly discussed on the Internet, allowing the machine learning model to retrieve contents which are more likely recommended by real humans.
  • Keyword search is useful when the topic is specific, local or obscure. If the query is a specific person's name, and identifier, or acronym, such that relevant results will contain the query itself, keyword search may do well. And if the machine learning model doesn't know about the topic, but relevant documents can be found by directly matching the search query, keyword search may be necessary.

So, Exa Researcher is going to get a query, and it needs to automatically decide whether to use keyword or neural search to research the query based on the criteria. Sounds like a job for the LLM! But we need to write a prompt that tells it about the difference between keyword and neural search-- oh wait, we have a perfectly good explanation right there.

// Let's generalize the prompt and call the search types (1) and (2) in case the LLM is sensitive to the names. We can replace them with different names programmatically to see what works best.
const SEARCH_TYPE_EXPLANATION = `- (1) search is usually preferred when the query is a broad topic or semantically complex because it lets us retrieve high quality, semantically relevant data. (1) search is especially suitable when a topic is well-known and popularly discussed on the Internet, allowing the machine learning model to retrieve contents which are more likely recommended by real humans.  
- (2) search is useful when the topic is specific, local or obscure. If the query is a specific person's name, and identifier, or acronym, such that relevant results will contain the query itself, (2) search may do well. And if the machine learning model doesn't know about the topic, but relevant documents can be found by directly matching the search query, (2) search may be necessary.
`;

Here's a function that instructs the LLM to choose between the search types and give its answer in a single word. Based on its choice, we return keyword or neural.

async function decideSearchType(topic, choiceNames = ['neural', 'keyword']){
    let userMessage = 'Decide whether to use (1) or (2) search for the provided research topic. Output your choice in a single word: either "(1)" or "(2)". Here is a guide that will help you choose:\n';
    userMessage += SEARCH_TYPE_EXPLANATION;
    userMessage += `Topic: ${topic}\n`;
    userMessage += `Search type: `;
    userMessage = userMessage.replaceAll('(1)', choiceNames[0]).replaceAll('(2)', choiceNames[1]);

    const response = await getLLMResponse({
        system: 'You will be asked to make a choice between two options. Answer with your choice in a single word.',
        user: userMessage,
        temperature: 0
    });
    const useKeyword = response.trim().toLowerCase().startsWith(choiceNames[1].toLowerCase());
    return useKeyword ? 'keyword' : 'neural';
}

Let's test it out:

console.log(SAMA_TOPIC, 'expected: keyword, got:', await decideSearchType(SAMA_TOPIC));
console.log(ART_TOPIC, 'expected: neural, got:', await decideSearchType(ART_TOPIC));
Sam Altman expected: keyword, got: keyword
renaissance art expected: neural, got: neural

Great! Now we have to craft some search queries for the topic and the search type. There are two cases here: keyword search and neural search. Let's do the easy one first. LLMs already know what Google-like keyword searches look like. So let's just ask the LLM for what we want:

function createKeywordQueryGenerationPrompt(topic, n){
    return `I'm writing a research report on ${topic} and need help coming up with Google keyword search queries.
Google keyword searches should just be a few words long. It should not be a complete sentence.
Please generate a diverse list of ${n} Google keyword search queries that would be useful for writing a research report on ${topic}. Do not add any formatting or numbering to the queries.`
}

console.log(await getLLMResponse({
    system: 'The user will ask you to help generate some search queries. Respond with only the suggested queries in plain text with no extra formatting, each on it\'s own line.',
    user: createKeywordQueryGenerationPrompt(SAMA_TOPIC, 3),
}));

Sam Altman biography
Y Combinator founder
Investments made by Sam Altman

Those are some good ideas!

Now we have to handle the neural Exa search. This is tougher: you can read all about crafting good Exa searches here. But this is actually a really good thing: making the perfect Exa search is hard because Exa is so powerful! Exa allows us to express so much more nuance in our searches and gives us unparalleled ability to steer our search queries towards our real objective.

We need to our app to understand our goal, what Exa is, and how to use it to achieve the goal. So let's just tell the LLM everything it needs to know.

function createNeuralQueryGenerationPrompt(topic, n){
    return `I'm writing a research report on ${topic} and need help coming up with Exa keyword search queries.
Exa is a fully neural search engine that uses an embeddings based approach to search. Exa was trained on how people refer to content on the internet. The model is trained given the description to predict the link. For example, if someone tweets "This is an amazing, scientific article about Roman architecture: <link>", then our model is trained given the description to predict the link, and it is able to beautifully and super strongly learn associations between descriptions and the nature of the content (style, tone, entity type, etc) after being trained on many many examples. Because Exa was trained on examples of how people talk about links on the Internet, the actual Exa queries must actually be formed as if they are content recommendations that someone would make on the Internet where a highly relevant link would naturally follow the recommendation, such as the example shown above.
Exa neural search queries should be phrased like a person on the Internet indicating a webpage to a friend by describing its contents. It should end in a colon :.
Please generate a diverse list of ${n} Exa neural search queries for informative and trustworthy sources useful for writing a research report on ${topic}. Do not add any quotations or numbering to the queries.`
}

console.log(await getLLMResponse({
    system: 'The user will ask you to help generate some search queries. Respond with only the suggested queries in plain text with no extra formatting, each on it\'s own line.',
    user: createNeuralQueryGenerationPrompt(ART_TOPIC, 3),
    //model: 'gpt-4'
}));
Hey, check out this comprehensive guide to Renaissance art:
Can you recommend any scholarly articles on Renaissance art?
I found an excellent website that explores the influence of religion on Renaissance art:

Now let's put them together into a function that generates queries for the right search mode.

async function generateSearchQueries(topic, n, searchType){
    if(searchType !== 'keyword' && searchType !== 'neural'){
        throw 'invalid searchType';
    }
    const userPrompt = searchType === 'neural' ? createNeuralQueryGenerationPrompt(topic, n) : createKeywordQueryGenerationPrompt(topic, n);
    const completion = await getLLMResponse({
        system: 'The user will ask you to help generate some search queries. Respond with only the suggested queries in plain text with no extra formatting, each on it\'s own line.',
        user: userPrompt,
        temperature: 1
    });
    const queries = completion.split('\n').filter(s => s.trim().length > 0).slice(0, n);
    return queries;
}

Let's make sure it works, and check out some more queries:

const samaQueries = await generateSearchQueries(SAMA_TOPIC, 3, 'keyword');
const artQueries = await generateSearchQueries(ART_TOPIC, 3, 'neural');
console.log(samaQueries);
console.log(artQueries);
[
  "Sam Altman biography",
  "Y Combinator founder",
  "Sam Altman startup advice"
]
[
  "Check out this comprehensive guide to Renaissance art:",
  "Discover the key characteristics of Renaissance art and its influential artists.",
  "Explore the development of perspective and human anatomy in Renaissance paintings."
]

Now it's time to use Exa to do the search, either neural or keyword. Using searchAndContents, we can get clean text contents bundled with each link.

async function getSearchResults(queries, type, linksPerQuery=2){
    let results = [];
    for (const query of queries){
        const searchResponse = await exa.searchAndContents(query, { type, numResults: linksPerQuery, useAutoprompt: false });
        results.push(...searchResponse.results);
    }
    return results;
}
const artLinks = await getSearchResults(artQueries, 'neural');
console.log(artLinks[0]); // first result of six
{
  title: "How to Look at and Understand Great Art",
  url: "https://www.wondrium.com/how-to-look-at-and-understand-great-art?lec=29%3Futm_source%3DSocialMedia&p"... 5 more characters,
  publishedDate: "2013-11-19",
  author: "Doc",
  id: "dq0L1GOKroUBuryT3ypSsQ",
  text: "\n" +
    " \n" +
    " \n" +
    " \n" +
    " \n" +
    " \n" +
    " \n" +
    " Trailer\n" +
    " \n" +
    " \n" +
    "\n" +
    " \n" +
    " \n" +
    " \n" +
    " \n" +
    " \n" +
    " \n" +
    " 01: The Importance of First Impressions\n" +
    " \n" +
    " Examine the conte"... 14543 more characters,
  score: 0.1785949170589447
}

In just a couple lines of code, we've used Exa to go from some search queries to useful Internet content.

The final step is to instruct the LLM to synthesize the content into a research report, including citations of the original links. We can do that by pairing the content and the urls and writing them into the prompt.

async function synthesizeReport(topic, searchContents, contentSlice = 750){
    const inputData = searchContents.map(item => `--START ITEM--\nURL: ${item.url}\nCONTENT: ${item.text.slice(0, contentSlice)}\n--END ITEM--\n`).join('');
    return await getLLMResponse({
        system: 'You are a helpful research assistant. Write a report according to the user\'s instructions.',
        user: 'Input Data:\n' + inputData + `Write a two paragraph research report about ${topic} based on the provided information. Include as many sources as possible. Provide citations in the text using footnote notation ([#]). First provide the report, followed by a single "References" section that lists all the URLs used, in the format [#] <url>.`,
        //model: 'gpt-4' //want a better report? use gpt-4
    });
}
const artReport = await synthesizeReport(ART_TOPIC, artLinks);
console.log(artReport)
Research Report: Renaissance Art

Renaissance art is a significant period in the history of art characterized by technical innovation and a richly symbolic visual language. It is known for combining the advancements in technique with the exploration of deeper layers of meaning in artworks. In Renaissance paintings, the identification of the patron or donor often provides insights into the intended message of the artwork. For example, a painting discussed in an article from the Royal Academy[^1] was commissioned by Jacopo Pesaro, the Bishop of Paphos on the island of Cyprus, and was likely painted by Titian during his early twenties. Analyzing the patronage and symbolism in Renaissance paintings allows for a better understanding of the multifaceted meanings conveyed by the artists.

Another key aspect of studying Renaissance art is understanding the contexts and environments in which art is encountered and viewed. The influence of the viewer's point of view and focal point plays a critical role in shaping the experience of art. Lectures provided by Wondrium[^3] discuss the importance of first impressions and explore how the artist positions the viewer with respect to the image. Additionally, the genres of Western art and the artist's media, tools, and techniques are explored in these lectures, providing a comprehensive understanding of the various elements that contribute to the creation and perception of Renaissance art.

Overall, studying Renaissance art encompasses an exploration of not only the technical skills and innovations of the artists but also the cultural and historical contexts in which the artworks were created. By analyzing the patronage, symbolism, and viewing experience, researchers gain a deeper appreciation and interpretation of this significant period in the history of art.

References:
[1] How to Read a Renaissance Painting. (2016, April 1). Royal Academy. Retrieved from https://www.royalacademy.org.uk/article/how-to-read-a-renaissance-painting
[3] How to Look at and Understand Great Art (#29). Wondrium. Retrieved from https://www.wondrium.com/how-to-look-at-and-understand-great-art?lec=29

Let's wrap up by putting it all together into one researcher() function that starts from a topic and returns us the finished report. We can also let Exa Researcher generate us a report about our keyword search topic as well.

async function researcher(topic){
    const searchType = await decideSearchType(topic);
    const searchQueries = await generateSearchQueries(topic, 3, searchType);
    console.log(searchQueries);
    const searchResults = await getSearchResults(searchQueries, searchType);
    console.log(searchResults[0]);
    const report = await synthesizeReport(topic, searchResults);
    return report;
}
console.log(await researcher(SAMA_TOPIC));
[
  "Sam Altman biography",
  "Y Combinator founder",
  "Sam Altman startup advice"
]
{
  title: "Sam Altman - Wikipedia",
  url: "https://en.wikipedia.org/wiki/Sam_Altman",
  author: null,
  id: "8942e4e1-a37d-42fd-bec1-cf1715ef8d35",
  text: "\n" +
    "From Wikipedia, the free encyclopedia\n" +
    "\n" +
    "Sam AltmanAltman in 2019BornSamuel Harris AltmanApril 22, 19"... 7424 more characters
}
Research Report: Sam Altman

Sam Altman is an American entrepreneur, investor, and former CEO of OpenAI[^1^]. He is widely known for his significant contributions as the president of Y Combinator from 2014 to 2019[^1^]. Altman was born on April 22, 1985, in Chicago, Illinois[^2^]. He grew up in St. Louis, Missouri, where he attended John Burroughs School[^3^]. Altman's interest in computers began at a young age, and he received his first computer, an Apple Macintosh, at the age of eight[^3^]. His childhood idol was Steve Jobs[^4^]. Sam Altman dropped out of Stanford University after one year to pursue his entrepreneurial endeavors[^1^].

Altman's career has been marked by his involvement in various tech ventures. He notably served as the CEO of OpenAI from 2019 to 2023[^1^]. Additionally, Altman played a pivotal role as the president of Y Combinator, a startup accelerator[^1^]. His influence in the tech industry has drawn comparisons to renowned figures like Steve Jobs and Bill Gates[^2^]. Altman firmly believes in the potential of artificial general intelligence (AGI) and its ability to accomplish tasks comparable to those performed by humans[^2^].

References:
[^1^] Wikipedia. (n.d.). Sam Altman. Retrieved from https://en.wikipedia.org/wiki/Sam_Altman
[^2^] Britannica. (n.d.). Sam Altman. Retrieved from https://www.britannica.com/biography/Sam-Altman

For a link to a complete, cleaned up version of this project that you can execute in your NodeJS environment, check out the alternative JS-only version.