# Score Deprecation in Auto and Keyword Search Source: https://docs.exa.ai/changelog/auto-keyword-score-deprecation We're deprecating relevance scores in Auto and Keyword search types due to architectural improvements. Scores will remain available in Neural search. *** **Date: July 21, 2025** We're launching a big update to Auto search in our API. The new system can't create useful scores for results. Because of this, we're removing scores from Auto and Keyword search types. Scores in Neural search results will remain unchanged and continue to work exactly as before. ## What Changed Previously, all search types (Auto, Keyword, and Neural) returned relevance scores - a number from 0 to 1 representing similarity between the query and each result. With our new Auto search architecture, we can no longer generate meaningful scores for Auto and Keyword search results. The search functionality works exactly the same way as it did before - you'll still get the same high-quality results, just without the `score` field in the response. ## What This Means for You 1. **Auto search**: The `score` field will no longer be returned in search results 2. **Keyword search**: The `score` field will no longer be returned in search results 3. **Neural search**: Scores continue to work exactly as before with no changes 4. **Migration needed**: If your application relies on scores from Auto or Keyword search, you should migrate as soon as possible ## How to Update Your Code If you currently use scores from Auto or Keyword search, here is what you can do: ### Remove Score Dependencies ```python Python # Before: Code that depends on scores result = exa.search("AI startups", type="auto") sorted_results = sorted(result.results, key=lambda x: x.score, reverse=True) # After: Use results in the order returned (already optimally ranked) result = exa.search("AI startups", type="auto") # Results are already ranked by relevance, no need to sort by score for item in result.results: print(f"Title: {item.title}") ``` ## Response Structure Changes ### Auto and Keyword Search (New) ```json { "results": [ { "title": "Example AI Startup", "url": "https://example-startup.com", "id": "abc123", "publishedDate": "2024-01-15", "author": "John Doe" // Note: No 'score' field } ] } ``` ### Neural Search (Unchanged) ```json { "results": [ { "score": 0.8756, "title": "Example AI Startup", "url": "https://example-startup.com", "id": "abc123", "publishedDate": "2024-01-15", "author": "John Doe" } ] } ``` ## Need Help with Migration? If you have questions about migrating from Auto/Keyword search scores or need help determining the best search type for your use case, please reach out to [hello@exa.ai](mailto:hello@exa.ai). We're here to help ensure a smooth transition. # Auto search as Default Source: https://docs.exa.ai/changelog/auto-search-as-default Auto search, which intelligently combines Exa's proprietary neural search with traditional keyword search, is now the default search type for all queries. *** The change to Auto search as default leverages the best of both Exa's proprietary neural search and industry-standard keyword search to give you the best results. Out of the box, Exa now automatically routes your queries to the best search type. Read our documentation on Exa's different search types [here](/reference/exas-capabilities-explained). ## What This Means for You 1. **Enhanced results**: Auto search automatically routes queries to the most appropriate search type (neural or keyword), optimizing your search results without any extra effort on your part. 2. **No Action required**: If you want to benefit from Auto search, you don't need to change anything in your existing implementation. It'll just work! 3. **Maintaining current behavior**: If you prefer to keep your current search behavior, here's how: * For neural search: Just set `type="neural"` in your search requests. * For keyword search: As always, add `type="keyword"` to your search requests. ## Quick Example Here's what this means for your code when default switches over: ```Python Python # New default behavior (Auto search) result = exa.search_and_contents("hottest AI startups") # Explicitly use neural search result = exa.search_and_contents("hottest AI startups", type="neural") # Use keyword search result = exa.search_and_contents("hottest AI startups", type="keyword") ``` We're confident this update will significantly improve your search experience. If you have any questions or want to chat about how this might impact your specific use case, please reach out to [\[hello@exa.ai\]](/cdn-cgi/l/email-protection#90f8f5fcfcffd0f5e8f1bef1f9). We can't wait for you to try out the new Auto search as default! # Contents Endpoint Status Changes Source: https://docs.exa.ai/changelog/contents-endpoint-status-changes The /contents endpoint now returns detailed status information for each URL instead of HTTP error codes, providing better visibility into individual content fetch results. *** **Date: 22 May 2025** We've updated the `/contents` endpoint to provide more granular status information for each URL you request. Instead of returning HTTP error codes directly, the endpoint now includes a `statuses` field that gives you detailed information about each content fetch operation. The `/contents` endpoint will now only return an error if there's an internal issue on our end. All other cases are handled through the new `statuses` field. ## What Changed Previously, the `/contents` endpoint would return HTTP error codes when content fetching failed. This approach had limitations when multiple URLs failed for different reasons, making it unclear which specific error to return. Now, the endpoint returns a `statuses` field containing individual status information for each URL, allowing you to handle different failure scenarios appropriately. ## Response Structure The new response structure includes: ```json { "results": [...], "statuses": [ { "id": "https://example.com", "status": "success" | "error", "error": { "tag": "CRAWL_NOT_FOUND" | "CRAWL_TIMEOUT" | "SOURCE_NOT_AVAILABLE" | "CRAWL_UNKNOWN_ERROR", "httpStatusCode": 404 | 408 | 403 | 500 } } ] } ``` ### Status Fields Explained * **id**: The URL that was requested * **status**: Either `"success"` or `"error"` * **error** (optional): Only present when status is `"error"` * **tag**: Specific error type * `CRAWL_NOT_FOUND`: Content not found (404) * `CRAWL_TIMEOUT`: Request timed out (408) * `SOURCE_NOT_AVAILABLE`: Access forbidden or source unavailable (403) * `CRAWL_UNKNOWN_ERROR`: Other errors (500+) * **httpStatusCode**: The corresponding HTTP status code ## How to Update Your Code Instead of catching HTTP errors, you should now check the `statuses` field: ```python Python # Old approach (no longer recommended) try: result = exa.get_contents(["https://example.com"]) except HTTPError as e: print(f"Error: {e.status_code}") # New approach result = exa.get_contents(["https://example.com"]) for status in result.statuses: if status.status == "error": print(f"Error for {status.id}: {status.error.tag} ({status.error.httpStatusCode})") ``` ## Need More Information? If you'd like more information about the status of a crawl or have specific use cases that require additional status details, please contact us at [hello@exa.ai](mailto:hello@exa.ai) with your use case. # Geolocation Filter Support Source: https://docs.exa.ai/changelog/geolocation-filter-support `userLocation` added to the search API to bias search results based on geographic location. *** **Date: July 30, 2025** We're excited to announce a new `userLocation` parameter that lets you bias search results based on a user's geographic region. The location is passed as an [ISO 3166-1 alpha-2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) country code (e.g., "fr" for France, "us" for the United States). If this field is provided, search will return results that are more relevant to users in the provided region. ## When to Use Geolocation Filter The `userLocation` parameter is particularly useful for: 1. **Multi-regional applications**: Show users content that's relevant to their region 2. **Language-specific content**: Prioritizing content in regional languages 3. **Local discovery**: Surface products or businesses relevant to the users region Consider using geolocation filtering when the user's physical location or regional context significantly impacts the relevance of search results. ## How To Use Geolocation Filter Here's how to implement the new `userLocation` parameter: ```python Python result = exa.search_and_contents( "football rules", type="auto", livecrawl="never", userLocation="us", # ISO 3166-1 alpha-2 country code num_results=10 ) ``` ```javascript JavaScript const result = await exa.searchAndContents( "football rules", { type: "auto", livecrawl: "never", userLocation: "us", // ISO 3166-1 alpha-2 country code numResults: 10 } ); ``` ```bash cURL curl -X POST https://api.exa.ai/search \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "football rules", "type": "auto", "userLocation": "us", "numResults": 10 }' ``` ## Response Structure Changes The response structure remains unchanged - geolocation filtering affects result ranking and relevance scoring, but doesn't modify the response format. ## Need Help? If you have any questions about location filtering or need help with your specific use case, please reach out to [hello@exa.ai](mailto:hello@exa.ai). # New Livecrawl Option: Preferred Source: https://docs.exa.ai/changelog/livecrawl-preferred-option Introducing the 'preferred' livecrawl option that tries to fetch fresh content but gracefully falls back to cached results when crawling fails, providing the best of both worlds. *** **Date: 7 June 2025** We've added a new `livecrawl` option called `"preferred"` that provides a more resilient approach to content fetching. This option attempts to crawl fresh content but gracefully falls back to cached results when live crawling fails. The `preferred` option is now available in both `/contents` and `/search_and_contents` endpoints. ## What's New The new `livecrawl: "preferred"` option provides intelligent fallback behavior: * **First**: Attempts to crawl fresh content from the live webpage * **If crawling succeeds**: Returns the fresh, up-to-date content * **If crawling fails but cached content exists**: Returns cached content instead of failing * **If crawling fails and no cached content exists**: Returns the crawl error ## How It Differs from "Always" The key difference between `"preferred"` and `"always"`: | Option | Crawl Fails + Cache Available | Crawl Fails + No Cache | | ------------- | ----------------------------- | ---------------------- | | `"preferred"` | Returns cached content | Returns crawl error | | `"always"` | Returns crawl error | Returns crawl error | This makes `"preferred"` more resilient for production applications where you want fresh content when possible, but don't want requests to fail when websites are temporarily unavailable. If content freshness is critical and you want nothing else, then using `"always"` might be better. ## When to Use "Preferred" The `"preferred"` option is ideal when: * You want the freshest content available but need reliability * Building production applications that can't afford to fail on crawl errors * Content freshness is important but not critical enough to fail the request * You're crawling websites that might be occasionally unavailable ## Complete Livecrawl Options Overview Here are all four livecrawl options and their behaviors: | Option | Crawl Behavior | Cache Fallback | Best For | | ------------- | ---------------- | --------------------------- | --------------------------------------------------- | | `"always"` | Always crawls | Never falls back | Critical real-time data, willing to accept failures | | `"preferred"` | Always crawls | Falls back on crawl failure | Fresh content with reliability | | `"fallback"` | Only if no cache | Uses cache first | Balanced speed and freshness | | `"never"` | Never crawls | Always uses cache | Maximum speed | ## Migration Guide If you're currently using `livecrawl: "always"` but experiencing reliability issues: ```python # Before - fails when crawling fails result = exa.get_contents(urls, livecrawl="always") # After - more resilient with cache fallback result = exa.get_contents(urls, livecrawl="preferred") ``` This change maintains your preference for fresh content while improving reliability. # Markdown Contents as Default Source: https://docs.exa.ai/changelog/markdown-contents-as-default Markdown content is now the default format for all Exa API endpoints, providing cleaner, more readable content that's ideal for AI applications and text processing. *** **Date: 23 June 2025** We've updated all Exa API endpoints to return content in markdown format by default. This change provides cleaner, more structured content that's optimized for AI applications, RAG systems, and general text processing workflows. All endpoints now process webpage content into clean markdown format by default. Use the `includeHtmlTags` parameter to control content formatting. ## What Changed Previously, our endpoints returned content in various formats depending on the specific endpoint configuration. Now, all endpoints consistently return content processed into clean markdown format, making it easier to work with the data across different use cases. ## Content Processing Behavior The `includeHtmlTags` parameter now controls how we process webpage content: * **`includeHtmlTags=false` (default)**: We process webpage content into clean markdown format * **`includeHtmlTags=true`**: We return content as HTML without processing to markdown In all cases, we remove extraneous data, advertisements, navigation elements, and other boilerplate content, keeping only what we detect as the main content of the page. **No action required** if you want the new markdown format - it's now the default! If you need HTML content instead: ## Benefits of Markdown Default 1. **Better for AI applications**: Markdown format is more structured and easier for LLMs to process 2. **Improved readability**: Clean formatting without HTML tags makes content more readable 3. **RAG optimization**: Markdown content chunks more naturally for retrieval systems If you have any questions about this change or need help adapting your implementation, please reach out to [hello@exa.ai](mailto:hello@exa.ai). We're excited for you to experience the improved content quality with markdown as the default! # New Fast Search Type Source: https://docs.exa.ai/changelog/new-fast-search-type Introducing Exa Fast: The world's fastest search API. *** **Date: July 29, 2025** We're excited to introduce **Exa Fast** - the fastest search API in the world. Exa Fast uses streamlined versions of our neural and keyword with p50 latency below 425ms. Fast search is available immediately on all API plans. [Try Fast search in the dashboard →](https://dashboard.exa.ai/playground/search?q=blog%20post%20about%20AI\&filters=%7B%22text%22%3A%22true%22%2C%22type%22%3A%22fast%22%2C%22livecrawl%22%3A%22never%22%7D) ## What's New The Fast search type provides: * **Speed**: p50 latency below 425ms - that's 30% faster than Brave and Google Serp * **Exa Index**: Uses the same index of high quality content as our neural search * **Customization**: Full compatibility with all the same parameters as our other search types ## When to Use Fast Search Fast search is ideal for: 1. **Fast web grounding**: Integrate real-time web information into responses without sacrificing speed and impacting user experience 2. **Agentic workflows**: AI agents like deep research that use dozens or hundreds of search calls where milliseconds add up 3. **Low-latency AI products**: Latency-sensitive applications like AI voice companions where every millisecond matters ## How to Use Fast Search Using Fast search is simple - just add `type="fast"` to your search requests: ```python Python result = exa.search_and_contents( "latest AI news", type="fast", livecrawl="never", ) ``` ```javascript JavaScript const result = await exa.searchAndContents( "latest AI news", { type: "fast", livecrawl: "never" } ); ``` ```bash cURL curl -X POST https://api.exa.ai/search \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "latest AI news", "type": "fast", "livecrawl": "never" }' ``` ## Options That Impact Latency While Fast search is optimized for speed, certain options can increase response times: * **Live crawling**: Fetching content live requires real-time web requests. Set `livecrawl="never"` to use cached content and maintain optimal speed. * **AI summaries**: Requesting AI-generated summaries requires LLM processing, which adds significant latency to your requests. * **Complex date filters**: Using wide date ranges or multiple date constraints requires additional filtering that can slow down results. * **Include/exclude text**: Text-based content filtering requires scanning through results, which impacts response times. * **Subpages**: Including subpages in your search requires additional processing and can significantly increase latency. For the fastest possible performance, use Fast search with minimal parameters and rely on cached content. # Company Analyst Source: https://docs.exa.ai/examples/company-analyst Example project using the Exa Python SDK. *** ## What this doc covers 1. Using Exa's link similarity search to find related links 2. Using the keyword search setting with Exa search\_and\_contents *** In this example, we'll build a company analyst tool that researches companies relevant to what you're interested in. If you just want to see the code, check out the [Colab notebook](https://colab.research.google.com/drive/1VROD6zsaDh%5FrSmogSpSn9FJCwmJO8TSi?here). The code requires an [Exa API key](https://dashboard.exa.ai/api-keys) and an [OpenAI API key](https://platform.openai.com/api-keys). Get 1000 free Exa searches per month just for [signing up](https://dashboard.exa.ai/overview)! ## Shortcomings of Google Say we want to find companies similar to [Thrifthouse](https://thrift.house/), a platform for selling secondhand goods on college campuses. Unfortunately, googling “[companies similar to Thrifthouse](https://www.google.com/search?q=companies+similar+to+Thrifthouse)” doesn't do a very good job. Traditional search engines rely heavily on keyword matching. In this case we get results about physical thrift stores. Hm, that's not really what I want. Let’s try again, this time searching based on a description of the company, like by googling “[community based resale apps](https://www.google.com/search?q=community+based+resale+apps).” But, this isn’t very helpful either and just returns premade SEO-optimized listicles... ![](https://mintlify.s3.us-west-1.amazonaws.com/exa-52/images/0bb023a-Screenshot_2024-02-06_at_11.22.28_AM.png) What we really need is neural search. ## What is neural search? Exa is a fully neural search engine built using a foundational embeddings model trained for webpage retrieval. It’s capable of understanding entity types (company, blog post, Github repo), descriptors (funny, scholastic, authoritative), and any other semantic qualities inside of a query. Neural search can be far more useful than traditional keyword-based searches for these complex queries. ## Finding companies with Exa link similarity search Let's try Exa, using the Python SDK! We can use the`find_similar_and_contents` function to find similar links and get contents from each link. The input is simply a URL, [https://thrift.house](https://thrift.house) and we set `num_results=10`(this is customizable up to thousands of results in Exa). By specifying `highlights={"num_sentences":2}` for each search result, Exa will also identify and return a two sentence excerpt from the content that's relevant to our query. This will allow us to quickly understand each website that we find. ```Python Python !pip install exa_py from exa_py import Exa import os EXA_API_KEY= os.environ.get("EXA_API_KEY") exa = Exa(api_key=EXA_API_KEY) input_url = "https://thrift.house" search_response = exa.find_similar_and_contents( input_url, highlights={"num_sentences":2}, num_results=10) companies = search_response.results print(companies[0]) ``` This is an example of the full first result: ``` [Result(url='https://www.mystorestash.com/', id='lMTt0MBzc8ztb6Az3OGKPA', title='The Airbnb of Storage', score=0.758899450302124, published_date='2023-01-01', author=None, text=None, highlights=["I got my suitcase picked up right from my dorm and didn't have to worry for the whole summer.Angela Scaria /Still have questions?Where are my items stored?"], highlight_scores=[0.23423566609247845])] ``` And here are the 10 titles and URLs I got: ```Python Python # to just see the 10 titles and urls urls = {} for c in companies: print(c.title + ':' + c.url) ``` ```rumie - College Marketplace:https://www.rumieapp.com/ The Airbnb of Storage:https://www.mystorestash.com/ Bunction.net:https://bunction.net/ Home - Community Gearbox:https://communitygearbox.com/ NOVA SHOPPING:https://www.novashoppingapp.com/ Re-Fridge: Buy, sell, or store your college fridge - Re-Fridge:https://www.refridge.com/ Jamble: Social Fashion Resale:https://www.jambleapp.com/ Branded Resale | Treet:https://www.treet.co/ Swapskis:https://www.swapskis.co/ Earn Money for Used Clothing:https://www.thredup.com/cleanout?redirectPath=%2Fcleanout%2Fsell ``` Looks pretty darn good! As a bonus specifically for companies data, specifying `category="company"` in the SDK will search across a curated, larger companies dataset - if you're interested in this, let us know at [hello@exa.ai](mailto:hello@exa.ai)! Now that we have 10 companies we want to dig into further, let’s do some research on each of these companies. ## Finding additional info for each company Now let's get more information by finding additional webpages about each company. To do this, we're going to do a keyword search of each company's URL. We're using keyword because we want to find webpages that exactly match the company we're inputting. We can do this with the `search_and_contents` function, and specify `type="keyword"` and `num_results=5`. This will give me 5 websites about each company. ```python Python # doing an example with the first companies c = companies[0] all_contents = "" search_response = exa.search_and_contents( c.url, # input the company's URL type="keyword", num_results=5 ) research_response = search_response.results for r in research_response: all_contents += r.text ``` Here's an example of the first result for the first company, Rumie App. You can see the first result is the actual link contents itself. ```

The key to your college experience.


Access the largest college exclusive marketplace to buy, sell, and rent with other students.

320,000+

Users in Our Network

Selling is just a away.

Snap a pic, post a listing, and message buyers all from one intuitive app.

Quick setup and .edu verification

Sell locally or ship to other campuses

Trade with other students like you

. From local businesses around your campus

Get access to student exclusive discounts

rumie students get access to student exclusive discounts from local and national businesses around their campus.

Rent dresses from

Wear a new dress every weekend! Just rent it directly from a student on your campus.

Make money off of the dresses you've already worn

rumie rental guarantee ensures your dress won't be damaged

Find a new dress every weekend and save money

. The only place to buy student tickets at student prices

Buy or Sell students Football and Basketball tickets with your campus

rumie students get access to the first-ever student ticket marketplace. No more getting scammed trying to buy tickets from strangers on the internet.

Secure

.edu authentication and buyer protection on purchases.

Lightning-fast

Post your first listing in under a minute.

Verified Students

Trade with other students, not strangers.

Intuitive

List an item in a few simple steps. Message sellers with ease.

Download the app now

Trusted by students.

Saves me money

Facebook Marketplace and Amazon are great but often times you have to drive a long way to meet up or pay for shipping. rumie let’s me know what is available at my school… literally at walking distance.

5 stars!

Having this app as a freshman is great! It makes buying and selling things so safe and easy! Much more efficient than other buy/sell platforms!

Amazing!

5 stars for being simple, organized, safe, and a great way to buy and sell in your college community.. much more effective than posting on Facebook or Instagram!

The BEST marketplace for college students!!!

Once rumie got to my campus, I was excited to see what is has to offer! Not only is it safe for students like me, but the app just has a great feel and is really easy to use. The ONLY place I’ll be buying and selling while I’m a student.

Easier to than GroupMe or Instagram.

Forget clothing instas, selling groupme's, and stress when buying and selling. Do it all from the rumie app.

``` ## Creating a report with LLMs Finally, let's create a summarized report that lists our 10 companies and gives us an easily digestible summary of each company. We can input all of this web content into an LLM and have it generate a nice report! ```Python python import textwrap import openai import os SYSTEM_MESSAGE = "You are a helpful assistant writing a research report about a company. Summarize the users input into multiple paragraphs. Be extremely concise, professional, and factual as possible. The first paragraph should be an introduction and summary of the company. The second paragraph should include pros and cons of the company. Things like what are they doing well, things they are doing poorly or struggling with. And ideally, suggestions to make the company better." openai.api_key = os.environ.get("OPENAI_API_KEY") completion = openai.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": SYSTEM_MESSAGE}, {"role": "user", "content": all_contents}, ], ) summary = completion.choices[0].message.content print(f"Summary for {c.url}:") print(textwrap.fill(summary, 80)) ``` ``` Summary for https://www.rumieapp.com/: Rumie is a college-exclusive marketplace app that allows students to buy, sell, and rent items with other students. It has over 320,000 users in its network and offers features such as quick setup, .edu verification, local and campus-wide selling options, and exclusive discounts from local businesses. Students can also rent dresses from other students, buy or sell student tickets at student prices, and enjoy secure and intuitive transactions. The app has received positive feedback from users for its convenience, safety, and effectiveness in buying and selling within the college community. Pros of Rumie include its focus on college students' needs, such as providing a safe platform and exclusive deals for students. The app offers an intuitive and fast setup process, making it easy for students to start buying and selling. The option to trade with other students is also appreciated. Users find it convenient that they can sell locally or ship items to other campuses. The app's rental guarantee for dresses provides assurance to users that their dresses won't be damaged. Overall, Rumie is highly regarded as a simple, organized, and safe platform for college students to buy and sell within their community. Suggestions to improve Rumie include expanding its reach to more colleges and universities across the nation and eventually internationally. Enhancing marketing efforts and fundraising can aid in raising awareness among college students. Additionally, incorporating features such as improved search filters and a rating/review system for buyers and sellers could enhance the user experience. Continual updates and improvements to the app's interface and functionality can also ensure that it remains user-friendly and efficient. ``` And we’re done! We’ve built an app that takes in a company webpage and uses Exa to 1. Discover similar startups 2. Find information about each of those startups 3. Gather useful content and summarize it with OpenAI Hopefully you found this tutorial helpful and are ready to start building your very own company analyst! Whether you want to generate sales leads or research competitors to your own company, Exa's got you covered. # Chat app Source: https://docs.exa.ai/examples/demo-chat # Company researcher Source: https://docs.exa.ai/examples/demo-company-researcher # Writing Assistant Source: https://docs.exa.ai/examples/demo-exa-powered-writing-assistant [Click here to try the Exa-powered Writing Assistant](https://demo.exa.ai/writing) [Click here to see the relevant GitHub repo and hosting instructions](https://github.com/exa-labs/exa-writing-assist) ## What this doc covers * Live demo link for hands-on experience (above!) * Overview of a real-time writing assistant using Exa and Claude * Breakdown of Exa query prompt engineering and generative AI system prompt ## Demo overview ## High-level overview This demo showcases a real-time writing assistant that uses Exa's search capabilities to provide relevant information and citations as a user writes. The system combines Exa's neural search with Anthropic's Claude AI model to generate contextually appropriate content and citations. ![Conceptual block diagram of how the writing assistant works](https://mintlify.s3.us-west-1.amazonaws.com/exa-52/images/77dd3c1-image.png) Conceptual block diagram of how the writing assistant works ## Exa prompting and query style The Exa search is performed using a unique query style that appends the user's input with a prompt for continuation. Here's the relevant code snippet: ```JavaScript JavaScript let exaQuery = conversationState.length > 1000 ? (conversationState.slice(-1000))+"\n\nIf you found the above interesting, here's another useful resource to read:" : conversationState+"\n\nIf you found the above interesting, here's another useful resource to read:" let exaReturnedResults = await exa.searchAndContents( exaQuery, { type: "neural", numResults: 10, highlights: { numSentences: 1, highlightsPerUrl: 1 } } ) ``` **Key aspects of this query style:** * **Continuation prompt:** The crucial post-pend "A helpful source to read so you can continue writing the above:" * This prompt is designed to find sources that can logically continue the user's writing when passed to an LLM to generate content. * It leverages Exa's ability to understand context and find semantically relevant results. * By framing the query as a request for continuation, it aligns with how people naturally share helpful links. * **Length limitation:** It caps the query at 1000 characters to maintain relevance and continue writing just based on the last section of the text. Note this prompt is not a hard and fast rule for this use-case - we encourage experimentation with query styles to get the best results for your specific use case. For instance, you could further constrain down to just research papers. ## Prompting Claude with Exa results The Claude AI model is prompted with a carefully crafted system message and passed the above formatted Exa results. Here is an example system prompt: ```typescript TypeScript const systemPrompt = `You are an essay-completion bot that continues/completes a sentence given some input stub of an essay/prose. You only complete 1-2 SHORT sentence MAX. If you get an input of a half sentence or similar, DO NOT repeat any of the preceding text of the prose. THIS MEANS DO NOT INCLUDE THE STARTS OF INCOMPLETE SENTENCES IN YOUR RESPONSE. This is also the case when there is a spelling, punctuation, capitalization or other error in the starter stub - e.g.: USER INPUT: pokemon is a YOUR CORRECT OUTPUT: Japanese franchise created by Satoshi Tajiri. NEVER/INCORRECT: Pokémon is a Japanese franchise created by Satoshi Tajiri. USER INPUT: Once upon a time there YOUR CORRECT OUTPUT: was a princess. NEVER/INCORRECT: Once upon a time, there was a princess. USER INPUT: Colonial england was a YOUR CORRECT OUTPUT: time of great change and upheaval. NEVER/INCORRECT: Colonial England was a time of great change and upheaval. USER INPUT: The fog in san francisco YOUR CORRECT OUTPUT: is a defining characteristic of the city's climate. NEVER/INCORRECT: The fog in San Francisco is a defining characteristic of the city's climate. USER INPUT: The fog in san francisco YOUR CORRECT OUTPUT: is a defining characteristic of the city's climate. NEVER/INCORRECT: The fog in San Francisco is a defining characteristic of the city's climate. Once you have made one citation, stop generating. BE PITHY. Where there is a full sentence fed in, you should continue on the next sentence as a generally good flowing essay would. You have a specialty in including content that is cited. Given the following two items, (1) citation context and (2) current essay writing, continue on the essay or prose inputting in-line citations in parentheses with the author's name, right after that followed by the relevant URL in square brackets. THEN put a parentheses around all of the above. If you cannot find an author (sometimes it is empty), use the generic name 'Source'. ample citation for you to follow the structure of: ((AUTHOR_X, 2021)[URL_X]). If there are more than 3 author names to include, use the first author name plus 'et al'` ``` This prompt ensures that: * Claude will only do completions, not parrot back the user query like in a typical chat based scenario. Note the inclusion of multiple examples that demonstrate Claude should not reply back with the stub even if there are errors, like spelling or grammar, in the input text (which we found to be a common issue) * We define the citation style and formatting. We also tell the bot went to collapse authors into 'et al' style citations, as some webpages have many authors Once again, experimenting with this prompt is crucial to getting best results for your particular use case. ## Conclusion This demo illustrates the power of combining Exa's advanced search capabilities with generative AI to create a writing assistant. By leveraging Exa's neural search and content retrieval features, the system can provide relevant, up-to-date information to any AI model, resulting in contextually appropriate content generation with citations. This approach showcases how Exa can be integrated into AI-powered applications to enhance user experiences and productivity. [Click here to try the Exa-powered Writing Assistant](https://demo.exa.ai/writing) # Hallucination Detector Source: https://docs.exa.ai/examples/demo-hallucination-detector A live demo that detects hallucinations in content using Exa's search.
*** We built a live hallucination detector that uses Exa to verify LLM-generated content. When you input text, the app breaks it into individual claims, searches for evidence to verify each one, and returns relevant sources with a verification confidence score. A claim is a single, verifiable statement that can be proven true or false - like "The Eiffel Tower is in Paris" or "It was built in 1822." This document explains the functions behind the three steps of the fact-checker: 1. The LLM extracts verifiable claims from your text 2. Exa searches for relevant sources for each claim 3. The LLM evaluates each claim against its sources, returning whether or not its true, along with a confidence score. See the full [step-by-step guide](/examples/identifying-hallucinations-with-exa) and [github repo](https://github.com/exa-labs/exa-hallucination-detector) if you'd like to recreate. *** ## Function breakdown The `extract_claims` function uses an LLM (Anthropic's, in this case) to identify distinct, verifiable statements from your inputted text, returning these claims as a JSON array of strings. For simpilicity, we did not include a try/catch block in the code below. However, if you are building your own hallucination detector, you should include one that catches any errors in the LLM parsing and uses a regex method that treats each sentence (text between capital letter and end punctuation) as a claim. ```python Python def extract_claims(text: str) -> List[str]: """Extract factual claims from the text using an LLM.""" system_message = SystemMessage(content=""" You are an expert at extracting claims from text. Your task is to identify and list all claims present, true or false, in the given text. Each claim should be a single, verifiable statement. Present the claims as a JSON array of strings. """) human_message = HumanMessage(content=f"Extract factual claims from this text: {text}") response = llm.invoke([system_message, human_message]) claims = json.loads(response.content) return claims ``` The `exa_search` function uses Exa search to find evidence for each extracted claim. For every claim, it retrieves the 5 most relevant sources, formats them with their URLs and content (`text`), passing them to the next function for verification. ```python Python def exa_search(query: str) -> List[str]: """Retrieve relevant documents using Exa's semantic search.""" search = ExaSearchRetriever(k=5, text=True) document_prompt = PromptTemplate.from_template(""" {url} {text} """) parse_info = RunnableLambda( lambda document: { "url": document.metadata["url"], "text": document.page_content or "No text available", } ) document_chain = (parse_info | document_prompt) search_chain = search | document_chain.map() documents = search_chain.invoke(query) return [str(doc) for doc in documents] ``` The `verify_claim` function checks each claim against the sources from `exa_search`. It uses an LLM to determine if the sources support or refute the claim and returns a decision with a confidence score. If no sources are found, it returns "insufficient information". ```python Python def verify_claim(claim: str, sources: List[str]) -> Dict[str, Any]: """Verify a single claim using combined Exa search sources.""" if not sources: return { "claim": claim, "assessment": "Insufficient information", "confidence_score": 0.5, "supporting_sources": [], "refuting_sources": [] } combined_sources = "\n\n".join(sources) system_message = SystemMessage(content=""" You are an expert fact-checker. Given a claim and sources, determine whether the claim is supported, refuted, or lacks sufficient evidence. Provide your answer as a JSON object with assessment and confidence score. """) human_message = HumanMessage(content=f'Claim: "{claim}"\nSources:\n{combined_sources}') response = llm.invoke([system_message, human_message]) return json.loads(response.content) ``` Using LLMs to extract claims and verify them against Exa search sources is a simple way to detect hallucinations in content. If you'd like to recreate it, the full documentation for the script is [here](/examples/identifying-hallucinations-with-exa) and the github repo is [here](https://github.com/exa-labs/exa-hallucination-detector). # Websets News Monitor Source: https://docs.exa.ai/examples/demo-websets-news-monitor A live demo that monitors the web semantically using the Websets API. *** # Overview We created a Websets News Monitor that uses the Websets API to monitor the web semantically for queries like "startup funding round announcements" or "new product launches." Each tab uses a different Webset that updates daily using a monitor. It demonstrates best practices for news monitoring including: * Deduplicating articles about the same story * Filtering out low-quality data sources * Receiving real-time updates via webhooks [View the full source code on GitHub](https://github.com/exa-labs/websets-news-monitor). # How it Works [Webhooks](/websets/api/webhooks) allow you to subscribe to real-time updates as your Websets run. We want to know when a Webset is created and items finish enriching, so we'll subscribe to `webset.created` and `webset.item.enriched`. ```javascript Javascript const exa = new Exa(process.env.EXA_API_KEY); const webhookUrl = 'https://smee.io/123abc456def'; // Replace with your webhook handler endpoint webhook = await exa.websets.webhooks.create({ url: webhookUrl, events: [ EventType.webset_created, EventType.webset_item_enriched, ], }); console.log(`✅ Webhook created with ID: ${webhook.id}`); console.log(`WEBHOOK_SECRET=${webhook.secret}`); ``` Save `webhook.secret`, we'll use it later to validate incoming webhook requests. Now we'll create a Webset that searches for the types of articles we are looking for. Use `query` to direct the search and `criteria` to narrow down the results. In this example we're looking for articles about recent startup fundraises. ```javascript Javascript const webset = await exa.websets.create({ search: { query: "Startups that raised a funding round in the last 24 hours", criteria: [ { description: "Article is about a startup raising a funding round of at least $1M", }, { description: "Article published in a top 20 tech publication (TechCrunch, The Verge, Wired, etc.)", }, { description: "Article was published in the last 24 hours", } ], entity: { type: "article" }, behavior: "append", count: 25 }, enrichments: [ { description: "One sentence summary of the article using content not in the title", format: "text", } ] }); console.log(`✅ Webset created with ID: ${webset.id}`); ``` We want our Webset to update with new articles daily, so we'll create a monitor with the `webset.id`. We set the `cadence` parameter to run daily and the `search` behavior so it looks for new results. By default, monitors use the last search the Webset ran. When we created the Webset we used "in the last 24 hours" so it's always relative to when the monitor runs. ```javascript Javascript const monitor = await exa.websets.monitors.create({ websetId: webset.id, behavior: { type: "search", config: { count: 10 } }, cadence: { cron: "0 0 * * *", // Every day timezone: "UTC" } }); console.log(`✅ Monitor created with ID: ${monitor.id}`); ``` Lastly, we need to create an endpoint to handle the webhook requests. We'll setup a Next.js route to handle POST requests and parse the event data. For security purposes, you should verify the request's signature using the webhook secret from the first step. See the [signature verification guide](https://docs.exa.ai/websets/api/webhooks/verifying-signatures) for more info. ```javascript Javascript // app/api/webhook/route.ts import { NextRequest, NextResponse } from 'next/server'; import { prisma } from '@/lib/prisma'; import { verifyWebhookSignature } from '@/lib/webhook'; import { exa } from '@/lib/exa'; import { embedText } from '@/lib/openai'; import { isDuplicate } from '@/lib/dedupe'; export async function POST(request: NextRequest) { // Get the raw body for signature verification const rawBody = await request.text(); const signatureHeader = request.headers.get('exa-signature') || ''; const webhookSecret = process.env.WEBHOOK_SECRET; // Verify webhook signature if (!verifyWebhookSignature(rawBody, signatureHeader, webhookSecret)) { console.error('Invalid webhook signature'); return NextResponse.json({ error: 'Invalid signature' }, { status: 400 }); } const body = JSON.parse(rawBody); switch (body.type) { case 'webset.created': // Handle new Webset break; case 'webset.item.enriched': // Handle new enriched item break; default: break; } return NextResponse.json({ received: true, type: body.type, timestamp: new Date().toISOString() }); ``` View the full route implementation [here](https://github.com/exa-labs/websets-news-monitor/blob/main/src/app/api/webhook/route.ts). # Semantic Whitelisting We want our feeds to contain high-quality links and avoid SEO spam. This would normally require manually maintaining lists of domains to include/exclude from your results, but with Websets it's simple. You can create criteria that function as a *semantic whitelist*, telling the LLM what kinds of articles to allow. Here's an example: ``` Article published in a top 20 tech publication (TechCrunch, The Verge, Wired, etc.) ``` You can see all of the criteria used in the demo [here](https://github.com/exa-labs/websets-news-monitor/blob/main/scripts/setup-websets.js). # Storyline Deduplication A common issue when monitoring news is handling multiple articles about the same storyline. Often you want to group articles by storyline or remove duplicates so users don't see repeated content. In our demo, we solve this using embeddings, vector search, and an LLM to classify duplicates. First, we'll embed the article's title using OpenAI's embedding API. We'll use the `text-embedding-3-small` model that produces vectors optimized for similarity comparisons. ```javascript Javascript import OpenAI from 'openai'; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, }); const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: title, dimensions: 1536, }); const embedding = response.data[0].embedding; ``` Next, we use PostgreSQL's `pgvector` extension to find the 10 most similar articles from the last week. ```javascript Javascript import { prisma } from '@/lib/prisma'; const query = ` SELECT id, title, "publishedAt", embedding <+> $1::vector AS distance FROM "Articles" WHERE "publishedAt" >= NOW() - INTERVAL '7 days' ORDER BY embedding <+> $1::vector LIMIT 10; `; const similarArticles = await prisma.$queryRawUnsafe(query, embedding) ``` Finally, we'll use an LLM with structured outputs to classify whether the article is a duplicate. The LLM will look at the titles of similar articles and determine if they are about the same event. ```javascript Javascript const DuplicateCheck = z.object({ is_duplicate: z.boolean(), }); const response = await openai.responses.parse({ model: 'gpt-4o-mini', input: [ { role: 'system', content: 'You are a news deduplication assistant. Determine if stories are about the same event.' }, { role: 'user', content: `Is this story a duplicate of any in the list? \nQuery story: "${title}" \nSimilar stories: ${similarArticles.map(item => item.title).join('\n')}` } ], text: { format: zodTextFormat(DuplicateCheck, "duplicate_check"), }, }); const isDuplicate = response.output_parsed.is_duplicate; ``` You can view the complete deduplication implementation [here](https://github.com/exa-labs/websets-news-monitor/blob/main/src/lib/dedupe.ts). # Exa MCP Source: https://docs.exa.ai/examples/exa-mcp Exa MCP Server enables AI assistants like Claude to perform real-time web searches through the Exa Search API, allowing them to access up-to-date information from the internet. It is open-source, checkout [GitHub](https://github.com/exa-labs/exa-mcp-server/). ## Remote Exa MCP Connect directly to Exa's hosted MCP server (instead of running it locally). ### Remote Exa MCP URL ``` https://mcp.exa.ai/mcp?exaApiKey=your-exa-api-key ``` Get your API key from [dashboard.exa.ai/api-keys](https://dashboard.exa.ai/api-keys). ### Claude Desktop Configuration Add this to your Claude Desktop configuration file: ```json { "mcpServers": { "exa": { "command": "npx", "args": [ "-y", "mcp-remote", "https://mcp.exa.ai/mcp?exaApiKey=your-exa-api-key" ] } } } ``` ## Available Tools Exa MCP includes several specialized search tools: | Tool | Description | | ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `deep_researcher_start` | Start a smart AI researcher for complex questions. The AI will search the web, read many sources, and think deeply about your question to create a detailed research report | | `deep_researcher_check` | Check if your research is ready and get the results. Use this after starting a research task to see if it's done and get your comprehensive report | | `web_search_exa` | Performs real-time web searches with optimized results and content extraction | | `company_research` | Comprehensive company research tool that crawls company websites to gather detailed information about businesses | | `crawling` | Extracts content from specific URLs, useful for reading articles, PDFs, or any web page when you have the exact URL | | `linkedin_search` | Search LinkedIn for companies and people using Exa AI. Simply include company names, person names, or specific LinkedIn URLs in your query | ## Usage Examples Once configured, you can ask Claude to perform searches: * "Research the company exa.ai and find information about their pricing" * "Start a deep research project on the impact of artificial intelligence on healthcare, then check when it's complete to get a comprehensive report" ## Local Installation ### Prerequisites * [Node.js](https://nodejs.org/) v18 or higher. * [Claude Desktop](https://claude.ai/download) installed (optional). Exa MCP also works with other MCP-compatible clients like Cursor, Windsurf, and more). * An [Exa API key](https://dashboard.exa.ai/api-keys). ### Using Claude Code The quickest way to set up Exa MCP is using Claude Code: ```bash claude mcp add exa -e EXA_API_KEY=YOUR_API_KEY -- npx -y exa-mcp-server ``` Replace `YOUR_API_KEY` with your actual Exa API key from [dashboard.exa.ai/api-keys](https://dashboard.exa.ai/api-keys). ### Using NPX The simplest way to install and run Exa MCP is via NPX: ```bash # Install globally npm install -g exa-mcp-server # Or run directly with npx npx exa-mcp-server ``` To specify which tools to enable: ```bash # Enable only web search npx exa-mcp-server --tools=web_search # Enable deep researcher tools npx exa-mcp-server --tools=deep_researcher_start,deep_researcher_check # List all available tools npx exa-mcp-server --list-tools ``` ## Configuring Claude Desktop To configure Claude Desktop to use Exa MCP: 1. **Enable Developer Mode in Claude Desktop** * Open Claude Desktop * Click on the top-left menu * Enable Developer Mode 2. **Open the Configuration File** * After enabling Developer Mode, go to Settings * Navigate to the Developer Option * Click "Edit Config" to open the configuration file Alternatively, you can open it directly: **macOS:** ```bash code ~/Library/Application\ Support/Claude/claude_desktop_config.json ``` **Windows:** ```powershell code %APPDATA%\Claude\claude_desktop_config.json ``` 3. **Add Exa MCP Configuration** Add the following to your configuration: ```json { "mcpServers": { "exa": { "command": "npx", "args": [ "-y", "exa-mcp-server" ], "env": { "EXA_API_KEY": "your-api-key-here" } } } } ``` Replace `your-api-key-here` with your actual Exa API key. You can get your (Exa API here)\[[https://dashboard.exa.ai/api-keys](https://dashboard.exa.ai/api-keys)]. 4. **Enabling Specific Tools** To enable only specific tools: ```json { "mcpServers": { "exa": { "command": "npx", "args": [ "-y", "exa-mcp-server", "--tools=web_search" ], "env": { "EXA_API_KEY": "your-api-key-here" } } } } ``` To enable deep researcher tools: ```json { "mcpServers": { "exa": { "command": "npx", "args": [ "-y", "exa-mcp-server", "--tools=deep_researcher_start,deep_researcher_check" ], "env": { "EXA_API_KEY": "your-api-key-here" } } } } ``` 5. **Restart Claude Desktop** * Completely quit Claude Desktop (not just close the window) * Start Claude Desktop again * Look for the 🔌 icon to verify the Exa server is connected ## Troubleshooting ### Common Issues 1. **Server Not Found** * Ensure the npm package is correctly installed 2. **API Key Issues** * Confirm your EXA\_API\_KEY is valid * Make sure there are no spaces or quotes around the API key 3. **Connection Problems** * Restart Claude Desktop completely ## Additional Resources For more information, visit the [Exa MCP Server GitHub repository](https://github.com/exa-labs/exa-mcp-server/). # RAG Q&A Source: https://docs.exa.ai/examples/exa-rag Using Exa to enable retrieval-augmented generation. *** ### What this doc covers 1. Using Exa search\_and\_contents to find relevant webpages for a query and get their contents 2. Performing Exa search based on text similarity rather than a search query The Jupyter notebook for this tutorial is available on [Colab](https://colab.research.google.com/drive/1iXfXg9%5F-MEmhwW1a0WRHHbMl21jSxjO7?usp=sharing) for easy experimentation. ## Answer your questions with context LLMs are powerful because they compress large amounts of data into a format that allows convenient access, but this compressions isn't lossless. LLMs are prone to hallucination, corrupting facts and details from training data. To get around this fundamental issue with LLM reliability, we can use Exa to bring the most relevant data into context—a fancy way of saying: put the info in the LLM prompt directly. This lets us combine the compressed data and *reasoning abilities* of the LLM with a curated selection of uncompressed, accurate data for the problem at hand for the best answers possible. Exa's SDKs make incorporating quality data into your LLM pipelines quick and painless. Install the SDK by running this command in your terminal: ```Shell Shell pip install exa-py ``` ```Python Python # Now, import the Exa class and pass your API key to it. from exa_py import Exa my_exa_api_key = "YOUR_API_KEY_HERE" exa = Exa(my_exa_api_key) ``` For our first example, we'll set up Exa to answer questions with OpenAI's popular GPT-3.5 Turbo model. (You can use GPT 4 or another model if you prefer!) We'll use Exa's `highlight` feature, which directly returns relevant text of customizable length for a query. You'll need to run `pip install openai` to get access to OpenAI's SDK if you haven't used it before. More information about the OpenAI Python SDK can be found [here](https://platform.openai.com/docs/quickstart?context=python). ```Python Python # Set up OpenAI' SDK from openai import OpenAI openai_api_key = "YOUR_API_KEY_HERE" openai_client = OpenAI(api_key=openai_api_key) ``` Now, we just need some questions to answer! ```Python Python questions = [ "How did bats evolve their wings?", "How did Rome defend Italy from Hannibal?", ] ``` While LLMs can answer some questions on their own, they have limitations: * LLMs don't have knowledge past when their training was stopped, so they can't know about recent events * If an LLM doesn't know the answer, it will often 'hallucinate' a correct-sounding response, and it can be difficult and inconvenient to distinguish these from correct answers * Because of the opaque manner of generation and the problems mentioned above, it is difficult to trust an LLM's responses when accuracy is [important](https://www.forbes.com/sites/mollybohannon/2023/06/08/lawyer-used-chatgpt-in-court-and-cited-fake-cases-a-judge-is-considering-sanctions/?sh=27194eb67c7f) Robust retrieval helps solve all of these issues by providing quality sources of ground truth for the LLM (and their human users) to leverage and cite. Let's use Exa to get some information to answer our questions: ```Python Python # Parameters for our Highlights search highlights_options = { "num_sentences": 7, # how long our highlights should be "highlights_per_url": 1, # just get the best highlight for each URL } # Let the magic happen! info_for_llm = [] for question in questions: search_response = exa.search_and_contents(question, highlights=highlights_options, num_results=3) info = [sr.highlights[0] for sr in search_response.results] info_for_llm.append(info) ``` ```Python Python info_for_llm ``` ```[['As the only mammals with powered flight, the evolutionary\xa0history of their wings has been poorly understood. However, research published Monday in Nature and PLoS Genetics has provided the first comprehensive look at the genetic origins of their incredible wings.But to appreciate the genetics of their wing development, it’s important to know how crazy a bat in flight truly\xa0looks.Try a little experiment: Stick your arms out to the side, palms facing forward, thumbs pointing up toward the ceiling. Now imagine that your fingers are\xa0long, arching down toward the floor like impossibly unkempt fingernails — but still made of bone, sturdy and spread apart. Picture the sides of your body connecting to your hands, a rubbery membrane attaching your leg and torso to those long fingers, binding you with strong, stretchy skin. Then, finally, imagine using your muscles to flap those enormous hands.Bats, man.As marvelous as bat flight is to behold, the genetic origins of their storied wings has remained murky. However, new findings from an international team of researchers led by Nadav Ahituv, PhD, of the University of California at San Francisco, Nicola Illing, PhD, of the University of Cape Town\xa0in\xa0South Africa\xa0and Katie Pollard, PhD of the UCSF-affiliated Gladstone Institutes has shed new light on how, 50 million years ago, bats took a tetrapod blueprint for arms and legs and went up into the sky.Using a sophisticated set of genetic tools, researchers approached the question of how bats evolved flight by looking not only at which genes were used in the embryonic development of wings, but at what point during development the genes were turned on and off, and — critically — what elements in the genome were regulating the expression of these genes. Genes do not just turn themselves on without input; genetic switches, called enhancers, act to regulate the timing and levels of gene expression in the body.', "Since flight evolved millions of years ago in all of the groups that are capable of flight today, we can't observe the changes in behavior and much of the morphology that the evolution of flight involves. We do have the fossil record, though, and it is fairly good for the three main groups that evolved true flight. We'll spare you an in-depth description of how each group evolved flight for now; see the later exhibits for a description of each group and how they developed flight.", "It's easy to forget that one in five species of mammal on this planet have wings capable of delivering spectacularly acrobatic flying abilities. Equally incredibly, two-thirds of these 1,200 species of flying mammal can fly in the dark, using exquisite echolocation to avoid obstacles and snatch airborne prey with stunning deftness. These amazing feats have helped make bats the focus not only of folkloric fascination, but also of biological enquiry and mimicry by human engineers from Leonardo da Vinci onwards. Recent research in PLOS journals continues to add surprising new findings to what we know about bats, and how they might inspire us to engineer manmade machines such as drones to emulate their skills. Bats, unlike most birds and flying insects, have relatively heavy wings – something that might appear disadvantageous. But a recent study in PLOS Biology by Kenny Breuer and colleagues shows that bats can exploit the inertia of the wings to make sharp turns that would be near-impossible using aerodynamic forces alone. The authors combined high-speed film of real bats landing upside-down on ceiling roosts with computational modelling to tease apart aerodynamic and inertial effects."], ["things, gold and silver, could buy a victory. And this Other Italian cities, inspired by Rome's example, overpowered occupying troops, shut their gates again and invited a second siege. Hannibal could not punish them without dividing his he had no competent leadership to do so, what with one member of", 'A group of Celts known as the Senone was led through Italy by their commander, Brennus. The Senone Gauls were threatening the nearby town of Clusium, when Roman Ambassadors from the Fabii family were sent to negotiate peace for Clusium. The Romans were notoriously aggressive, and so it is only a little surprising that when a scuffle broke out between the Gauls and Clusians, the Fabii joined in and actually killed a Senone chieftain. The Roman people voted to decide the fate of those who broke the sacred conduct of ambassadors, but the Fabii were so popular that they were instead voted to some of the highest positions in Rome. This absolutely infuriated Brennus and his people and they abandoned everything and headed straight for Rome. Rome was woefully unprepared for this sudden attack. The Gauls had marched with purpose, declaring to all the towns they passed that they would not harm them, they were heading straight for Rome.', "Hannibal had no intention to sit and recieve the romans in spain.Hannibal clearly considered the nature of roman power-and came to the conclusion that Rome could only be defeated in Italy.The cornerstone of Rome's power was a strategic manpower base that in theory could produce 7,00,000 infantry and 70,000 cavalry.More than half of this manpower base (4,00,000) was provided by rome's Italian allies,who paid no taxes but had to render military service to rome's armies.Not all were content.Carthage on the other hand rarely used its own citizens for war,bulk of its army being mercenaries.In any case its manpower could never even come close to Rome,the fact that had aided roman victory in the 1st Punic war.Hannibal thus understood that Rome could afford to raise and send army after army to spain and take losses. Meanwhile any carthiginian losses in spain would encourage the recently conquered iberian tribes to defect. The only way to defeat Rome,was to fight in italy itself.By winning battle after battle on italian soil and demonstrating to the italian allies rome's inability to protect them and weakness,he could encourage them to break free of Rome eroding Rome's manpower to sizeable proportions. But there was one problem,his fleet was tiny and Rome ruled the seas.By land,the coastal route would be blocked by Roman forces and her ally-the great walled city of massalia.Hannibal thus resolved to think and do the impossible - move thousands of miles by land through the pyranees mountains,uncharted territory inhabited by the fierce gauls ,then through the Alps mountains and invade italy. Even before the siege of Saguntum had concluded,Hannibal had set things in motion.Having sent a number of embassies to the Gallic tribes in the Po valley with the mission of establishing a safe place for Hannibal to debouch from the Alps into the Po valley. He did not desire to cross this rugged mountain chain and to descend into the Po valley with exhausted troops only to have to fight a battle.Additionally the fierce gauls would provide a source of manpower for Hannibal's army.The romans had recently conquered much territory from the gauls in this area,brutally subjagating them ,seizing their land and redistributing it to roman colonists.Thus securing an alliance proved to be easy. After the sack of Saguntum he dismissed his troops to their own localities."]] ``` Now, let's give the context we got to our LLM so it can answer our questions with solid sources backing them up! ```Python Python responses = [] for question, info in zip(questions, info_for_llm): system_prompt = "You are RAG researcher. Read the provided contexts and, if relevant, use them to answer the user's question." user_prompt = f"""Sources: {info} Question: {question}""" completion = openai_client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt}, ] ) response = f""" Question: {question} Answer: {completion.choices[0].message.content} """ responses.append(response) ``` ```Python Python from pprint import pprint # pretty print pprint(responses) ``` ```['\n' ' Question: How did bats evolve their wings?\n' ' Answer: Recent research has shed new light on how bats evolved their ' 'wings. An international team of researchers used genetic tools to study the ' 'embryonic development of bat wings and the genes involved in their ' 'formation. They also investigated the regulatory elements in the genome that ' 'control the expression of these genes. By analyzing these factors, the ' 'researchers discovered that bats took a tetrapod blueprint for arms and legs ' 'and adapted it to develop wings, allowing them to fly. This research ' 'provides a comprehensive understanding of the genetic origins of bat wings ' 'and how they evolved over 50 million years ago.\n' ' ', '\n' ' Question: How did Rome defend Italy from Hannibal?\n' ' Answer: Rome defended Italy from Hannibal by using various strategies. One ' 'of the main defenses relied on the Roman manpower base, which consisted of a ' 'large army made up of Roman citizens and Italian allies who were obligated ' "to render military service. Rome's strategic manpower base was a cornerstone " 'of their power, as it could produce a significant number of infantry and ' 'cavalry. This posed a challenge for Hannibal, as Carthage relied heavily on ' "mercenaries and could not match Rome's manpower.\n" '\n' 'Hannibal realized that in order to defeat Rome, he needed to fight them in ' 'Italy itself. His plan was to win battles on Italian soil and demonstrate ' "Rome's inability to protect their Italian allies, with the intention of " "encouraging them to break free from Rome. This would erode Rome's manpower " 'base to a sizeable proportion. However, Hannibal faced several obstacles. ' 'Rome ruled the seas, making it difficult for him to transport troops and ' 'supplies by sea. Additionally, the coastal route to Italy would be blocked ' 'by Roman forces and their ally, the walled city of Massalia.\n' '\n' 'To overcome these challenges, Hannibal devised a daring plan. He decided to ' 'lead his troops on a treacherous journey through the Pyrenees mountains, ' 'inhabited by fierce Gauls, and then through the Alps mountains to invade ' 'Italy. He sent embassies to Gallic tribes in the Po valley, securing ' 'alliances and establishing a safe place for his army to enter the Po valley ' 'from the Alps.\n' '\n' 'Overall, Rome defended Italy from Hannibal by leveraging their manpower ' 'base, their control of the seas, and their strategic alliances with Italian ' 'allies. They also had the advantage of better infrastructure and control ' 'over resources within Italy itself. These factors ultimately played a ' "significant role in Rome's defense against Hannibal's invasion.\n" ' '] ``` ## Beyond Question Answering: Text Similarity Search Exa can be used for more than simple question answering. One superpower of Exa's special embeddings-based search is that we can search for websites containing text with similar meaning to a given paragraph or essay! Instead of providing a standard query like "a research paper about Georgism", we can provide Exa with a paragraph about Georgism and find websites with similar contents. This is useful for finding additional sources for your research paper, finding alternatives/competitors for a product, etc. ```Python Python paragraph = """ Georgism, also known as Geoism, is an economic philosophy and ideology named after the American political economist Henry George (1839–1897).This doctrine advocates for the societal collective, rather than individual property owners, to capture the economic value derived from land and other ural resources. To this end, Georgism proposes a single tax on the unimproved value of land, known as a "land value tax," asserting that this would deter speculative land holding and promote efficient use of valuable resources. Adherents argue that because the supply of land is fundamentally inelastic, taxing it will not deter its availability or use, unlike other forms of taxation. Georgism differs from Marxism and capitalism, underscoring the distinction between common and private property while largely contending that individuals should own the fruits of their labor.""" query = f"The best academic source about {paragraph} is (paper: " georgism_search_response = exa.search_and_contents(paragraph, highlights=highlights_options, num_results=5) ``` ```Python Python for result in georgism_search_response.results: print(result.title) print(result.url) pprint(result.highlights) ``` ```Henry George https://www.newworldencyclopedia.org/entry/Henry_George ["George's theory of interest is nowadays dismissed even by some otherwise " 'Georgist authors, who see it as mistaken and irrelevant to his ideas about ' 'land and free trade. The separation of the value of land into improved and ' "unimproved is problematic in George's theory. Once construction has taken " 'place, not only the land on which such improvements were made is affected, ' 'the value of neighboring, as yet unimproved, land is impacted. Thus, while ' 'the construction of a major attraction nearby may increase the value of ' 'land, the construction of factories or nuclear power plants decreases its ' 'value. Indeed, location is the single most important asset in real estate. ' 'George intended to propose a tax that would have the least negative impact ' 'on productive activity. However, even unimproved land turns out to be ' 'affected in value by productive activity in the neighborhood.'] Wikiwand https://www.wikiwand.com/en/Georgism ['Georgism is concerned with the distribution of economic rent caused by land ' 'ownership, natural monopolies, pollution rights, and control of the commons, ' 'including title of ownership for natural resources and other contrived ' 'privileges (e.g. intellectual property). Any natural resource which is ' 'inherently limited in supply can generate economic rent, but the classical ' 'and most significant example of land monopoly involves the extraction of ' 'common ground rent from valuable urban locations. Georgists argue that ' 'taxing economic rent is efficient, fair and equitable. The main Georgist ' 'policy recommendation is a tax assessed on land value, arguing that revenues ' 'from a land value tax (LVT) can be used to reduce or eliminate existing ' 'taxes (such as on income, trade, or purchases) that are unfair and ' 'inefficient. Some Georgists also advocate for the return of surplus public ' "revenue to the people by means of a basic income or citizen's dividend. The " 'concept of gaining public revenues mainly from land and natural resource ' 'privileges was widely popularized by Henry George through his first book, ' 'Progress and Poverty (1879).'] Henry George https://www.conservapedia.com/Henry_George ['He argued that land, unlike other factors of production, is supplied by ' 'nature and that rent is unearned surplus. The landless deserve their share ' 'of this surplus as a birthright, according to George. Henry George was born ' 'in Philadelphia, Pennsylvania, on the 2nd of September 1839. He settled in ' 'California in 1858; then later removed to New York in 1880; was first a ' 'printer, then an editor, but finally devoted all his life to economic and ' 'social questions. In 1860, George met Annie Corsina Fox. Her family was very ' 'opposed to the relationship, and in 1861 they eloped. In 1871 he published ' 'Our Land Policy, which, as further developed in 1879 under the title of ' 'Progress and Poverty, speedily attracted the widest attention both in ' 'America and in Europe.'] Georgism - Wikipedia https://en.wikipedia.org/wiki/Georgism ['A key issue to the popular adoption of Georgism is that homes are illiquid ' 'yet governments need cash every year. Some economists have proposed other ' 'ways of extracting value from land such as building government housing and ' 'selling homes to new buyers in areas of fast-rising land value. The ' 'government would theoretically collect revenue from home sales without much ' 'cost to current homeowners while slowing down land value appreciation in ' 'high-demand areas. Henry George, whose writings and advocacy form the basis ' 'for Georgism Georgist ideas heavily influenced the politics of the early ' '20th century. Political parties that were formed based on Georgist ideas ' 'include the Commonwealth Land Party in the United States, the Henry George ' 'Justice Party in Victoria, the Single Tax League in South Australia, and the ' "Justice Party in Denmark. In the United Kingdom, George's writings were " 'praised by emerging socialist groups in 1890s such as the Independent Labour ' 'Party and the Fabian Society, which would each go on to help form the ' 'modern-day Labour Party.'] Georgism https://rationalwiki.org/wiki/Georgism ['Even with mostly primitive methods, land values are already assessed around ' 'the world wherever property/council taxes exist, and some municipalities ' 'even collect all their revenue from land values. Though these are ' 'market-based measures, they can still prove difficult and require upfront ' 'investment. Georgists believe that the potential value of land is greater ' 'than the current sum of government spending, since the abolition of taxes on ' 'labor and investment would further increase the value of land. Conversely, ' 'the libertarian strain in Georgism is evident in the notion that their land ' 'tax utopia also entails reducing or eliminating the need for many of the ' 'things governments currently supply, such as welfare, infrastructure to ' 'support urban sprawl, and military & foreign aid spending to secure ' "resources abroad. Therefore, many Georgists propose a citizen's dividend. " 'This is a similar concept to basic income but its proponents project its ' 'potential to be much larger due to supposedly huge takings from the land ' 'tax, combined with lowered government spending. It has been recognized since ' 'Adam Smith and David Ricardo that a tax on land value itself cannot be ' 'passed on to tenants, but instead would be paid for by the owners of the ' 'land:'] ``` Using Exa, we can easily find related papers, either for further research or to provide a source for our claims. This is just a brief intro into what Exa can do. For a look at how you can leverage getting full contents, check out [this article](/search-api/get-contents-of-documents-many-different-types). # Recruiting Agent Source: https://docs.exa.ai/examples/exa-recruiting-agent *** ## What this doc covers 1. Using Exa search with includeDomain to only retrieve search results from a specified domain 2. Using Exa keyword search to find specific people by name 3. Using excludeDomain to ignore certain low-signal domains 4. Using Exa link similarity search to find similar websites *** ## Introduction In this tutorial, we use Exa to **automate** the process of **discovering**, **researching**, and **evaluating** exceptional candidates. If you just want to see the code, check out the [Colab notebook](https://colab.research.google.com/drive/1a-7niLbCtIEjZnPz-qXPS3XwckPgIMrV?usp=sharing). Here's what we're going to do: 1. Candidate research: Identify potential candidates and use Exa to find additional details, such as personal websites, LinkedIn profiles, and their research topics. 2. Candidate evaluation: Evaluate candidates using an LLM to score their fit to our hiring criteria. 3. Finding more candidates: Discover more candidates similar to our top picks. This project requires an [Exa API key](https://dashboard.exa.ai/api-keys) and an [OpenAI API key](https://platform.openai.com/api-keys). Get 1000 Exa searches per month free just for [signing up](https://dashboard.exa.ai/overview)! ```Python Python # install dependencies !pip install exa_py openai matplotlib tqdm import pandas as pd from exa_py import Exa import openai EXA_API_KEY = '' OPENAI_API_KEY = '' exa = Exa(api_key = EXA_API_KEY) openai.api_key = OPENAI_API_KEY ``` ## Initial Candidates Suppose I'm building Simile, an AI startup for web retrieval. My hiring criteria is: * AI experience * interest in retrieval, databases, and knowledge * available to work now or soon We start with 13 example PhD students recommended by friends. All I have is their name and email. ```Python Python # Usually you would upload a csv of students # df = pd.read_csv('./students.csv') # TODO: add your own candidates sample_data = { "Name": [ "Kristy Choi", "Jiaming Song", "Brice Huang", "Andi Peng", "Athiya Deviyani", "Hao Zhu", "Zana Bucinca", "Usha Bhalla", "Kia Rahmani", "Jingyan Wang", "Jun-Kun Wang", "Sanmi Koyejo", "Erik Jenner" ], "Email": [ "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)", "[[email protected]](/cdn-cgi/l/email-protection)" ] } # Creating the DataFrame students_df = pd.DataFrame(sample_data) students_df ``` ## Information Enrichment Now, let's add more information about the candidates: current school, LinkedIn, and personal website. First, we'll define a helper function to call OpenAI -- we'll use this for many of our later functions. ```Python Python def get_openai_response(input_text): # if contents is empty if not input_text: return "" completion = openai.chat.completions.create( model="gpt-3.5-turbo-0125", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": input_text}, ], temperature=0 ) return completion.choices[0].message.content ``` We'll ask GPT to extract the candidate's school from their email address. ```Python Python def extract_school_from_email(email): content = f"I'm going to give you a student's email. I want you to figure out what school they go to. For example, if the email is [[email protected]](/cdn-cgi/l/email-protection) you should return 'CMU' and nothing else. Only return the name of the school. Here is their email: {email}" return get_openai_response(content) # Example extract_school_from_email('[[email protected]](/cdn-cgi/l/email-protection)') ``` Now that we have their school, let's use Exa to find their LinkedIn and personal website too. Here, we're passing in `type="keyword"` to do an Exa keyword search because we want our results to have the exact name in the result. We also specify `include_domains=['linkedin.com']` to restrict the results to LinkedIn profiles. ```Python Python def get_linkedin_from_name(name, school = ''): query = f"{name} {school}" keyword_search = exa.search(query, num_results=1, type="keyword", include_domains=['linkedin.com']) if keyword_search.results: result = keyword_search.results[0] return result.url print(f"No LinkedIn found for: {name}") return None print("LinkedIn:", get_linkedin_from_name('Sarah Chieng', 'MIT')) ``` To now find the candidate's personal website, we can use the same Exa query, but we want to also scrape the website's contents. To do this, we use `search_and_contents`. We can also exclude some misleading websites with `exclude_domains=['linkedin.com', 'github.com', 'twitter.com']`. Whatever's left has a good chance of being their personal site! ```Python Python #given a name, returns their personal website if we can find it def exa_search_personal_website(name, school = ''): query = f"{name} {school}" keyword_search = exa.search_and_contents(query, type="keyword", text={"include_html_tags": False}, num_results=1, exclude_domains=['linkedin.com', 'github.com', 'twitter.com']) if keyword_search.results: result = keyword_search.results[0] return result.url, result.text print(f"No personal website found for: {name}") return (None, None) #example personal_website_url, personal_website_text = exa_search_personal_website('Aryaman Arora', 'Stanford') personal_website_url ``` Now that I have personal websites of each candidate, we can use Exa and GPT-4 to answer questions like: * what are they doing now? Or what class year are they? * where did they do their undergrad? * what topics do they research? * are they an AI researcher? Once we have all of the page's contents, let's start asking some questions: ```Python Python def extract_undergrad_from_contents(contents): contents = f"""I'm going to give you some information I found online about a person. Based on the provided information, determine where they went to college for undergrad. Some examples are \"MIT\" or \"Harvard.\" You should answer only in the example format, or return \"not sure\" if you're not sure. Do not return any other text. Here is the information I have scraped: {contents}.""" return get_openai_response(contents) def extract_current_role_from_contents(contents): contents = f"""I'm going to give you some information I found online about a person. Based on the provided information, determine where they are currently working or if they are still a student, what their current year of study is. Some examples are \"OpenAI\" or \"first year PHD.\" You should answer only in the example format, or return \"not sure\" if you're not sure. Do not return any other text. Here is the information I have scraped: {contents}.""" return get_openai_response(contents) def extract_research_topics_from_contents(contents): contents = f"""I'm going to give you some information I found online about a person. Based on the provided information, determine what fields they research. Some examples are \"RAG, retrieval, and databases\" or \"Diffusion models.\" You should answer only in the example format, or return \"not sure\" if you're not sure. Do not return any other text. Here is the information I have scraped: {contents}.""" return get_openai_response(contents) def extract_is_ai_from_contents(contents): contents = f"""I'm going to give you some information I found online about a person. Based on the provided information, determine whether they are a AI researcher. You should only return \"yes\" or \"no\", or return \"not sure\" if you're not sure. Do not return any other text. Here is the information I have scraped: {contents}.""" return get_openai_response(contents) #Example personal_website_url, personal_website_text = exa_search_personal_website('Aryaman Arora', 'Stanford') # Note: this is a random person I found online using an Exa search undergrad = extract_undergrad_from_contents(personal_website_text) current = extract_current_role_from_contents(personal_website_text) topics = extract_research_topics_from_contents(personal_website_text) ai = extract_is_ai_from_contents(personal_website_text) # Printing the information using f-string formatting print(f"Personal Site: {personal_website_url}") print(f"Undergrad: {undergrad}") print(f"Current: {current}") print(f"Topics: {topics}") print(f"AI: {ai}") ``` ## Candidate Evaluation Next, we use GPT-4 to score candidates 1-10 based on fit. This way, we can use Exa to find more folks similar to our top-rated candidates. ```Python Python # TODO: change these to fit your own criteria def calculate_score(info, undergrad, year, researchTopics, AI): contents = f"""I'm going to provide some information about an individual, and I want you to rate on a scale of 1 to 10 how good of a hiring candidate they are. I am hiring for AI researchers. A 10 is someone who went to an incredible college, is graduating soon (final year PhD ideally) or is already graduated, is definitely an AI researcher, has a lot of experience and seems really smart, and a nice bonus is if their research is related to retrieval, search, databases. Only return an integer from 0 to 10. Do not return anything else. This candidate did undergrad at {undergrad} and their current role is {year}. Are they an AI researcher? {AI}. They do research in {researchTopics}. Here are some other things I know about them: {info}""" try: return int(get_openai_response(contents)) except: return None ``` Finally, let's enrich our dataframe of people. We define a function `enrich_row` that uses all the functions we defined to learn more about a candidate,and sort by score to get the most promising candidates. ```Python Python # Set up progress bar from tqdm.auto import tqdm tqdm.pandas() def enrich_row(row): row['School'] = extract_school_from_email(row['Email']) linkedIn_info = get_linkedin_from_name(row['Name'], row['School']) if linkedIn_info: row['LinkedIn'] = linkedIn_info website_url, website_info = exa_search_personal_website(row['Name'], row['School']) row['ExaWebsite'] = website_url row['ContentInfo'] = website_info row['Undergrad'] = extract_undergrad_from_contents(row['ContentInfo']) row['Role'] = extract_current_role_from_contents(row['ContentInfo']) row['ResearchTopics'] = extract_research_topics_from_contents(row['ContentInfo']) row['AI'] = extract_is_ai_from_contents(row['ContentInfo']) row['Score'] = calculate_score(row['ContentInfo'], row['Undergrad'], row['Role'], row['ResearchTopics'], row['AI']) return row enriched_df = students_df.progress_apply(enrich_row, axis=1) sorted_df = enriched_df.sort_values(by='Score', ascending=False).reset_index(drop=True) sorted_df ``` ## Finding more candidates Now that we know how to research candidates, let's find some more! We'll take each of the top candidates (score 7-10), and use Exa to find similar profiles. Exa's `find_similar`,allows us to search a URL and find semantically similar URLs. For example, I could search 'hinge.co' and it'll return the homepages of similar dating apps. In this case, we'll pass in the homepages of our top candidates to find similar profiles. ```Python Python # given a homepage, get homepages of similar candidates def get_more_candidates(homepageURL): new_homepages = [] if not homepageURL: return None similarity_search = exa.find_similar_and_contents(homepageURL, num_results=3, text={"include_html_tags": False}, exclude_domains=['linkedin.com', 'github.com', 'twitter.com']) #return a list of emails for res in similarity_search.results: new_homepages.append((res.url, res.text)) return new_homepages # we can already get things like role and education, but we need to get the name and email this time def get_name_from_contents(contents): content = f"""I'm going to give you some information I found online about a person. Based on the provided information, figure out their full name. Some examples are \"Sarah Chieng\" or \"Will Bryk.\" You should answer only in the example format, or return \"not sure\" if you're not sure. Do not return any other text. Here is the information I have scraped: {contents}.""" return get_openai_response(content) def get_email_from_contents(contents): content = f"""I'm going to give you some information I found online about a person. Based on the provided information, figure out their email. Some examples are \"[[email protected]](/cdn-cgi/l/email-protection)\" or \"[[email protected]](/cdn-cgi/l/email-protection).\" You should answer only in the example format, or return \"not sure\" if you're not sure. Do not return any other text. Here is the information I have scraped: {contents}.""" return get_openai_response(content) # Example example_homepage = ('https://winniexu.ca/') additional_homepages = get_more_candidates(example_homepage) new_candidate_url, new_candidate_content = additional_homepages[0] name = get_name_from_contents(new_candidate_content) email = get_email_from_contents(new_candidate_content) print(f"Additional Homepages:{additional_homepages}") print(f"Name:{name}") print(f"Email: {email}") ``` Final stretch -- let's put it all together. Let's find and add our new candidates to our original dataframe. ```Python Python def new_candidates_df(df): # get the websites of our top candidates top_candidates_df = df[df['Score'] > 7] websites_list = top_candidates_df['ExaWebsite'].tolist() # use those top candidates to find new candidates new_candidates = set() for url in websites_list: new_candidates.update(get_more_candidates(url)) #for each new candidate, get their information and add them to the dataframe names = [] emails = [] urls = [] for url, content in tqdm(new_candidates): names.append(get_name_from_contents(content)) emails.append(get_email_from_contents(content)) urls.append(url) new_df = pd.DataFrame({ 'Name': names, 'Email': emails, 'ExaWebsite': urls, }) return new_df new_df = new_candidates_df(sorted_df) new_df ``` Alrighty, that's it! We've just built an automated way of finding, researching, and evaluating candidates. You can use this for recruiting, or tailor this to find customers, companies, etc. And the best part is that every time you use Exa to find new candidates, you can do more `find_similar(new_candidate_homepage)` searches with the new candidates as well -- helping you build an infinite list! Hope this tutorial was helpful and don't forget, you can get started with [Exa for free](https://dashboard.exa.ai/overview) :) # Exa Researcher - JavaScript Source: https://docs.exa.ai/examples/exa-researcher Example project using the Exa JS SDK. *** ## What this doc covers 1. Using Exa's Auto search to pick the best search setting for each query (keyword or neural) 2. Using searchAndContents() through Exa's JavaScript SDK *** In this example, we will build Exa Researcher, a JavaScript app that, given a research topic, automatically searches for relevant sources with Exa's [**Auto search**](/v2.0/reference/magic-search-as-default) and synthesizes the information into a reliable research report. Fastest setup: Interact with the code in your browser with this Replit [template](https://replit.com/@olafblitz/exa-researcher?v=1). Alternatively, this [interactive notebook](https://github.com/exa-labs/exa-js/tree/master/examples/researcher/researcher.ipynb) was made with the Deno Javascript kernel for Jupyter so you can easily run it locally. Check out the [plain JS version](https://github.com/exa-labs/exa-js/tree/master/examples/researcher/researcher.mjs) if you prefer a regular Javascript file you can run with NodeJS, or want to skip to the final result. If you'd like to run this notebook locally, [Installing Deno](https://docs.deno.com/runtime/manual/getting%5Fstarted/installation) and [connecting Deno to Jupyter](https://docs.deno.com/runtime/manual/tools/jupyter) is fast and easy. To play with this code, first we need a [Exa API key](https://dashboard.exa.ai/api-keys) and an [OpenAI API key](https://platform.openai.com/api-keys). ## Setup Let's import the Exa and OpenAI SDKs and put in our API keys to create a client object for each. Make sure to pick the right imports for your runtime and paste or load your API keys. ```TypeScript TypeScript // Deno imports import Exa from 'npm:exa-js'; import OpenAI from 'npm:openai'; // NodeJS imports //import Exa from 'exa-js'; //import OpenAI from 'openai'; // Replit imports //const Exa = require("exa-js").default; //const OpenAI = require("openai"); const EXA_API_KEY = "" // insert or load your API key here const OPENAI_API_KEY = ""// insert or load your API key here const exa = new Exa(EXA_API_KEY); const openai = new OpenAI({ apiKey: OPENAI_API_KEY }); ``` Since we'll be making several calls to the OpenAI API to get a completion from GPT-3.5 Turbo, let's make a simple utility function so we can pass in the system and user messages directly, and get the LLM's response back as a string. ```TypeScript TypeScript async function getLLMResponse({system = 'You are a helpful assistant.', user = '', temperature = 1, model = 'gpt-3.5-turbo'}){ const completion = await openai.chat.completions.create({ model, temperature, messages: [ {'role': 'system', 'content': system}, {'role': 'user', 'content': user}, ] }); return completion.choices[0].message.content; } ``` Okay, great! Now let's starting building Exa Researcher. ## Exa Auto search The researcher should be able to automatically generate research reports for all kinds of different topics. Here's two to start: ```TypeScript TypeScript const SAMA_TOPIC = 'Sam Altman'; const ART_TOPIC = 'renaissance art'; ``` The first thing our researcher has to do is decide what kind of search to do for the given topic. Exa offers two kinds of search: **neural** and **keyword** search. Here's how we decide: * Neural search is preferred when the query is broad and complex because it lets us retrieve high quality, semantically relevant data. Neural search is especially suitable when a topic is well-known and popularly discussed on the Internet, allowing the machine learning model to retrieve contents which are more likely recommended by real humans. * Keyword search is useful when the topic is specific, local or obscure. If the query is a specific person's name, and identifier, or acronym, such that relevant results will contain the query itself, keyword search may do well. And if the machine learning model doesn't know about the topic, but relevant documents can be found by directly matching the search query, keyword search may be necessary. Conveniently, Exa's autosearch feature (on by default) will automatically decide whether to use `keyword` or `neural` search for each query. For example, if a query is a specific person's name, Exa would decide to use keyword search. Now, we'll create a helper function to generate search queries for our topic. ```TypeScript TypeScript async function generateSearchQueries(topic, n){ const userPrompt = `I'm writing a research report on ${topic} and need help coming up with diverse search queries. Please generate a list of ${n} search queries that would be useful for writing a research report on ${topic}. These queries can be in various formats, from simple keywords to more complex phrases. Do not add any formatting or numbering to the queries.`; const completion = await getLLMResponse({ system: 'The user will ask you to help generate some search queries. Respond with only the suggested queries in plain text with no extra formatting, each on its own line.', user: userPrompt, temperature: 1 }); return completion.split('\n').filter(s => s.trim().length > 0).slice(0, n); } ``` Next, let's write another function that actually calls the Exa API to perform searches using Auto search. ```TypeScript TypeScript async function getSearchResults(queries, linksPerQuery=2){ let results = []; for (const query of queries){ const searchResponse = await exa.searchAndContents(query, { numResults: linksPerQuery }); results.push(...searchResponse.results); } return results; } ``` ## Writing a report with GPT-4 The final step is to instruct the LLM to synthesize the content into a research report, including citations of the original links. We can do that by pairing the content and the URLs and writing them into the prompt. ```TypeScript TypeScript async function synthesizeReport(topic, searchContents, contentSlice = 750){ const inputData = searchContents.map(item => `--START ITEM--\nURL: ${item.url}\nCONTENT: ${item.text.slice(0, contentSlice)}\n--END ITEM--\n`).join(''); return await getLLMResponse({ system: 'You are a helpful research assistant. Write a report according to the user\'s instructions.', user: 'Input Data:\n' + inputData + `Write a two paragraph research report about ${topic} based on the provided information. Include as many sources as possible. Provide citations in the text using footnote notation ([#]). First provide the report, followed by a single "References" section that lists all the URLs used, in the format [#] .`, //model: 'gpt-4' //want a better report? use gpt-4 (but it costs more) }); } ``` ## All Together Now Now, let's just wrap everything into one Researcher function that strings together all the functions we've written. Given a user's research topic, the Researcher will generate search queries, feed those queries to Exa Auto search, and finally use an LLM to synthesize the retrieved information. Three simple steps! ```TypeScript TypeScript async function researcher(topic){ console.log(`Starting research on topic: "${topic}"`); const searchQueries = await generateSearchQueries(topic, 3); console.log("Generated search queries:", searchQueries); const searchResults = await getSearchResults(searchQueries); console.log(`Found ${searchResults.length} search results. Here's the first one:`, searchResults[0]); console.log("Synthesizing report..."); const report = await synthesizeReport(topic, searchResults); return report; } ``` In just a couple lines of code, we've used Exa to go from a research topic to a valuable essay with up-to-date sources. ```TypeScript TypeScript async function runExamples() { console.log("Researching Sam Altman:"); const samaReport = await researcher(SAMA_TOPIC); console.log(samaReport); console.log("\n\nResearching Renaissance Art:"); const artReport = await researcher(ART_TOPIC); console.log(artReport); } // To use the researcher on the examples, simply call the runExamples() function: runExamples(); // Or, to research a specific topic: researcher("llama antibodies").then(console.log); ``` For a link to a complete, cleaned up version of this project that you can execute in your NodeJS environment, check out the [alternative JS-only version](https://github.com/exa-labs/exa-js/tree/master/examples/researcher/researcher.mjs). # Exa Researcher - Python Source: https://docs.exa.ai/examples/exa-researcher-python *** ## What this doc covers 1. Using Exa's Auto search to pick the best search setting for each query (keyword or neural) 2. Using search\_and\_contents() through Exa's Python SDK *** In this example, we will build Exa Researcher, a Python app that, given a research topic, automatically searches for relevant sources with Exa's [auto search](../reference/exa-s-capabilities-explained) and synthesizes the information into a reliable research report. To run this code, first we need a [Exa API key](https://dashboard.exa.ai/api-keys) and an [OpenAI API key](https://platform.openai.com/api-keys). If you would like to se the full code for this tutorial as a Colab notebook, [click here](https://colab.research.google.com/drive/1Aj6bBptSHWxZO7GVG2RoWtQSEkpabuaF?usp=sharing) ## Setup Let's import the Exa and OpenAI SDKs and set up our API keys to create client objects for each. We'll use environment variables to securely store our API keys. ```Python Python import os import exa_py from openai import OpenAI EXA_API_KEY = os.environ.get('EXA_API_KEY') OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY') exa = exa_py.Exa(EXA_API_KEY) openai_client = OpenAI(api_key=OPENAI_API_KEY) ``` Since we'll be making several calls to the OpenAI API to get a completion from GPT-3.5 Turbo, let's make a simple utility function so we can pass in the system and user messages directly, and get the LLM's response back as a string. ```Python Python def get_llm_response(system='You are a helpful assistant.', user='', temperature=1, model='gpt-3.5-turbo'): completion = openai_client.chat.completions.create( model=model, temperature=temperature, messages=[ {'role': 'system', 'content': system}, {'role': 'user', 'content': user}, ] ) return completion.choices[0].message.content ``` Okay, great! Now let's start building Exa Researcher. ## Exa Auto search The researcher should be able to automatically generate research reports for all kinds of different topics. Here's two to start: ```Python Python SAMA_TOPIC = 'Sam Altman' ART_TOPIC = 'renaissance art' ``` The first thing our researcher has to do is decide what kind of search to do for the given topic. Exa offers two kinds of search: **neural** and **keyword** search. Here's how we decide: * Neural search is preferred when the query is broad and complex because it lets us retrieve high quality, semantically relevant data. Neural search is especially suitable when a topic is well-known and popularly discussed on the Internet, allowing the machine learning model to retrieve contents which are more likely recommended by real humans. * Keyword search is useful when the topic is specific, local or obscure. If the query is a specific person's name, and identifier, or acronym, such that relevant results will contain the query itself, keyword search may do well. And if the machine learning model doesn't know about the topic, but relevant documents can be found by directly matching the search query, keyword search may be necessary. Conveniently, Exa's [auto search](../reference/exa-s-capabilities-explained) feature (on by default) will automatically decide whether to use `keyword` or `neural` search for each query. For example, if a query is a specific person's name, Exa would decide to use keyword search. Now, we'll create a helper function to generate search queries for our topic. ```Python Python def generate_search_queries(topic, n): user_prompt = f"""I'm writing a research report on {topic} and need help coming up with diverse search queries. Please generate a list of {n} search queries that would be useful for writing a research report on {topic}. These queries can be in various formats, from simple keywords to more complex phrases. Do not add any formatting or numbering to the queries.""" completion = get_llm_response( system='The user will ask you to help generate some search queries. Respond with only the suggested queries in plain text with no extra formatting, each on its own line.', user=user_prompt, temperature=1 ) return [s.strip() for s in completion.split('\n') if s.strip()][:n] ``` Next, let's write another function that actually calls the Exa API to perform searches using Auto search. ```Python Python def get_search_results(queries, links_per_query=2): results = [] for query in queries: search_response = exa.search_and_contents(query, num_results=links_per_query ) results.extend(search_response.results) return results ``` ## Writing a report with GPT-3.5 Turbo The final step is to instruct the LLM to synthesize the content into a research report, including citations of the original links. We can do that by pairing the content and the URLs and writing them into the prompt. ```Python Python def synthesize_report(topic, search_contents, content_slice=750): input_data = '\n'.join([f"--START ITEM--\nURL: {item.url}\nCONTENT: {item.text[:content_slice]}\n--END ITEM--\n" for item in search_contents]) return get_llm_response( system='You are a helpful research assistant. Write a report according to the user\'s instructions.', user=f'Input Data:\n{input_data}Write a two paragraph research report about {topic} based on the provided information. Include as many sources as possible. Provide citations in the text using footnote notation ([#]). First provide the report, followed by a single "References" section that lists all the URLs used, in the format [#] .', # model='gpt-4' # want a better report? use gpt-4 (but it costs more) ) ``` ## All Together Now Now, let's just wrap everything into one Researcher function that strings together all the functions we've written. Given a user's research topic, the Researcher will generate search queries, feed those queries to Exa Auto search, and finally use an LLM to synthesize the retrieved information. Three simple steps! ```Python Python def researcher(topic): print(f'Starting research on topic: "{topic}"') search_queries = generate_search_queries(topic, 3) print("Generated search queries:", search_queries) search_results = get_search_results(search_queries) print(f"Found {len(search_results)} search results. Here's the first one:", search_results[0]) print("Synthesizing report...") report = synthesize_report(topic, search_results) return report ``` In just a couple lines of code, we've used Exa to go from a research topic to a valuable essay with up-to-date sources. ```Python Python def run_examples(): print("Researching Sam Altman:") sama_report = researcher(SAMA_TOPIC) print(sama_report) print("\n\nResearching Renaissance Art:") art_report = researcher(ART_TOPIC) print(art_report) # To use the researcher on the examples, simply call the run_examples() function: if __name__ == "__main__": run_examples() # Or, to research a specific topic: # print(researcher("llama antibodies")) ``` This Python implementation of Exa Researcher demonstrates how to leverage Exa's Auto search feature and the OpenAI API to create an automated research tool. By combining Exa's powerful search capabilities with GPT-3.5 Turbo's language understanding and generation, we've created a system that can quickly gather and synthesize information on any given topic. # Structured Outputs with Instructor Source: https://docs.exa.ai/examples/getting-started-with-exa-in-instructor Using Exa with instructor to generate structured outputs from web content. ## What this doc covers * Setting up Exa to use [Instructor](https://python.useinstructor.com/) for structured output generation * Practical examples of using Exa and Instructor together ## Guide ## 1. Pre-requisites and installation Install the required libraries: ```python Python pip install exa_py instructor openai ``` Ensure API keys are initialized properly. The environment variable names are `EXA_API_KEY` and `OPENAI_API_KEY`. ## 2. Why use Instructor? Instructor is a Python library that allows you to generate structured outputs from a language model. We could instruct the LLM to return a structured output, but the output will still be a string, which we need to convert to a dictionary. What if the dictionary is not structured as we want? What if the LLM forgot to add the last "}" in the JSON? We would have to handle all of these errors manually. We could use `{ "type": "json_object" }` [](https://platform.openai.com/docs/guides/structured-outputs/json-mode) which will make the LLM return a JSON object. But for this, we would need to provide a JSON schema, which can get [large and complex](https://python.useinstructor.com/why/#pydantic-over-raw-schema). Instead of doing this, we can use Instructor. Instructor is powered by [pydantic](https://docs.pydantic.dev/latest/), which means that it integrates with your IDE. We use pydantic's `BaseModel` to define the output model: ## 3. Setup and Basic Usage Let's set up Exa and Instructor: ```python Python import os import instructor from exa_py import Exa from openai import OpenAI from pydantic import BaseModel exa = Exa(os.environ["EXA_API_KEY"]) client = instructor.from_openai(OpenAI()) search_results = exa.search_and_contents( "Latest advancements in quantum computing", type="neural", text=True, ) # Limit search_results to a maximum of 20,000 characters search_results = search_results.results[:20000] class QuantumComputingAdvancement(BaseModel): technology: str description: str potential_impact: str def __str__(self): return ( f"Technology: {self.technology}\n" f"Description: {self.description}\n" f"Potential Impact: {self.potential_impact}" ) structured_output = client.chat.completions.create( model="gpt-3.5-turbo", response_model=QuantumComputingAdvancement, messages=[ { "role": "user", "content": f"Based on the provided context, describe a recent advancement in quantum computing.\n\n{search_results}", } ], ) print(structured_output) ``` Here we define a `QuantumComputingAdvancement` class that inherits from `BaseModel` from Pydantic. This class will be used by Instructor to validate the output from the LLM and for the LLM as a response model. We also implement the `__str__()` method for easy printing of the output. We then initialize `OpenAI()` and wrap instructor on top of it with `instructor.from_openai` to create a client that will return structured outputs. If the output is not structured as our class, Instructor makes the LLM retry until max\_retries is reached. You can read more about how Instructor retries [here](https://python.useinstructor.com/why/#retries). This example demonstrates how to use Exa to search for content about quantum computing advancements and structure the output using Instructor. ## 4. Advanced Example: Analyzing Multiple Research Papers Let's create a more complex example where we analyze multiple research papers on a specific topic and use pydantic's own validation model to correct the structured data to show you how we can be *even* more fine-grained: ```python Python import os from typing import List import instructor from exa_py import Exa from openai import OpenAI from pydantic import BaseModel, field_validator exa = Exa(os.environ["EXA_API_KEY"]) client = instructor.from_openai(OpenAI()) class ResearchPaper(BaseModel): title: str authors: List[str] key_findings: List[str] methodology: str @field_validator("title") @classmethod def validate_title(cls, v): if v.upper() != v: raise ValueError("Title must be in uppercase.") return v def __str__(self): return ( f"Title: {self.title}\n" f"Authors: {', '.join(self.authors)}\n" f"Key Findings: {', '.join(self.key_findings)}\n" f"Methodology: {self.methodology}" ) class ResearchAnalysis(BaseModel): papers: List[ResearchPaper] common_themes: List[str] future_directions: str def __str__(self): return ( f"Common Themes:\n- {', '.join(self.common_themes)}\n" f"Future Directions: {self.future_directions}\n" f"Analyzed Papers:\n" + "\n".join(str(paper) for paper in self.papers) ) # Search for recent AI ethics research papers search_results = exa.search_and_contents( "Recent AI ethics research papers", type="neural", text=True, num_results=5, # Limit to 5 papers for this example ) # Combine all search results into one string combined_results = "\n\n".join([result.text for result in search_results.results]) structured_output = client.chat.completions.create( model="gpt-3.5-turbo", response_model=ResearchAnalysis, max_retries=5, messages=[ { "role": "user", "content": f"Analyze the following AI ethics research papers and provide a structured summary:\n\n{combined_results}", } ], ) print(structured_output) ``` By using pydantic’s `field_validator`, we can create our own rules to validate each field to be exactly what we want, so that we can work with predictable data even though we are using an LLM. Additionally, implementing the `__str__()` method allows for more readable and convenient output formatting. Read more about different pydantic validators [here](https://docs.pydantic.dev/latest/concepts/validators/#field-validators). Because we don’t specify that the `Title` should be in uppercase in the prompt, this will result in *at least* two API calls. You should avoid using `field_validator`s as the *only* means to get the data in the right format; instead, you should include instructions in the prompt, such as specifying that the `Title` should be in uppercase/all-caps. This advanced example demonstrates how to use Exa and Instructor to analyze multiple research papers, extract structured information, and provide a comprehensive summary of the findings. ## 5. Streaming Structured Outputs Instructor also supports streaming structured outputs, which is useful for getting partial results as they're generated (this does not support validators due to the nature of streaming responses, you can read more about it [here](https://python.useinstructor.com/concepts/partial/)): To make the output easier to see, we will use the [rich](https://pypi.org/project/rich/) Python package. It should already be installed, but if it isn’t, just run `pip install rich`. ```python Python import os from typing import List import instructor from exa_py import Exa from openai import OpenAI from pydantic import BaseModel from rich.console import Console exa = Exa(os.environ["EXA_API_KEY"]) client = instructor.from_openai(OpenAI()) class AIEthicsInsight(BaseModel): topic: str description: str ethical_implications: List[str] def __str__(self): return ( f"Topic: {self.topic}\n" f"Description: {self.description}\n" f"Ethical Implications:\n- {', '.join(self.ethical_implications or [])}" ) # Search for recent AI ethics research papers search_results = exa.search_and_contents( "Recent AI ethics research papers", type="neural", text=True, num_results=5, # Limit to 5 papers for this example ) # Combine all search results into one string combined_results = "\n\n".join([result.text for result in search_results.results]) structured_output = client.chat.completions.create_partial( model="gpt-3.5-turbo", response_model=AIEthicsInsight, messages=[ { "role": "user", "content": f"Provide insights on AI ethics based on the following research:\n\n{combined_results}", } ], stream=True, ) console = Console() for output in structured_output: obj = output.model_dump() console.clear() print(output) if ( output.topic and output.description and output.ethical_implications is not None and len(output.ethical_implications) >= 4 ): break ``` ```Text stream output topic='AI Ethics in Mimetic Models' description='Exploring the ethical implications of AI that simulates the decisions and behavior of specific individuals, known as mimetic models, and the social impact of their availability in various domains such as game-playing, text generation, and artistic expression.' ethical_implications=['Deception Concerns: Mimetic models can potentially be used for deception, leading to misinformation and challenges in distinguishing between a real individual and a simulated model.', 'Normative Issues: Mimetic models raise normative concerns related to the interactions between the target individual, the model operator, and other entities that interact with the model, impacting transparency, authenticity, and ethical considerations in various scenarios.', 'Preparation and End-Use: Mimetic models can be used as preparation for real-life interactions or as an end in themselves, affecting interactions, personal relationships, labor dynamics, and audience engagement, leading to questions about consent, labor devaluation, and reputation consequences.', ''] Final Output: Topic: AI Ethics in Mimetic Models Description: Exploring the ethical implications of AI that simulates the decisions and behavior of specific individuals, known as mimetic models, and the social impact of their availability in various domains such as game-playing, text generation, and artistic expression. Ethical Implications: - Deception Concerns: Mimetic models can potentially be used for deception, leading to misinformation and challenges in distinguishing between a real individual and a simulated model. - Normative Issues: Mimetic models raise normative concerns related to the interactions between the target individual, the model operator, and other entities that interact with the model, impacting transparency, authenticity, and ethical considerations in various scenarios. - Preparation and End-Use: Mimetic models can be used as preparation for real-life interactions or as an end in themselves, affecting interactions, personal relationships, labor dynamics, and audience engagement, leading to questions about consent, labor devaluation, and reputation consequences. ``` This example shows how to stream partial results and break the loop when certain conditions are met. ## 6. Writing Results to CSV After generating structured outputs, you might want to save the results for further analysis or record-keeping. Here's how you can write the results to a CSV file: ```python Python import csv import os from typing import List import instructor from exa_py import Exa from openai import OpenAI from pydantic import BaseModel exa = Exa(os.environ["EXA_API_KEY"]) client = instructor.from_openai(OpenAI()) class AIEthicsInsight(BaseModel): topic: str description: str ethical_implications: List[str] # Search for recent AI ethics research papers search_results = exa.search_and_contents( "Recent AI ethics research papers", type="neural", text=True, num_results=5, # Limit to 5 papers for this example ) # Combine all search results into one string combined_results = "\n\n".join([result.text for result in search_results.results]) def write_to_csv(insights: List[AIEthicsInsight], filename: str = "ai_ethics_insights.csv"): with open(filename, mode='w', newline='', encoding='utf-8') as file: writer = csv.writer(file) writer.writerow(['Topic', 'Description', 'Ethical Implications']) for insight in insights: writer.writerow([ insight.topic, insight.description, '; '.join(insight.ethical_implications) ]) print(f"Results written to {filename}") # Generate multiple insights num_insights = 5 insights = [] for _ in range(num_insights): insight = client.chat.completions.create( model="gpt-3.5-turbo", response_model=AIEthicsInsight, messages=[ { "role": "user", "content": f"Provide an insight on AI ethics based on the following research:\n\n{combined_results}", } ], ) insights.append(insight) # Write insights to CSV write_to_csv(insights) ``` After running the code, you'll have a CSV file named "ai\_ethics\_insights.csv". Here's an example of what the contents might look like: ```csv Topic,Description,Ethical Implications Algorithmic Bias,"This research challenges the assumption that algorithms can replace human decision-making and remain unbiased. It identifies three forms of outrage-intellectual, moral, and political-when reacting to algorithmic bias and suggests practical approaches like clarifying language around bias, developing new auditing methods, and building certain capabilities in AI systems.",Potential perpetuation of existing biases if not addressed; Necessity for transparency in AI system development; Impact on fairness and justice in societal decision-making processes; Importance of inclusive stakeholder engagement in AI design and implementation Algorithmic Bias and Ethical Interview,"Artificial intelligence and machine learning are used to offload decision making from humans, with a misconception that machines can be unbiased. This paper critiques this assumption and discusses forms of outrage towards algorithmic biases, identifying three types: intellectual, moral, and political outrage. It suggests practical approaches such as clarifying language around bias, auditing methods, and building specific capabilities to address biases. The overall discussion urges for greater insight into conversations around algorithmic bias and its implications.","Algorithms can perpetuate and even amplify existing biases in data.; There can be a misleading assumption that machines are inherently fair and unbiased.; Algorithmic biases can trigger intellectual, moral, and political outrage, affecting societal trust in AI systems." Algorithmic Bias and Human Decision Making,"This research delves into the misconceptions surrounding the belief that algorithms can replace human decision-making because they are inherently fair and unbiased. The study highlights the flaws in this rationale by showing that algorithms are not free from bias. It explores three types of outrage—intellectual, moral, and political—that arise when people confront algorithmic bias. The paper recommends addressing algorithmic bias through clearer language, better auditing methods, and enhanced system capabilities.","Algorithms can perpetuate and exacerbate existing biases rather than eliminate them.; The misconception that algorithms are unbiased may lead to a false sense of security in their use.; There is a need for the AI community to adopt clearer language and terms when discussing bias to prevent misunderstanding and misuse.; Enhancing auditing methods and system capabilities can help identify and address biases.; Decisions made through biased algorithms can have unjust outcomes, affecting public trust and leading to social and ethical implications." Algorithmic Bias in AI,"Artificial intelligence and machine learning are increasingly used to offload decision making from people. In the past, one of the rationales for this replacement was that machines, unlike people, can be fair and unbiased. Evidence suggests otherwise, indicating that algorithms can be biased. The study investigates how bias is perceived in algorithmic decision-making, proposing clarity in the language around bias and suggesting new auditing methods for intelligent systems to address this concern.",Algorithms may inherit or exacerbate existing biases.; Misleading assumptions about AI's objectivity can lead to unfair outcomes.; Need for transparent language and robust auditing to mitigate bias. Algorithmic Bias in AI Systems,"This research explores the misconception that algorithms can replace humans in decision-making without bias. It sheds light on the absurdity of assuming that algorithms are inherently unbiased and discusses emotional responses to algorithmic bias. The study suggests clarity in language about bias, new auditing methods, and capacity-building in AI systems to address bias concerns.",Misleading perception of unbiased AI leading to potential unfairness in decision-making.; Emotional and ethical concerns due to algorithmic bias perceived unfairness.; Need for consistent auditing methods to ensure fairness in AI systems. ``` Instructor has enabled the creation of structured data that can as such be stored in tabular format, e.g.in a CRM or similar. By combining Exa’s powerful search capabilities with Instructor’s predictable output generation, you can extract and analyze information from web content efficiently and accurately. # Build a Retrieval Agent with LangGraph Source: https://docs.exa.ai/examples/getting-started-with-rag-in-langgraph *** ## What this doc covers * Brief intro to LangGraph * How to set up an agent in LangGraph with Exa search as a tool *** ## Guide This guide will show you how you can define and use Exa search within the LangGraph framework. This framework provides a straightforward way for you to define an AI agent and for it to retrieve high-quality, semantically matched content via Exa search. ## Brief Intro to LangGraph Before we dive into our implementation, a quick primer on the LangGraph framework. LangGraph is a powerful tool for building complex LLM-based agents. It allows for cyclical workflows, gives you granular control, and offers built-in persistence. This means you can create reliable agents with intricate logic, pause and resume execution, and even incorporate human oversight. Read more about [LangGraph here](https://langchain-ai.github.io/langgraph/) ## Our Research Assistant Workflow For our AI-powered research assistant, we're leveraging LangGraph's capabilities to create a workflow that combines an AI model (Claude) with a web search retrieval tool powered by Exa's API, to fetch, find and analyze any documents (in this case research on climate tech). Here's a visual representation of our workflow: ![Alt text](https://files.readme.io/a2674bdce9b576860cd8eeec735ebd8959e8a8b41d4e5fab829dbbdcae37d6b0-Screenshot_2024-08-22_at_11.50.08.png) This diagram illustrates how our workflow takes advantage of LangGraph's cycle support, allowing the agent to repeatedly use tools and make decisions until it has gathered sufficient information to provide a final response. ## Let's break down what's happening in this simple workflow: 1. We start at the Entry Point with a user query (e.g., "Latest research papers on climate technology"). 2. The Agent (our AI model) receives the query and decides what to do next. 3. If the Agent needs more information, it uses the Web Search Retriever Tool to search for relevant documents. 4. The Web Search Retriever Tool fetches information using Exa's semantic search capabilities. 5. The Agent receives the fetched information and analyzes it. 6. This process repeats until the Agent has enough information to provide a final response. In the following sections, we'll explore the code implementation in detail, showing how we leverage LangGraph's features to create this advanced research assistant. ## 1. Prerequisites and Installation Before starting, ensure you have the required packages installed: ```shell pip install langchain-anthropic langchain-exa langgraph ``` Make sure to set up your API keys. For LangChain libraries, the environment variables should be named `ANTHROPIC_API_KEY` and `EXA_API_KEY` for Anthropic and Exa keys respectively. ```bash export ANTHROPIC_API_KEY= export EXA_API_KEY= ``` ## 2. Set Up Exa Search as a LangChain Tool After setting env variables, we can start configuring a search tool using `ExaSearchRetriever`. This tool ([read more here](https://api.python.langchain.com/en/latest/retrievers/langchain_exa.retrievers.ExaSearchRetriever.html)) will help retrieve relevant documents based on a query. First we need to import the required libraries: ```python from typing import List from langchain_exa import ExaSearchRetriever from langchain_core.prompts import PromptTemplate from langchain_core.runnables import RunnableLambda from langchain_core.tools import tool ``` After we have imported the necessary libraries, we need to define and register a tool so that the agent know what tools it can use. We use LangGraph `tool` decorator which you can read more about [here](https://python.langchain.com/v0.1/docs/modules/tools/custom_tools/#tool-decorator). The decorator uses the function name as the tool name. The docstring provides the agent with a tool description. The `retriever` is where we initialize the Exa search retriever and configure it with parameters such as `highlights=True`. You can read more about all the available parameters [here](https://docs.exa.ai/reference/python-sdk-specification#input-parameters-1). ```python @tool def retrieve_web_content(query: str) -> List[str]: """Function to retrieve usable documents for AI assistant""" # Initialize the Exa Search retriever retriever = ExaSearchRetriever(k=3, highlights=True, exa_api_key=EXA_API_KEY) # Define how to extract relevant metadata from the search results document_prompt = PromptTemplate.from_template( """ {url} {highlights} """ ) # Create a chain to process the retrieved documents document_chain = ( RunnableLambda( lambda document: { "highlights": document.metadata.get("highlights", "No highlights"), "url": document.metadata["url"], } ) | document_prompt ) # Execute the retrieval and processing chain retrieval_chain = retriever | document_chain.map() # Retrieve and return the documents documents = retrieval_chain.invoke(query) return documents ``` Here, `ExaSearchRetriever` is set to fetch 3 documents. Then we use LangChain's `PromptTemplate` to structure the results from Exa in a more AI friendly way. Creating and using this template is optional, but recommended. Read more about PromptTemplate ([here](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/quick_start/#). We also use a RunnableLambda to extract necessary metadata (like URL and highlights) from the search results and format it using the prompt template. After all of this we start the retrieval and processing chain and store the results in the `documents` variable which is returned. ## 3. Creating a Toolchain with LangGraph Now let's set up the complete toolchain using LangGraph. ```python from typing import Literal from langchain_anthropic import ChatAnthropic from langchain_core.messages import HumanMessage from langgraph.checkpoint.memory import MemorySaver from langgraph.graph import END, MessagesState, StateGraph from langgraph.prebuilt import ToolNode # Define and bind the AI model model = ChatAnthropic(model="claude-3-5-sonnet-20240620", temperature=0, api_key=ANTHROPIC_API_KEY).bind_tools([retrieve_web_content]) ``` Here, ChatAnthropic is set up with our Exa search tool, ready to generate responses based on the context provided. ## Define Workflow Functions Create functions to manage the workflow: ```python # Determine whether to continue or end def should_continue(state: MessagesState) -> Literal["tools", END]: messages = state["messages"] last_message = messages[-1] return "tools" if last_message.tool_calls else END # Function to generate model responses def call_model(state: MessagesState): messages = state["messages"] response = model.invoke(messages) return {"messages": [response]} ``` ## Build the Workflow Graph ```python # Define the workflow graph workflow = StateGraph(MessagesState) workflow.add_node("agent", call_model) workflow.add_node("tools", ToolNode([retrieve_web_content])) workflow.set_entry_point("agent") workflow.add_conditional_edges("agent", should_continue) workflow.add_edge("tools", "agent") # Initialize memory checkpointer = MemorySaver() # Compile the workflow into a runnable app = workflow.compile(checkpointer=checkpointer) ``` This sets up a state machine that switches between generating responses and retrieving documents, with memory to maintain context (this is a key advantage of LangGraph). ## 4. Running Your Workflow We are approaching the finish line of our Exa-powered search agent. ## Invoke and run ```python final_state = app.invoke( {"messages": [HumanMessage(content="Latest research papers on climate technology")]}, config={"configurable": {"thread_id": 44}}, ) print(final_state["messages"][-1].content) ``` ```c Text output Thank you for your patience. I've retrieved some information about the latest research papers on climate technology. Let me summarize the key findings for you: 1. Research and Development Investment Strategy for Paris Climate Agreement: - Source: Nature Communications (2023) - URL: https://www.nature.com/articles/s41467-023-38620-4.pdf - Key points: - The study focuses on research and development (R&D) investment strategies to achieve the goals of the Paris Climate Agreement. - It highlights that some low-carbon options are still not available at large scale or are too costly. - The research emphasizes the importance of government decisions in incentivizing R&D for climate technologies. - Current assessments of climate neutrality often don't include research-driven innovation, which this paper addresses. 2. Impact of Green Innovation on Emissions: - Source: SSRN (Social Science Research Network) - URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4212567 - Key points: - This study examines the effect of green innovation on direct and indirect emissions across various sectors worldwide. - Surprisingly, it finds that green innovation does not significantly affect emissions in the short term (one year after filing a green patent) or medium term (three to five years after filing). - The research touches on concepts like the path dependence of innovation and the Jevons paradox in relation to green technology. 3. Comprehensive Study on Green Technology: - Source: Taylor & Francis Online - URL: https://www.tandfonline.com/doi/pdf/10.1080/1331677X.2023.2178017 - Key points: - This paper provides a comprehensive review of literature on green technology. - It includes sections on research methods, measurement of variables, and data analysis techniques related to green technology. - The study offers policy recommendations and discusses limitations in the field of green technology research. These papers represent some of the latest research in climate technology, covering topics from R&D investment strategies to the actual impact of green innovations on emissions. They highlight the complexity of the field, showing that while there's significant focus on developing new technologies, the real-world impact of these innovations may be more nuanced than expected. Would you like more information on any specific aspect of these studies or climate technology in general? ``` ## (5. Optional: Streaming the output) ```python for chunk in app.stream({"messages": [HumanMessage(content="Latest research papers on climate technology")]}, config={"configurable": {"thread_id": 42}}): print(chunk, end="|", flush=True) ``` Or asynchronously: ```python async def async_streamer(): async for chunk in app.astream({"messages": [HumanMessage(content="Latest research papers on climate technology")]}, config={"configurable": {"thread_id": 42}}): print(chunk, end="|", flush=True) async_streamer() ``` That's it! You have now created a super powered search agent with the help of LangGraph and Exa. Modify the code to fit your needs and you can create an Exa powered agent for any task you can think of. ## Full Code ```python from typing import List, Literal from langchain_anthropic import ChatAnthropic from langchain_core.messages import HumanMessage from langchain_core.prompts import PromptTemplate from langchain_core.runnables import RunnableLambda from langchain_core.tools import tool from langchain_exa import ExaSearchRetriever from langgraph.checkpoint.memory import MemorySaver from langgraph.graph import END, MessagesState, StateGraph from langgraph.prebuilt import ToolNode @tool def retrieve_web_content(query: str) -> List[str]: """Function to retrieve usable documents for AI assistant""" # Initialize the Exa Search retriever retriever = ExaSearchRetriever(k=3, highlights=True) # Define how to extract relevant metadata from the search results document_prompt = PromptTemplate.from_template( """ {url} {highlights} """ ) # Create a chain to process the retrieved documents document_chain = ( RunnableLambda( lambda document: { "highlights": document.metadata.get("highlights", "No highlights"), "url": document.metadata["url"], } ) | document_prompt ) # Execute the retrieval and processing chain retrieval_chain = retriever | document_chain.map() # Retrieve and return the documents documents = retrieval_chain.invoke(query) return documents # Define and bind the AI model model = ChatAnthropic(model="claude-3-5-sonnet-20240620", temperature=0).bind_tools( [retrieve_web_content] ) # Determine whether to continue or end def should_continue(state: MessagesState) -> Literal["tools", END]: messages = state["messages"] last_message = messages[-1] return "tools" if last_message.tool_calls else END # Function to generate model responses def call_model(state: MessagesState): messages = state["messages"] response = model.invoke(messages) return {"messages": [response]} # Define the workflow graph workflow = StateGraph(MessagesState) workflow.add_node("agent", call_model) workflow.add_node("tools", ToolNode([retrieve_web_content])) workflow.set_entry_point("agent") workflow.add_conditional_edges("agent", should_continue) workflow.add_edge("tools", "agent") # Initialize memory checkpointer = MemorySaver() # Compile the workflow into a runnable app = workflow.compile(checkpointer=checkpointer) final_state = app.invoke( { "messages": [ HumanMessage(content="Latest research papers on climate technology") ] }, config={"configurable": {"thread_id": 44}}, ) print(final_state["messages"][-1].content) ``` Full code in Google Colab [here](https://docs.exa.ai/reference/getting-started-with-rag-in-langgraph) # Building a Hallucination Checker Source: https://docs.exa.ai/examples/identifying-hallucinations-with-exa Learn how to build an AI-powered system that identifies and verifies claims using Exa and LangGraph. *** We'll build a hallucination detection system using Exa's search capabilities to verify AI-generated claims. The system works in three steps: 1. Extract claims from text 2. Search for evidence using Exa 3. Verify claims against evidence This combines RAG with LangGraph to fact-check AI outputs and reduce hallucinations by grounding claims in real-world data. *** ## Get Started Install the required packages: ```python pip install langchain-core langgraph langchain-exa langchain-anthropic pydantic ``` You'll need both an Exa API key and an Anthropic API key to run this example. You can get your Anthropic API key [here](https://console.anthropic.com/). Set up your API keys: ```python Python import os import re import json from typing import Dict, Any, List, Annotated from pydantic import BaseModel from langchain_core.tools import StructuredTool from langgraph.graph import StateGraph, END from langgraph.graph.message import add_messages from langchain_core.messages import HumanMessage, SystemMessage, AIMessage from langchain_exa import ExaSearchRetriever from langchain_core.runnables import RunnableLambda from langchain_core.prompts import PromptTemplate from langchain_anthropic import ChatAnthropic # Check for API keys assert os.getenv("EXA_API_KEY"), "Please set the EXA_API_KEY environment variable" assert os.getenv("ANTHROPIC_API_KEY"), "Please set the ANTHROPIC_API_KEY environment variable" # Set up the LLM (ChatAnthropic) llm = ChatAnthropic(model="claude-3-5-sonnet-20240620", temperature=0) ``` First, we'll create functions to extract factual claims from the text: ```python Python def extract_claims_regex(text: str) -> List[str]: """Fallback function to extract claims using regex.""" pattern = r'([A-Z][^.!?]*?[.!?])' matches = re.findall(pattern, text) return [match.strip()+'.' for match in matches] def extract_claims(text: str) -> List[str]: """Extract factual claims from the text using an LLM.""" system_message = SystemMessage(content=""" You are an expert at extracting claims from text. Your task is to identify and list all claims present, true or false, in the given text. Each claim should be a single, verifiable statement. Consider various forms of claims, including assertions, statistics, and quotes. Do not skip any claims, even if they seem obvious. Do not include in the list 'The text contains a claim that needs to be checked for hallucinations' - this is not a claim. Present the claims as a JSON array of strings, and do not include any additional text. """) human_message = HumanMessage(content=f"Extract factual claims from this text: {text}") response = llm.invoke([system_message, human_message]) try: claims = json.loads(response.content) if not isinstance(claims, list): raise ValueError("Response is not a list") except (json.JSONDecodeError, ValueError): # Fallback to regex extraction if LLM response is not valid JSON claims = extract_claims_regex(text) return claims ``` We include a regex-based fallback method in case the LLM response isn't properly formatted. This ensures our system remains robust even if the LLM output is unexpected. Create a function to search for evidence using Exa: ```python Python def exa_search(query: str) -> List[str]: """Function to retrieve usable documents for AI assistant.""" search = ExaSearchRetriever(k=5, text=True) print("Query: ", query) document_prompt = PromptTemplate.from_template( """ {url} {text} """ ) parse_info = RunnableLambda( lambda document: { "url": document.metadata["url"], "text": document.page_content or "No text available", } ) document_chain = (parse_info | document_prompt) search_chain = search | document_chain.map() documents = search_chain.invoke(query+".\n Here is a web page to help verify this claim:") print("Documents: ", documents) return [str(doc) for doc in documents] ``` We format each source with its URL and content for easy reference in the verification step. The print statements help with debugging and understanding the search process. Build a function to analyze the evidence and assess each claim: ```python Python def verify_claim(claim: str, sources: List[str]) -> Dict[str, Any]: """Verify a single claim using combined Exa search sources.""" if not sources: # If no sources are returned, default to insufficient information return { "claim": claim, "assessment": "Insufficient information", "confidence_score": 0.5, "supporting_sources": [], "refuting_sources": [] } # Combine the sources into one text combined_sources = "\n\n".join(sources) system_message = SystemMessage(content=""" You are an expert fact-checker. Given a claim and a set of sources, determine whether the claim is supported, refuted, or if there is insufficient information in the sources to make a determination. For your analysis, consider all the sources collectively. Provide your answer as a JSON object with the following structure: { "claim": "...", "assessment": "supported" or "refuted" or "Insufficient information", "confidence_score": a number between 0 and 1 (1 means fully confident the claim is true, 0 means fully confident the claim is false), "supporting_sources": [list of sources that support the claim], "refuting_sources": [list of sources that refute the claim] } Do not include any additional text. """) human_message = HumanMessage(content=f""" Claim: "{claim}" Sources: {combined_sources} Based on the above sources, assess the claim. """) response = llm.invoke([system_message, human_message]) try: result = json.loads(response.content) if not isinstance(result, dict): raise ValueError("Response is not a JSON object") except (json.JSONDecodeError, ValueError): # If parsing fails, default to insufficient information result = { "claim": claim, "assessment": "Insufficient information", "confidence_score": 0.5, "supporting_sources": [], "refuting_sources": [] } return result ``` The verifier includes robust error handling and defaults to "Insufficient information" if there are issues with the LLM response or source processing. Set up the LangGraph workflow to orchestrate the process: ```python Python def hallucination_check(text: str) -> Dict[str, Any]: """Check a given text for hallucinations using Exa search.""" claims = extract_claims(text) claim_verifications = [] for claim in claims: sources = exa_search(claim) verification_result = verify_claim(claim, sources) claim_verifications.append(verification_result) return { "claims": claim_verifications } def hallucination_check_tool(text: str) -> Dict[str, Any]: """Assess the given text for hallucinations using Exa search.""" return hallucination_check(text) structured_tool = StructuredTool.from_function( func=hallucination_check_tool, name="hallucination_check", description="Assess the given text for hallucinations using Exa search." ) class State(BaseModel): messages: Annotated[List, add_messages] analysis_result: Dict[str, Any] = {} def call_model(state: State): # Simulate the assistant calling the tool return {"messages": state.messages + [AIMessage(content="Use hallucination_check tool", additional_kwargs={"tool_calls": [{"type": "function", "function": {"name": "hallucination_check"}}]})]} def run_tool(state: State): text_to_check = next((m.content for m in reversed(state.messages) if isinstance(m, HumanMessage)), "") tool_output = structured_tool.invoke(text_to_check) return {"messages": state.messages + [AIMessage(content=str(tool_output))], "analysis_result": tool_output} def use_analysis(state: State) -> str: return "tools" workflow = StateGraph(State) workflow.add_node("agent", call_model) workflow.add_node("tools", run_tool) workflow.add_node("process_result", lambda x: x) workflow.set_entry_point("agent") workflow.add_conditional_edges("agent", use_analysis, { "tools": "tools" }) workflow.add_edge("tools", "process_result") workflow.add_edge("process_result", END) graph = workflow.compile() ``` Let's try it with a sample text about the Eiffel Tower: ```python Python initial_state = State(messages=[ SystemMessage(content="You are a helpful assistant."), HumanMessage(content="Check this text for hallucinations: The Eiffel Tower, an iconic iron lattice structure located in Paris, was originally constructed as a giant sundial in 1822.") ]) final_state = graph.invoke(initial_state) ``` Sample output: ``` Workflow executed successfully Final state: Messages: SystemMessage: You are a helpful assistant.... HumanMessage: Check this text for hallucinations: The Eiffel Tower, an iconic iron lattice structure located in Pa... AIMessage: Use hallucination_check tool... AIMessage: {'claims': [{'claim': 'The Eiffel Tower is an iconic iron lattice structure', 'assessment': 'support... Analysis Result: Claim: The Eiffel Tower is an iconic iron lattice structure Assessment: supported Confidence Score: 1 Supporting Sources: - https://www.toureiffel.paris/en/news/130-years/what-eiffel-tower-made... - https://thechalkface.net/resources/melting_the_eiffel_tower.pdf... - https://datagenetics.com/blog/april22016/index.html... - https://engineering.purdue.edu/MSE/aboutus/gotmaterials/Buildings/patel.html... - https://www.toureiffel.paris/en/news/130-years/how-long-can-tower-last... Refuting Sources: Claim: The Eiffel Tower is located in Paris Assessment: supported Confidence Score: 1 Supporting Sources: - https://hoaxes.org/weblog/comments/is_the_eiffel_tower_copyrighted... - https://www.toureiffel.paris/en... - http://www.eiffeltowerguide.com/... - https://www.toureiffel.paris/en/the-monument... Refuting Sources: Claim: The Eiffel Tower was originally constructed as a giant sundial Assessment: refuted Confidence Score: 0.05 Supporting Sources: Refuting Sources: - https://www.whycenter.com/why-was-the-eiffel-tower-built/... - https://www.sciencekids.co.nz/sciencefacts/engineering/eiffeltower.html... - https://corrosion-doctors.org/Landmarks/eiffel-history.htm... Claim: The Eiffel Tower was constructed in 1822 Assessment: refuted Confidence Score: 0 Supporting Sources: Refuting Sources: - https://www.eiffeltowerfacts.org/eiffel-tower-history/... - https://www.whycenter.com/why-was-the-eiffel-tower-built/... - https://www.sciencekids.co.nz/sciencefacts/engineering/eiffeltower.html... ``` Through this combination of Exa's search capabilities and LangGraph's workflow management, we've created a powerful system for identifying and verifying claims in any text. The system successfully identified both true claims (structure and location) and false claims (construction date and purpose) about the Eiffel Tower. # Job Search with Exa Source: https://docs.exa.ai/examples/job-search-with-exa Tutorial for simple Exa searches on our front-end. ## What This Doc Covers * The problem with traditional job search tools * How to use Exa, an AI-powered search engine, for job hunting * Other cool ways to use Exa beyond job searching Finding a job is way harder than it should be. Tools like LinkedIn, Handshake, or Google are supposed to solve this problem, but they're filled with too many noisy results to actually be useful. Here's how you can use AI to find hundreds of hidden job listings in less than 5 minutes. At a high level, Exa is a search engine that understands your query. So, when searching for "ML internships for new grads in San Francisco" here's what gets returned: ![](https://mintlify.s3.us-west-1.amazonaws.com/exa-52/images/5d5309c-Screenshot_2024-07-18_at_12.44.34.png) And, by filtering for only things that were posted recently, you can make sure that the positions were new and not-filled. But, there's actually an even better way to take advantage of Exa. You can just paste a job posting and get similar ones: ![](https://mintlify.s3.us-west-1.amazonaws.com/exa-52/images/eb97595-Screenshot_2024-07-18_at_12.40.27.png) ## More than just jobs Job search is really just one use case of Exa. Exa is a search engine built using novel representation learning techniques. For example, Exa excels at finding similar things. * **Shopping**: if you want a similar (but cheaper) shirt, paste a link to your shirt and it'll give you hundreds like it * **Research**: paste a link to a research paper to find hundreds of other relevant papers * **Startups**: if you're building a startup, find your competitors by searching a link to your startup # Hacker News Clone Source: https://docs.exa.ai/examples/live-demo-hacker-news-clone Make your very own Hacker News powered by Exa [Click here to try Exa-powered Hacker News for Anything.](https://hackernews-by-exa.replit.app/) ## What this doc covers: * How to create a personalized Hacker News clone using Exa's API. * Steps to set up and run your own news site with custom prompts. * Customization options for the site's content, appearance, and deployment. *Estimated time to complete: 20 minutes* Built by Silicon Valley legend Paul Graham in 2007, [Hacker News](https://news.ycombinator.com/) is a popular website where users post interesting tech-adjacent content. The most interesting content often comes from small blogs and personal sites. However, these gems can be really hard to find. Thankfully, Exa's search models are good at finding interesting sites from all corners of the web, no matter how big or small. Exa searches the web semantically, enabling you to find information based on meaning rather than SEO. We can use Exa to find super interesting tech articles without specific keywords, topics, or blogs in mind. In this tutorial, we'll use Exa's API to create a clone of Hacker News. Here's our [live example](https://hackernews-by-exa.replit.app/). You'll get to create your own personalized version about anything, not just tech. For instance, you could make Business News, a site that displays relevant corporate updates. Your website will automatically update to get the newest content on whatever topic you choose. ![](https://mintlify.s3.us-west-1.amazonaws.com/exa-52/images/315a2e9-Screenshot_2024-07-14_at_7.49.35_PM.png) ## Getting Started First, grab a free Exa API key by signing up [here](https://exa.ai/). You get 1000 free queries a month. Next, fork (clone) our [template](https://replit.com/@olafblitz/exa-hackernews-demo-nodejs?v=1) on Replit. Once you've forked the template, go to the lower left corner of the screen and scroll through the options until you see "Secrets" (where you manage environment variables like API keys). ![Click on Secrets](https://mintlify.s3.us-west-1.amazonaws.com/exa-52/images/0screenshot_2024-05-15_at_11.12.21___pm.png) Add your Exa API key as a secret named "EXA\_API\_KEY" (original, we know). ![Add your API key!](https://mintlify.s3.us-west-1.amazonaws.com/exa-52/images/0screenshot_2024-05-15_at_11.13.34___pm.png) After you've added your API key, click the green Run button in the top center of the window. ![Run button](https://mintlify.s3.us-west-1.amazonaws.com/exa-52/images/0screenshot_2024-05-15_at_10.08.03___pm.png) After a few seconds, a Webview window will pop up with your website. You'll see a website that vaguely resembles Hacker News. It's a basic Express.js app with some CSS styling. ![What you should see](https://mintlify.s3.us-west-1.amazonaws.com/exa-52/images/0screenshot_2024-05-15_at_10.12.09___pm.png) ## How Exa works In the index.js file (should be open by default), scroll to **line 19**. This is the brains of the site. It's where we call the Exa API with a custom prompt to get back Hacker News-style content. ``` const response = await fetch('https://api.exa.ai/search', { method: 'POST', headers: { 'Content-Type': 'application/json', // Add your API key named "EXA_API_KEY" to Repl.it Secrets 'x-api-key': process.env.EXA_API_KEY, }, body: JSON.stringify({ // change this prompt! query: 'here is a really interesting techy article:', // specify the maximum number of results to retrieve (10 is the limit for free API users) numResults: 10, // Set the start date for the article search startPublishedDate: startPublishedDate, // Set the end date for the article search endPublishedDate: endPublishedDate, }), }); ``` The prompt is set to "here is a really interesting tech article:". This is because of how Exa works behind the scenes. Exa uses embeddings to help predict which links would naturally follow a query. For example, on the Internet, you'll frequently see people recommend great content like this: "this tutorial really helped me understand linked lists: linkedlisttutorial.com". When you prompt Exa, you pretend to be someone recommending what you're looking for. In this case, our prompt nudges Exa to find links that someone would share when discussing a "really interesting tech article". Check out the [results](https://exa.ai/search?q=here%20is%20a%20really%20interesting%20tech%20article%3A\&filters=%7B%22numResults%22%3A30%2C%22useAutoprompt%22%3Afalse%2C%22domainFilterType%22%3A%22include%22%7D) Exa returns for our prompt. Aren't they nice? More example prompts to help you get a sense of prompting with Exa: * [this gadget saves me so much time:](https://exa.ai/search?c=all\&q=this%20gadget%20saves%20me%20so%20much%20time%3A\&filters=%7B%22domainFilterType%22%3A%22include%22%2C%22timeFilterOption%22%3A%22any%5Ftime%22%2C%22activeTabFilter%22%3A%22all%22%7D) * [i loved my wedding dress from this boutique:](https://exa.ai/search?c=all\&q=i%20loved%20my%20wedding%20dress%20from%20this%20boutique%3A\&filters=%7B%22domainFilterType%22%3A%22include%22%2C%22timeFilterOption%22%3A%22any%5Ftime%22%2C%22activeTabFilter%22%3A%22all%22%7D) * [this video helped me understand attention mechanisms:](https://exa.ai/search?c=all\&q=this%20video%20helped%20me%20understand%20attention%20mechanisms%3A\&filters=%7B%22domainFilterType%22%3A%22include%22%2C%22timeFilterOption%22%3A%22any%5Ftime%22%2C%22activeTabFilter%22%3A%22all%22%7D) More examples in the Exa [docs](/reference/the-metaphor-index). At this point, please craft your own Exa prompt for your Hacker News site. It can be about anything you find interesting. Example ideas: * [this is a really exciting machine learning paper:](https://exa.ai/search?c=all\&q=this%20is%20a%20really%20exciting%20machine%20learning%20paper%3A\&filters=%7B%22domainFilterType%22%3A%22include%22%2C%22timeFilterOption%22%3A%22past%5Fday%22%2C%22activeTabFilter%22%3A%22all%22%7D) * [here's a delicious new recipe:](https://exa.ai/search?c=all\&q=here%27s%20a%20delicious%20new%20recipe%3A\&filters=%7B%22domainFilterType%22%3A%22include%22%2C%22timeFilterOption%22%3A%22any%5Ftime%22%2C%22activeTabFilter%22%3A%22all%22%7D) * [this company just got acquired:](https://exa.ai/search?c=all\&q=this%20company%20just%20got%20acquired%3A\&filters=%7B%22domainFilterType%22%3A%22include%22%2C%22timeFilterOption%22%3A%22past%5Fday%22%2C%22activeTabFilter%22%3A%22all%22%7D) * [here's how the basketball game went:](https://exa.ai/search?c=all\&q=here%27s%20how%20the%20basketball%20game%20went%3A\&filters=%7B%22domainFilterType%22%3A%22include%22%2C%22timeFilterOption%22%3A%22past%5Fday%22%2C%22activeTabFilter%22%3A%22all%22%7D) Once you have your prompt, replace the old one (line 28 of index.js). Hit the Stop button (where the Run button was) and hit Run again to restart your site with the new prompt. Feel free to keep tweaking your prompt until you get results you like. ## Customize your site Now, other things you can modify in the site template include the time window to search over, the number of results to return, the text on the site (title, description, footer), and styling (colors, fonts, etc.). By default, the site asks the Exa API to get the ten most relevant results from the last 24 hours every time you visit the site. On the free plan, you can only get up to ten results, so you'll have to sign up for an Exa plan to increase this. You *can* tweak the time window though. Lines 12 to 17 in index.js is where we set the time window. You can adjust this as you like to get results from the last week, month, year, etc. Note that you don't have to search starting from the current date. You can search between any arbitrary dates, like October 31, 2015 and January 1, 2018. To adjust the site title and other text, go to line 51 in index.js where the dynamic HTML starts. You can Ctrl-F "change" to find all the places where you can edit the text. If orange isn't your vibe, go to the styles.css. To get there, go to the left side panel on Replit and click on the "public" folder. To keep your site running all the time, you'll need to deploy it on Replit using Deployments. Click Deploy in the top right corner and select Autoscale. You can leave the default settings and click Deploy. This does cost money though. Alternatively you can deploy the site on your own. It's only two files (index.js and public/styles.css). Well, there you have it! You just made your very own Hacker News-style site using the Exa API. Share it on X and [tag us](https://x.com/ExaAILabs) for a retweet! # Phrase Filters: Niche Company Finder Source: https://docs.exa.ai/examples/niche-company-finder-with-phrase-filters *** ## What this doc covers 1. What Phrase filters are and how they work 2. Using 'Phrase Filters' to find specific results, in this case filtering by a foreign company suffix In this simple example, we'll demonstrate a company discovery search that helps find relevant companies incorporated in the Germany (and a few nearby countries) via Phrase Filters. This example will use the fact that companies incorporated in these locations [have a suffix of GmbH](https://en.wikipedia.org/wiki/GmbH), which is a term in the region similar to the US 'incorporated'. ## How Phrase Filters work Exa's search combines semantic relevance with precise filtering: a neural query first retrieves contextually relevant documents, then a phrase filter refines these results by checking for specific text in the content. This two-stage approach delivers highly targeted outputs by leveraging both semantic understanding and exact text matching. ![](https://mintlify.s3.us-west-1.amazonaws.com/exa-52/images/1864e57-Screenshot_2024-07-16_at_05.41.13.png) ## Running a query with phrase filter Using Phrase Filters is super simple. As usual, install the `exa_py` library with `pip install exa_py`. Then instantiate the library: ```Python Python # Now, import the Exa class and pass your API key to it. from exa_py import Exa my_exa_api_key = "YOUR_API_KEY_HERE" exa = Exa(my_exa_api_key) ``` Make a query, in this example searching for the most innovative climate tech companies. To use Phrase Filters, specify a string corresponding to the `includeText` input parameter ```Python Python result = exa.search_and_contents( "Here is an innovative climate technology company", type="neural", num_results=10, text=True, include_text=["GmbH"] ) print(result) ``` Which outputs: ``` { "results": [ { "score": 0.4694329500198364, "title": "Sorption Technologies |", "id": "https://sorption-technologies.com/", "url": "https://sorption-technologies.com/", "publishedDate": "2024-02-10", "author": null, "text": "" }, { "score": 0.46858930587768555, "title": "FenX | VentureRadar", "id": "https://www.ventureradar.com/organisation/FenX/364b6fb7-0033-4c88-a4e9-9c3b1f530d72", "url": "https://www.ventureradar.com/organisation/FenX/364b6fb7-0033-4c88-a4e9-9c3b1f530d72", "publishedDate": "2023-03-28", "author": null, "text": "Follow\n\nFollowing\n\nLocation: Switzerland\n\nFounded in 2019\n\nPrivate Company\n\n\"FenX is a Spinoff of ETH Zurich tackling the world’s energy and greenhouse gas challenges by disrupting the building insulation market. Based on a innovative foaming technique, the company produces high-performance insulation foams made from abandoned waste materials such as fly ash from coal power stations. The final products are fully recyclable, emit low CO2 emissions and are economically competitive.\"\n Description Source: VentureRadar Research / Company Website\n\nExport Similar Companies Similar Companies\n\nCompany \n Country\n Status\n Description\n\nVecor Australia Australia n/a Every year the world’s coal-fired power stations produce approximately 1 billion tonnes of a very fine ash called fly ash. This nuisance ash, which resembles smoke, can be... MCC Technologies USA Private MCC Technologies builds, owns and operates processing plants utilizing coal fly ash waste from landfills and ash ponds. The company processes large volumes of low-quality Class F... Climeworks GmbH Switzerland Private Climeworks has developed an ecologically and economically attractive method to extract CO2 from ambient air. Our goal is to deliver CO2 for the production of synthetic liquid... Errcive Inc USA Private The company is involved in developing a novel fly ash based material to mitigate exhaust pollution. The commercial impact of the work is to allow: the reduction of exhaust fumes... 4 Envi Denmark n/a Danish 4 Envi develops a system for the cleaning and re-use of biomass-fuelled plant’s fly ash. After cleaning, the ash and some of its components can be reused as fertilizers,... Neolithe France n/a Néolithe wants to reduce global greenhouse gas emissions by 5% by tackling a problem that concerns us all: waste treatment! They transform non-recyclable waste into aggregates...\n\nShow all\n\nWebsite Archive\n\nInternet Archive snapshots for |\n\nhttps://fenx.ch/\n\nThe archive allows you to go back in time and view historical versions of the company website\n\nThe site\n\nhttps://fenx.ch/\n\nwas first archived on\n\n4th Jul 2019\n\nIs this your company? Claim this profile andupdate details for free\n\nSub-Scores\n\nPopularity on VentureRadar\n\nWebsite Popularity\n\nLow Traffic Sites\n Low\n\nHigh Traffic Sites\n High\n\nAlexa Global Rank:\n\n3,478,846 | \n fenx.ch\n\nAuto Analyst Score\n\n68\n\nAuto Analyst Score:\n 68 | \n fenx.ch\n\nVentureRadar Popularity\n\nHigh\n\nVentureRadar Popularity:\n High The popularity score combines profile views, clicks and the number of times the company appears in search results.\n\nor\n\nTo continue, please confirm you\n are not a robot" }, { "score": 0.4682174026966095, "title": "intelligent fluids | LinkedIn", "id": "https://www.linkedin.com/company/intelligentfluids", "url": "https://www.linkedin.com/company/intelligentfluids", "publishedDate": "2023-06-08", "author": null, "text": "Sign in to see who you already know at intelligent fluids GmbH (SMARTCHEM)\n\nWelcome back\n\nEmail or phone\n\nPassword\n\nForgot password?\n\nor\n\nNew to LinkedIn? Join now\n\nor\n\nNew to LinkedIn? Join now" }, { "score": 0.46611523628234863, "title": "justairtech GmbH – Umweltfreundliche Kühlsysteme mit Luft als Kältemittel", "id": "https://www.justairtech.de/", "url": "https://www.justairtech.de/", "publishedDate": "2024-06-13", "author": null, "text": "decouple cooling from climate change with air as refrigerant.\n\nWir entwickeln eine hocheffziente Kühlanlage, die Luft als Kältemittel verwendet. Wieso? Die Welt verändert sich tiefgreifender und schneller als in allen Generationen vor uns. Wir sehen darin nicht nur eine Bedrohung, sondern begreifen dies auch als Chance, Prozesse nachhaltig zu gestalten.\n\nUnsere Arbeit konzentriert sich auf die Revolutio­nie­rung der Kühlung für Ziel­tempera­turen von 0–40 °C bei beliebiger Umwelt­temperatur. Dabei verwenden wir Luft als Kältemittel.\n\nzielgruppe\n\nDer globale Kühlbedarf macht aktuell 10% des weltweiten Strom­bedarfs aus und steigt rasant an. Es werden zwischen 2020 und 2070 knapp 10 Klima­anlagen pro Sekunde verkauft (viele weitere Zahlen und Statistiken rund um das Thema Kühlung findest Du bei der International Energy Agency ) . Mit unserer Technologie können wir verhindern, dass der Strom­verbrauch und die CO2-Emissionen propor­tional mit der Anzahl der verkauften Anlagen wächst.\n\nWir entwickeln eine Technologie, die 4–5 mal so effizient wie konventio­nelle Kühlanlagen arbeitet. Außerdem verwendet sie Luft als Kühlmittel. Luft ist ein natürliches Kältemittel, ist unbegrenzt frei verfügbar und hat ein Global Warming Potential von 0 (mehr zu natürlichen Kältemittel bei der Green Cooling Initiative) . Der Einsatz von Luft als Kältemittel ist nicht neu, aber mit konventio­nellen Anlagen im Ziel­temperatur­bereich nicht wettbewerbs­fähig umsetzbar. Unser erstes Produkt wird für die Kühlung von Rechen­zentren ausgelegt. Weitere Produkte im Bereich der gewerblichen und industriellen Kälte­erzeugung werden folgen.\n\nroadmap\n\n06/2020 \n Q4 2020 erste Seed-Finanzierungsrunde Q4 2020 \n 10/2020 erste Patentanmeldungen 10/2020 \n Q4 2021 zweite Seed-Finanzierungsrunde Q4 2021 \n Q4 2021 erste Patenterteilungen beantragt Q4 2021 \n 05/2022 Prototyp des fraktalen Wärmetauschers 05/2022 \n Q3 2022 Start-Up-Finanzierungsrunde Q3 2022\n\nQ4 2023 per CCS ausgeblendet Q4 2023 \n Q4 2023 physischer Anlagenprototyp Q4 2023 \n Q3 2024 Serienüberleitung und Beta-Tests Q3 2024 \n Q3 2025 \n ab 2025\n\nour core values\n\nWe love innovation. And disruption is even better! Failing is part of the game, but we are curious and continuous learners. \n We help and enable each other. Cooperative interaction with our clients, our partners and our colleagues is central. \n We are pragmatic. Our goals always remain our focus. We are dedicated team players. \n We interact respectfully. With each other and our environment.\n\nteam\n\nGerrit Barth Product Development & Technology \n Anna Herzog Head of Sales & Marketing, PR \n Bikbulat Khabibullin Product Development & Technology\n\nJohannes Lampl Product Development & Technology \n Anne Murmann Product Development & Technology \n Jens Schäfer Co-Founder and CEO\n\nHolger Sedlak Inventor, Co-Founder and CTO \n Adrian Zajac Product Development & Technology\n\nstellenangebote\n\npartner & förderungen" }, { "score": 0.4648257791996002, "title": "Let’s capture CO2 and tackle climate change", "id": "https://blancair.com/", "url": "https://blancair.com/", "publishedDate": "2023-03-01", "author": null, "text": "Let’s capture CO2 and tackle climate change\n\nWe need to keep global warming below 1.5°C. This requires a deployment of Negative Emission Technologies (NETs) of around 8 Gt of CO2 in 2050. Natural Climate solutions cannot do it alone.Technology has to give support. BLANCAIR can turn back human-emitted carbon dioxide from our atmosphere by capturing it and sequestering it back into the planet.\n\nGet to know us, our Hamburg team, partnerships and network\n\nTake a look at the BLANCAIR technology, our milestones & our next goals\n\nJoin our BLANCAIR team & help us to fight climate change!" }, { "score": 0.46323055028915405, "title": "bionero - Der Erde zuliebe. Carbon Removal | Terra Preta", "id": "https://www.bionero.de/", "url": "https://www.bionero.de/", "publishedDate": "2023-10-28", "author": null, "text": "Mehr Wachstum. Echter Klimaschutz. bionero ist eines der ersten Unternehmen weltweit, das zertifiziert klimapositiv arbeitet. Das Familienunternehmen, das in der Nähe von Bayreuth beheimatet ist, stellt qualitativ höchstwertige Erden und Substrate her, die durch das einzigartige Produktionsverfahren aktiv CO2 aus der Atmosphäre entziehen und gleichzeitig enorm fruchtbar sind. Aus Liebe und der Ehrfurcht zur Natur entwickelte bionero ein hochmodernes, industrialisiertes Verfahren, das aus biogenen Reststoffen eine höchstwertige Pflanzenkohle herstellt und zu fruchtbaren Schwarzerden made in Germany verwandelt. Hier kannst du bionero im Einzelhandel finden Wir liefern Gutes aus der Natur, für die Natur. Terra Preta (portugiesisch für \"Schwarze Erde\") gilt als \"wiederentdeckte Wundererde\". Sie wurde vor circa 40 Jahren in den Tiefen des Amazonasgebiets entdeckt und intensiv erforscht. Das Besondere an ihr ist ihre Fruchtbarkeit. Tatsächlich gilt dieser Boden als der fruchtbarste unseres Planeten. bionero hat gemeinsam mit Professor Bruno Glaser, einem weltweit anerkannten Experten für Terra Preta, das Herstellungsverfahren dieser besonderen Erde transformiert, optimiert und industrialisiert. Der wesentliche Wirk- und Inhaltsstoff ist eine sog. Pflanzenkohle. Sie sorgt dank ihrer enorm großen spezifischen Oberfläche für optimale Nährstoff- und Wasserspeicherfähigkeiten im Boden und bietet zusätzlich Lebensraum für wertvolle Mikroorganismen. Das Ergebnis ist ein stetiger Humusaufbau und eine dauerhafte Bodenfruchtbarkeit. Das Einzigartige an bionero? Die bionero Wertschöpfungskette ist vollständig klimapositiv! bioneros Produkte bieten einer Branche, die stark in die Kritik geraten ist, einen Weg in eine nachhaltige Zukunft. Während der Herstellung unserer hochwertigen Terra Preta leisten wir einen aktiven Beitrag zum Klimaschutz. Durch die Produktion unserer wichtigsten Zutat, der Pflanzenkohle, wird dem atmosphärischen Kohlenstoffkreislauf aktiv Kohlenstoff entzogen. Der Kohlenstoff, welcher anfangs in den biogenen Reststoffen gespeichert war, wird während des Pyrolyseprozesses für mehrere Jahrtausende in der Pflanzenkohle fixiert und gelangt somit nicht als Kohlenstoffdioxid zurück in unsere Atmosphäre. Das Erstaunliche: Die Pflanzenkohle entzieht der Atmosphäre das bis zu dreieinhalbfache ihres Eigengewichts an CO2! Die entstandenen Kohlenstoffsenken sind dabei transparent quantifizierbar und zertifiziert. Tatsächlich vereint bionero als erstes Unternehmen weltweit alle notwendigen Verfahrensschritte zu einer echten Kohlenstoffsenke gemäß EBC. Der Kohlenstoff ist am Ende der bionero Wertschöpfungskette in einer stabilen Matrix fixiert. Torf ist bis heute der meistgenutzte Rohstoff bei der Herstellung von Pflanzsubstraten. Schon beim Abbau werden Unmengen an CO2 freigesetzt. Moore sind einer der wichtigsten Kohlenstoff-Speicher unseres Planeten. Moore speichern 700 Tonnen Kohlenstoff je Hektar, sechsmal mehr als ein Hektar Wald! Durch die Trockenlegung und den Abbau für die Gewinnung von Torf können diese gewaltigen Mengen Kohlenstoff wieder zu CO2-reagieren und gelangen in die Atmosphäre. Hinzu kommen enorm weite Transportwege. Der Torfabbau findet zu großen Teilen in Osteuropa statt. Um einerseits die natürlichen Ökosysteme zu schützen und andererseits lange Transportwege zu vermeiden, setzen wir auf regional anfallende Roh- und Reststoffe. In langen Reifeprozessen verarbeiten wir natürliche Reststoffe zu hochwertigen Ausgangsstoffen für unsere Produkte. Bei der Auswahl aller Inputstoffe schauen wir genau hin und arbeiten nach dem Prinzip “regional, nachhaltig, umwelt- und klimaschonend“. Nur, wenn diese Voraussetzungen ausnahmslos gewährleistet sind, findet ein Rohstoff letztlich seinen Weg in unsere Produkte. bionero - Mehr Wachstum. Echter Klimaschutz. Erhalte spannende Einblicke in die Abläufe unseres Start-Ups und unsere hochmodernen Verfahren. Hier gibt es die neuesten Trends, aktuelle Tipps, hilfreiche Pflanz- und Pflegeanleitungen und interessante Videos." }, { "score": 0.4623781740665436, "title": "Green City Solutions", "id": "https://www.greentalents.de/green-city-solutions.php", "url": "https://www.greentalents.de/green-city-solutions.php", "publishedDate": "2022-04-12", "author": null, "text": "In their devices, called CityTrees, they combine the natural ability of moss to clean and cool the air with Internet of Things technology to control irrigation and ventilation. In March 2014, Green City Solutions GmbH was founded by Peter Sänger and his friend Liang Wu in Dresden. They set up a team of young experts from the fields of horticulture/biology, computer science, architecture, and mechanical engineering. The knowledge of the individuals was bundled to realise a device that combines nature and technology: the CityTree.\n\nThe living heart of CityTrees is moss cultivated on hanging textile mats. The moss mats are hidden behind wooden bars that provide sufficient shade for these plants, which naturally grow mainly in forests. Sensors are measuring various parameters such as temperature, humidity, and concentration of particulates. This data is used to regulate ventilation and irrigation. Behind the moss mats are large vents that create an airflow through the moss. In this way, the amount of air cleaned by the device can be increased when pollution levels are high, such as during rush hours.\n\nGreen City Solutions collaborates with several partners in Germany and abroad. Scientific partners include the Leibniz Institute for Tropospheric Research (TROPOS) and the Dresden University of Applied Sciences (HTW Dresden), both located in Germany. Green City Solutions has been awarded the Seal of Excellence by the European Commission. This is a European Union quality label for outstanding ideas worthy of funding.\n\nThe work of Green City Solutions mainly contributes to the Sustainable Development Goals 3, 11, 13, and 15:" }, { "score": 0.4593821167945862, "title": "No.1 DAC manufacturer from Germany - DACMA GmbH", "id": "https://dacma.com/", "url": "https://dacma.com/", "publishedDate": "2024-03-02", "author": null, "text": "Reach net zero goal with BLANCAIR by DACMA – a proven direct air capture technology with maximum CO2 uptake and minimal energy demand.\n\nDACMA GmbH, headquartered in Hamburg, Germany, is a pioneering DAC manufacturer with cutting-edge technology. With a proven track record, our first machines were delivered in 2023. Our scalable design reaches gigaton capacities, ensuring high CO2 uptake with minimal energy demand.\n\nGet to know us, our team, partnerships and network\n\nLearn more about the status quo of DAC technologies and our BLANCAIR solution\n\nJoin our DACMA team – help us to reach net zero and fight climate change!\n\nWhy BLANCAIR by DACMA:\n\nNo.1 DAC manufacturer from Germany – leveraging decades of aerospace – innovation\n\nDeliverable: proven technology in the market\n\nInterchangeable adsorbents for continuous performance improvement\n\nPatented reactor design with optimized air flow\n\nUniversal application for different climate conditions\n\n“In just one year, DACMA GmbH have achieved an exponential progress in the atmospheric carbon capture journey. The strategic alliance with Repsol (both in Venturing Capital and projects) will boost the pace of this highly focused group of outstanding engineers that are persistently looking for every angle of the technology improvement. Take the time to celebrate, acknowledge your success and keep going!!!”\n\n“One of the most relevant projects related to the development of technologies with a negative CO2 effect, the ONLY project in Brazil on Direct Air Capture multi-country Spain, Brazil Germany in Open Innovation. Repsol Sinopec Brazil Corporation, Start Up DACMA and PUC Rio Grande do Sul University. A disruptive commitment to a more decarbonized world. Being part of this project is a privilege and a unique opportunity to add value to society.”\n\n“In collaboration with Phoenix Contact, DACMA has developed an application that contributes to CO2 decarbonization. This technology makes a significant contribution to sector coupling in the All Electric Society and to the sustainable use of energy. I am delighted that two technology-driven companies are working together so efficiently.”\n\n“The DACMA GmbH with Jörg Spitzner and his team are not only valuable partners in our network, but also key initiators and innovators who, with BLANCAIR, are driving forward DAC system engineering in the Hamburg metropolitan region – an essential future climate change mitigation technology.”\n\n“Together with our partner DACMA GmbH, we are delighted to be building the first DAC machine on the HAMBURG BLUE HUB site in the Port of Hamburg. The 30-60 tons output of CO2 annually of the BLANCAIR machine can later be used to produce e-methanol for the Port of Hamburg, for example. This is a joint milestone, as it fits in with the plan to purchase large volumes of synthetic fuels from Power-to-X plants in Africa and South America for Germany through the HAMBURG BLUE HUB”.\n\nBacked by strong investors & partners:\n\nassociations & supporters:" }, { "score": 0.4587157368659973, "title": "Heatrix GmbH Decarbonizing Industry – We decarbonize high temperature industrial heat.", "id": "https://heatrix.de/", "url": "https://heatrix.de/", "publishedDate": "2024-02-28", "author": null, "text": "Our mission\n\nis to competitively replace fossil fuels in energy intensive industriesby converting renewable electricity into storable, high-temperature process heat.\n\n11% of global CO2 emissions is caused byhigh-temperature industrial heat.\n\nNo carbon-neutral, cost-competitive and easy\nto\nintegrate solution exists yet.\n\n11%\nof global CO2 emissions is caused by\nhigh-temperature industrial heat.\n\nNo carbon-neutral, cost-competitive and easy\nto integrate solution exists yet.\n\nOur solution\n\nThe Heatrix system combines an electric heater, utilizing off-grid solar or wind \nelectricity, with a thermal energy storage to provide continuous high-temperature \nprocess heat. With an outlet temperature of up to 1500 °C, Heatrix has the potential to \ndecarbonize the majority of high emission industries.\n\nHeatrix technology perfectly fulfils customers' \nrequirements – CO2 free continuous and easily integrated process heat at competitive cost.\n\nCarbon-free green heat, \nreducing CO2 emissions \n up to 100%\n\nProcess heat (hot air) \nup to 1500 °C\n\nThermal storage up\n to 20 hours to \ndeliver green heat 24/7\n\nHigh efficiency up \nto 90% based on \nresistance heating\n\nCost competitive vs. \nfossil fuels and substantially \ncheaper than green hydrogen\n\nModular container\nsystem enables \neasy scalability\n\nEasy integration \nwith minimal \nretrofitting needs\n\nApplications for Heatrix\n\nCalcination\n\nReplacing fossil fuel burners and reducing fuel consumption in calcination processes by integrating Heatrix heat to shaft calciners or precalciners of rotary kilns.\n\nHeat Treatment\n\nInducing required process temperatures via hot air flow from Heatrix replacing fossil fuel burners in heat treatment ovens.\n\nSintering & Pelletization\n\nReduced fuel gas & coke usage by providing Heatrix heat to sintering or pelletization plants.\n\nPreheating\n\nCombined with existing burner system, Heatrix technology can be used to preheat materials and reduce fuel consumption in the actual process.\n\nThis is us\n\nStrategy & Operations\n\nInnovator / Inventor / Sold first tech start-up in 2021 / Ph.D. from RWTH Aachen\n\nTechnology & Product\n\nTech Lead / Fluid dynamics expert / Energy technologies / Ph.D. from University Bremen\n\nBusiness & Finance\n\n2nd-time Founder / former VC-Investor / MBA from Tsinghua, MIT & HEC Paris\n\nContact us\n\nLooking for more information about Heatrix and our technology? We’d love to get in touch!\n\nHeatrix ensures defensibility through modular product, ease of integration, technological advantages and compelling business model.\n\nModular Product\n\n• Avoids individual design process – fits in standard containers• Industry-agnostic solution• Modular configuration to meet customer needs\n\nEasy Interaction\n\n• Rapid deployment• Focus on minimal plant downtime• Compatible to back-up for guaranteed production\n\nBusiness Model\n\n• Ongoing customer relationship and revenue \n• Large growth potential\n• Maximal impact on CO2\nreduction\n\nTechnical Advantage\n\n• Unique system design integrating electric heater and thermal storage \n• IP application in preparation for unique heater and storage design" }, { "score": 0.45255395770072937, "title": "vabeck® GmbH - Grüne Prozesstechnik für den Umweltschutz", "id": "https://www.vabeck.com/en", "url": "https://www.vabeck.com/en", "publishedDate": "2022-01-01", "author": null, "text": "" } ], "requestId": "a02fd414d9ca16454089e8720cd6ed2b" } ``` Nice! On inspection, these results include companies located in Hamburg, Munich and other close by European locations. This example can be extended to any key phrase - have a play with filtering via [other company suffixes - ](https://en.wikipedia.org/wiki/List%5Fof%5Flegal%5Fentity%5Ftypes%5Fby%5Fcountry) and see what interesting results you get back! # Building a News Summarizer Source: https://docs.exa.ai/examples/recent-news-summarizer Learn how to build an AI-powered news summarizer that searches and summarizes recent articles using Exa and GPT. *** In this example, we will build an LLM-based news summarizer with the Exa API to keep us up-to-date with the latest news on a given topic. We'll do this in three steps: 1. Generate search queries for Exa using an LLM 2. Retrieve relevant URLs and their contents using Exa 3. Summarize webpage contents using GPT-3.5 Turbo This is a form of Retrieval Augmented Generation (RAG), combining Exa's search capabilities with GPT's summarization abilities. The Jupyter notebook for this tutorial is available on [Colab](https://colab.research.google.com/drive/1uZ0kxFCWmCqozl3ArTJohNpRbeEYlwlT?usp=sharing) for easy experimentation. You can also [check it out on Github](https://github.com/exa-labs/exa-py/tree/master/examples/newssummarizer/summarizer.ipynb), including a [plain Python version](https://github.com/exa-labs/exa-py/tree/master/examples/newssummarizer/summarizer.py) if you want to skip to the complete product. *** ## Get Started Install the required packages: ```python pip install exa_py openai ``` You'll need both an Exa API key and an OpenAI API key to run this example. You can get your OpenAI API key [here](https://platform.openai.com/api-keys). Set up your API keys: ```python from google.colab import userdata # comment this out if you're not using Colab EXA_API_KEY = userdata.get('EXA_API_KEY') # replace with your Exa API key OPENAI_API_KEY = userdata.get('OPENAI_API_KEY') # replace with your OpenAI API key ``` Import and set up both the OpenAI and Exa clients: ```python import openai from exa_py import Exa openai.api_key = OPENAI_API_KEY exa = Exa(EXA_API_KEY) ``` First, we'll use GPT to generate an optimized search query based on the user's question: ```python SYSTEM_MESSAGE = "You are a helpful assistant that generates search queries based on user questions. Only generate one search query." USER_QUESTION = "What's the recent news in physics this week?" completion = openai.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": SYSTEM_MESSAGE}, {"role": "user", "content": USER_QUESTION}, ], ) search_query = completion.choices[0].message.content print("Search query:") print(search_query) ``` Now we'll use Exa to search for recent articles, filtering by publication date: ```python from datetime import datetime, timedelta one_week_ago = (datetime.now() - timedelta(days=7)) date_cutoff = one_week_ago.strftime("%Y-%m-%d") search_response = exa.search_and_contents( search_query, start_published_date=date_cutoff ) urls = [result.url for result in search_response.results] print("URLs:") for url in urls: print(url) ``` We use `start_published_date` to filter for recent content. Exa's `search_and_contents` already retrieved the article contents for us, so we can access them directly: ```python results = search_response.results result_item = results[0] print(f"{len(results)} items total, printing the first one:") print(result_item.text) ``` Unlike traditional search engines that only return URLs, Exa gives us direct access to the webpage contents, eliminating the need for web scraping. Finally, we'll use GPT to create a concise summary of the article: ```python import textwrap SYSTEM_MESSAGE = "You are a helpful assistant that briefly summarizes the content of a webpage. Summarize the users input." completion = openai.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": SYSTEM_MESSAGE}, {"role": "user", "content": result_item.text}, ], ) summary = completion.choices[0].message.content print(f"Summary for {urls[0]}:") print(result_item.title) print(textwrap.fill(summary, 80)) ``` And we're done! We've built an app that translates a question into a search query, uses Exa to search for useful links and their contents, and summarizes the content to effortlessly answer questions about the latest news. **Through Exa, we have given our LLM access to the entire Internet.** The possibilities are endless. # CrewAI Docs Source: https://docs.exa.ai/integrations/crew-ai-docs Learn how to use Exa's search API with CrewAI. CrewAI have a dedicated Exa tool. This enables AI agents to perform web search. For detailed instructions on using Exa with CrewAI, visit the [CrewAI documentation](https://docs.crewai.com/tools/exasearchtool). # IBM WatsonX Source: https://docs.exa.ai/integrations/ibm-watsonx-docs Combine IBM WatsonX's AI with Exa's web search to build a smart assistant that can search the internet and answer questions.