Our investment in

Meet the company building evaluation and real-time optimisation for generative AI.


Alister Coleman

March 21, 2024

Folklore is thrilled to announce our investment into, an evaluation and real-time optimisation platform for generative AI, and Folklore’s 11th investment into AI startups.  

Led by PhD Co-Founders Indigo Orton, CEO, and Pierre Thodoroff, CTO,’s vision is to help businesses incorporate AI into their products and move from early demo to deployment, balancing their resources without compromising on quality. 

Why this matters

Whilst generative AI (genAI) is not new, the recent and rapid advancements in large language models (LLMs) and platform technology have captured the worlds’ attention, and for good reason. The truly impactful possibilities opened up by LLMs are yet to be understood, potentially even yet to be explored. On the ground floor, many people have already tried harnessing LLMs to create an image or written content with a text prompt, and even in 2024, this function is still an eye-opening novelty.

As this genAI wave moves from R&D and into commercialisation, every founding team and board is asking, “what’s our AI strategy?” Businesses are itching to get the latest tech in the hands of customers - whether that be genAI-native startups looking to sell their products to businesses, or incumbents looking to enhance their existing products with genAI features. 

This represents an exciting era of experimentation where it has never been easier for businesses to quickly impress their customers with a demo leveraging powerful foundation models. But as any technical lead trying to incorporate genAI into their product will know, moving past this demo to a real world deployment can be insurmountable with resource constraints.

This leads us to the two core problems is aiming to address.

Problem 1: from demo to deployment

The latest developments in foundation models have been a significant catalyst for businesses to get in front of a customer to demonstrate their MVP. However, barriers to deployment exist. Inference costs can become prohibitive at scale; latency may be intolerable for the intended use case/output; risk-averse customers may require models run on local infrastructure; uncaught edge cases can cause bugs in end products; and many more. Importantly, most of these challenges are technical and will typically be agnostic of the model. 

Understanding that genAI is primarily a function with two core parameters, the prompt and the model, the underlying problem will become one of optimisation. That is, what combination of prompts and model configurations allow businesses to address these costs, latency, data privacy concerns without compromising on the quality that impressed their customers at demo? Furthermore, which of these combinations actually work when deployed, and which will break the underlying genAI system?

To arrive at the ‘goldilocks solution’ requires searching through the space of every possible input, driven by a unique prompt and model configuration. This gives rise to an inherent challenge (and beauty) of genAI, there is an infinite number of configurations, each producing outputs of varying quality. Therefore, optimising the configuration requires an effective evaluation system to assess each output. Evaluation is to generative AI what testing is to software - it lets you know if the incremental changes to the prompt or the model will lead you to a better outcome, and, where your genAI system is going to fail. 

Problem 2: from deployment to development

As AI adoption continues to grow, we have no doubt that the models and platforms will continue to evolve, and will produce better outputs more efficiently - in fact, we’re counting on it. As the performance of foundation models increases, so too will users’ expectations, the novelty of creating ‘something’ will wear off and ‘good enough’ will no longer cut it. This gives rise to the second core issue addresses: how can companies incorporating genAI not only keep up with the latest advancements, but also get better over time across many model modalities?

The solution: enter is focused on enabling businesses to deploy their genAI products faster and more confidently, while balancing resource constraints and output quality at scale. 

Evaluation requires the unison of human and machine.’s platform allows businesses to set a ground truth of ‘what good looks like’ by sampling outputs based on the prompt / model, and allowing users to provide feedback. To take this to scale,’s auto-evaluation function extrapolates the ground truth to provide feedback on sampled outputs and assess if it aligns with the human evaluator. Evaluation also acts as the genAI equivalent of automated test coverage, it allows businesses to get a better understanding of which edge cases can break the system. 

Nailing evaluation enables optimisation. With one half of the space search problem solved, have developed search algorithms to efficiently identify the configuration that addresses the previous barriers such as cost and latency, whilst maintaining the requisite quality. In other words, with optimisation comes deployment. 

And with deployment, comes development. By utilising’s API, customers are able to test new models and tools as they’re released as well as personalise their product using live human feedback. oslo have built a set of algorithms that are able to deconstruct and attribute user feedback to discrete components of the output, in real time. As a result, without requiring re-deployment, this production environment and data can be used to continuously improve the prompt and underlying model, using reinforcement learning for configuring hyper-parameters and fine-tuning. 

The team

From the first time we met with the founding team, we loved Indigo and Pierre’s technical capability and comprehension of the far reaching weaknesses of existing AI, brought from their PhD’s in computer science and deep understanding of the intersection between human behaviour and machine learning.

First meeting at Cambridge University during their PhDs, Indigo Orton and Pierre Thodoroff joined forces to leverage the intersection of their respective research areas and industry expertise. Indigo, is a second time founder with a research and commercial focus on high performance concurrency. Pierre has over a decade of experience researching AI/ML, with a particular focus on reinforcement learning, at Cambridge and MILA, a leading deep learning institute in Montreal. 

The future

This generative AI wave is growing rapidly and we see potential for real value accrual to businesses that capitalise on the latest developments. We believe the problems is solving will only grow alongside these tailwinds. In particular, as the conversation moves from getting to an outcome quickly, to quickly getting to the right outcome.

With this new round of funding, the team will continue with R&D to improve the core optimisation functionality, with a commercial focus of getting in front of early adopters. 

Folklore is excited to join’s journey and we look forward to supporting Indigo, Pierre and the rest of the team as they scale their vision. 

Stay in the loop

Subscribe to our newsletter for updates delivered directly to your inbox.