Not lost in translation: mapping LLM outputs to insurance APIs
Posted:
Conversational AI speaks in natural, flowing sentences. Insurance systems want tidy codes and strict formats. Getting the two to work together is harder than it looks, and classifiers are a big part of the solution we're developing at Prompted.
The mismatch problem
Insurance APIs tend to be unforgiving. A policy pricing endpoint might expect a standard two-letter country code, a fixed list of options, or a field in a very specific format. For years, insurers and comparison sites sidestepped this problem entirely by putting the burden onto the user. Rather than accepting free text, they presented a long dropdown list of countries and asked customers to scroll through and find their own. It worked, technically, but it was a clunky experience that most users simply tolerated.
Conversational AI changes that expectation. When a customer can type "I'm travelling through France and a bit of Belgium" and have a conversation about their cover, a dropdown feels like a step backwards. The challenge is that the conversational AI's natural, flexible output still needs to be converted into whatever the insurer's system is expecting, and that conversion is not always straightforward.
The opportunities for failure can be surprisingly subtle. A field that expects FR might simply reject FRA, throw a vague error, or silently map to the wrong record entirely. At scale, these small mismatches will affect pricing, cause compliance issues, frustrate users and, worst of all, could leave customers uninsured.
Where classifiers help
A classifier sits between conversational AI and the insurer API call. Its job is to take a vague or messy input and return a reliable, standardised value. Rather than hoping the model always formats things correctly (which it will not), a classifier treats this conversion as a separate step that can be tested and improved on its own. This makes the whole process much easier to audit: we can log every decision the classifier makes, measure how accurate it is, and update it without changing anything else in our pipeline.
The challenge of building a classifier
Building a classifier that actually works well is not straightforward. The range of things users might type is enormous: abbreviations, misspellings, city names, words from other languages, and informal nicknames all need to map to the right value. A simple rules-based approach, such as a lookup table or a pattern-matching script, breaks down quickly because it cannot handle inputs it has never seen before. A more advanced machine learning approach needs good training data, which can be time-consuming to collect, and can still return a confident-looking but incorrect answer.
One of the trickiest parts is getting the classifier to recognise when it is not sure. A wrong answer delivered with high confidence is far more dangerous than an honest "I don't know." The best systems return a range of possible answers with a score for each one, so that when confidence is low, the application can ask the user to clarify rather than guessing.
Solving the country problem
Country identification is a real problem and good example of how a classifier can make a clunky and risky process feel seamless.
Consider the kinds of inputs a real user travelling to France might provide:
- "France"
- "Fra"
- "Paris"
- "Franse"
- "Frankreich" (a German using an English service)
All of these need to map to the ISO country code for France, which is FR, or to be very clear that they don't understand the user's input!
At Prompted, we take a practical approach. Our location matching classifier is built on a curated dataset of global place names, covering countries, islands, and larger towns - in short, anywhere a traveller is likely to name. Inputs are first checked for an exact match against this dataset, with any ambiguity resolved through a pre-defined ranking system. Where no exact match is found, a scoring algorithm identifies the closest candidate.