When bankers don't trust the AI, the AI may as well not exist.
I led the trust layer and digital core UX for the SBS Banking AI Assistant — a three-phase agentic platform spanning generative answers, agentic actions, and predictive simulations. We rejected confidence scores and probability bands. We shipped chain of thought, source-grounded answers, and a human-in-the-loop checkpoint model. A client said: "We'll wait months to onboard the core platform — but can you give us this AI agent now?"
Platform Evolution
A client at the Annual Summit told us: "Our bank will take months to onboard the SBS core platform — until then, can you give us this AI agent? It will drastically improve our operations." The assistant became a faster commercial pathway into the platform than the platform itself.
The Impact
I led the trust layer and digital core UX for the SBS Banking AI Assistant — an enterprise agentic platform that lets banking users query data in natural language, propose actions that wait for approval, and run predictive simulations before committing to commercial decisions. The work was not "add a chatbot." It was designing the interaction model that lets bankers trust AI in an environment where one wrong answer can leak revenue, trigger an audit, or expose customer data.

The assistant became the most demanded asset in the SBS portfolio. At the 2025 Annual Summit a client told us: "Our bank will take months to onboard the SBS core platform. Until then, can you give us this AI agent? It will drastically improve our operations." The assistant locked in EUR 20M ARR — and opened a faster commercial pathway into the platform than the platform itself.
The Stakes
AI in consumer products fails by being annoying. AI in banking fails by being expensive. We were designing for users who handle C1 and C2-level customer data, execute money-movement workflows, and answer to regulators. The cost of a hallucination is not a bad chat experience — it is regulatory exposure, audit findings, reputational damage, and lost revenue.
In banking AI, trust is the design problem. Every other feature is downstream of it. A faster query is worthless if the banker can't trust the result enough to act on it.
The strategic context for this work was set by SBS's Chief Science Officer, who published the company's position on agentic AI: it must be safe, explainable, and human-in-the-loop. The architecture team built the deterministic Fortress beneath. My job was to design the experience that made both worlds cohere from the user's point of view.
The Three Phases
The platform was architected as three phases of AI maturity — each one moving the autonomy-trust slider further right, each requiring its own trust contract with the user. We shipped them in order because trust compounds: a user who trusts the answers will eventually trust the actions. A user who trusts the actions might eventually trust the predictions. Autonomy isn't granted — it's earned, phase by phase.
Phase 01 — Generative · live
Natural language queries. "Show the performance of Savings Plus since January 2024, highlight any unusual trends." The AI retrieves real data, returns a chart with anomaly flagged, lets the user switch to a table, change chart type, export to PDF, or share with peers. The chain of thought is visible. The SQL is auditable. The response is grounded in real records or it doesn't exist.

Phase 02 — Agentic · rolling out
The user proposes an action: "Create a credit transfer of EUR 10 from C006 to C115." The AI drafts the operation, surfaces an approval card, shows its reasoning and source references, and waits. Nothing moves until a human approves. Money never moves autonomously. Override and reject are equal-weight options in the UI to approve — refusing the AI is as fast and friction-free as accepting it.

Phase 03 — Predictive simulation · in design
A pricing manager asks: "If I raise the Visa Classic tariff to €20, what happens to revenue?" The AI doesn't do the naive multiplication. It models customer retention curves, predicts adoption shift, and surfaces an outcome range grounded in historical signals — citing the comparable scenarios it learned from. The user sees a forecast, the assumptions behind it, and the data points it leaned on. They decide.
Where I Came In
This was a multi-team initiative. The UX Director owned the org-wide vision across every SBS product. The architects owned the orchestration logic, memory retrieval, and system steering. The AI team owned the model behavior. My territory was the digital core — the data-and-query surface of the assistant — and within that, the experience and trust layer.
What I owned
The interaction model · confidence states · chain-of-thought UX · disambiguation flows · response format flexibility · agent creation UX · the human-in-the-loop checkpoint model · trust design across the three phases · personas, research, and use case prioritisation for Digital Core.
What I didn't own — and won't claim — orchestration logic, fallback systems, agent workflows, memory architecture, model selection, platform-level governance. Those belonged to the architects and the AI team. I shaped how the system *felt* to the human using it, not how the model thought beneath. That boundary mattered. Owning trust design requires being honest about where it ends.
The Trust Problem
When you talk to a banking professional about AI, they don't ask "is it accurate?" They ask "how do I know?" The question isn't about model performance — it's about evidence. A back-office user processing 200 transactions a day cannot stop and verify each one if the system can't show its work. And they will not approve money movements based on a system whose reasoning is opaque.
In testing, the moment that reframed the project came from a back-office user reviewing a generative answer. They said: "It looks right. But how do I know it's right?" That question — *how do I know* — became the design brief for everything that followed.
The naive solution would have been to show confidence scores. "87% confident." "High certainty." Probability bands. We tested it. It failed badly. Users either ignored the percentages (numerical anchoring with no operational meaning) or treated them as guarantees (which they fundamentally are not). Worse, compliance pushed back hard. In a regulated environment, a probability is a liability — it suggests the bank stands behind a confidence level it cannot defend in an audit.
Platform Evolution
Confidence scores were the obvious answer. They were also the wrong one. Trust doesn't come from a percentage — it comes from being able to verify.
The Work
The answer to "how do I know" was not a number. It was making the AI's reasoning observable, the sources verifiable, and the actions reversible. I designed five trust surfaces that appear on every response. Together they form the contract between the AI and the user.
01 · Chain of thought, always visible

A collapsible block above every response showing the step-by-step reasoning. Which files the AI opened. Which rules it applied. Which records it filtered. The user can audit the logic before trusting the conclusion.
Adjacent to the answer — not buried in a settings panel. Trust has to be a one-click away, not a feature toggle.
02 · Source-grounded answers
Every claim cites a specific record, contract, or rule. "€25.45 was charged because of a 2.5% debit interest component and an unauthorised overdraft from Aug 10–20." Both reasons link to the source. No grounded reference → the AI refuses to answer rather than guessing.
This was the answer to "how do I know." Not certainty as a number. Citations as a habit.
03 · Query transparency
A "View entire SQL" link on every data retrieval. The actual query the AI generated is auditable. Compliance officers, power users, and audit teams can verify how data was retrieved — not just what came back.
Built for the regulators reading over the banker's shoulder. The interface treats audit as a first-class user need.
04 · Format flexibility
One-click toggle between chart type, table view, plain-language explanation, export to PDF, and share. The same answer, in the format the user needs to act on it. Reduces friction without changing the data underneath.
Speed for power users; clarity for new users. Same trust contract for both.
05 · Feedback + agentification

Thumbs up/down on every response feeds back into model training. "Create this every quarter" / "Create this every time I log in" — one click turns a trusted query into a recurring agent. Users build their own AI infrastructure out of queries they trust.
Trust earned once = trust automated forever.
Beneath these five surfaces sits a single principle: AI is allowed to refuse to answer. When the references are insufficient, when entities are ambiguous, when the request falls outside the role's permissions — the assistant says so. It does not guess. It does not pad. The refusal itself is a trust signal.
Mindful friction — what stays human by design
Every approval card, every disambiguation prompt, every refusal-to-answer is an intentional pause. Not a UX failure — a design principle. The pipeline handles NL understanding, data retrieval, formatting, chain of thought generation, anomaly surfacing, and refusing to answer when uncertain. Humans lead on approving money-movement, overriding AI suggestions, reviewing exceptions, validating reasoning, creating agents, and rejecting outputs entirely. The AI gets faster. Humans get final say.
The Hard Call — No Probabilistic Outputs
The single most consequential design decision on this project was a rule that runs through every screen: nothing probabilistic to back-office users. No confidence percentages. No "I think." No hedged language. No probability bands. The AI gives a grounded answer with citations, or it doesn't answer.
The rule came from compliance, with strong support from the Chief Science Officer and the AI team. I pushed initial designs that showed confidence states — and they were right to push back. In regulated banking, probability is liability.
I initially designed confidence indicators. They felt useful. They tested badly. Compliance flagged the legal exposure: showing "87% confident" implies the bank stands behind that number in an audit. Users either over-trusted high scores or developed superstitions about which percentages were "safe." The signal was noisy and the liability was real.
Platform Evolution
We replaced confidence scores with chain of thought plus source references. Trust through transparency, not trust through self-reported certainty. The AI shows its work. The user verifies. If the work doesn't hold up, the user rejects. If it does, the user acts. No middle ground of "maybe."
This was hard to give up. Confidence states are everywhere in consumer AI products — they feel modern, transparent, helpful. In enterprise banking they're wrong. The lesson: AI patterns that work in consumer contexts don't automatically transfer to regulated ones. Sometimes the most senior design decision is letting go of a pattern you fought for.
Roles & Permissions
Trust isn't universal — it's scoped to what each user is allowed to do. The assistant inherits the bank's existing role-based access. AI never gains permissions a human doesn't already have. I designed for three Digital Core personas, with the AI Admin / Compliance Manager as the elevated platform operator above them.
Platform Evolution
A back-office user with no transfer authority cannot use the AI to move money. A customer service agent who cannot modify pricing cannot ask the AI to modify pricing. The system enforces the existing org chart. AI is a multiplier, not a privilege escalator.

The Agent Lifecycle
One of the most distinctive moves in the platform was letting users create their own AI infrastructure. When a banker runs a query they like, they can click "Create this every quarter" and that query becomes a versioned, tested, deployed agent — running on schedule, governed by the same rules as every other agent.
Spark — A query worth keeping
Users encounter the agent creation surface inline, exactly when they need it — at the moment they trust an answer enough to want it again.
No separate "create agent" mode. The agent is born from the conversation that produced it.
Configure — Shape the agent

Name, prompt, chart type, dashboard placement, role-based audience, schedule (daily, weekly, monthly, quarterly), and permissions inherited from the user's role. The configuration UI reuses the SBS design system — no new mental model.
Test — Validate before launch

Every agent passes through automated testing: question generation, RAGAS quality metrics, safety risk scoring, hallucination check. The same compliance bar applied to every agent in the platform.
Deploy — Goes live

Compliance review, approval, version tag. The agent appears in the user's dashboard, runs on schedule, and gets sentiment-tracked from day one.
Govern — Forever, not just at launch
Usage telemetry, thumbs up/down sentiment, version control with V1→V2→V3 lifecycle, and graceful archive/retire when an agent's usefulness ends.
Every agent has a state, a history, and a future. The lifecycle is the UX.
How I Led It
This was a designer-led contribution inside a larger multi-team initiative. The UX Director set the cross-product vision; the architects owned the backend; the AI team owned the model. My job was to lead the Digital Core experience and design the trust contract — and to do it in a way that influenced patterns adopted across the rest of the assistant.
Persona research → use case prioritisation
Identified the 9 highest-impact use cases for Digital Core by working directly with bankers across product management, back office operations, and customer service. Mapped each to the AI maturity phase it required.
Not every use case needed the full agentic stack. Some only needed generative. Knowing the difference was design work.
Pushed back when the obvious answer was wrong
Initially designed confidence scores. Compliance pushed back. I revised the entire trust model around chain of thought + source references instead — and the new pattern became the platform-wide standard.
Senior UX work includes letting go of your own designs when better thinking arrives.
Designed for emotional moments, not just rational ones
On one screen, I deliberately introduced a small emotional cue — language that acknowledged the banker's position when an answer was uncertain. This wasn't in the spec. It was a UX call about how the system should feel when a user is exposed.
AI design isn't just logic flows. It's how the system holds a hand when the stakes are high.
Aligned with architects on the trust boundary
Worked closely with the architecture team to ensure the experience-side trust patterns (chain of thought, source citations, refusal-to-answer) had backend hooks that supported them — and that the AI team's steering rules enforced "no probabilistic outputs" at the model layer, not just the UI layer.
A trust pattern that exists only in the UI is decoration. A trust pattern that's enforced from the model upward is governance.
Translated executive vision into shipped patterns
The Chief Science Officer's public framing — "safe, explainable, human-in-the-loop" — was the strategic context. My design work was the operational answer: chain of thought operationalises explainability; checkpoint UX operationalises human-in-the-loop; refusal-to-answer operationalises safety.
Senior design is the bridge between executive vision and user reality.
Stakeholder demo that secured EUR 20M ARR
Co-presented the assistant at the SBS 2025 Annual Summit. The client demand that emerged — "give us the AI assistant before the core platform" — validated the assistant as a standalone commercial offering.
Design work that opens new commercial pathways is the highest form of impact a UX leader can deliver.
Enterprise AI UX is collaborative by necessity — no single designer owns every surface. The senior move is knowing what you own, doing it at the highest possible bar, and being explicit about what you don't.
Reflection
Platform Evolution
Most AI products optimise for capability. Banking AI has to optimise for verifiability. Industry research frames this as calibrated trust — trust that matches the actual trustworthiness of the system, neither over- nor under-extended. The interface I designed never tries to be impressive — it tries to be inspectable. Every clever answer is one chain-of-thought toggle away from being checked. Every action is one approval card away from being undone. The system gets out of the user's way precisely because it stays in their hands.
Most AI products optimise for capability. Banking AI has to optimise for verifiability. Industry research frames this as calibrated trust — trust that matches the actual trustworthiness of the system, neither over- nor under-extended. The interface I designed never tries to be impressive — it tries to be inspectable. Every clever answer is one chain-of-thought toggle away from being checked. Every action is one approval card away from being undone. The system gets out of the user's way precisely because it stays in their hands.
What I took away
Trust is the UX
In regulated AI, the trust layer isn't a feature. It's the product. Everything else — chart types, export buttons, agent creation — is downstream of whether the user trusts the answer enough to act on it.
Calibrated trust beats blind trust
The goal isn't maximum trust — it's trust that matches what the system can actually deliver. Over-trust leads to unverified actions. Under-trust leads to abandonment. The trust layer's real job is keeping the user calibrated: trusting what's grounded, questioning what's ambiguous, rejecting what's wrong.
Confidence is not the same as trust
Confidence scores tell a user how the AI feels. Source references tell a user what the AI used. The second is auditable. The first is liability. Designing for regulated environments means knowing the difference.
Refusal is a feature
The AI being allowed to say "I don't have enough to answer" is a stronger trust signal than any confidence percentage. Refusal demonstrates restraint. Restraint earns trust.
Scope honestly
On a multi-team initiative, the senior move is being clear about what you own and what you don't. Overclaiming weakens the work. Honest scoping makes the contribution legible — and credible.
Design for the question users won't ask
Bankers won't ask "is this AI accurate?" out loud. They'll ask "how do I know?" silently, every single time they use it. The design must answer that question continuously, without being asked.
A client wanted the assistant before the core platform. That outcome wasn't about features. It was about the experience earning enough trust that bankers wanted it in their hands now — even before the foundation underneath was in place. That is what enterprise AI design at its best can deliver: not faster answers, but answers people can actually use.