Evaluating Trust in emerging AI (LLMs)
This PhD has just commenced, and the project scope is actively being refined.
Large Language Models increasingly tackle complex decisions under uncertainty (e.g., planning, reasoning chains). This creates urgent need for rigorous trust frameworks – combining normative theory (what trust should mean for AI), evaluation methods (how to measure it), and human-centred benchmarks (tests aligning AI with human expectations) to enable safer real-world deployment.
Normative Model of Trust
- Define trust as justified reliance under uncertainty.
- Tailor to AI contexts like planning domains and LLM decision-making.
Evaluation Framework
- Operationalise the model into measurable dimensions: risk exposure, verifiability, context-dependence.
- Support multi-interaction scenarios, where trust emerges (or erodes) through interactions.
Human-Centred Benchmarks
- Propose standardised tests for LLMs and planners: e.g., trust calibration in uncertain planning scenarios (partial observability, non-determinism).
- Align AI outputs with human trust judgements for safer deployment.
Expected Impact: Taxonomies and tools for LLM benchmarking; bridge AI planning with cognitive science of trust.