Themes
Research
My research asks how autonomous systems can remain aligned, trustworthy, and auditable as they are deployed, adapted, and subjected to adversarial pressure. The themes below cut across alignment theory, multi-agent systems, security, and governance.
AI Alignment & Safety
Formalizing what it means for a deployed system to remain aligned with a declared value specification, and designing evaluations that expose failure before users do.
specificationevaluationrobustnessMoral Alignment in Autonomous Agents
Operationalizing moral constraints as measurable properties of agent behavior, and tracking how those properties drift under deployment and adaptation.
valuesdriftautonomyMulti-Agent Systems
Studying how multiple learning agents interact, cooperate, and compete — and how alignment guarantees compose (or fail to compose) across agents.
cooperationemergenceequilibriaGame Theory for Security
Modeling adversarial settings — IoT security, hardware Trojans, supply-chain attacks — using both classical and behavioral game theory to design robust defenses.
adversarialbehavioral GTdefenseAuditable AI & Blockchain for Governance
Designing governance layers that make AI behavior independently verifiable: cryptographic commitments, tamper-evident audit trails, and policy-update accountability.
auditgovernanceverifiabilityReward Manipulation & Grader Reliability
Auditing the evaluators that train and certify AI systems, and characterizing how policies can manipulate reward signals in realistic pipelines.
RLHFevaluationauditing
In progress
Current directions
- Stability metrics for alignment: moving from point-in-time evaluation to longitudinal guarantees.
- Composable alignment in multi-agent systems where individual guarantees do not automatically lift to the system.
- Auditing infrastructure that external parties — not only the operator — can trust.
- Empirical benchmarks for grader reliability under adversarial and distributional pressure.