Research

Themes

My research asks how autonomous systems can remain aligned, trustworthy, and auditable as they are deployed, adapted, and subjected to adversarial pressure. The themes below cut across alignment theory, multi-agent systems, security, and governance.

AI Alignment & Safety
Formalizing what it means for a deployed system to remain aligned with a declared value specification, and designing evaluations that expose failure before users do.
specificationevaluationrobustness
Moral Alignment in Autonomous Agents
Operationalizing moral constraints as measurable properties of agent behavior, and tracking how those properties drift under deployment and adaptation.
valuesdriftautonomy
Multi-Agent Systems
Studying how multiple learning agents interact, cooperate, and compete — and how alignment guarantees compose (or fail to compose) across agents.
cooperationemergenceequilibria
Game Theory for Security
Modeling adversarial settings — IoT security, hardware Trojans, supply-chain attacks — using both classical and behavioral game theory to design robust defenses.
adversarialbehavioral GTdefense
Auditable AI & Blockchain for Governance
Designing governance layers that make AI behavior independently verifiable: cryptographic commitments, tamper-evident audit trails, and policy-update accountability.
auditgovernanceverifiability
Reward Manipulation & Grader Reliability
Auditing the evaluators that train and certify AI systems, and characterizing how policies can manipulate reward signals in realistic pipelines.
RLHFevaluationauditing

In progress

Current directions

Stability metrics for alignment: moving from point-in-time evaluation to longitudinal guarantees.
Composable alignment in multi-agent systems where individual guarantees do not automatically lift to the system.
Auditing infrastructure that external parties — not only the operator — can trust.
Empirical benchmarks for grader reliability under adversarial and distributional pressure.

Research

AI Alignment & Safety

Moral Alignment in Autonomous Agents

Multi-Agent Systems

Game Theory for Security

Auditable AI & Blockchain for Governance

Reward Manipulation & Grader Reliability

Current directions