Soraya Partow

Themes

Research

My research asks how autonomous systems can remain aligned, trustworthy, and auditable as they are deployed, adapted, and subjected to adversarial pressure. The themes below cut across alignment theory, multi-agent systems, security, and governance.

  • AI Alignment & Safety

    Formalizing what it means for a deployed system to remain aligned with a declared value specification, and designing evaluations that expose failure before users do.

    specificationevaluationrobustness
  • Moral Alignment in Autonomous Agents

    Operationalizing moral constraints as measurable properties of agent behavior, and tracking how those properties drift under deployment and adaptation.

    valuesdriftautonomy
  • Multi-Agent Systems

    Studying how multiple learning agents interact, cooperate, and compete — and how alignment guarantees compose (or fail to compose) across agents.

    cooperationemergenceequilibria
  • Game Theory for Security

    Modeling adversarial settings — IoT security, hardware Trojans, supply-chain attacks — using both classical and behavioral game theory to design robust defenses.

    adversarialbehavioral GTdefense
  • Auditable AI & Blockchain for Governance

    Designing governance layers that make AI behavior independently verifiable: cryptographic commitments, tamper-evident audit trails, and policy-update accountability.

    auditgovernanceverifiability
  • Reward Manipulation & Grader Reliability

    Auditing the evaluators that train and certify AI systems, and characterizing how policies can manipulate reward signals in realistic pipelines.

    RLHFevaluationauditing

In progress

Current directions

  • Stability metrics for alignment: moving from point-in-time evaluation to longitudinal guarantees.
  • Composable alignment in multi-agent systems where individual guarantees do not automatically lift to the system.
  • Auditing infrastructure that external parties — not only the operator — can trust.
  • Empirical benchmarks for grader reliability under adversarial and distributional pressure.