Summer research fellowship for computer science students
The Cambridge AI Safety Hub would like to invite exceptional computer science students at UF to apply to the upcoming iteration of the Mentorship for Alignment Researchers (MARS), an AI safety fellowship that matches exceptional students and early-career researchers with experienced researchers and academics from AI labs, think tanks, and academia. In July we will be flying out promising students and working professionals to the United Kingdom to participate in a "sprint week" where they will begin a research project that they'll subsequently carry out remotely through September.
We'll have more than 20 projects spanning multiple disciplines, but a few
projects we think especially interesting to computer science students are:
• Research with Yossi Gandelsman (Reve) on whether LLMs can predict the
layer at which their own neurons appear, detect polysemantic neurons, identify
causal connections between two neurons in their own architecture, or anticipate
their own attention patterns.
• A project with Lindley Lentati (Cambridge Inference) on reproducible
white-box jailbreak monitoring, covering automated attack generation,
multi-layer probe aggregation, and streaming token-by-token detection.
• An investigation with Rhea Karty and Jacob Davis (ERA; LASR Labs) of whether
steering vectors for traits like confidence and honesty are context-independent
or persona-dependent, using LoRA adapters for character-trained models and
tracking trait geometry across training checkpoints.
• Work with James Lucassen (Redwood Research) on deferral protocols for AI
control — implementing defer-to-trusted in BashArena, developing usefulness
monitors, and building methodology to evaluate them.
• Work with Shivam Raval and Luiza Corpaci (Harvard; AMD) on detecting
unfaithful formal translations, using Lean-verified equational theories as
ground truth and mech-interp methods to locate where translation failures
occur.
Applications close on May 3rd. Students can find more information on our
program's webpage, caish [dot] org [slash] mars.
Go Gators!
Justin Dollman
Co-Director @ Cambridge AI Safety Hub
Comments
Post a Comment