Securing AI from exfiltration
Protecting model weights and algorithms from theft
This is part of a series on Key problems in AI infrastructure Security, as part of the launch of Navigators, our leadership incubator.
Interested in working on this problem?
What we want to avoid:
A terrorist actor gains access to a powerful model without guardrails or safety training, and uses it to create a bioweapon
A well-resourced state or corporate actor exfiltrates a model capable of recursive self-improvement, gives it access to compute without appropriate safety or control measures, and uses it to create a superintelligence that it loses control over
A malicious state actor gains access to a frontier model and deploys it in the military, for surveillance, and for economic purposes to establish stable totalitarianism
A rogue AI system self-exfiltrates and proliferates, leading to catastrophic loss of control
What’s happening right now:
RAND’s report Securing AI Model Weights lays out what’s necessary to prevent the theft of model weights by a malicious human actor, with graded security levels corresponding to resilience against low-resource, low-skill attackers (SL1) to expert nation state attackers (SL5).
SL5 Task Force has been doing R&D and prototyping on turning SL5 into an adoptable standard for AI companies, and Intelligence Security Lab is working on developing highly secure clusters. AI lab security teams are pursuing increased security but trying to avoid slowing down development. Most frontier Chinese models that we know about are completely open weight and unprotected.
Relevant backgrounds:
Experience building SL5 environments
Experience developing security standards
Experience conducting security audits for high-risk environments
Experience with securing AI infrastructure (datacenters and AI hardware)
Possible directions:
Mapping out threat actors, kill chains, and their likelihoods at various levels of security and model capability
Adapting national security paradigms to a frontier lab’s development requirements
Developing or commissioning R&D, for example on projects similar to those scoped in SL5 Novel Recommendations - Security Framework
Thinking through the adoption plan for higher security standards
Concrete project ideas:
Map out kill-chains for self-exfiltration under SL3-4 controls
Design new network security protocols for inference servers to prevent exfiltration
Interested in working on this problem? Apply to join Navigators, our leadership incubator!

