Programming with Proofs for High-assurance Software

Microsoft Research · Intermediate ·☁️ DevOps & Cloud ·5y ago

Key Takeaways

The video discusses programming with proofs for high-assurance software, highlighting recent accomplishments in the F* proof assistant and Project Everest, including the development of verification libraries and stacks of communication components with full formal proofs.

Full Transcript

hi there um i'm really happy to be um to have the opportunity to send this video to this uh to your purple seminar um my name is nick swami i'm a researcher at the rise group at microsoft research in redmond and i'm going to tell you a bit about some work we've been doing in the past few years on programming with proofs for high assurance software so what do i mean by high assurance software there's software in a number of domains for which the correctness and security requirements are at a fairly high bar where system functionality really depends critically on getting this code right for instance in cloud infrastructure things like the virtualization technology used to secure the platform for instance at microsoft it's microsoft hyper-v securing the azure cloud or in secure communications things like crypto protocols like tls or wire guard a kernel vpn and linux and in other domains like financial technology or even e-voting software plays a critical role in ensuring system correctness and security and getting this code right is really important but of course as we know there's bugs everywhere in in in software written today and um this manifests itself in a variety of things from say in the financial domain from stock market flash crashes to um confidentiality leaks in in cloud providers so um what we've been looking at is you know a long-standing agenda can formal proofs about software come to the rescue here and allow us to build this kind of software with really high assurance now i'd be thinking well computer scientists have been talking about doing formal proofs about programs for many decades and is this can this really work at scale and i think we've made a lot of progress in the last decade or so decade and a half where as a community we've begun to be able to apply our program proof techniques to genuinely large systems and at my at msr in the last few years we've had this project called project everest where we've been looking at building um a uh verification libraries and a stack of communication components with full form proofs and these days we um under in our ci system we routinely build multiple times a day a system that is about 600 000 lines of code and growing with full formal proofs of correctness and security and that's at that scale uh things only work if they're modular and uh we do proof set um the modular proofs at at a large scale and keep chugging along so uh we're beginning to approach a scale where um i think it's it's not uh infeasible to think that in a year or two this will be a couple of million lines of code so using our uh our our tools um uh the tool uh that i've been with collaborators been building for several years now is a is a programming language and program verification tool called f star and f-star is the is the um is the verification engine behind our uh our verified code base and our tools have been uh our code has been deployed in a number of places so um in a sense um proofs and programs developed in f-star are being used by billions of people today even if they don't know it so for instance in in hyper-v we use code produced by f-star for securing communication between host and guest in the secure communication space we are we have proofs about the transport layer of quick and tls and uh our verified crypto runs in a number of places including firefox embed tls signal the linux kernel with wire guard and um um and in number of other scenarios we have verified merkle trees for enterprise blockchains in the financial technology space several third parties use our tools for proving their code correct and in e-voting um um microsoft has an uh an open offering called election guard and sdk for building auditable um uh elections and um uh we provide the crypto for them uh for that sdk and even in other domains that things like verifiable multi-party computations they're proven correct um things like proven correct high performance verifiable key value stores developed using our our tool chain so this stuff is real and uh program verification and program proofs that um are at a point where this can be applied to real systems and genuinely move the needle on improving the security and assurance of the entire system this is based on decade a decade plus work on on this our tool chain building on the research that's come out of the pl community um over several decades for instance with um star itself is based on probably a 10 or so popple papers we've repeatedly improved the system trying to address the needs of our applications and simultaneously as we improve the the foundations and the tooling we've been developing applications to to to drive the tooling in a way it's been quite synergistic um and uh f-star broadly is a is a programming language that looks a bit like o'camel or f sharp but it comes with a type system that lets you do that that looks more like or agda in that it comes with full dependent types and it's it's backed by smt automation to help you to proofs in many cases by smt rather than by by tactics although you can use tactics too so you write your program in f-star and you can compile it by default to ocaml or f-sharp but we have one thing we've been using f-star for as a lot is as a framework to embed dsls domain-specific languages so we have dsl embeddings in f-star for for several dsls one of them is called low star which is a dsl for c like programming embedded in evstar and if you write programs in low star you can do proofs about against a low level memory model dealing with things like manual memory management and low level representations of memory stack allocation these kinds of things and extract your code to c or to web assembly we have a dsl called veil that is intended for assembly level programming in f star this has been really crucial to get high performance crypto both low star and veil high performance crypto out of our tool chain uh we have a dsl dedicated for parsing and serialization called evoparse which we used extensively and that too produces a c code and recently we've been kind of the frontier of our of our tool chain these days is exploring concurrent separation logic embedded in f-star for concurrent and distributed programming with proofs and we are considering uh additional back-ends potentially for instance extracting our code to rust is something we've been thinking about recently this is based on an um a large team of of people and um collaborations across many institutions um at um it's too much for me to just to speak to everybody here but uh let me just highlight our my colleagues at msr redmond chris jonathan and tahina and asim at msr india who's been doing a lot of work on the tool chain as well but this is collaborations with msr cambridge indriya cmu edinburgh rosario in argentina and several visitors and and interns and visiting researchers and so on um um that's a picture of us a year or so ago at uh maybe two years ago at cambridge um we welcome collaborations and and contributions of this kind of thing interests you please reach out so i thought maybe just in a minute or two i'd give you a very whirlwind kind of taste of the some of the techniques that we use so broadly we've been building um many kinds of secure communication components and broadly the structure of such an a component is that there is an application it's trying to communicate some structured message say a key value pair across the network to appear so what it does is it uses a message formatter turns this structured message into some binary formatted message which is then signed and encrypted and then there's a second a wire format that happens dictated by some protocol a protocol format and then a wire format encrypted message encrypted sign message is sent across the network on an untrusted network and the other side reconstitute reconstitutes the the high-level message by by parsing and decrypting and verifying a signature and so on so having built a number of applications like this we have various components notably in order to orchestrate all of this we need a library of you need a way to write state machines and deal with concurrency and you need a way to do message formatting so we have ever parse which is this parser generator to deal with the the formatting and parsing we have a library of verified cryptographic primitives that's our main crypto provider called evercrypt and we've been developing tools to deal in this new dsl called steel to deal with concurrency and distribution and state machines um ever passed broadly our goal there is is um is to build um to bring parser generators to low level uh programming just as we use password generators for uh in in compiler development let's say uh currently people writing low level protocols because they're either performance there are performance constraints or because of various other deployment constraints like they must run in the kernel or something like this they tend to write their parsing code by hand and this is a can be a recipe for disaster um where if you are trying to parse adversarial input manually you can easily get this wrong and at that low level any flaw can result in a system takeover so our goal there with everpass is to kind of abolish writing low level binary parsers by hand and to instead generate high performance verified code from a high level specification of message formats in a way that integrates seamlessly with existing code bases and our toolchain produces code that's um memory save arithmetically safe functionally correct and also free from double fetches so broadly what you do is you you write a a high level specification and from the specification we have a tool that will produce f star code and proofs where you can the the proofs establish the for instance that a passer and the serializer are mutually inverse and and then you can extract the code to uh to memory save low level c code that has the same functional correctness property the result of doing this is that you get parcels that are that are low level and correct and secure uh and with performance that can be remarkably fast even when compared to handwritten parsers so for instance here in comparison to handwritten um parsers for c plus for um bitcoin transactions written in c plus plus um everpass generated parsers can be 13 times faster than handwritten c plus plus code for evercrypt evercrypt is a is a broad-ranging cryptographic provider that provides a a large suite of cryptographic algorithms in a customized to various platforms implemented in a variety of implemented in c and in assembly and with full proofs of correctness and security um evercrypt is used in a number of places ranging from linux to firefox to various azure components uh it comes out to about the library itself at the end is about 43 000 lines of c code and 15 000 lines of assembly code so it's quite a substantial effort with full form approves and uh we've really been kind of uh aiming one criterion for adding an algorithm to evercrypt is to ensure that it is both correct and um as fast if not faster than the best unverified implementations out there so recently for instance uh in in evercrypt we have a implementation of aes gcm which is the the the the constru the authenticated encryption construction that secures 90 of tls connections um and we recently produced verified code for this that matches and that slightly exceeds the performance of unverified openssl code for the same thing um uh finally just a word about steel um this is kind of where the the cutting edge of where um on the language design and aside f-star is evolving we have we've been working on embedding a concurrent separation logic in f-star and we have a couple of recent papers describing this which you can learn more about from that link we see this as a way to scale proofs beyond what is done by what can be done with smt alone so using steel we do proofs use in concurrent separation logic using a mixture of tactics for doing separation logic proofs and smt for automating arithmetic and other parts of the proof so just some takeaways we verify and deploy reusable critical software components at scale and our goal is to fully verify or harden critical subsystems of existing in existing software while achieving high performance and usability um in the future we aim to be applying many of the techniques that we've developed over the years to aim to reduce the bar on building program proofs further and we see that kind of evolving in three directions one is to do more proof and code generation from domain-specific languages evoparce has been a great example of that to produce verified parsers at um there are push button with push button proofs we expect to do more of that a lot of the code that we another big hammer that we use these days is meta programming so that we write code once in a very abstract generic way and then meta program to specialize and partially evaluate and produce many a large amount of verified code from a single very generic proof object and um another thing is as exemplified by things like steel we're aiming to sort of raise the level of abstraction and allow you to program against a a stack of verified abstractions things like channels and state machines producing low-level code through the tool chain rather than being forcing the programmer to work at the lowest level of abstraction so do reach out if you're curious to hear more about this and thanks for the opportunity to share some of our work at your seminar thank you

Original Description

Programming critical systems with proofs, a long-standing goal of computer science, is beginning to come within reach of modern programming languages and proof assistants. I provide a brief overview of recent accomplishments in this space, related to work in the F* proof assistant and Project Everest, one of its flagship applications. Programs developed in F* with proofs of correctness are now deployed in wide variety of settings, ranging from Microsoft Windows and Hyper-V, Microsoft Azure, the Linux kernel, Firefox, mbedTLS, and several others production systems. F* functional programming language: https://fstar-lang.org Project Everest: https://www.microsoft.com/en-us/research/project/project-everest-verified-secure-implementations-https-ecosystem/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Microsoft Research · Microsoft Research · 23 of 60

1 Frontiers in ML: Learning from Limited Labeled Data: Challenges and Opportunities for NLP
Frontiers in ML: Learning from Limited Labeled Data: Challenges and Opportunities for NLP
Microsoft Research
2 Frontiers in Machine Learning: Climate Impact of Machine Learning
Frontiers in Machine Learning: Climate Impact of Machine Learning
Microsoft Research
3 Frontiers in Machine Learning: Security and Machine Learning
Frontiers in Machine Learning: Security and Machine Learning
Microsoft Research
4 Hope Speech and Help Speech: Surfacing Positivity Amidst Hate
Hope Speech and Help Speech: Surfacing Positivity Amidst Hate
Microsoft Research
5 Early Indicators of the Effect of the Global Shift to Remote Work on People with Disabilities
Early Indicators of the Effect of the Global Shift to Remote Work on People with Disabilities
Microsoft Research
6 Remote Work and Well-Being
Remote Work and Well-Being
Microsoft Research
7 Challenges and Gratitude of Software Developers During COVID-19 Working From Home
Challenges and Gratitude of Software Developers During COVID-19 Working From Home
Microsoft Research
8 Towards a Practical Virtual Office for Mobile Knowledge Workers
Towards a Practical Virtual Office for Mobile Knowledge Workers
Microsoft Research
9 Impact of COVID-19 crisis on the future of work in India
Impact of COVID-19 crisis on the future of work in India
Microsoft Research
10 Empowering and Supporting Remote Software Development Team Members through a Culture of Allyship
Empowering and Supporting Remote Software Development Team Members through a Culture of Allyship
Microsoft Research
11 How Work From Home Affects Collaboration: Information Workers in a Natural Experiment During COVID19
How Work From Home Affects Collaboration: Information Workers in a Natural Experiment During COVID19
Microsoft Research
12 Phong Surface: Efficient 3D Model Fitting using Lifted Optimization
Phong Surface: Efficient 3D Model Fitting using Lifted Optimization
Microsoft Research
13 Managing Tasks Across the Work-Life Boundary: Opportunities, Challenges, and Directions
Managing Tasks Across the Work-Life Boundary: Opportunities, Challenges, and Directions
Microsoft Research
14 Microsoft Urban Futures Summer Workshop | Data Driven Urban Transformation [Day 1]
Microsoft Urban Futures Summer Workshop | Data Driven Urban Transformation [Day 1]
Microsoft Research
15 Microsoft Urban Futures Summer Workshop | Sensors and Data [Day 2]
Microsoft Urban Futures Summer Workshop | Sensors and Data [Day 2]
Microsoft Research
16 Microsoft Urban Futures Summer Workshop | Policy and Social Impact [Day 3]
Microsoft Urban Futures Summer Workshop | Policy and Social Impact [Day 3]
Microsoft Research
17 Directions in ML: Algorithmic foundations of neural architecture search
Directions in ML: Algorithmic foundations of neural architecture search
Microsoft Research
18 MineRL Competition 2020
MineRL Competition 2020
Microsoft Research
19 Can we make better software by using ML and AI techniques? With Chandra Maddila and Chetan Bansal
Can we make better software by using ML and AI techniques? With Chandra Maddila and Chetan Bansal
Microsoft Research
20 From Paper to Product
From Paper to Product
Microsoft Research
21 SkinnerDB: Regret Bounded Query Evaluation using RL
SkinnerDB: Regret Bounded Query Evaluation using RL
Microsoft Research
22 From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks
From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks
Microsoft Research
Programming with Proofs for High-assurance Software
Programming with Proofs for High-assurance Software
Microsoft Research
24 Platform for Situated Intelligence Overview
Platform for Situated Intelligence Overview
Microsoft Research
25 Directional Sources & Listeners in Interactive Sound Propagation using Reciprocal Wave Field Coding
Directional Sources & Listeners in Interactive Sound Propagation using Reciprocal Wave Field Coding
Microsoft Research
26 Galactic Bell Star Music Demo
Galactic Bell Star Music Demo
Microsoft Research
27 Importing Animations in Microsoft Expressive Pixels (9 of 9)
Importing Animations in Microsoft Expressive Pixels (9 of 9)
Microsoft Research
28 Welcome to Microsoft Expressive Pixels (1 of 9)
Welcome to Microsoft Expressive Pixels (1 of 9)
Microsoft Research
29 Getting Started with Microsoft Expressive Pixels (2 of 9)
Getting Started with Microsoft Expressive Pixels (2 of 9)
Microsoft Research
30 Creating an Image in Microsoft Expressive Pixels (3 of 9)
Creating an Image in Microsoft Expressive Pixels (3 of 9)
Microsoft Research
31 Creating Animations in Microsoft Expressive Pixels (4 of 9)
Creating Animations in Microsoft Expressive Pixels (4 of 9)
Microsoft Research
32 Managing Animation Galleries in Microsoft Expressive Pixels (5 of 9)
Managing Animation Galleries in Microsoft Expressive Pixels (5 of 9)
Microsoft Research
33 Creating Fragments in Microsoft Expressive Pixels (6 of 9)
Creating Fragments in Microsoft Expressive Pixels (6 of 9)
Microsoft Research
34 Using Layers in Microsoft Expressive Pixels (7 of 9)
Using Layers in Microsoft Expressive Pixels (7 of 9)
Microsoft Research
35 Exporting Animations with Microsoft Expressive Pixels (8 of 9)
Exporting Animations with Microsoft Expressive Pixels (8 of 9)
Microsoft Research
36 What Kind of Computation is Human Cognition? A Brief History of Thought (Episode 2/2)
What Kind of Computation is Human Cognition? A Brief History of Thought (Episode 2/2)
Microsoft Research
37 What Kind of Computation is Human Cognition? A Brief History of Thought (Episode 1/2)
What Kind of Computation is Human Cognition? A Brief History of Thought (Episode 1/2)
Microsoft Research
38 Planeverb: Interactive sound propagation for dynamic scenes using 2D wave simulation
Planeverb: Interactive sound propagation for dynamic scenes using 2D wave simulation
Microsoft Research
39 Making cryptography accessible, efficient, and scalable with Dr. Divya Gupta and Dr. Rahul Sharma
Making cryptography accessible, efficient, and scalable with Dr. Divya Gupta and Dr. Rahul Sharma
Microsoft Research
40 Beyond the mega-data center: networking multi-data center regions (SIGCOMM 2020 Talk)
Beyond the mega-data center: networking multi-data center regions (SIGCOMM 2020 Talk)
Microsoft Research
41 Optics for the cloud – Light at the end of the tunnel? (SIGCOMM 2020 Workshop)
Optics for the cloud – Light at the end of the tunnel? (SIGCOMM 2020 Workshop)
Microsoft Research
42 Beyond the mega-data center: networking multi-data center regions (SIGCOMM 2020 short talk)
Beyond the mega-data center: networking multi-data center regions (SIGCOMM 2020 short talk)
Microsoft Research
43 Sirius: A Flat Datacenter Network with Nanosecond Optical Switching (SIGCOMM 2020 short talk)
Sirius: A Flat Datacenter Network with Nanosecond Optical Switching (SIGCOMM 2020 short talk)
Microsoft Research
44 Novel Image Captioning
Novel Image Captioning
Microsoft Research
45 Forest Sound Scene Simulation and Bird Localization with Distributed Microphone Arrays
Forest Sound Scene Simulation and Bird Localization with Distributed Microphone Arrays
Microsoft Research
46 Decoding Music Attention from “EEG headphones”: a User-friendly Auditory Brain-computer Interface
Decoding Music Attention from “EEG headphones”: a User-friendly Auditory Brain-computer Interface
Microsoft Research
47 How does holographic storage work?
How does holographic storage work?
Microsoft Research
48 The physics of hologram formation in iron doped lithium niobate
The physics of hologram formation in iron doped lithium niobate
Microsoft Research
49 Introduction to coax: A Modular RL Package
Introduction to coax: A Modular RL Package
Microsoft Research
50 Directions in ML: "Neural architecture search: Coming of age"
Directions in ML: "Neural architecture search: Coming of age"
Microsoft Research
51 Microsoft Research AI Breakthroughs 2020: 20 minute research talks + Q&A panel
Microsoft Research AI Breakthroughs 2020: 20 minute research talks + Q&A panel
Microsoft Research
52 Fireside Chat with Johannes Gehrke during Microsoft Research AI Breakthroughs 2020
Fireside Chat with Johannes Gehrke during Microsoft Research AI Breakthroughs 2020
Microsoft Research
53 Fireside Chat with Susan Dumais during Microsoft Research AI Breakthroughs 2020
Fireside Chat with Susan Dumais during Microsoft Research AI Breakthroughs 2020
Microsoft Research
54 Microsoft Research AI Breakthroughs 2020: 20 minute research talks, Q&A panel, and event wrap-up
Microsoft Research AI Breakthroughs 2020: 20 minute research talks, Q&A panel, and event wrap-up
Microsoft Research
55 Clinical Research with FHIR
Clinical Research with FHIR
Microsoft Research
56 Soundscape Street Preview
Soundscape Street Preview
Microsoft Research
57 Tilt-Responsive Techniques for Digital Drawing Boards
Tilt-Responsive Techniques for Digital Drawing Boards
Microsoft Research
58 SurfaceFleet: Exploring Distributed Interactions Unbounded from Device, Application, User, and Time
SurfaceFleet: Exploring Distributed Interactions Unbounded from Device, Application, User, and Time
Microsoft Research
59 Haptic PIVOT: On-Demand Handhelds in VR
Haptic PIVOT: On-Demand Handhelds in VR
Microsoft Research
60 SurfaceFleet Supplemental Video Demonstration (UIST 2020)
SurfaceFleet Supplemental Video Demonstration (UIST 2020)
Microsoft Research

The video discusses the use of formal proofs in programming to develop high-assurance software, highlighting recent accomplishments in the F* proof assistant and Project Everest. The speaker provides an overview of the tools and techniques used to develop verification libraries and stacks of communication components with full formal proofs. The video also covers the applications of verified code in various industries, including Hyper-V, Firefox, and Linux kernel.

Key Takeaways
  1. Write a high-level specification of message formats
  2. Use Everpass to generate high-performance verified code
  3. Prove the correctness of the generated code
  4. Extract the code to memory-safe low-level C code
  5. Design and conduct experiments on programming with proofs
  6. Analyze the results of research papers on programming with proofs
💡 The use of formal proofs in programming can ensure the correctness and security of software, and recent advancements in tools and techniques have made it possible to develop high-assurance software.

Related AI Lessons

5 Best BrowserStack Alternatives to Optimize Your Testing Infrastructure
Discover the top 5 BrowserStack alternatives to optimize testing infrastructure for better execution speed, pricing, and test management
Medium · DevOps
️ The Lifecycle Symphony: A Senior SRE’s Deep Dive into Init and Sidecar Containers
Learn how to optimize container initialization and sidecar containers for resilient multi-cloud platforms
Medium · DevOps
`wrangler dev --remote` silently writes to your production KV namespace — here's the fix
Learn how to safely use wrangler dev --remote with live KV namespaces without overwriting production data
Dev.to · 강해수
Qwen 3.6 27B Is the Local Dev Sweet Spot — Here's Why
Discover why Qwen 3.6 27B is the ideal choice for local development, and how it can boost your productivity
Dev.to · Carter May
Up next
Containers on Amazon ECS with Mama J
AWS Developers
Watch →