John Li CV

I'm a PhD student in programming languages working with Amal Ahmed and Steven Holtzen at Northeastern University. I'm generally interested in logic, semantics, and mechanized proof; lately I've been thinking about connections between mutable state and probability theory. Before starting my PhD, I built a verified-optimization-pass generator as part of the CertiCoq project. When I'm not thinking about research, I like learning math and playing bullet chess, violin, and ping pong.


A Nominal Approach to Probabilistic Separation Logic LICS 2024
John M. Li, Jon Aytac, Philip Johnson-Freyd, Amal Ahmed, and Steven Holtzen.
Abstract    arXiv
Currently, there is a gap between the tools used by probability theorists and those used in formal reasoning about probabilistic programs. On the one hand, a probability theorist decomposes probabilistic state along the simple and natural product of probability spaces. On the other hand, recently developed probabilistic separation logics decompose state via relatively unfamiliar measure-theoretic constructions for computing unions of sigma-algebras and probability measures. We bridge the gap between these two perspectives by showing that these two methods of decomposition are equivalent up to a suitable equivalence of categories. Our main result is a probabilistic analog of the classic equivalence between the category of nominal sets and the Schanuel topos. Through this equivalence, we validate design decisions in prior work on probabilistic separation logic and create new connections to nominal-set-like models of probability.
Lilac: A Modal Separation Logic for Conditional Probability PLDI 2023
John M. Li, Amal Ahmed, and Steven Holtzen.
Abstract    Slides    arXiv    DOI    Local copy    Notes
We present Lilac, a separation logic for reasoning about probabilistic programs where separating conjunction captures probabilistic independence. Inspired by an analogy with mutable state where sampling corresponds to dynamic allocation, we show how probability spaces over a fixed, ambient sample space appear to be the natural analogue of heap fragments, and present a new combining operation on them such that probability spaces behave like heaps and measurability of random variables behaves like ownership. This combining operation forms the basis for our model of separation, and produces a logic with many pleasant properties. In particular, Lilac has a frame rule identical to the ordinary one, and naturally accommodates advanced features like continuous random variables and reasoning about quantitative properties of programs. Then we propose a new modality based on disintegration theory for reasoning about conditional probability. We show how the resulting modal logic validates examples from prior work, and give a formal verification of an intricate weighted sampling algorithm whose correctness depends crucially on conditional independence structure.

Since publication, Jialu Bao has pointed out that the interpretation of almost-sure equality of random variables does not validate the expected substitution rule, due to a subtle point about negligibility. This issue is corrected in the local copy above and on arXiv; see Appendix B.5 for details.

We did not attempt to fully mechanize Lilac, but we do have a mechanization of our main result that probability spaces form a Kripke resource monoid (Theorem 2.4 in the paper). See this blog post for links to the mechanization, and for a PL-brained presentation of the pi-lambda theorem, a workhorse of measure-theoretic probability.

Deriving Efficient Program Transformations from Rewrite Rules ICFP 2021
John M. Li and Andrew W. Appel.
Abstract    Slides    DOI    Local copy
An efficient optimizing compiler can perform many cascading rewrites in a single pass, using auxiliary data structures such as variable binding maps, delayed substitutions, and occurrence counts. Such optimizers often perform transformations according to relatively simple rewrite rules, but the subtle interactions between the data structures needed for efficiency make them tricky to write and trickier to prove correct. We present a system for semi-automatically deriving both an efficient program transformation and its correctness proof from a list of rewrite rules and specifications of the auxiliary data structures it requires. Dependent types ensure that the holes left behind by our system (for the user to fill in) are filled in correctly, allowing the user low-level control over the implementation without having to worry about getting it wrong. We implemented our system in Coq (though it could be implemented in other logics as well), and used it to write optimization passes that perform uncurrying, inlining, dead code elimination, and static evaluation of case expressions and record projections. The generated implementations are sometimes faster, and at most 40% slower, than hand-written counterparts on a small set of benchmarks; in some cases, they require significantly less code to write and prove correct.
Compositional Optimizations for CertiCoq ICFP 2021
Zoe Paraskevopoulou, John M. Li, and Andrew W. Appel.
Abstract    DOI    Local copy
Compositional compiler verification is a difficult problem that focuses on separate compilation of program components with possibly different verified compilers. Logical relations are widely used in proving correctness of program transformations in higher-order languages; however, they do not scale to compositional verification of multi-pass compilers due to their lack of transitivity. The only known technique to apply to compositional verification of multi-pass compilers for higher-order languages is parametric inter-language simulations (PILS), which is however significantly more complicated than traditional proof techniques for compiler correctness. In this paper, we present a novel verification framework for lightweight compositional compiler correctness. We demonstrate that by imposing the additional restriction that program components are compiled by pipelines that go through the same sequence of intermediate representations, logical relation proofs can be transitively composed in order to derive an end-to-end compositional specification for multi-pass compiler pipelines. Unlike traditional logical-relation frameworks, our framework supports divergence preservation---even when transformations reduce the number of program steps. We achieve this by parameterizing our logical relations with a pair of relational invariants.

We apply this technique to verify a multi-pass, optimizing middle-end pipeline for CertiCoq, a compiler from Gallina (Coq's specification language) to C. The pipeline optimizes and closure-converts an untyped functional intermediate language (ANF or CPS) to a subset of that language without nested functions, which can be easily code-generated to low-level languages. Notably, our pipeline performs more complex closure-allocation optimizations than the state of the art in verified compilation. Using our novel verification framework, we prove an end-to-end theorem for our pipeline that covers both termination and divergence and applies to whole-program and separate compilation, even when different modules are compiled with different optimizations. Our results are mechanized in the Coq proof assistant.


Towards a Categorical Model of the Lilac Separation LogicLAFI @ POPL 2024
John M. Li, Jon Aytac, Philip Johnson-Freyd, Amal Ahmed, and Steven Holtzen.
Extended Abstract    Slides
New Foundations for Probabilistic Separation LogicLAFI @ POPL 2023
John M. Li, Amal Ahmed, and Steven Holtzen.
Extended Abstract    Slides    Poster

m o