Statement of Academic Purpose 2023

2022-12-29

Introduction

My general ambition is to accelerate software development. At the moment I am particularly interested in abstractions capturing the trifecta of correctness, performance, and interpretability. Some inspiring examples include the pipes streaming framework designed in the vein of applied category theory (Gonzalez 2012), Chlipala’s synthesized crypto routines (Erbsen et al. 2019), and metacompilation work with Python (Bolz and Tratt 2015) and Lua (Xu 2022). pipes’ design elegantly avoids other libraries’ unsoundness without compromising performance; Chlipala synthesizes from high-level machine-checked code a fastest-known crypto routine implementation; metatracing and metacompilation yield competitive JITs at a fraction of effort. The common thread unifying these is applying foundational theory to well-chosen areas and producing practical results outperforming prior heuristic approaches.

Experience

PL

Let me showcase my experience.

I had previously worked on a security policy to runtime monitor compiler in the vein of SELinux1 for Kaspersky OS2 (Kaspersky 2015). The project used to compile PSL the Policy Specification Language to C directly, which lead to binary blowup for large and/or complicated policies like LTL3 with regexps. Our team thus sketched out a subset of PSL in Agda and designed a simple stack IR. With this we both decoupled type elaboration from codegen (sic) and were free to implement optimization passes, reducing the binary size severalfold.

Our project was modular w.r.t. policy engines: that is, providers could be plugged into the compiler to imbue particular PSL syntax with semantics. Providers’ codegen and composition are implemented with nk-sketch4, a Nix-style AST build system inspired by the SKETCH synthesis tool (Solar-Lezama et al. 2006). A sketch is a template with holes that the compilation process fills (formally, a sketch is a key, a list of dependencies, and a cache of holes). Templates were written in C before the IR project, but nk-sketch is fully general.

Before joining Kaspersky, I did a research internship at Jetbrains’ MLSE5 lab, fuzzing the Kotlin compiler for performance; that is, to find anomalously slow-to-compile inputs. Whitebox methods were dismissed for Kotlin is complicated and lacks a formal description; blackbox neural generation with heuristic filter à la (Cummins et al. 2018) seemed viable but dumb; so I tried out AFL6. With instrumentation available at the time (Padhye, Lemieux, and Sen 2019), that came down7 to 5 \mathrm{\frac{samples}{sec \ast core}}. Prior work on domain-general semantic fuzzing (Padhye et al. 2019) didn’t work well with such slow feedback; so I tried bootstrapping from the compiler performance test suite and implementing Kotlin-specific mutators. That’s when my internship ran out, with inconclusive results. One thing I learned from that is to estimate the research-readiness of the subject: I spent an inordinate amount of time patching an unfamiliar industrial compiler having expected to work at a much higher level than that. Still, that work could further be fleshed out into a “CompilerGym”8 of sorts.

Networks

Prior to having pivoted to PL work, I did networks: a joint project and an independent one.

My group at eQualitie9 approached the censorship resistance problem from an unusual angle: in lieu of an ever-more-sophisticated DPI10-resistant proxy, we served the static Web with BitTorrent (eQualitie 2018). That let us reduce the traffic that needed to be expensively proxied and handle networks temporarily or semi-permanently separated from the global Internet (such as a country flipping the Internet switch). My particular contribution was in the distributed naming scheme we used and adjusting the P2P code to mobile clients. This work was presented as (Ivan Vilata-i-Balaguer 2019).

I got the eQualitie contract after presenting gnunet-hs (Volkov 2019) (Volkov 2018), an experiment in wrapping GNUnet (Grothoff 2017), an intimidating C P2P application runtime complete with a CPS11 scheduler, into a simple imperative Haskell API. This was engineering and API design work directed to lowering the barrier to P2P application development and raising the exposure of a productive Haskell+C PL combination (Volkov 2020), culminating in a port of Cloud Haskell12 (Epstein, Black, and Peyton-Jones 2011) to GNUnet’s CADET13.

An anecdote: upon reading the Jitk (Wang et al. 2014) and Firecracker (Agache et al. 2020) papers back in 2020 – briefly, Jitk runs a Coq→OCaml extracted BPF JIT in userspace; kernel downcalls to it; Firecracker is a lightweight virtualization system powering AWS Lambda – I had the idea of running a BPF VM as a unikernel with virtio network as an alternative to the likes of DPDK and prototyped14 this up to virtio-net initialization in baremetal Rust before life intervened. I imagine grad school is where life doesn’t intervene so.

PLV outreach

Besides that, my teaching experience totals ca. 400 hours, most of that intensive instruction in functional programming and distributed systems at summer schools, both to students and seasoned software engineers. In 2021 I organized Lalambda, a two-week summer school on formal methods hosting 70 participants in collaboration with Anton Trunov.

Research interests

As stated earlier, I aspire to do applied theoretic work yielding correct and performant artifacts. Particular areas I find curious are gradual typing, program synthesis and extensible compilation, and specification logics. Some particular ideas follow:

Static typing limits expressiveness; typing statically what can be thus typed ergonomically and leaving the rest to dynamic checks (Siek and Taha 2006) is a compromise whose value is attested to by TypeScript and mypy. There is practical work to be done with these to add soundness without sacrificing performance (Rastogi et al. 2015). Another exciting direction is recent work in gradualizing Coq’s CIC15 (Lennon-Bertrand et al. 2022); this suggests a new sort of proof assistants and dependently-typed languages and is very curious.

Synthesis and extensible compilation both make code easier to reason about and verify, faster to develop, and offer very real performance gains – as evidenced by Halide and Exo. I am particularly interested in verified synthesis with the Fiat (Delaware et al. 2015) refinement framework, and further Fiat-to-Facade and the Rupicola relational compiler. These are ripe for application to further targets (like synthesizing an SMT solver) and development of the corresponding theory.

On the other hand, some systems are intractable to synthesize: prominently, DNN control systems. One avenue is furthering simulation; another is to scale direct verification of DNNs via SMT solvers (Katz et al. 2021), in particular reducing case-splitting by better conflict prediction, or providing better soundness guarantees by extending Barrett’s work (Katz et al. 2016) to support the full input language of CVC4.

Iris16’s (Jung et al. 2015) success suggests the right logic can be tremendously valuable to reason about a wide range of domains from weak memory models (Mével and Jourdan 2021) to CRDTs17 (Nieto et al. 2022) to OCAPs18 (Swasey, Garg, and Dreyer 2017), and even build executable specifications of distributed systems (Krogh-Jespersen et al. 2020) (Hawblitzel et al. 2015). Clearly, there is no shortage of research to be done here; one exciting problem is generalizing Simuliris’s (Gäher et al. 2022) results from a particular program transformation to general optimizations or adapting it to a weak memory model.

For another direction, staged compilation seems to be the next evolution step of domain-specific metaprogramming systems like FFTW, ATLAS, or Halide (Kiselyov 2018) and further makes some styles of program analysis and synthesis more viable as evidenced by recent work in staged abstract and relational interpreters (Wei, Chen, and Rompf 2019). One imagines extending that work to richer languages, such as higher-order ones.

Finally, I feel putting engineering effort into tooling (c.f. Proof General, Alectryon, Dotty decompiler) can improve many an academic’s research or teaching experience.

One comparative advantage I bring to the table is six years’ worth of systems experience. That means I’m proficient with pushing out code; what I’m looking for is a mentor to guide my research and an org process optimized for crystallizing and publishing quality ideas.

References

Agache, Alexandru, Marc Brooker, Andreea Florescu, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. “Firecracker: Lightweight Virtualization for Serverless Applications.” In Proceedings of the 17th Usenix Conference on Networked Systems Design and Implementation, 419–34. NSDI’20. USA: USENIX Association.
Bolz, Carl Friedrich, and Laurence Tratt. 2015. “The Impact of Meta-Tracing on VM Design and Implementation.” Science of Computer Programming 98: 408–21. https://doi.org/https://doi.org/10.1016/j.scico.2013.02.001.
Brockman, Greg, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. “OpenAI Gym.” arXiv. https://doi.org/10.48550/ARXIV.1606.01540.
Cummins, Chris, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. “Compiler Fuzzing Through Deep Learning.” In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, 95–105. ISSTA 2018. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3213846.3213848.
Delaware, Benjamin, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. “Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant.” In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 689–700. POPL ’15. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2676726.2677006.
Epstein, Jeff, Andrew P. Black, and Simon Peyton-Jones. 2011. “Towards Haskell in the Cloud.” SIGPLAN Not. 46 (12): 118–29. https://doi.org/10.1145/2096148.2034690.
eQualitie. 2018. “Ouinet.” Git Repository. https://github.com/equalitie/ouinet.
Erbsen, Andres, Jade Philipoom, Jason Gross, Robert Sloan, and Adam Chlipala. 2019. “Simple High-Level Code for Cryptographic Arithmetic - with Proofs, Without Compromises.” In 2019 IEEE Symposium on Security and Privacy (SP), 1202–19. https://doi.org/10.1109/SP.2019.00005.
Gäher, Lennard, Michael Sammler, Simon Spies, Ralf Jung, Hoang-Hai Dang, Robbert Krebbers, Jeehoon Kang, and Derek Dreyer. 2022. “Simuliris: A Separation Logic Framework for Verifying Concurrent Program Optimizations.” Proc. ACM Program. Lang. 6 (POPL). https://doi.org/10.1145/3498689.
Gonzalez, Gabriella. 2012. “Pipes.” Hackage Repository. https://hackage.haskell.org/package/pipes.
Grothoff, Christian. 2017. The GNUnet System.” Habilitation á diriger des recherches, Université de Rennes 1. https://hal.inria.fr/tel-01654244.
Hawblitzel, Chris, Jon Howell, Manos Kapritsos, Jay Lorch, Bryan Parno, Justine Stephenson, Srinath Setty, and Brian Zill. 2015. “IronFleet: Proving Practical Distributed Systems Correct.” In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), Proceedings of the ACM Symposium on Operating Systems Principles (SOSP). ACM - Association for Computing Machinery. https://www.microsoft.com/en-us/research/publication/ironfleet-proving-practical-distributed-systems-correct/.
Ivan Vilata-i-Balaguer, Dmitriy Volkov, Peter Jankuliak. 2019. “CENO and Ouinet: Ignore Censorship with P2P-backed Web Browsing.” TICS. https://tics-conf.github.io/proceedings/2019a/icn_2019_vil2019.pdf.
Jung, Ralf, David Swasey, Filip Sieczkowski, Kasper Svendsen, Aaron Turon, Lars Birkedal, and Derek Dreyer. 2015. “Iris: Monoids and Invariants as an Orthogonal Basis for Concurrent Reasoning.” In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 637–50. POPL ’15. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2676726.2676980.
Kaspersky. 2015. “Kaspersky OS.” https://os.kaspersky.com.
Katz, Guy, Clark Barrett, David Dill, Kyle Julian, and Mykel Kochenderfer. 2021. “Reluplex: A Calculus for Reasoning about Deep Neural Networks.” Formal Methods in System Design, July, 1–30. https://doi.org/10.1007/s10703-021-00363-7.
Katz, Guy, Clark Barrett, Cesare Tinelli, Andrew Reynolds, and Liana Hadarean. 2016. “Lazy Proofs for DPLL(t)-Based SMT Solvers.” In 2016 Formal Methods in Computer-Aided Design (FMCAD), 93–100. https://doi.org/10.1109/FMCAD.2016.7886666.
Kiselyov, Oleg. 2018. “Reconciling Abstraction with High Performance: A MetaOCaml Approach.” Foundations and Trends® in Programming Languages 5 (1): 1–101. https://doi.org/10.1561/2500000038.
Krogh-Jespersen, Morten, Amin Timany, Marit Edna Ohlenbusch, Simon Oddershede Gregersen, and Lars Birkedal. 2020. “Aneris: A Mechanised Logic for Modular Reasoning about Distributed Systems.” In Programming Languages and Systems: 29th European Symposium on Programming, ESOP 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25–30, 2020, Proceedings, 336–65. Berlin, Heidelberg: Springer-Verlag. https://doi.org/10.1007/978-3-030-44914-8_13.
Lennon-Bertrand, Meven, Kenji Maillard, Nicolas Tabareau, and Éric Tanter. 2022. “Gradualizing the Calculus of Inductive Constructions.” ACM Trans. Program. Lang. Syst. 44 (2). https://doi.org/10.1145/3495528.
Mével, Glen, and Jacques-Henri Jourdan. 2021. “Formal Verification of a Concurrent Bounded Queue in a Weak Memory Model.” Proc. ACM Program. Lang. 5 (ICFP). https://doi.org/10.1145/3473571.
Nieto, Abel, Léon Gondelman, Alban Reynaud, Amin Timany, and Lars Birkedal. 2022. “Modular Verification of Op-Based CRDTs in Separation Logic.” Proc. ACM Program. Lang. 6 (OOPSLA2). https://doi.org/10.1145/3563351.
Padhye, Rohan, Caroline Lemieux, and Koushik Sen. 2019. “JQF: Coverage-Guided Property-Based Testing in Java.” In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 398–401. ISSTA 2019. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3293882.3339002.
Padhye, Rohan, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. 2019. “Semantic Fuzzing with Zest.” In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 329–40. ISSTA 2019. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3293882.3330576.
Rastogi, Aseem, Nikhil Swamy, Cédric Fournet, Gavin Bierman, and Panagiotis Vekris. 2015. “Safe and Efficient Gradual Typing for TypeScript.” SIGPLAN Not. 50 (1): 167–80. https://doi.org/10.1145/2775051.2676971.
Siek, Jeremy, and Walid Taha. 2006. “Gradual Typing for Functional Languages.” In Scheme and Functional Programming.
Solar-Lezama, Armando, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. “Combinatorial Sketching for Finite Programs.” In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 404–15. ASPLOS XII. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/1168857.1168907.
Swasey, David, Deepak Garg, and Derek Dreyer. 2017. “Robust and Compositional Verification of Object Capability Patterns.” Proc. ACM Program. Lang. 1 (OOPSLA). https://doi.org/10.1145/3133913.
Volkov, Dmitriy. 2018. “Practical Cloudless Applications.” CryptoInstallFest. Moscow, Russia.
———. 2019. gnunet-hs.” Git Repository. https://git.sr.ht/~wldhx/gnunet-hs.
———. 2020. “Haskell Prototypes ’round C Applications.” FPure. Kazan, Russia.
Wang, Xi, David Lazar, Nickolai Zeldovich, Adam Chlipala, and Zachary Tatlock. 2014. “Jitk: A Trustworthy in-Kernel Interpreter Infrastructure.” In OSDI.
Wei, Guannan, Yuxuan Chen, and Tiark Rompf. 2019. “Staged Abstract Interpreters: Fast and Modular Whole-Program Analysis via Meta-Programming.” Proc. ACM Program. Lang. 3 (OOPSLA). https://doi.org/10.1145/3360552.
Xu, Haoran. 2022. “Luajit-Remake.” Git Repository. https://github.com/luajit-remake/luajit-remake.

  1. Security-Enhanced Linux, a rich multimodal kernel-enforced security policy framework↩︎

  2. a lean separation kernel OS for secure applications↩︎

  3. Linear Temporal Logic↩︎

  4. Developed in-house at Kaspersky primarily by Alexander Bondarenko; going through legal to be open-sourced at the moment. Not a synthesis system.↩︎

  5. Machine Learning Methods in Software Engineering, headed by Timofey Bryksin↩︎

  6. https://lcamtuf.coredump.cx/afl/, a well-known graybox fuzzer↩︎

  7. with “daemonized” kotlinc; default AFL restarts the fuzzed program, which kills JVM software performance↩︎

  8. A la OpenAI Gym (Brockman et al. 2016): a high-level experimentation environment abstracting out the minutiae of interfacing with StarCraft in OpenAI’s case, and kotlinc in mine.↩︎

  9. supported by the Open Technology Fund and the NSF↩︎

  10. Deep Packet Inspection, i.e. traffic analysis. Deployed to filter VPN traffic, for instance. A good overview of the state of art in DPI and its circumvention is https://www.pluggabletransports.info/transports/.↩︎

  11. Continuation-Passing Style↩︎

  12. Erlang-in-Haskell, complete with closure serialization↩︎

  13. double ratchet-encrypted sockets going over an overlay network↩︎

  14. https://git.sr.ht/~wldhx/runfw↩︎

  15. Calculus of Inductive Constructions, a higher-order typed lambda calculus with inductive types↩︎

  16. higher order concurrent separation logic↩︎

  17. Conflict-free Replicated Data Types, a family of eventually consistent data structures↩︎

  18. Object CAPabilities, a name for handles both referring to an object and authority; also topic of my senior thesis↩︎