Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probalistic programming
Track: Modern CS in the Real World
Location: Soho Complex, 7th fl.
Day of week:
Track Host: Werner Schuster
Werner Schuster (@murphee) sometimes writes software, sometimes writes about software. He focuses on languages, VMs and compilers, HTML5/Javascript, and recently more on performance optimisation.
10:35am - 11:25am
Fast Log Analysis by Automatically Parsing Heterogeneous Log
Most log analysis tools provide platforms for indexing, monitoring, and visualizing logs. Although these tools allow users to relatively easily perform ad-hoc queries and define rules in order to generate alerts, they do not provide automated log parsing support. In particular, most of these systems use regular expressions (RegEx) to parse log messages. These tools assume that the users know how to work with RegEx. and make them manually parse or define the fields of interest. By definition, these tools support only supervised parsing as human input is essential. However, human involvement is clearly non-scalable for heterogeneous and continuously evolving log message formats in systems such as IoTs and custom applications -- it is impossible to manually review the sheer number of logs generated in an hour, let alone days and weeks. On top of that, writing RegEx-based parsing rules is a long, frustrating, and error-prone process as RegEx rules may conflict with each other. In this talk, we present a solution inspired by the unsupervised machine learning techniques for automatically generating RegEx rules from a set of logs with no (or minimal) human involvement. Human involvement is limited to providing a set of training logs. In addition, we present a demo illustrating how to integrate our solution with the popular Elasticsearch-Logstash-Kibana (ELK) stack to analyze logs collected from the real-world applications.
11:50am - 12:40pm
Git Gud with Property-Based Testing
Traditional unit tests force developers to make difficult design decisions on their own, meaning their tests can be only as robust as the human imagination allows. In searching for an alternative to this limiting style of testing, I’ve found Property-Based Testing as a meaningful alternative. Property-Based Testing presents a marked shift from this typical awkward style of test creation. Instead, test cases are programmatically generated to try to explore not only execution path coverage but also the domain input space. This bolsters the possibility of exposing unidentified problematic inputs and states.
Typically, Property-Based Testing is applied close to the code in order to exercise functions. I’ve found that it can also be meaningfully used for testing at the integration level. Over the course of my discussion I will cover my experience using both methods as a way to find casting and overflow bugs in PolySync’s Open Source Car Control (OSCC) project, as well as for work involving the validation of security research and hardening strategies for git repositories.
1:40pm - 2:30pm
Real-Time, Fine-Grained Version Control With CRDTs
Last year, GitHub released Teletype, a package that adds support for real-time collaborative editing to the Atom text editor. Historically, similar systems such as Google Docs have based their implementations on Operational Transformation, which has been an area of active CS research for nearly 30 years. For Teletype, we tried a different approach, building on a newer theoretical framework called Conflict-Free Replicated Data Types, or CRDTs. Introduced in 2011, CRDTs offer a simple and general framework for synchronizing distributed replicas of non-trivial data structures, and they proved a great fit for collaborative editing. Having validated CRDTs in a production setting with Teletype, we're now exploring how we can take them further with an experimental system called Eon. Eon will enable what can best be described as "real-time version control". In this talk, I'll cover the foundations of CRDTs, then explore how we're using them in Eon to synchronize and persist changes to a repository at the granularity of individual keystrokes.
2:55pm - 3:45pm
AutoCAD & WebAssembly: Moving a 30 Year Code Base to the Web
AutoCAD is a computer-aided design desktop software application that was first released in 1982. With the advent of the internet age, there comes a need to extend AutoCAD's capabilities to the browser. However, the massive, complex, and constantly changing code base makes it impractical to rewrite everything in JavaScript. Therefore, the question remains: Can we really find an elegant way to leverage AutoCAD on the Web?
Enter WebAssembly! A compilation target for languages such as C/C++ that runs on modern browsers. For the first time in history, legacy code bases can now run on the Web at near native speed with the help of the Emscripten compiler. Nevertheless, there are mismatches between the programming paradigms of the desktop and Web world which greatly complicate the porting effort. Some of these include the use of synchronous blocking calls and shared memory on the desktop.
The goals of this session are two-fold. Firstly, the solutions for overcoming the above challenges will be explored in the context of existing Web APIs. Secondly, both the build time and performance implications of porting such a large code base will be addressed as well. As such, this talk will be helpful for developers who aspire to reuse their legacy software on the Web.
4:10pm - 5:00pm
Probabilistic Programming from Scratch
This talk is for anyone who deals with real world data. Such data is always incomplete or imperfect in some way. Bayesian inference is a framework that allows us to draw conclusion from that data. And despite a reputation for mathematical and computational complexity, you don’t need a statistics background to understand Bayes at a conceptual level. We’ll develop that understanding by building a lightweight probabilistic programming system from scratch with simple Python. We’ll use the code we write to solve two real data problems: an A/B test and the German Tank problem. We’ll also look at how we’d solve those problems using PyMC3, a much more powerful, fully-featured probabilistic programming system.
5:25pm - 6:15pm
Introduction to gVisor: Sandboxed Linux Container Runtime
Linux containers are a lightweight and portable way to run your services at scale. However, since they share the same host OS, they are considered providing weaker isolation than virtual machines. gVisor is a user-space kernel that implements a substantial portion of the Linux system interface to provide between applications and the host kernel. This session will introduce the architecture of gVisor and its benefits and discuss differences between other isolation mechanisms.