Envision VM (part 1), Environment and General Architecture
This article is the first of a four-part series on the Envision virtual machine’s inner workings: the software that runs Envision scripts. See part 2, part 3 and part 4. This series doesn’t cover the Envision compiler (maybe some other time), so let’s just assume that the script has somehow been converted to the bytecode that the Envision virtual machine takes as input.
A Supply Chain Optimization pipeline covers a wide range of data processing needs: data ingestion and augmentation, feature extraction, probabilistic forecasting, producing optimal decisions under constraints, data exports, analytics, and dashboard creation. Each company’s pipeline is different, with its own inputs, rules, constraints and outputs. The volume of data is large: from Lokad’s experience, even our smallest accounts must process gigabytes of data each day, and our larger accounts reach well over a terabyte per day. As the optimization pipeline usually waits for daily inputs from the rest of the company’s internal data processing, it only has a few hours of computation time to optimize tomorrow’s decisions based on today’s data.
It’s not hard. Not really. Processing several terabytes in a few hours is a performance target well within the reach of a team of skilled software engineers.
Lokad’s challenge is to build such pipelines without software engineers. Of course, there are software engineers working at Lokad, but they are developing tools and infrastructure for the entire company, not for individual customers. Instead, we have our Supply Chain Scientists - a team of supply chain experts who grok the specific problems of each customer and design the solutions. In a more traditional company structure, these would be the product managers, the ones who listen to customers and then tell the software engineers, in great detail, exactly what needs to be implemented. Our development philosophy, and the reason for creating Envision, our own programming language, is that it should be faster for a Supply Chain Scientist to implement the solution themselves, than to write the specifications for someone else to implement.
In order to achieve this without sacrificing scale or performance, there are three features that Envision must bring to the table for free1, without any effort from the Supply Chain Scientist.
- Memory management must be automatic. This obviously includes having a garbage collector, but it also means Envision should support large-scale data sets transparently. Creating a ten-gigabyte array is table stakes: that’s not even three billion numbers! The ability to work with a data set larger than a single machine’s memory is expected. In fact, Envision supports data sets that are larger than the memory of the entire cluster, by cleverly spilling to NVMe drives.
- Multi-core and multi-machine parallelism must be automatic. Embarrassingly parallel operations should be distributed to as many cores as possible on the cluster, without human intervention. A script should survive the crash of a single machine in the cluster without having to start over. A script should be able to complete even if the cluster is reduced to a single machine. And, of course, two Envision runs should be able to execute concurrently on the same cluster.
- We expect Envision to require little hardware to run. Many performance problems can be solved by spending millions of dollars on high-grade hardware and/or server licenses, but we would rather avoid a situation where clicking a script’s “Run” button costs hundreds of dollars.
General-purpose programming languages do not provide these features, and even though they can usually be combined with frameworks that do (Scala + Spark, Python + Dask, and others), this leaves too many sharp edges exposed to the user. In this sense, Envision is more similar to SQL running on BigQuery.
Environment
Envision cannot be installed or run locally. Instead, all users connect to Lokad’s online platform, which provides a browser-based IDE to edit and execute scripts. The data and the dashboards are also stored online and accessed through a web interface, as well as through SFTP and FTPS.
When a user runs a script through the web interface, it creates a mission that is dispatched to an Envision cluster for execution.
Envision runs in batch mode: each execution reads the entirety of the input data and produces a complete output. This can take between 5 seconds for a very simple script running on little data, to 30-40 minutes for a large data augmentation script, and even several hours for some machine learning tasks.
Other execution modes, such as stream processing (listening for new input and producing the corresponding output on the fly) or transactional access (reading only a few lines of data, writing a few lines back) are not supported: the language primitives and the low-level implementation details involved in running these modes with a decent performance run counter to those involved in batch mode processing.
Cluster Structure
As of 2021, all of Envision runs on .NET 5, hosted on Ubuntu 20.04 virtual machines in the Microsoft Azure cloud. A cluster is composed of between 2 and 6 Standard_L32s_v2 instances: 32× AMD EPYC 7551 cores, 256GiB of memory, and 4× NVMe drives totaling 7.68TB of storage space. We call these machines workers.
The cluster holds an M-to-N association between workers and missions: a single worker can execute several Envision scripts concurrently, and if a single script is assigned to several workers, they will cooperate to finish it faster.
Each cluster also has a scheduler, a smaller Standard_B2ms with 2× cores and 8GiB of memory. The scheduler provides the API endpoints for external applications to submit new missions and to collect the results of finished missions. It is also responsible for dispatching missions to one or more machines on the cluster. Depending on the load, and the degree of parallelism available to each script at any point in time, the scheduler can add or remove workers from a mission.
The entire system was designed to be resilient: once a mission has been assigned to a worker, even if all the other workers in the cluster; as well as the scheduler, go offline the surviving worker will still be able to complete the mission. As such, multi-worker cooperation and the mission re-assignments performed by the scheduler are performance optimizations, they are not necessary for the mission’s completion.
Blob Storage
Lokad does not use SQL databases for customer data. The hosted solutions cannot easily hold the datasets of our larger customers (they tend to tap out between 4TB and 16TB), and running our own servers would require effort that we would rather expend elsewhere.
On the other hand, Envision runs in batch mode, which eliminates the need for queries more complex than “Read column X between lines L and M”: once the input data has been loaded the worker will be able to index and re-process it as necessary.
Because of this, we use Azure Blob Storage as our primary storage. It lets us store in excess of a petabyte at less than 1% of the cost of hosted SQL databases, and read query performance reliably stays between 30MB/s and 60MB/s.
We have also made our blobs immutable: it is possible to create new blobs, but not to change existing ones. This ensures that the inputs of a script cannot be changed while the script is running, and that the outputs of a script cannot be seen until the execution completes and returns the new blobs’ identifiers.
To be exact, Lokad has built a Content-Addressable Store on top of Azure Blob Storage. It is open source, and available as a pair of NuGet packages Lokad.ContentAddr and Lokad.ContentAddr.Azure. Knowing the hash of individual files lets Envision determine that some of its inputs have not changed, so it can reuse computed values kept in cache from a previous run.
Deployment
Envision does not use any containerization (such as Docker), because the benefits of containers do not justify the additional complexity involved.
First, high-performance computing requires every last drop of CPU, RAM and storage from our workers, and so it is not possible to run several applications on the same machine.
Second, for packaging an application along with all its dependencies, in a platform-independent way, we have found dotnet publish
to be sufficient (and, in fact, faster than docker build
). With .NET 5, Microsoft provides outstanding cross-platform support, and it is enough to literally copy the results of a build from a Windows machine to a Linux host.
Finally, quickly creating new instances from scratch is something we actively avoid. While we can shut down workers to reduce costs, creating new clusters or adding more workers to existing clusters is a financial decision: we do not bill our customers based on resource usage, so we have no way to forward the additional costs.
Next week, we will dive into Envision’s execution model, and how the work to be done is represented inside the cluster.
Shameless plug: we are hiring software engineers. Remote work is possible.
-
nothing is really free, and we pay for it by sacrificing Envision’s ability to act as a general-purpose programming language. For instance, automatic parallelization means we will never support explicit control over threads in Envision; automatic multi-machine processing means there will never be the concept of a “local machine” in Envision. ↩︎