A tour of the Vector Virtual Machine
Empirical’s Vector Virtual Machine (VVM) is the core component that makes both a responsive REPL and automated CTFE possible. All Empirical code is compiled to VVM’s register-based bytecode.
Launching Empirical with --dump-vvm
will print the disassembled bytecode. For example:
>>> 13 + 17
becomes:
add_i64s_i64s 13 17 %0
The %0
is a temporary/local register that stores the result. If we had assigned the value to a variable, it would likely resolve to @1
, a global register. (@0
is argv
, the list of user arguments on the command line for a script.)
Instructions
As with any virtual machine, VVM operates on instructions. Each instruction starts with an opcode, like add_i64s_i64s
, and continues with a pre-determined number of operands. Addition takes two inputs and one output, so add_i64s_i64s
will always take three operands.
Operands are almost always a register, but certain instructions can take a boolean or small integer. Any other literal is stored in the constant pool, a set of global registers:
@1 = "Hello World"
@2 = 1.25
@3 = 4611686018427387904
The opcode is often specialized for an input. The add_i64s_i64s
indicates that addition takes two integer 64-bit scalars. As of this writing, there are a total of 52 addition instructions, including:
add_f64v_i64s
- floating-point vector and integer scalaradd_Sv_Sv
- two string vectorsadd_Ts_Dv
- timestamp scalar and timedelta vectoradd_DAs_TIs
- date scalar and time scalar
Concatenating two strings together:
>>> "Hello " + "World"
results in storing the constants before invoking the add instruction:
@4 = "Hello "
@5 = "World"
add_Ss_Ss @4 @5 %10
Type codes
Opcodes can be specialized ahead of time for builtin types, but user-defined types require a different approach. For that reason, some instructions take a type code.
>>> var x: Int64
becomes:
alloc i64s @6
The alloc
instruction must be able to take any type. For example:
>>> data Person: name: String, age: Int64 end
>>> var me: Person
becomes:
$0 = {"name": Ss, "age": i64s}
alloc $0 @7
The $0
is a user-defined type code for VVM. The type’s definition (layout and member names) are mapped to a code that can be passed as an operand. Giving the type a parameter means that alloc
can handle anything.
Dispatch
Each opcode maps to a C++ function in the runtime. Since there are many opcodes that do similar actions with different inputs, the the corresponding C++ routine is often a templated function. Our add_i64s_i64s
invokes:
add_ss<int64_t, int64_t, int64_t>(...);
This C++ function adds the values in the first two operands and stores the result in the third operand. We will investigate the code for this later, but for now we will see how the dispatch takes place.
The bytecode is the set of instructions, and each instruction is an opcode followed by a pre-determined number of operands. By making all opcodes and operands an integer of the same size, we can simply define a set of instructions as an integer array. And indeed that’s what happens in C++:
typedef std::vector<size_t> instructions_t;
To perform the dispatch, we simply jump to the logic that sets up the function call for each opcode:
void dispatch(const instructions_t& code) {
size_t p;
ip_ = 0;
while (true) {
switch (opcodes(code[ip_])) {
case opcodes::halt:
return;
case opcodes::alloc:
p = ip_;
ip_ += 3;
alloc(code[p + 1], code[p + 2]);
break;
...
case opcodes::add_i64s_i64s:
p = ip_;
ip_ += 4;
add_ss<int64_t, int64_t, int64_t>(code[p + 1], code[p + 2], code[p + 3]);
break;
...
}
}
}
The ip_
above is the instruction pointer, also known as a program counter (PC). It is a global variable that stores the index of our bytecode array. (The reason it’s a global variable is so that branching instructions and function calls can manipulate our location in the bytecode.) By jumping to the appropriate function-call logic for each operand, we can invoke the C++ function with the correct number of arguments.
The halt
opcode is a hidden instruction that appears at the end of every function definition and bytecode array. It is generated by the Empirical compiler to tell our interpreter to stop, which means that we don’t need any boundary checking for ip_
.
While the above code is pretty fast, there is an optimization in C++ for compilers that support computed gotos. We can use direct threading (which has nothing to do with multithreading), to jump to the function-call logic without needing the branches that are required for the break
and while
.
void dispatch(const instructions_t& code) {
static void* opcode_labels[] = { &&halt, &&alloc, ..., &&add_i64s_i64s, ... };
size_t p;
ip_ = 0;
goto *opcode_labels[code[ip_]];
halt:
return;
alloc:
p = ip_;
ip_ += 3;
alloc(code[p + 1], code[p + 2]);
goto *opcode_labels[code[ip_]];
...
add_i64s_i64s:
p = ip_;
ip_ += 4;
add_ss<int64_t, int64_t, int64_t>(code[p + 1], code[p + 2], code[p + 3]);
goto *opcode_labels[code[ip_]];
...
}
Now each function-call logic is responsible for invoking the next instruction in the bytecode. (It also saves switch
from any boundary checking.) Computed gotos are available in Clang and GCC, but not Visual Studio.
Registers
We can see from the dispatch code that the operand is passed to the C++ function. Let’s take a look at one possible implementation of scalar addition:
template<class T, class U, class V>
void add_ss(operand_t left, operand_t right, operand_t result) {
T& x = get_reference<T>(left);
U& y = get_reference<U>(right);
V& z = get_reference<V>(result);
z = x + y;
}
The register code is simply an index into an array. %1
is really:
local_registers_[1];
And @5
is:
global_registers_[5];
Each individual register is a pointer (void*
) whose interpretation is handled by the invoked function. The add_ss()
above converts these pointers to C++ references for easier handling. The actual addition is a single C++ add, but there is overhead from indexing the register arrays and dereferencing the pointer.
VVM’s add_i64s_i64s
is allowed an embedded integer for the first two values. As we saw earlier:
add_i64s_i64s 13 17 %0
Since these immediate values are not registers, the C++ function must know how to get the values directly. The actual C++ function is more like:
template<class T, class U, class V>
void add_ss(operand_t left, operand_t right, operand_t result) {
T x = get_value<T>(left);
U y = get_value<U>(right);
V& z = get_reference<V>(result);
z = x + y;
}
The function checks if the parameter is an embedded value first. If the operand is still a register, then it dereferences the pointer for the underlying value. add_ss()
is able to do this because it knows it is getting a scalar value, so the copying is fine. (By contrast, the vectorized add_vv()
knows it cannot have an immediate value since the operand must be a vector.)
Nil
At this point, it is worth looking at how Empirical handles missing values, which can come from a blank entry in a CSV file, a mismatched join, or a bad cast.
>>> Int64("75")
75
>>> Int64("7b")
nil
VVM represents missing numbers with a sentinel value. Floating points are simply the IEEE nan
while integers are the numeric limit (size-appropriate INT_MAX
). Determining whether a value is missing is just a comparison. Here is a simplified version:
bool is_nil(int64_t x) {
return x == std::numeric_limits<int64_t>::max();
}
bool is_nil(double x) {
return std::isnan(x);
}
The function to sum a vector discards missing values; it is similar to:
template<class T, class U>
void sum_v(operand_t left, operand_t result) {
std::vector<T>& xs = get_reference<std::vector<T>>(left);
U& y = get_reference<U>(result);
y = 0;
for (auto x: xs) {
if (!is_nil(x)) {
y = y + x;
}
}
}
Going back to our add_ss()
function, we want to ensure that a missing value propagates. So the actual addition is more like:
z = (is_nil(x) || is_nil(y)) ? nil_value<V>() : x + y;
This seems quite slow in a loop. Fortunately there is an optimization: IEEE nan
is propagated in hardware! We must check for integers in software, but floating point numbers can be added directly in hardware. For that we have a special function:
constexpr bool is_int_nil(int64_t x) {
return x == std::numeric_limits<int64_t>::max();
}
constexpr bool is_int_nil(double x) {
return false;
}
And then we change the addition line to:
z = (is_int_nil(x) || is_int_nil(y)) ? nil_value<V>() : x + y;
With an optimizing C++ compiler, the is_int_nil()
guard will disappear for floating points, leaving only the addition. The nan
propagation is done entirely in hardware. (I first saw this trick in Python datatable, though I don’t know its origins.)
Repr
Let’s take one more look at our humble addition. If we enable --dump-vvm
for our original example, the full disassembled bytecode is:
add_i64s_i64s 13 17 %0
repr %0 i64s %1
save %1
These last two lines are how Empirical and VVM display a result to the console during interactive mode (ie., in the REPL). The repr
instruction converts a value (%0
) of a given type (i64s
) into a string (%1
). The save
instruction then copies this string to a designated variable in VVM; it is this saved string that is returned when evaluating a user expression.
The repr
for a builtin type is pretty straightforward: convert to a string and then surround with whatever is needed to make that string a valid Empirical expression. For example, a timestamp is:
Timestamp("2020-09-03 22:43:05.894561")
The repr
for a Dataframe is a lot more involved. It determines the size of the user’s console, which is needed to cut-off the overflowed rows and columns. It generates each column as a vector of strings, and then transposes so that each row can be printed line-by-line. And it must trim excess zeros in unison from floating points and timestamps, which is why those columns line-up their decimal points.
Here are the first few lines of the trades
table if we dramatically shrink the console window:
>>> trades
symbol timestamp price ...
AAPL 2019-05-01 09:30:00.578802 210.52 ...
AAPL 2019-05-01 09:30:00.580485 210.81 ...
BAC 2019-05-01 09:30:00.629205 30.25 ...
CVX 2019-05-01 09:30:00.944122 117.80 ...
... ... ... ...
Expanding the window a little bit gives us:
>>> trades
symbol timestamp price size
AAPL 2019-05-01 09:30:00.578802 210.5200 780
AAPL 2019-05-01 09:30:00.580485 210.8100 390
BAC 2019-05-01 09:30:00.629205 30.2500 510
CVX 2019-05-01 09:30:00.944122 117.8000 5860
AAPL 2019-05-01 09:30:01.002405 211.1300 320
AAPL 2019-05-01 09:30:01.066917 211.1186 310
AAPL 2019-05-01 09:30:01.118968 211.0000 730
... ... ... ...
Our repr
does this resizing automatically.