Benchmarks

Cyber is fast and efficient with memory. Here are some benchmarks against similar languages. Each benchmark either compares the VM or the JIT. Read more about Performance.

Note: Cyber's JIT was only recently started so there aren't as many benchmarks for it. Cyber's memory footprint is slighter higher than Lua and other small embedded languages because it's CLI includes an HTTP lib and libtcc for FFI. The embedded version of Cyber takes up less memory.

Showing script time (orange), load time (gray), and peak memory usage. Load times <4ms are not labeled. Final metrics were computed after measuring each script body several times on a Macbook Pro M2. Details are documented in bench.cy. Versions used in benchmarks: Node 21.2.0, quickjs 2021-03-27, Python 3.12.0, luajit 2.1.17, lua 5.4.6, luau 0.604, ruby 3.2.2, wren 0.4, Oracle Java 21.0.1, wasm3 0.5.0
Fibers Start/Resume (VM) source
This tests spawning fibers and context switching.
cyber
8ms
 
27.8 MB
wren
15ms
8ms
28.1 MB
luajit
20ms
5ms
52.1 MB
luau
39ms
15ms
124.0 MB
quickjs
53ms
9ms
29.3 MB
lua
48ms
14ms
110.7 MB
node
22ms
41ms
65.2 MB
python3
34ms
44ms
28.8 MB
Recursive Fibonacci (VM) source
This tests how fast function calls are with a growing call stack.
cyber
19ms
 
2.9 MB
luajit
21ms
 
1.4 MB
wasm3
31ms
 
1.4 MB
luau
34ms
 
2.1 MB
lua
39ms
 
1.3 MB
quickjs
57ms
 
1.9 MB
wren
71ms
 
1.5 MB
java
44ms
31ms
35.2 MB
python3
70ms
15ms
10.2 MB
ruby
54ms
31ms
29.6 MB
node
56ms
39ms
31.9 MB
Recursive Fibonacci (JIT) source
cyber
5ms
 
2.9 MB
luajit
5ms
 
1.6 MB
luau
20ms
 
2.3 MB
java
3ms
31ms
38.3 MB
node
6ms
35ms
34.0 MB
ruby-yjit
13ms
30ms
30.1 MB
For Range/Iterator (VM) source
This tests basic iterations with counters and also iterable objects.
luajit
11ms
 
9.9 MB
luau
10ms
 
27.3 MB
cyber
12ms
 
20.3 MB
lua
27ms
 
26.6 MB
wren
44ms
 
9.5 MB
ruby
50ms
31ms
45.2 MB
quickjs
78ms
 
25.0 MB
node
57ms
43ms
96.6 MB
python3
87ms
23ms
57.9 MB
Max-heap Insert/Pop (VM) source
The max-heap was implemented using nodes instead of an array to test operations on objects.
cyber
40ms
 
4.9 MB
luajit
52ms
 
4.9 MB
luau
66ms
 
6.0 MB
java
43ms
30ms
36.0 MB
python3
69ms
15ms
13.3 MB
lua
82ms
 
5.7 MB
node
63ms
40ms
34.9 MB
quickjs
115ms
 
4.8 MB
wren
123ms
 
2.8 MB

Performance

Cyber was designed with performance in mind from the start. The language, type system, memory management, VM, and JIT were carefully considered to enable optimizations.

Crafty register VM.

Cyber's VM is register based so most bytecode instructions have a destination operand. This reduces the amount of cpu cycles and memory accesses compared to a stack based VM. Having registers as operands enables allocation strategies that reduce the amount of instructions the VM has to run.

Unlike physical registers, virtual registers are reserved as stack slots. This allows fibers to be swapped in and out by just replacing the current stack pointer.

Efficient call convention.

Call arguments are assigned unique stack slots which reduces copy instructions. The compiler also arranges the return slot to feed directly as an argument to a parent function call. This makes composing functions fast which is suitable for declarative programming paradigms. The call stack is also repurposed for storing call frames which increases cache locality.

In many dynamic languages, functions and fields are looked up in a hash map. In Cyber, they are indexed in an array by a symbol ID which is faster.

Inline caching.

Cyber optimizes instructions by patching bytecode at runtime. Object operations tend to involve more lookups and checks since values can have a dynamic type. By caching the lookup results in the bytecode, the instruction will run faster. In the rare case of a cache miss, the deoptimized version uses an MRU table for object types and symbols.

Compact values.

In Cyber, all values are 8 bytes and use NaN tagging to represent primitive types or heap objects. Having a compact value representation simplifies the data structures used in the VM. It's also easier to align them in memory to improve cache locality.

Small objects are allocated from object pools which is fast since they don't require much bookkeeping. mimalloc is used to allocate heap memory which has proven to be fast and reliable.

Fast dispatch.

Computed gotos allow the next bytecode to be dispatched using a jump table directly from the end of each bytecode segment. This does the least amount of work and leverages the cpu's branch prediction.

On the contrary, a switch statement performs additional bounds checks and funnels each bytecode segment to the same place which is unfavorable for branch predictions.

Precompiled JIT.

Cyber's JIT is implemented using precompiled stencils from LLVM. With only a small assembler, each bytecode instruction can be stitched together at runtime to generate performant machine code while still remaining fast to compile. More details can be found in the v0.3 Release Notes.

Compiled using Zig/LLVM.

Cyber itself was written in Zig, a system programming language that makes writing performant software easier. Zig leverages modern compiler features from LLVM which produces fast machine instructions for most targets.