Cyber Performance

Benchmarks

Cyber is fast and efficient with memory. Here are some benchmarks against similar languages. Each benchmark either compares the VM or the JIT. Read more about Performance.

Note: Cyber's JIT was only recently started so there aren't as many benchmarks for it. Cyber's memory footprint is slighter higher than Lua and other small embedded languages because it's CLI includes an HTTP lib and libtcc for FFI. The embedded version of Cyber takes up less memory.

Showing script time (orange), load time (gray), and peak memory usage. Load times <4ms are not labeled. Final metrics were computed after measuring each script body several times on a Macbook Pro M2. Details are documented in bench.cy. Versions used in benchmarks: Node 21.2.0, quickjs 2021-03-27, Python 3.12.0, luajit 2.1.17, lua 5.4.6, luau 0.604, ruby 3.2.2, wren 0.4, Oracle Java 21.0.1, wasm3 0.5.0

Fibers Start/Resume (VM) source

This tests spawning fibers and context switching.

cyber

8ms

27.8 MB

wren

15ms

8ms

28.1 MB

luajit

20ms

5ms

52.1 MB

luau

39ms

15ms

124.0 MB

quickjs

53ms

9ms

29.3 MB

lua

48ms

14ms

110.7 MB

node

22ms

41ms

65.2 MB

python3

34ms

44ms

28.8 MB

Recursive Fibonacci (VM) source

This tests how fast function calls are with a growing call stack.

cyber

19ms

2.9 MB

luajit

21ms

1.4 MB

wasm3

31ms

1.4 MB

luau

34ms

2.1 MB

lua

39ms

1.3 MB

quickjs

57ms

1.9 MB

wren

71ms

1.5 MB

java

44ms

31ms

35.2 MB

python3

70ms

15ms

10.2 MB

ruby

54ms

31ms

29.6 MB

node

56ms

39ms

31.9 MB

Recursive Fibonacci (JIT) source

cyber

5ms

2.9 MB

luajit

5ms

1.6 MB

luau

20ms

2.3 MB

java

3ms

31ms

38.3 MB

node

6ms

35ms

34.0 MB

ruby-yjit

13ms

30ms

30.1 MB

For Range/Iterator (VM) source

This tests basic iterations with counters and also iterable objects.

luajit

11ms

9.9 MB

luau

10ms

27.3 MB

cyber

12ms

20.3 MB

lua

27ms

26.6 MB

wren

44ms

9.5 MB

ruby

50ms

31ms

45.2 MB

quickjs

78ms

25.0 MB

node

57ms

43ms

96.6 MB

python3

87ms

23ms

57.9 MB

Max-heap Insert/Pop (VM) source

The max-heap was implemented using nodes instead of an array to test operations on objects.

cyber

40ms

4.9 MB

luajit

52ms

4.9 MB

luau

66ms

6.0 MB

java

43ms

30ms

36.0 MB

python3

69ms

15ms

13.3 MB

lua

82ms

5.7 MB

node

63ms

40ms

34.9 MB

quickjs

115ms

4.8 MB

wren

123ms

2.8 MB

Performance

Cyber was designed with performance in mind from the start. The language, type system, memory management, VM, and JIT were carefully considered to enable optimizations.

Crafty register VM.

Cyber's VM is register based so most bytecode instructions have a destination operand. This reduces the amount of cpu cycles and memory accesses compared to a stack based VM. Having registers as operands enables allocation strategies that reduce the amount of instructions the VM has to run.

Unlike physical registers, virtual registers are reserved as stack slots. This allows fibers to be swapped in and out by just replacing the current stack pointer.

Efficient call convention.

Call arguments are assigned unique stack slots which reduces copy instructions. The compiler also arranges the return slot to feed directly as an argument to a parent function call. This makes composing functions fast which is suitable for declarative programming paradigms. The call stack is also repurposed for storing call frames which increases cache locality.

In many dynamic languages, functions and fields are looked up in a hash map. In Cyber, they are indexed in an array by a symbol ID which is faster.

Inline caching.

Cyber optimizes instructions by patching bytecode at runtime. Object operations tend to involve more lookups and checks since values can have a dynamic type. By caching the lookup results in the bytecode, the instruction will run faster. In the rare case of a cache miss, the deoptimized version uses an MRU table for object types and symbols.

Compact values.

In Cyber, all values are 8 bytes and use NaN tagging to represent primitive types or heap objects. Having a compact value representation simplifies the data structures used in the VM. It's also easier to align them in memory to improve cache locality.

Small objects are allocated from object pools which is fast since they don't require much bookkeeping. mimalloc is used to allocate heap memory which has proven to be fast and reliable.

Fast dispatch.

Computed gotos allow the next bytecode to be dispatched using a jump table directly from the end of each bytecode segment. This does the least amount of work and leverages the cpu's branch prediction.

On the contrary, a switch statement performs additional bounds checks and funnels each bytecode segment to the same place which is unfavorable for branch predictions.

Precompiled JIT.

Cyber's JIT is implemented using precompiled stencils from LLVM. With only a small assembler, each bytecode instruction can be stitched together at runtime to generate performant machine code while still remaining fast to compile. More details can be found in the v0.3 Release Notes.

Compiled using Zig/LLVM.

Cyber itself was written in Zig, a system programming language that makes writing performant software easier. Zig leverages modern compiler features from LLVM which produces fast machine instructions for most targets.