CS2D: JIT-powered parallelism

If you'd like to skip right to the code, scroll down to the "Recipe" section.

In 2020, I thought it would be nice to revisit the game that shaped at least the first half of the previous decade of my life. It's a little crazy to realise how far it's come since the early 2010s: full 1.0 release, published on Steam, and a whole slew of Lua scripting tools added (I even worked on some in 2017-2018, so I'm especially fond of them). A significant part of the community has graduated from CS2D onto other games and, indeed, other work, as with yours truly – but there is still fresh blood coming in, and older members of the community making new things on a whole new level.

One such tool, recently publicly released, is CS2D JIT by MikuAuahDark, a genius tool that replaces CS2D's built-in BlitzMax Lua 5.1 engine with LuaJIT. This increases Lua performance massively, and adds working support for external C modules and FFI.

External code support, whether in the shape of C modules or FFI, makes it possible to create a whole new class of scripts; however, there is one big problem. Due to the limitations of BlitzMax and CS2D's aged codebase, both the dedicated server and client are compiled without multithreading support. This means that not only does CS2D itself run in a single thread, but any attempts to create native threads (e.g. pthread) from within CS2D fail. This makes many tasks, like listening to messages in a queue or network, impossible to perform within a CS2D Lua script – busy-waiting would lock up the game and make it impossible for players to actually play.

It's possible to have companion programs running alongside your server and communicating with it via file I/O, or using some sort of orchestrator that CS2D can poll using reqhttp/httpdata. But how far can we go within CS2D itself, not using any external programs or resources?

There is a fairly simple answer: with LuaJIT's FFI support, it becomes trivial to fork the server process and perform busy-waiting async tasks there. The question then is, how do we have the parent and child processes communicate? Probably the simplest (and most portable) option is using file I/O. Libraries like luafilesystem can make working with files even easier, and using lockfiles will prevent race conditions. However, in my experiments, I wanted to go further and use as few external resources as possible.

The setup I used for my experiments was a Linux host running my CS2D JIT Docker image, which meant I didn't have to make my solution cross-platform. This led me to another fairly simple option: shared memory via the mmap(2) system call. I envisioned a simple reusable library: create a shared memory buffer, use it in parent and child processes for reading and writing data. Simple and volatile, no need to create and clean up files.

I spent a while trying to get it to work using LuaJIT's FFI, but sadly kept running into segfaults. I am not particularly skilled in low-level programming, so I may have been doing something wrong, but eventually I gave up on trying to use FFI for it and built a simple C binding. I have published this binding as lua-shmem and made it available in LuaRocks under the same name.

Once I had my shared memory binding, I could finally make the example script I had in mind: a very simple UDP server running in a child process, which prints inbound messages directly to the in-game chat.


Recipe

The easiest way to reproduce this recipe is using a configured engin33r/cs2djit container. I used the following libraries available from LuaRocks:

  • luasocket (UDP server)
  • lua-protobuf (formatting interprocess messaging)
  • lua-shmem (shared memory)

If Docker is not an option, make sure you have a Linux environment (this is tested on Debian 10) fully configured for running CS2D; this involves installing 32-bit libraries (gcc-multilib is the easiest option) for running CS2D, building CS2D JIT, and installing and configuring LuaRocks to compile 32-bit libraries. This setup process will likely be the subject of a separate blog post.

While not strictly necessary, I made a reusable component for spawning child processes for this project. I may upload it to the file archive if there is enough demand. For now, you're free to use the code below:

local ffi = require("ffi")

ffi.cdef([[
typedef int32_t pid_t;
pid_t fork(void);
]])

local function spawn(func)
    local pid = ffi.C.fork()
    if pid < 0 then
        error("fork failed " .. ffi.errno())
    elseif pid == 0 then
        func()
        os.exit(0)
    end
end

return spawn

Saving this as sys/lua/spawn.lua allows you to use it as a library via require. This will be our final requirement.

Setup

Let's start by importing all of our libraries.

local socket = require("socket")
local pb = require("pb")
local protoc = require("protoc")
local shmem = require("shmem")
local spawn = require("spawn")

We'll set up a shared memory buffer, and define a protobuf message type that describes a "chat message".

local mem = shmem.new(4096) -- 4 kilobytes should be enough
local proto = protoc.new()
-- Load type definitions into pb
proto:load([[
message Chat {
    required string name = 1;
    required string body = 2;
}
]])

Let's set up the UDP server now. Since our spawn module takes a simple function which serves as the body of code that will run in the child process, we'll just make the whole server a local function. We will use IP:port as the name for our Chat message, and the packet body as the body, and we'll put the binary-encoded protobuf message into the shared memory.

local function udpServe()
    -- Listen on all addresses, port 12345, never time out
    local udp = socket.udp()
    udp:setsockname("*", 12345)
    udp:settimeout()
    while true do
        local data, ip, port = udp:receivefrom()
        shared:write(pb.encode("Chat", {
            name = tostring(ip) .. ":" .. tostring(port),
            body = tostring(data)
        }))
    end
end
spawn(udpServe) -- Start the server as soon as the server is initialised

Finally, on the parent process' side, we'll set up the code that listens for these messages and prints them out into the in-game chat. Since our UDP server "publishes" messages to our shared memory "message queue", we need to set up a "consumer". Because busy-waiting is impossible in CS2D without locking it up, we'll use the ms100 hook here.

addhook("ms100", "__test_queueconsume")
function __test_queueconsume()
    local mesg = shared:read()
    if (#mesg > 0) then -- Only try to decode if there is a message waiting
        local chat = pb.decode("Chat", mesg)
        msg(chat.name .. ": " .. chat.message)
        shared:clear() -- We have consumed this message, remove it from shared memory
    end
end

Once this script is running on our server, we can test it out using a simple bash command in the CS2D JIT Docker container to send a message:

$ docker exec -it cs2djit bash -c "echo 'hello' > /dev/udp/127.0.0.1/12345"

On the server side, we'll see something like this in the chat (the port will be different):

127.0.0.1:33927: hello

Success!


Closing thoughts

  • The "message queue" implementation above is absolutely trivial. If there are several packets received between two runs of the ms100 hook, only the latest will be parsed. You may be able to use always for an interval closer to 15 ms (at 60 FPS), but the same caveat applies. Some sort of FIFO mechanism with locking for the shared memory would circumvent this; I opted to skip that to keep it simple.
  • Protobuf is overkill for this particular use case. However, protobuf is a good solution for cross-platform communication that doesn't use too much traffic thanks to its binary encoding. You can build upon this scaffold to implement a proper communications channel using your own standardised protobuf message types.
  • The child process is essentially a clone of the CS2D dedicated server at the moment of forking, and has a whole parallel game state; this is why we can't use msg in the child process directly. I have not tested for any inconsistencies caused by the existence of a parallel game state, but there should not be any significant errors.
  • EDIT: Thanks to MikuAuahDark for pointing out the mistakes in my shmem FFI implementation. The lua-shmem package is still available, but you can use the FFI version for better performance and one fewer binding to compile.
  • Feedback always welcome!

Full code

local socket = require("socket")
local pb = require("pb")
local protoc = require("protoc")
local shmem = require("shmem")
local spawn = require("spawn")

local mem = shmem.new(4096) -- 4 kilobytes should be enough
local proto = protoc.new()
-- Load type definitions into pb
proto:load([[
message Chat {
    required string name = 1;
    required string body = 2;
}
]])

local function udpServe()
    -- Listen on all addresses, port 12345, never time out
    local udp = socket.udp()
    udp:setsockname("*", 12345)
    udp:settimeout()
    while true do
        local data, ip, port = udp:receivefrom()
        shared:write(pb.encode("Chat", {
            name = tostring(ip) .. ":" .. tostring(port),
            body = tostring(data)
        }))
    end
end
spawn(udpServe) -- Start the server as soon as the server is initialised

addhook("ms100", "__test_queueconsume")
function __test_queueconsume()
    local mesg = shared:read()
    if (#mesg > 0) then -- Only try to decode if there is a message waiting
        local chat = pb.decode("Chat", mesg)
        msg(chat.name .. ": " .. chat.message)
        shared:clear() -- We have consumed this message, remove it from shared memory
    end
end
Show Comments