Intimacy at scale: Building an architecture for density

By Rob Whitehead, Improbable Co-founder and Chief Product Officer.

If you were lucky enough to be part of our ScavLab event, you’ll have witnessed some thrillingly unusual spectacles, from tumbling masses of players powersliding down treacherous sheets of ice, to an epic battle with 10,000 AI-controlled zombies (aka ‘Thresh’) that rained from the sky.

It was an unprecedented event that seemed to teeter between orchestration and chaos. At one point ScavLab was hosting a peak concurrent of 4,144 players, and at any one moment, you could see hundreds of avatars – each representing a live, participating human – swarming in massive groups.

But what’s happening behind the scenes in ScavLab, and why did we do it?

Scale and the problem of density

Since Improbable began, we’ve always been about creating large-scale game experiences.

By large-scale, we mean where the total number of players, AI entities, and physical objects is much larger in number than typical game worlds. We’ve been able to achieve much higher scale game worlds than a typical game server could handle, working on projects with over 10,000 players able to connect into the same space.

However, the natural question that always comes up when designing games at large scale is about density: what happens when all of your players try to gather at the same point in the world?

The problem with creating dense multiplayer game worlds – those where every player can see and interact with every other player at the same time, especially at the fidelity of action games like PubG, Fortnite and Apex Legends – is that it’s a full-stack problem that can’t be solved with a single technical component.

So any solution needs to address unique requirements that touch pretty much every part of the game tech stack – areas such as:

Networking. How data about the world is sent from the server out to clients – the required data grows quadratically as you scale up (10x players = 100x more data).

Rendering. Drawing huge numbers of animated characters on screen, each with unique animation states and potential customisations, while integrating with existing content pipelines.

Simulation. Building a server architecture that can handle the density of the game logic, AI and physics of a world, while retaining a development experience that’s familiar to designers and gameplay programmers.

Orchestration. Scaling server infrastructure that can scale and adapt to the changing compute requirements.

Strangely, in the midst of the Covid-19 pandemic, it was a fitting time to start grappling with the difficulties of density. As we all adjusted to social distancing and hunkered down to remote working, it seemed a shame how most virtual worlds shared the same limitations as the real world – it was hard for people to share the same space at the same time.

Building an architecture for density

So in 2020 we assembled a multidisciplinary game prototyping team to explore a high density social environment – combining our knowledge across the areas of networking, rendering, simulation and orchestration.

The result was a UE4-based shared social environment to handle 10,000 player connections, where every player updates at up to 30hz and can see every other player in real time.

The architecture behind it was built upon existing technical foundations with new components designed specifically for high-density gameplay: namely high-density networking and large-scale character rendering.

From idea to application

We worked to integrate this new architecture into the Scavengers game world. The result is ScavLab.

ScavLab and its technology allows us to do something we’ve never been able to do before: experiment with massive interactive events that involve the game’s community, running alongside the main Scavengers game.

It’s relatively uncharted territory for game design, and so the things we’re playing with don’t always go so smoothly in these events. But that’s fine – it’s all part of the game. The important part is that we get the chance to tweak our game ideas and follow the fun with our player community.

Creating an intense emotional reaction

ScavLab is the perfect place to play with new game ideas at high scale and density. Bernd Diemer is a Creative Director who worked on designing ScavLab events, and was inspired by how the technology “allowed us to create the emotional magic of a live crowd.”

“Imagine a virtual world that can give you the intense emotional reaction of being in a crowd of people experiencing a large-scale event” said Bernd. “It’s a unique, ephemeral feeling that we described internally as ‘intimacy at scale’.”

For the ScavLab events, Bernd wanted players to “feel the goosebumps you get when you experience something amazing in a crowd of people – that shared feeling of being able to react and participate in that moment.” It’s this feeling – enabled by our architecture for density – that ScavLab is creating and experimenting with in our live events.

A deep dive into density

Earlier, I explained how simulating density requires a full stack solution across four main components: rendering, networking, simulation and orchestration. At a deeper level, what’s going on under the hood in ScavLab to make this possible?

Networking

Networking is primarily the problem of ensuring that every player in the world gets the remote data to see what they need to see – simulated entities and other players – when they need to see it.

As more players are added to a single game world, special techniques are used to prioritise and limit what each player can see to only the area near them – these techniques are often called Interest Management, Net Relevancy or Net Priority.

For most game worlds, there is a distance beyond which you stop receiving any updates for players or other entities – this is usually a critical optimisation for building high scale game environments.

Creating a space where every player can see every other player is very much a degenerate case for typical game networking, and requires the problem of raw networking performance to be tackled head-on.

The issue is, going from a 100 player full-density space to a 10,000 player density space requires 10,000x more updates to be sent from the server.

So even if you need only infrequent updates from most of those players (for example, two updates a second), we’re still talking about hundreds of millions of updates a second that need to be sent from the server.

Using traditional networking approaches, sending data updates for so many objects at once isn’t possible.

Additionally, there are client-side bandwidth constraints to think about. The typical bandwidth consumed by playing a traditional large-scale action game is roughly ~100MB per hour (222 kilobits per second).

If we were to naively send UE4 FRepMovement (quantized rotation + position) updates for 10,000 players at 2Hz, we’re looking at ~1,920 kilobits per second – almost 10x traditional games bandwidth. Any system working at such a scale would need to perform large amounts of compression to fit within typical bandwidth consumption.

The networking technology at the heart of the ScavLab experience is explicitly built for such high density action experiences – where every player updates often, and can see a huge number of other players. While the technology is still in its early stages, we’re seeing some promising performance characteristics:

Every player is able to see every other player, NPC and object. The fidelity of what you can see varies every second, depending on the relative priority of each entity to the player.

Players receive updates from other players between 10Hz and 2Hz. Updates are received at 10Hz for ~500 high priority characters, and at 2Hz for ~9500 lower priority characters.

The backend sends over 250 million networking updates a second at 10,000 players. Note this figure grows quadratically (for example, adding 10x players needs 100x more updates), so a 10,000 player game would require roughly 10,000x more updates than a 100 player game.

Player bandwidth peaks around 350 kilobits a second in a 10,000 player test. This figure is comparable to the bandwidth for typical 64+ player arena games like Apex Legends and Battlefield V.

UE4 actor replication workflows. Gameplay programmers and technical designers can continue to author network-aware code using blueprints and C++, marking UPROPERTYs and UFUNCTIONs as replicated.

Rendering

The task of rendering tens of thousands of instances of fully animated characters is a slightly more well-trodden path than networking, it still remains a relatively niche set of techniques that are used largely by large-scale RTS titles like the Total War series and Ultimate Epic Battle Simulator.

While modern commercial game engines can easily render tens of thousands of instances of simple geometry such as foliage or debris, character animation and rendering is still heavily CPU-based, and the out-of-the box techniques start to hit their limits around hundreds of characters on screen.

To create the high-density scenes of ScavLab, we combine multiple character rendering techniques together. For a 10,000 player demo, the techniques range from the full character UE4 blueprints used in the main game mode for the most relevant ~50 players through to a very efficient low-fidelity technique for the least relevant ~9,000 players.

Crucially, this was achieved by creating an automated pipeline that takes a traditional UE4 Skeletal Mesh and generates these efficient intermediate character representations.

Server-side simulation

Many subsystems of existing commercial game engines are single-threaded at their heart, and are not built to sustain the type of scale we want to achieve in ScavLab.

The ScavLab project makes use of a number of different techniques to scale the server-side simulation to the level we needed:

Multi-threading Actors. Gameplay APIs provide a ParallelTick method that allows for expensive computations that need to be performed every frame to be easily offloaded to the TaskGraph. This was best for computationally expensive, but relatively isolated computations.

Lightweight Actor Representations. The traditional architecture of a UE4 application has the server running a near-identical representation of Actors in the world to what clients see. We found it was often preferable to have a more efficient lightweight representation on the server – for example, the data pickups in the map.

Dynamically scaled simulation servers. For gameplay systems that require use of the existing high-fidelity UE4 simulation such as AI characters and physics, the system can spool up and down additional simulation resources.

Orchestration

Live events require a lot of compute and have to be very resilient. We developed several ways to dynamically scale that compute and improve the resilience of the server-side simulation:

Leveraging Scavengers’ server orchestration platform. We used the Kubernetes-based Improbable Multiplayer Services server orchestration as a baseline. This allowed us to extend what was a typically smaller-scale, session based orchestration system to additionally support super-scale event spaces while being able to administer it as if it was part of the main game

Dynamically scale server resources. The underlying simulation architecture of ScavLab allows the dynamic addition and removal of compute resources. The actual requisitioning of resources was handled by the underlying clusters themselves – scaling from zero to 100s of simulation processes at the crescendo of our ScavLab event with 10,000 Thresh zombies raining from the sky.

Failover and hot backups. Building live events to host thousands of players requires seriously robust and reliable game server hosting – a feat we were able to achieve using dynamically scaled compute and new server failover strategies.

Scale testing and simulated players. A critical part of running live events of this scale is our ability to regularly test with large simulated workloads. We created a simulated player framework which allows us to test the load of over ten thousand players at the event. We run these tests multiple times a day to ensure the events are performant.

These components – the networking, rendering, server-side simulation and orchestration techniques – combine to allow many things to happen at once, and open up many ideas for the future.

There is still so much to discover in terms of the architecture’s potential, not least in exploiting the realm of massive interactive live events (MILEs), where mastering fidelity and density at scale are critical to success.

An architecture for the future

On 29 May 2021, with ScavLab fully operational, we moved from the hypothetical to the real. Over 4,000 people joined and played in the event simultaneously on production servers in a live game environment, and loved it.

The ScavLab events show us new ways to play and have fun with these large-scale, densely populated worlds. It’s a momentous time, and we’re super excited with what we’ve achieved so far – it’s surpassed many of our expectations. And seeing 5,000 Thresh zombie AI accidentally get lured into a jump cannon and catapulted across the map never gets old!

The ScavLab teams ought to be proud of what they accomplished. I'm hugely proud of their achievement and I'd like to thank the teams for all the hard work, creativity and passion they've put into building these amazing events – I'm really looking forward to seeing where we'll go next.

Needless to say, we’ll continue the live experimentation that the new architecture allows with unprecedented density and fidelity – especially the possibilities it opens up for new kinds of games, new modes, virtual social spaces and the future of live interaction on a large scale.