RPC Part 1: Message Passing

Intro

In the previous post I discussed ordering, concurrency, and interleavings. In this post I’ll delve into some of the design goals for the RPC package that forms the backbone of all our software built on the MSC (Marymoor Studios Core libraries).

Distributed Systems

Traditionally we define a Distributed System as a set of program components executing on one or more computers all of which are connected together by a network. The components work together to achieve a higher level goal such as providing a service like a database or a game server. Most distributed systems appear logically as a single unit to their user. There is a well-studied field in computer science called Distributed Computing dedicated to the design, development, and understanding of distributed systems. Distributed computing offers a rich body of learnings and best practices for building robust distributed systems.

But, how is distributed computing relevant to game design? Sure, there can be highly scalable game servers, and some games have multiplayer systems. Obviously, developing those would benefit from distributed computing expertise. However, from my perspective, game design (or pretty much any modern software) has much more to learn from distributed computing than just that.

In my Ordering Post we discussed Activities and recognized that games can be made of multiple concurrent activities. Concurrent activities can be independent or can close over shared state (as we saw in the example where two activities shared the same “weapon charge” variable). Furthermore, we saw that when multiple activities share state there is a higher degree of interleaving nondeterminism making those programs harder to write and harder to debug. But if we only use pure independent activities (i.e. a bunch of concurrent components executing with completely separate state and never touching each other) then we don’t end up with a very interesting game - unless - they can talk to each other, say, through message passing of some kind!

If we define a distributed system slightly differently:

Distributed System

A system with multiple independent concurrent activities which communicate with each other by message passing so as to coordinate their operation to achieve a higher goal.

By this definition, all our games are actually distributed systems with many different independent activities executing concurrently and communicating with each other to implement the game logic. We can leverage the learnings from distributed computing to make games that are easier to build, more correct, more robust, and more bug free!

Global vs. Local Reasoning

Why should we choose message passing over shared state? What’s wrong with two activities both mutating the same variable? Nothing! In fact, most of our game components are made of multiple activities that close over the state of that component. But, as components grow larger and more complicated it becomes more and more difficult to reason about the correctness of activity interactions over the state they share.

Consider the case where the entire game has just one big global block of state and all activities making up the game are running concurrently and mutating that state as they go. Reasoning about the correctness of the program requires us to look at all of these activities together. Reasoning about the correctness of any one activity will NOT be sufficient to know even if there is a bug in that activity. Some other activity could, during its turn, mutate the shared state in a way that this activity doesn’t expect or doesn’t want. Looking broadly across many activities at the same time is called Global Reasoning. You have to think globally about the entire program (or a large portion of it) when deciding if the program logic is correct.

However, if the game were to divide the state up into smaller pieces, each could be owned by only a single component. Think of this as a set of islands of computation, each island has its own state and activities, but the islands communicate with each other through message passing. When we reasoning about the correctness of a single activity, we at most need to reason about only those other activities that have access to the same pieces of state (i.e. those on the same island). We can safely ignore all other activities in the program because they MUST be independent of the activity we are considering. This leads to Local Reasoning where we only have to look at a small subset of the program to determine if that part of the program is correct.

In general, local reasoning scales much better than global reasoning due to inherent physical limitations (human cognitive load, screen size when coding, test execution latency, etc.). Maximizing the opportunities for local reasoning while minimizing the places where global reasoning is required will make our programs easier to build, easier to maintain, and more correct.

These islands of computation, however, do need to communicate with each other. And when they do, they will need some mechanism for message passing. What properties and characteristics should we look for in a message passing system?

Message Specifications

Though there are different coding styles, personally, I strongly prefer a formal and explicit mechanism for message passing (like structured RPC) over informal or ad hoc ones (like key/value pairs, dictionaries, property bags, or untyped JSON). Formal mechanisms require you to define explicitly which messages a component can accept and what the shape of those messages will be. This is usually done through some written specification. Message specifications are great! Just writing the specification forces me to think through what is really needed. They formalize the information being exchanged. They clarify the requirements and assumptions of both the caller (sender) and the callee (receiver). They form a contract between the caller and the callee. And lastly, the specification itself becomes a physical artifact in source control that tracks the evolution of these contracts between components making their review and the avoidance of unintentional breaking changes easier. All of these things are particularly important when the caller component and the callee component are being written by different people or different teams or at different times.

As great as they are, a specification shouldn’t be a burden to write. It shouldn’t require an obscure syntax or an extra programming language that forces the developer outside of their normal flow. Ideally, it should just be like writing any other code. Anyone that has worked with me know that I am a big fan of type systems. So for me, the specification should integrate with the type system that the rest of the code uses. It should aid in development and encourage correctness by construction. Any fan of types systems will tell you: if you can’t write wrong code, then you can’t ship wrong code.

Message Interleaving

A good message passing mechanisms should provide some control over when incoming messages are dispatched. Remember our discussion from the Ordering Post regarding nondeterminism. If incoming messages can be dispatched at any time and in parallel with existing executing activities, then each incoming message will create additional interleaving nondeterminism. Ideally, new incoming messages are only dispatched between turn boundaries. This means that all message handlers will have the Top of Turn property. This property is particularly important in preventing reentrancy issues that so frequently occur in games (and other programs) when one component calls another only for it to immediately call back into the first. If these calls were regular functions instead of messages, or if the messages are dispatched in parallel with existing activities, then the incoming return facing message will reenter the first component, perhaps before it has restored its invariants. Whether this leads to corruption will depend on the particular interleaving choosen (leading us back to timing issues and rare interleavings). If locks are used to protect the state from safety issues, then reentrancy may also lead to deadlocks. By deferring message dispatches to the next top of turn, reentrancy and deadlock issues are made impossible by construction.

Location Independence

Another property I find important for dependency management and code reusability is what I’ll call Location Independence. Location independence is when the programming model exposed by the messaging system (for either the caller, the callee, or both) abstracts away the details about the relative location of the caller and callee. By “location” here we mean whether the caller and the callee are in the same SIP (the same scheduler), different SIPs in the same process (cross-thread), different processes on the same machine (cross-process), or different machines (cross-network).

The more the call model exposes types and concepts from the message system, the more the components must take a direct dependency on the message system itself. For example, if it exposes TCP network types in the call interface, then all components MUST take a compile-time dependency on the TCP libraries, even when they are only using messaging to nearby components. This negatively impacts component evolution, reusability, and build times. To achieve location independence a message system design must consider a few key questions. What is the shape of the call model? How does it represent arguments and return values? How does it track completion? How does it represents errors?

When I think back to early RPC systems, I remember that they tried to make “remote calls” look like “local calls”. This quickly exposed several ugly seams. First, remote calls have significantly different latency characteristics. Second, remote calls have significantly different failure modes. The local call model doesn’t provide good extensibility points to represent these extra failure modes. And local calls (regular functions) block the stack until they return. Long waiting blocking calls are a poor fit to modern system with asynchronous IO (as we discussed previously in the Ordering Post). Modern RPC systems seem to go in the other direction. Typically they give up on the local call illusion entirely and instead simply provide an async remote call model that supports remote failure modes, but exposes all the details of the remoting wire protocol. With either of these solutions it remains difficult to switch between the local and remote models within a single piece of code. Each has it own unique style, types, and requirement that permeate the code.

In my ideal world, calls to nearby components would look and behave identically to remote calls to far away components (with the exception of the actual latency). This would guarantee that the code for the caller will look no different whether the call is nearby or far away. The differences in the programming model, such as how errors are handled, how completion is tracked or awaited, or how the results are consumed, MUST be minimized or eliminated. This would make it possible to easily change the topology of connected components - perhaps even dynamically at runtime! To move a component from the same SIP to different SIPs (cross-thread) only if the machine happens to have more cores. To move a component to a child process to achieve greater dependency isolation during an evolution. To move a component to another machine to achieve greater scale. To move a component behind a load balancer to achieve fault tolerance. All of these topology changes could be made without altering the implementation of either the caller or the callee. This would provide powerful degrees of freedom for evolving components and for component reuse.

Scaling and Performance

If the messaging system is easy to use, and provides benefits to correctness and maintainability, then I’d want to use it everywhere in my code. I’d want to use it at all component boundaries. If I didn’t do that then I wouldnl’t be able to take full advantage of location independence when I later decide to refactor the topology of my game. But is it possible to use everywhere? What is the cost? Do those costs scale proportionally? How does the cost of sending a message scale along the spectrum between the nearby (same SIP) case and the far away (cross-network) case? How does the latency scale? How much overhead is there on the nearby side of the spectrum (i.e. is it pay-for-play or a high upfront cost)? How impactful is the programming model on the design of the callee?

Ideally, when the caller and callee are nearby, the cost of sending a message is very cheap, the latency is very small, there are minimal upfront costs, and the impact to the callee’s programming model is negligible. Unfortunately, for most modern messaging systems the upfront costs are high, the latency is not proportional, and the impact to the callee programming model is extremely asymmetric, requiring complex listeners with networking, security, and resource contention considerations. In short the complexty is upfront and in your face in the exposed programming model. You wouldn’t natural reach for these tools to talk to another component in the same SIP or even the same process. Which means you’ll end up with one coding style for nearby messaging and a very different one for far away messaging. If you need to change the topology (or support more than one), then you’ll have to rewrite the code.

Testability and Logging

Despite all the advice I can give on managing order, the principles on controlling nondetermism, the techniques for state isolation, writing distributed systems can be hard. Therefore, efficient and low cost diagnostic logging is essential.

An important observation about a system formed from independent components that communicate only through message passing is:

A given sequence of messages represents the entire interaction with a component.

If the component has a high degree of determinism, then a particular playthrough of that component can easily be reproduced by applying the same initial conditions and a replay of the message sequence. This is true regardless of the internal states of all other components (because an independent component takes no direct dependency on the state of any other component).

This means that independent components designed around message passing can easily be tested in isolation. Unit tests need not execute or even instantiate any other components. They need only reproduce the desired message sequences. Being able to test large parts of the game without having to actual execute the game is a huge aid to correctness. If we can leverage location independence to write small, fast unit tests for components that might be deployed in production in more complex topologies, all the better for test development, execution, and maintenance.

Ideally, the logging produced by the messaging system also supports multiple levels of detail that can optionally capture either only the message sequence (without payloads to preserve privacy) or the full message content (for easy debuggability and replay). If logging, at least the most basic level, is cheap enough that it can be turned on in production scenarios, then it can be a valuable tool is diagnosing customer issues in the field. In a system built of independent component, the message system’s loggging can provide significant insight into a customer’s issue without additional diagnostics. If the logging system captures and reports in near real-time it can also be used interactively during development and testing to provide insight into whether the code is executing as intended.

Promise RPC

So to summarize, we are looking for a message passing system that:

Has formal explicit specifications. Is super easy for C# developers to use. One that doesn’t require learning a new specification language or using unusual 3rd party tools.
Works with our existing scheduler. Does’t introduce new interleaving nondeterminism, reentrancy, or locking issues.
Delivers location independence such that we can write components once for both nearby and far away use cases.
Is cheap and easy enough to use between components in the same SIP that we wouldn’t hesitate to use it everywhere.
Scales up enough to handle cross-thread, cross-process, and cross-network scenarios without introducing a lot of additional complexity.
Works in unit testing for component isolation and validation without incurring a lot of compile-time dependencies.
Has an efficient logging system with near real-time interactivity.

This is quite a daunting list of requirements. Honestly we weren’t able to find one that quite met all of our needs. So, we wrote our own. From this requirements list was born the Promise RPC library. Together with the Promise library discussed in the previous post and other foundational technologies, it forms the MSC (Marymoor Studios Core).

Promise RPC provides formal explicit specifications through the definition of C# interfaces with the [Eventual] attribute. The specification language is just normal idiomatic C# supporting almost all familiar syntax features including method overloads, optional arguments, nullable reference types, parameter attributes, doc-comments, and even generic interfaces. Flexible parameter modeling support (provided by the MSC Serialization package which I’ll talk more about in a future post) allows parameters of any primitive type, user-defined data contract types, sequences, byte streams, and capabilities (which I’ll also talk more about in another post).

The Promise RPC system includes a Roslyn-integrated code generator that automatically generates proxy and stub types during the execution of the normal C# compiler toolchain. By simply including a reference to the generator Nuget package, Visual Studios (or any dotnet SDK-based build system) generates proxy/stub code on-demand as you type, provides full completion, and doc-comment-based intellisense. Due to this unique compiler integration, eventual interfaces can be both defined and implemented in the same assembly making them simple and easy to use on the fly in production code or tests.

All eventual methods return Promise-based asynchronous values that isolate both return value tracking and failure discovery. The shape of these methods exactly matches the shape of local, Promise-based, async methods providing true location independence. This also integrates callers with the Promise Scheduler (discussed in the previous post) and supports call concurrency, completion tracking, awaiting, and first-class Promise-based value composition. The generated stubs provide abstract classes that can easily be derived from and support async method implementations for message dispatch. The RPC dispatcher is also transparently integrated with the Promise Scheduler and dispatches incoming messages as top-level activities. Like any other Promise-based activity, these provide the same top-of-turn, single-threaded, cooperative, semantics with guaranteed freedom from reentrancy and locking issues.

Behind the scenes, the runtime provides automatic routing and guaranteed at most once message delivery. Local calls (same SIP) are very cheap, incurring only two allocations (a resolver for the result and a message class). Cross-thread and cross-process dispatches proportionally add serialization and in-memory queuing. Networking overhead is incured only by messages that actually go remote. This cost profile makes it easy to use RPC everywhere in our designs for component isolation. This concentrates any concerns over topology configuration into a few small places in the program where the SIPs and channels are orchestrated. Once the topology is established, none of the components have runtime visibility into that topology and so take no dependency on it.

Example

As an example of how all these pieces come together to make writing games easier for us: Consider the impact to the component that implements the game logic when orchistrated with the three multiplayer modes: local coop, LAN hosted, and remote game server. They are all implemented using the exact same IGame interface and game component implementation. The key differences lie in where that component is instantiated.

In local coop mode the game component is instantiated within the same SIP as the display which renders the game board and interacts with the players. The display is given only an IGame proxy for the game instance. In the LAN hosted mode, the game component is instantiated in another SIP running on another core but in the same process as the display. The other SIP also hosts the networking component which accepts connections from the remote players. The display is still given only an IGame proxy to the game instance, calls on which now lead to cross-thread marshalling. The display component does not know the difference. In the remote game server mode, the game component is instantiated on another machine, but the display is still given only an IGame proxy which now leads to cross-network communication. The display component does not know the difference.

We leverage the Promise RPC system to write one implementation of both the display and game components. We test these implementations independently of each other and independently of the actual game topology at runtime. We leverage location independence to dynamically choose a topology at runtime that meets the needs of the current game mode.

Conclusion

In this post we discussed distributed systems and message passing mechanisms. We itemized some desirable properties that we’d look for in an ideal message passing system. And we introduced the Promise RPC library which attempts to meet some of these requirements.

This is only Part 1 of our look at the MSC RPC system. In Part 2 we will continue to dive deeper. We’ll take a look at how Ordering once again rears its ugly head. We’ll see how message order impacts RPCs and then provide some solutions. In future posts, we’ll also look at some unique features of Promise RPC including capability exchange, streaming with sequences and bytes, and finally we’ll see how channel lifetime relates to aborts and cancellation. Until next time, code on!

Read the previous post in this series.

Read the next post in this series.

Feedback

Write us with feedback.