RPC Part 3: Capability Exchange

Intro

In the previous post we examined the phases of an RPC call’s lifecycle and how they related to message ordering. We saw how non-sequential ordering was desirable in some cases, but undesirable in others. Finally we proposed a rule, Interface Order, for when dispatch order should match call order. The rule supported developers’ intuitions about causal ordering and therefore increased the likelihood of code correctness. In this post we will further refine our ideas around ordering to a narrower and yet more powerful scope, and introduce a new concept called capabilities.

Instance Order

In the previous post we talked about using Interface Order to impose a total ordering on the messages sent to a chat room embedded in a game lobby. The ordering guarantees prevented undesirable reordering of either two messages from the same user, or replies by other users. For this discussion let’s simplify this example by defining a pure chat interface (without the game lobby parts). This might look like:

[Eventual]
interface IChat
{
  Promise Chat(string message);
}

Assuming c is a ChatProxy, we can write code like:

await c.Chat("Let's get going.  We need to:");
await c.Chat(" * Put gas in the car.");
await c.Chat(" * Drive to the event.");

(It might be unfortunate if those last two messages were reordered!)

What if we want to support more than one chat room at the same time (perhaps with different groups of people, talking about different things)? We’ll need some way to indicate which chat room each message was intended for. We could add a parameter to the Chat method:

[Eventual]
interface IChat
{
  Promise Chat(int chatRoomId, string message);
}

This works… but has some issues. How does the caller learn the id? Who allocates the id? How do you prevent the same id from being used for multiple rooms at the same time? When a room is destroyed, how to you reuse the id (so you don’t run out)? During message dispatch, how do you find the data structure that manages that specific room’s data? How do you prevent malicious users from guessing the id of rooms they aren’t a member of? I’m sure you can imagine solutions to some of these problems, but let’s ignore them for now and look at how this interface might be used. We can now write:

c.Chat(videoGameReviewRoomId, "Marymoor Studios [b]rocks[/b]!");
c.Chat(cookingRecipesRoomId, "No, the secret is to use half-and-half instead of milk!");

Interface Order guarantees that all calls to the same interface are ordered. These are both calls to the IChat interface. But, do we really need our messages to the Cooking Recipes room to be ordered with messages to the Video Game Review room? Probably not. It would be better if our ordering guarantees applied only to the chat room, not the entire interface. We could define two different interfaces:

[Eventual]
interface IVideoGameReviewChat
{
  Promise Chat(string message);
}

[Eventual]
interface ICookingRecipesChat
{
  Promise Chat(string message);
}

Now we have the ordering scope that we want. But this clearly won’t scale. We don’t really want a static set of chat rooms defined at compile-time. We want the system to be able to create (and destroy) new chat rooms at will, but we still want to get ordering guarantees. What we want is more than one Instance of the same Interface. Then we could define Instance Order instead of Interface Order. Instance Order says that calls are ordered if they are made on the same proxy instance (rather than to the same Interface).

If we had instances then we can go back to the original IChat design we started with above, but with two instances of it instead of one. We would have two proxies, each of which is of type ChatProxy, and we are able to simplify the interface once again by removing those troublesome chatRoomId parameters. We can write this code:

await videoGameReview.Chat("Marymoor Studios [b]rocks[/b]!");
await cookingRecipes.Chat("No, the secret is to use half-and-half instead of milk!");

Now, we get exactly and only the ordering that we desire. All calls made on the same chat room are ordered, but calls made to different rooms are independent.

Capability Exchange

But, how did we get the two proxy instances to begin with? The caller can’t just create them because the callee has to “know” which chat room each refers to. The caller needs the callee to provide them imbued with the proper chat room association. To make this possible our message passing system will need a new feature not yet discussed: Capability Exchange. Capability Exchange allows the payload of a message (including response messages) to carry a Capability.

Like Distributed Computing, there is an entire field of computer science focused on studying the design of secure systems through the use of Capability-based Security. A full examination of capability-based security is beyond the scope of this devlog. But for the purposes of our discussion we can define a Capability as:

Capability

A communicable, unforgeable, reference to a (possibly remote) object along with an associated set of access rights.

Communicable merely means that it can be transferred (in our case through the message passing system). The Capability Exchange feature of our message passing system makes this possible.

Unforgeable means that you can’t create one without already having access to the object it refers to. This implies that there is no way for a malicous caller to create a capability reference to an object that the server didn’t give them. There are many ways to create unforgeable tokens (e.g. crypto, handle tables, block chains). For our purposes it is not important specifically how the message passing system implements unforgeable tokens, only that they are unforgeable.

The associated set of access rights means you can perform actions on the object by using the reference. In our case each capability is a strongly typed proxy, and the Interface type of that proxy defines a set of methods, and the set of methods are exactly the set of actions that you are authorized to perform when holding the capability. A capability is an example of what is referred to broadly as a bearer token (prior to the term’s popularity in HTTP and OAuth in referencing a specific format). A bearer token has the property that just holding it (or bearing) is all you need to be authorized to use it for whatever it allows you to do. There are no other access checks, and the token itself is not bound to any specific identity. Authorization is controlled by deciding who to give the token to and who not to. A bearer token is like a key that opens a door. If you have the key then you can open the door, and if you don’t you can’t, but the door doesn’t know the identity of the person that opens it.

Lastly, the object referenced by the proxy is a unique instantiation. When the message passing system dispatches a call it implicitly includes a reference to the target object automatically (like the this pointer in an object-oriented programming language). The object pointed to by a capability has 4 interesting characteristics:

Identity:
It has identity. That is, one object instance is distinct from any other object instance (even another of the same type). This implies that each has its own memory allocation and so its own copy of any state variables. Sending a message to one instance is independent of sending a message to another instance.
State:
It has state. The object closes over its own state variables, which can be accessed within its actions when they are dispatched (via the implicit object reference provided by the message passing system).
Lifetime:
It has a well-defined lifetime. Once all references (including remote capabilities) to an object have been discarded, the object is no longer reachable and can itself be discarded.
Actions:
It has a well-defined type whose specification is provided by the Interface that it implements. The object MUST provide an implementation for all actions that might be made on it through the capability (and subsequently dispatched by the message passing system). As we saw in the post RPC Part 1, Promise RPC’s code generator defines an abstract class from which to derive an implementation for each Interface. An instance of this implementation class statisfies the contract of its corresponding capability.

Capabilities and their objects are a flexible modeling tool. Multiple capability instances of the same type can all point to the same object, creating distinct unordered message sequences. A single object can simultaneously be referenced by multiple capabilities of different types providing different sets of authorized actions (essentially different views or projections over the larger set of actions defined by the whole object). Capabilities can be encapsulated where one capability targets another capability as its object. This is a common technique for creating a winnowed capability (one with a refined or reduced set of actions) which can be subsequently transferred downstream to a less authorized party.

If our message passing system supports Capability Exchange, we can define a second interface for our chat server:

[Eventual]
interface IChatRoomManager
{
  ChatProxy CreateRoom(string name);
}

This then provides more insight into the code we saw above:

ChatProxy videoGameReview = crm.CreateRoom("Video Game Review");
ChatProxy cookingRecipes = crm.CreateRoom("Cooking Recipes");

await videoGameReview.Chat("Marymoor Studios [b]rocks[/b]!");
await cookingRecipes.Chat("No, the secret is to use half-and-half instead of milk!");

The CreateRoom method creates a new distinct ChatRoom object and then returns a capability to it. The Chat method dispatches a new activity which includes an implicit reference to the target ChatRoom object. In Promise RPC, Chat is an abtract instance method on the ChatRoom object that is called by the message passing system within the dispatch activity. The this pointer gives easy access to the object’s state.

Capability Arguments

In the above example, we created new chat rooms and then use their capabilities to send independently ordered messages. But how do other members participate? We need a way to give another user access to a chat room’s capability.

To explore how we might do this, let’s define an Interface to represent an individual chat user:

[Eventual]
interface IUser
{
  Promise Invite(ChatProxy room);
}

Unlike the previous examples where a capability was returned from a method, the Invite method takes a capability as an argument. Capabilities can be exchanged in either direction by the message passing system. Now we can invite other users to participate in the chat rooms we created by passing them the chat room capability:

ChatProxy videoGameReview = crm.CreateRoom("Video Game Review");
await alice.Invite(videoGameReview);
await bob.Invite(videoGameReview);
ChatProxy cookingRecipes = crm.CreateRoom("Cooking Recipes");
await bob.Invite(cookingRecipes);
await carol.Invite(cookingRecipes);

await videoGameReview.Chat("Marymoor Studios [b]rocks[/b]!");
await cookingRecipes.Chat("No, the secret is to use half-and-half instead of milk!");

Bob is a member of both rooms, while Alice and Carol are only members of one each.

Object Id Comparison

How is a capability any different than a plain old object id (like the one we saw as the chatRoomId parameter in the example above)? Remember we skipped over a set of possible issues with object ids earlier. Let’s revisit those issues and see if our capability solution addresses any of them.

How does the caller learn the id?
The caller is given a capability through an earlier exchange. In the above example, the CreateRoom method returns a capability. Of course, CreateRoom could just as easily return an object id, so this is not significantly different.
Who allocates the id?
Unlike with an object id, there is no program-visible number in the capability programming model. Ids don’t need to be allocated by the server. The message passing system is responsible for keeping track of capability and object identities.
How do you prevent the same id from being used for multiple rooms at the same time?
Again, with a capability design there is no program-visible id. The server implementation doesn’t have to do anything to ensure that a capability is unique. The constructor for a capability takes a direct reference to a target object of the appropriate type. So, only the server (which has direct access to the target object) can create a new capability. Once created, capabilities must be passed around as part of a capability exchange. The message passing system then properly associates that capability with its target object, automatically.

When using an explicit object id, the server would have to keep a table somewhere that lists all object ids in use so that when allocating a new id it would know not to allocate the same id to two different objects.
When a room is destroyed, how to you reuse the id (so you don’t run out)?
In a capability-based system the capabilities and the objects they reference each have a lifetime. When a capability is discarded its reference to the target object is released. When all of the references to an object are discarded then the object’s lifetime has ended. There is no object id to reuse, and a ChatRoom is automatically destroyed when all its members leave. Additionally, capabilities are attached to connected sessions. When a connected session ends (i.e. because of disconnection, say, when one of the parties terminates), all outstanding capabilities on that session are automatically released. In this way, capabilities cannot be lost or leaked, and their lifetimes are always bounded. Capability and object lifetimes are tracked by the message passing system, automatically.

When using an object id, however, the id itself is just a number. Once passed from the server to the caller, there is no way for the server to track the subsequent lifetime of that number. Did the caller store the object id in a memory variable, or drop it on the floor? Either, the server must rely on the caller to tell them when they are done using an id, or the server must attach its own lifetime to the object (e.g. with a lease or a ticket) and then require the caller to periodically tell the server if the object is still in use (e.g. by renewing the lease, or acquiring a new ticket). All of this adds complexity to the caller, the server’s implementation, and the protocol. All of this complexity must be repeated for each application. And despite this complexity, the lifetime bounds for resource use at the server will NOT be tight. The server’s lease will inevitably be longer than the actual use, leading the server to keep resources around longer than necessary, reducing overall efficiency.
During message dispatch, how do you find the data structure that manages that specific room’s data?
In the capability-based system the implicit reference to the target object provides direct access to the object’s state during message dispatch. No additional bookkeeping is necessary on the part of the server implementation.

When using an object id, the server would have to maintain a mapping between active object ids and their associated state. This table could easily be combined with the table used for id allocation in (1) and id uniqueness in (2) and (3) above. Table access MUST be very cheap, because the table would need to be consulted explictly during each dispatch to convert the object id argument into a reference to the associated state.
How do you prevent malicious users from guessing the id of rooms they aren’t a member of?
In a capability-based system you can only perform actions on capabilities that you hold. If you were never given a capability, then there is no way to perform an action on it. Since capabilities are unforgeable, there is no way for a malicious user to create their own copy of the capability without the server’s involvement. Members of the chat room are simply defined by those who were given the capability, while those who were not are not members. Guaranteed.

When dealing with an object id, however, security is more nuianced. An object id is just a number passed as an argument. A malicious user could attempt to guess the right number (say by trying all of the numbers, or guessing randomly). No guarantees can be made that a malicious user might not get lucky. Some other security mechanisms should also be in place. Perhaps the chat room could keep a list of all its members (i.e. an Access Control List, or ACL), and then check that list at the beginning of every call to the Chat method? This check can be expensive, complex, and requires there be some way to accurately identify who the caller is. An ACL thus converts the problem of authorization into the problem of authentication.

The capabilities-based design doesn’t have any of these security problems. We haven’t even talked yet about authentication or user identity in our system, and yet we still have a secure system without them.

All in all, there are ways to address these issues when using object ids, but capabilities provide elegant, automatic, built-in, and highly efficient solutions.

The Root Capability

In all of our examples above we always acquired a capability from a previous exchange. This is starting to feel like a Ponzi Scheme where future capabilities are paid for only by previous capabilities. The dependency chain can’t go on forever. Somewhere we must have acquired the first capability without an exchange. We call this first capability the root capability. The root capability forms the root of a connected graph whose nodes are capabilities and whose edges are the methods that transfer them. We say that a capability X is reachable in this graph if there is some sequence of method calls from the root capability that eventually transfers X. Once given a root capability, the set of all reachable capabilities defines the set of all allowable actions that a connected party is authorized to perform.

In the message passing systems we are looking at, the first capability is received when successfully initiating a new connected session. The root capability is made available directly as a result of session establishment and doesn’t require any additional methods to be called. This is the ONLY capability that is delivered without a method call. All other reachable capabilities MUST be obtained through some sequence of calls to either the root capability or another capability returned directly or indirectly from the root. Each party exports their own root capability to the opposite party in a bidrectional session. Either party may export the Nothing capability if no actions are authorized in that direction.

I’ll talk more about the MSC Identity model to a future post, but its important to note that authentication happens at session establishment (using a crypto exchange). Each party to a new session has the opportunity to view the validated authentication credentials of the opposite party before deciding which root capability to offer on the session. So it is possible that different sessions to the same endpoint export different root capabilities based on the identity of the connecting party.

Example

Capabilities and capability-based security play a key role in pretty much all of our designs. But to illustrate how capabilities make it easier to write games, I’d like to revisit the game lobby example from the previous post. In that post we were concerned with message ordering as it related to game settings and the settings’ LSN. Here we will see how capabilities play a important role in securing our multiplayer systems. The game lobby defines the following three interfaces (abridged here for illustrative purposes):

[Eventual]
interface IGameLauncher
{
  JoinedPlayerProxy TryJoin(uint slotId, ClientLauncherProxy client);
  
  // We talked about this method in the previous post.
  Promise TryLaunch(ulong settingsLsn);
}
[Eventual]
interface IJoinedPlayer
{
  Promise Leave();
}
[Eventual]
interface IClientLauncher
{
  Promise Launch(PlayerProxy player);
}

The first two interfaces are implemented by the game server while the last is implemented by the game client.

When connecting to a game server the client is given a root capability that provides methods for enumerating the available unlaunched games or creating a new unlaunched game. When a game with open slots is selected, a capability to an object implementing the IGameLauncher interface is returned. A client can attempt to join an open game slot by calling the TryJoin method and passing: (1) a slotId which identifies the desired slot to claim, and (2) a capability to an object which implements the IClientLauncher interface. If the join attempt is successful, the server returns a valid capability to an object implementing the IJoinedPlayer interface. Only clients that hold a valid JoinedPlayerProxy have actually successfully claimed a slot. In this way, race conditions to join the same slot are resolved.

The JoinedPlayerProxy capability itself only authorizes the player to leave the game slot (by calling the Leave method, or by discarding the capability). But, if they hold the JoinedPlayerProxy capability until the game is launched then the server calls the Launch method on the ClientLauncherProxy capability that the client provided during the join attempt. The server passes the PlayerProxy capability to the Launch method which authorizes the player to play that specific game as that specific player. All subsequent method calls on the PlayerProxy imply that those are the actions of that player for that game. No other security checks are needed on subsequent method dispatches because only those players that successfully joined a slot were given a PlayerProxy, and each PlayerProxy uniquely refers to a particular game slot.

Conclusion

In this post we talked about capabilities, capability exchange, and how both can be used to build secure, efficient, and easy to use systems. We gave some examples of how Promise RPC library’s implementation of capabilities can be used to implement common game constructs.

This is Part 3 of our look at the MSC RPC system. In Part 4 and beyond we’ll look at streaming with sequences and bytes, and finally we’ll see how session lifetime relates to aborts and cancellation. Until next time, code on!

Read the previous post in this series.

Read the next post in this series.

Feedback

Write us with feedback.