Cluster Formation
In addition to the Arx nodes comprising a Cluster, a number of other properties are defined during the Cluster creation process. The maximum possible computational load of a Cluster (either for a single very large solo Computation or spread across many smaller Computations) is implicitly defined by the hardware specification of the weakest Arx node (resource-wise) in the given Cluster. See the Self-Claimed Hardware Specification section for more on how this hardware specification is defined. However, when creating a Cluster, the maximum computational load that will be required of the Cluster's computational executions must also be explicitly defined. Clusters that use TEEs also must specify a quota of the exact number of TEE-enabled Arx nodes required.
Most properties of a Cluster may not be modified after creation, including its Node Priority List (see below), because Clusters are publicly available and their specific properties may be relied upon by other MXEs (see the Reuse of Existing Clusters section for more). However, the maximum computational limit may be increased by the Cluster creator on the condition that the increase can be supported by all of the Arx nodes in the Cluster.
At the time of Cluster creation, rent is paid by the user who creates the Cluster. This rent is nonrefundable, but can be used by the user to cover future Computations using the Cluster. The requirement of rent functions to significantly raise the cost of executing a spam attack.
Cluster Assignment and Acceptance
During Cluster creation, each Arx node included in the new Cluster must independently accept its admission to the Cluster. Any number of subjective factors could cause an Arx node to reject its admission to a Cluster, including, for example, the computational requirements of the Cluster as well as the reputation or jurisdiction of other nodes in the Cluster. In practice, Arx nodes will likely maintain their own (off-chain) blacklists of Arx node operators that they refuse to work alongside, as well as the computational requirement parameters of Clusters that they are willing to join, in order to enable on-the-fly analysis of Cluster admission requests.
For non-Permissioned Clusters, prior to Cluster activation, one random Arx node from the Network is included in the Cluster's node-set. Not explictly defined by the Computation Customer, this random node is instead independently selected via the Automatic Alternative Node Selection mechanism (see the Sybil Resistance section for more on this process). The Cluster immediately becomes active for use once all of the Arx nodes assigned to it approve the assignment (including the randomly selected Node), otherwise, the Computation Customer may cancel the Cluster formation process after a minimum of one complete epoch has passed. If a single Arx node assigned to a Cluster opts to reject the assignment then the Cluster creation fails.
These same mechanisms of Cluster assignment and rejection also apply to the post-creation addition of Arx nodes to (existing) Clusters in the event that migrations are needed, see the Node Priority List and Automatic Alternative Node Selection sections below for more details on these processes.
Node Priority List
During Cluster creation, the Arx nodes comprising the Cluster are defined as a priority-ordered list that may extend beyond the size of the Cluster (the number of active Arx nodes in the Cluster). This priority ordering is used to select the best backup Arx nodes if a migration becomes necessary (e.g. if one of the higher-priority nodes becomes unavailable).
Automatic Alternative Node Selection
The Automatic Alternative Node Selection mechanism serves to select acceptable alternative Arx nodes in the event of a migration where the Node Priority List of a Cluster does not have an available node defined (either because the list has been exhausted or because an Arx node in the list has shut-down or gone offline for any reason). This recovery mechanism is essential in the event of a large portion of the Network going down (e.g. a major cloud computing company that many Arx nodes in the Network rely on suffers an outage, etc.). In order to select an acceptable alternative Arx node, the selection mechanism first filters all of the Network's nodes by the Cluster's TEE status (whether or not a Node that has a TEE is required), then it filters the remaining nodes by their Self-Claimed Hardware Specification (see this section inside the Arx Nodes parent-section) such that any nodes who's Self-Claimed Hardware Specification falls below the Cluster's maximum computational requirement are filtered out. Finally, these remaining nodes are dynamically ranked based on the following trustworthiness criteria (via a composite "trust score"):
Operation Duration: How long the Arx node has been actively operating for.
Slashing Record: How often the Arx node has been slashed, as a proportion of the total operating duration.
Furthermore, Clusters that enable Automatic Alternative Node Selection may also configure the mechanism to avoid nodes that fit certain criteria:
Node Blacklist: A list of raw Arx node Pubkeys that should be avoided when selecting an alternative node.
Jurisdictional Blacklist: A list of jurisdictions (where nodes are operating from) that should be avoided when selecting an alternative node. Note that an Arx node's jurisdiction is self-declared by the node (since, of course, actual hardware location cannot be verified), however, location claims will be independently evaluated by the Arcium Network's community (along with a Node Operator's team, security approach, transparency, cross-network reputation, etc.).
In the future, further customization of this alternative selection mechanism may be added, including the ability to configure a custom balance of trust versus availability — meaning that instead of solely optimizing for trust when selecting an alternative Arx node, the amount of readily available computational capacity would also be considered, since highly trustworthy nodes may be busy (leading to either higher priority fees or longer Computation processing delays).
Last updated