Akka (13): Distributed Operations: Cluster-Sharing-Operations Cluster Fragmentation

Posted by feckless on Fri, 14 Jun 2019 02:59:14 +0200

Through the introduction of Cluster-Singleton in the last part, we understand Akka's programming support for distributed programs: message-driven computing mode is particularly suitable for distributed programming, we do not need special efforts, just need to follow the normal Actor programming mode to achieve cluster distributed programs. Cluster-Singleton guarantees that no matter the cluster node has any problems, as long as there are nodes in the cluster online, they can continue to operate safely. Cluster-Singleton ensures that the only instance of an actor can run safely and steadily in a cluster environment. Another situation is that if there are many special resource-occupying actors that need to run at the same time, and these actors occupy more resources than the capacity of one server at the same time, so we must distribute these actors to multiple servers, or a cluster environment composed of multiple servers, then we need Cluster-Sharing mode to help solve this problem. .

I will share with you some of the goals achieved by using Cluster-Sharing. Let's analyze whether these goals include the distribution of Actor s among cluster nodes.

First, I have an Actor, whose name is a self-coding constructed by Cluster-Sharing on a node in the cluster. Because in a cluster environment, I don't know which node the Actor is on and what its address is. I just need to use this self-coding to communicate with it. If I have many self-coding resource-consuming Actors, I can specify the shard number in the self-coding to build these Actors in other shards. Akka-Cluster can also rebalance cluster nodes according to the increase or decrease of nodes in the whole cluster according to the current situation of cluster nodes, including rebalance of all actors on nodes on other online nodes when some nodes are separated from the cluster for some reasons. In this way, the Actor's self-coding should be the core element of Cluster-Sharing's application. As a rule, we also use examples to demonstrate the use of Cluster-Sharing. The actor we need to sharding is the Calculator mentioned in previous discussions:

package clustersharding.entity

import akka.actor._
import akka.cluster._
import akka.persistence._
import scala.concurrent.duration._
import akka.cluster.sharding._

object Calculator {
  sealed trait Command
  case class Num(d: Double) extends Command
  case class Add(d: Double) extends Command
  case class Sub(d: Double) extends Command
  case class Mul(d: Double) extends Command
  case class Div(d: Double) extends Command
  case object ShowResult extends Command



  sealed trait Event
  case class SetNum(d: Double) extends Event
  case class Added(x: Double, y: Double) extends Event
  case class Subtracted(x: Double, y: Double) extends Event
  case class Multiplied(x: Double, y: Double) extends Event
  case class Divided(x: Double, y: Double) extends Event

  case class State(result: Double) {

    def updateState(evt: Event): State = evt match {
      case SetNum(n) => copy(result = n)
      case Added(x, y) => copy(result = x + y)
      case Subtracted(x, y) => copy(result = x - y)
      case Multiplied(x, y) => copy(result = x * y)
      case Divided(x, y) => copy(result = {
        val _ = x.toInt / y.toInt //yield ArithmeticException when /0.00
        x / y
      })
    }
  }

  case object Disconnect extends Command    //exit cluster

  def props = Props(new Calcultor)

}

class Calcultor extends PersistentActor with ActorLogging {
  import Calculator._
  val cluster = Cluster(context.system)

  var state: State = State(0)

  override def persistenceId: String = self.path.parent.name+"-"+self.path.name

  override def receiveRecover: Receive = {
    case evt: Event => state = state.updateState(evt)
    case SnapshotOffer(_,st: State) => state = state.copy(result =  st.result)
  }

  override def receiveCommand: Receive = {
    case Num(n) => persist(SetNum(n))(evt => state = state.updateState(evt))
    case Add(n) => persist(Added(state.result,n))(evt => state = state.updateState(evt))
    case Sub(n) => persist(Subtracted(state.result,n))(evt => state = state.updateState(evt))
    case Mul(n) => persist(Multiplied(state.result,n))(evt => state = state.updateState(evt))
    case Div(n) => persist(Divided(state.result,n))(evt => state = state.updateState(evt))
    case ShowResult => log.info(s"Result on ${cluster.selfAddress.hostPort} is: ${state.result}")
    case Disconnect =>
      log.info(s"${cluster.selfAddress} is leaving cluster!!!")
      cluster.leave (cluster.selfAddress)

  }

  override def preRestart(reason: Throwable, message: Option[Any]): Unit = {
    log.info(s"Restarting calculator: ${reason.getMessage}")
    super.preRestart(reason, message)
  }
}

class CalcSupervisor extends Actor {
  def decider: PartialFunction[Throwable,SupervisorStrategy.Directive] = {
    case _: ArithmeticException => SupervisorStrategy.Resume
  }

  override def supervisorStrategy: SupervisorStrategy =
    OneForOneStrategy(maxNrOfRetries = 5, withinTimeRange = 5 seconds){
      decider.orElse(SupervisorStrategy.defaultDecider)
    }
  val calcActor = context.actorOf(Calculator.props,"calculator")

  override def receive: Receive = {
    case msg@ _ => calcActor.forward(msg)
  }

}

We can see that Calculator is a common Persistent Actor. The internal state can be persisted, and the state can be restored when the Actor restarts. Calc Supervisor is Calculator's oversight, which is designed to implement the new oversight strategy, Supervisor Strategy.

Calculator is our target enitity for cluster sharding. An Actor fragmentation is constructed in the cluster through the Cluster-Sharing Cluster Sharing. Start method of Akka. We need to run this method on all the nodes that will carry the fragmentation to deploy the fragmentation:

/**
   * Register a named entity type by defining the [[akka.actor.Props]] of the entity actor and
   * functions to extract entity and shard identifier from messages. The [[ShardRegion]] actor
   * for this type can later be retrieved with the [[#shardRegion]] method.
   *
   * The default shard allocation strategy [[ShardCoordinator.LeastShardAllocationStrategy]]
   * is used. [[akka.actor.PoisonPill]] is used as `handOffStopMessage`.
   *
   * Some settings can be configured as described in the `akka.cluster.sharding` section
   * of the `reference.conf`.
   *
   * @param typeName the name of the entity type
   * @param entityProps the `Props` of the entity actors that will be created by the `ShardRegion`
   * @param settings configuration settings, see [[ClusterShardingSettings]]
   * @param extractEntityId partial function to extract the entity id and the message to send to the
   *   entity from the incoming message, if the partial function does not match the message will
   *   be `unhandled`, i.e. posted as `Unhandled` messages on the event stream
   * @param extractShardId function to determine the shard id for an incoming message, only messages
   *   that passed the `extractEntityId` will be used
   * @return the actor ref of the [[ShardRegion]] that is to be responsible for the shard
   */
  def start(
    typeName:        String,
    entityProps:     Props,
    settings:        ClusterShardingSettings,
    extractEntityId: ShardRegion.ExtractEntityId,
    extractShardId:  ShardRegion.ExtractShardId): ActorRef = {

    val allocationStrategy = new LeastShardAllocationStrategy(
      settings.tuningParameters.leastShardAllocationRebalanceThreshold,
      settings.tuningParameters.leastShardAllocationMaxSimultaneousRebalance)

    start(typeName, entityProps, settings, extractEntityId, extractShardId, allocationStrategy, PoisonPill)
  }

Start returns to ShardRegion, which is an ActorRef type. ShardRegion is a special Actor that manages actors instances that may be called Entity within multiple shards. These fragments may be distributed on different cluster nodes and communicated with their Entities through ShardRegion. From the start function parameter entityProps, we can see that only one kind of Actor is allowed in each fragment; the specific Entity instance is built by another internal Actor, shard, which can build multiple Entity instances in a fragment. The feature of multi-shard and multi-entity can get some information from the two methods of extract ShardId and extract EntityId. We said that Actor self-coding, entity-id, is the core element of Cluster-Sharing. Shard-id is also included in entity-id self-coding, so users can design the whole fragmentation system by entity-id coding rules, including the number of shard and entity under each ShardRegion. When ShardRegion gets an entity-id, it first extracts shard-id from it. If shard-id does not exist in the cluster, it builds a new shard on one of the nodes according to the load of each node in the cluster. Then it uses entity-id to find entity in the shard-id fragment, and if it does not exist, it builds a new entity instance. The whole process of building shard and entity is realized by user-provided functions extractShardId and extractEntityId, through which Cluster-Sharing builds and uses shard and entity according to user's requirements. This self-coding does not need to be in a certain order, just to ensure uniqueness. Here is an example of coding:

object CalculatorShard {
  import Calculator._

  case class CalcCommands(eid: String, msg: Command)  //user should use it to talk to shardregion
  val shardName = "calcShard"
  val getEntityId: ShardRegion.ExtractEntityId = {
    case CalcCommands(id,msg) => (id,msg)
  }
  val getShardId: ShardRegion.ExtractShardId = {
    case CalcCommands(id,_) => id.head.toString
  }
  def entityProps = Props(new CalcSupervisor)
}

Users communicate with ShardRegion using CalcCommands. This is a special type of embedded message for communication with the fragmentation system. In addition to the Command message normally supported by Calculator, the embedded message also includes the number eid of the target Entity instance. The first byte of this eid represents shard-id, so that we can directly specify the fragment where the target entity is located or arbitrarily select a shard-id such as Random.NextInt(9).toString. Since each fragment contains only one type of Actor, different entity-ids represent the simultaneous existence of multiple instances of the same type of Actor, just like Router discussed earlier: all instances perform the same function for different inputs. Generally speaking, users will generate entity-id arbitrarily through some algorithm, hoping to achieve the balanced deployment of entity in each fragment. Cluster-Sharing can automatically adjust the deployment of fragments at the cluster node level according to the specific cluster load.

The following code demonstrates how to deploy fragmentation on a cluster node:

package clustersharding.shard
import akka.persistence.journal.leveldb._
import akka.actor._
import akka.cluster.sharding._
import com.typesafe.config.ConfigFactory
import akka.util.Timeout
import scala.concurrent.duration._
import akka.pattern._
import clustersharding.entity.CalculatorShard

object CalcShards {
  def create(port: Int) = {
    val config = ConfigFactory.parseString(s"akka.remote.netty.tcp.port=${port}")
      .withFallback(ConfigFactory.load("sharding"))
    // Create an Akka system
    val system = ActorSystem("ShardingSystem", config)

    startupSharding(port,system)

  }

  def startupSharedJournal(system: ActorSystem, startStore: Boolean, path: ActorPath): Unit = {
    // Start the shared journal one one node (don't crash this SPOF)
    // This will not be needed with a distributed journal
    if (startStore)
      system.actorOf(Props[SharedLeveldbStore], "store")
    // register the shared journal
    import system.dispatcher
    implicit val timeout = Timeout(15.seconds)
    val f = (system.actorSelection(path) ? Identify(None))
    f.onSuccess {
      case ActorIdentity(_, Some(ref)) =>
        SharedLeveldbJournal.setStore(ref, system)
      case _ =>
        system.log.error("Shared journal not started at {}", path)
        system.terminate()
    }
    f.onFailure {
      case _ =>
        system.log.error("Lookup of shared journal at {} timed out", path)
        system.terminate()
    }
  }

  def startupSharding(port: Int, system: ActorSystem) = {

    startupSharedJournal(system, startStore = (port == 2551), path =
      ActorPath.fromString("akka.tcp://ShardingSystem@127.0.0.1:2551/user/store"))

    ClusterSharding(system).start(
      typeName = CalculatorShard.shardName,
      entityProps = CalculatorShard.entityProps,
      settings = ClusterShardingSettings(system),
      extractEntityId = CalculatorShard.getEntityId,
      extractShardId = CalculatorShard.getShardId
    )

  }

}

The specific deployment code is in the startup Sharing method. The following code demonstrates how to use entity in fragmentation:

package clustersharding.demo
import akka.actor.ActorSystem
import akka.cluster.sharding._
import clustersharding.entity.CalculatorShard.CalcCommands
import clustersharding.entity._
import clustersharding.shard.CalcShards
import com.typesafe.config.ConfigFactory

object ClusterShardingDemo extends App {

  CalcShards.create(2551)
  CalcShards.create(0)
  CalcShards.create(0)
  CalcShards.create(0)

  Thread.sleep(1000)

  val shardingSystem = ActorSystem("ShardingSystem",ConfigFactory.load("sharding"))
  CalcShards.startupSharding(0,shardingSystem)

  Thread.sleep(1000)

  val calcRegion = ClusterSharding(shardingSystem).shardRegion(CalculatorShard.shardName)

  calcRegion ! CalcCommands("1001",Calculator.Num(13.0))   //shard 1, entity 1001
  calcRegion ! CalcCommands("1001",Calculator.Add(12.0))
  calcRegion ! CalcCommands("1001",Calculator.ShowResult)  //shows address too
  calcRegion ! CalcCommands("1001",Calculator.Disconnect)   //disengage cluster

  calcRegion ! CalcCommands("2003",Calculator.Num(10.0))   //shard 2, entity 2003
  calcRegion ! CalcCommands("2003",Calculator.Mul(3.0))
  calcRegion ! CalcCommands("2003",Calculator.Div(2.0))


  Thread.sleep(15000)
  calcRegion ! CalcCommands("1001",Calculator.ShowResult)   //check if restore result on another node
  calcRegion ! CalcCommands("2003",Calculator.ShowResult)

}

In the above code, fragmentation and entity-id are selected artificially, including the operation of extracting a node from the cluster. The results are as follows:

[INFO] [07/14/2017 13:52:05.911] [ShardingSystem-akka.actor.default-dispatcher-28] [akka.tcp://ShardingSystem@127.0.0.1:2551/system/sharding/calcShard/1/1001/calculator] Result on ShardingSystem@127.0.0.1:2551 is: 25.0
[INFO] [07/14/2017 13:52:05.911] [ShardingSystem-akka.actor.default-dispatcher-28] [akka.tcp://ShardingSystem@127.0.0.1:2551/system/sharding/calcShard/1/1001/calculator] akka.tcp://ShardingSystem@127.0.0.1:2551 is leaving cluster!!!
[INFO] [07/14/2017 13:52:15.826] [ShardingSystem-akka.actor.default-dispatcher-34] [akka.tcp://ShardingSystem@127.0.0.1:58287/system/sharding/calcShard/2/2003/calculator] Result on ShardingSystem@127.0.0.1:58287 is: 15.0
[INFO] [07/14/2017 13:52:17.819] [ShardingSystem-akka.actor.default-dispatcher-23] [akka.tcp://ShardingSystem@127.0.0.1:58288/system/sharding/calcShard/1/1001/calculator] Result on ShardingSystem@127.0.0.1:58288 is: 25.0

The results show that entity 1001 is transferred to node 58288 after node 2551 exits the cluster, and the state is preserved in parallel.

The following is the source code for this demonstration:

build.sbt

name := "cluster-sharding"

version := "1.0"

scalaVersion := "2.11.9"

resolvers += "Akka Snapshot Repository" at "http://repo.akka.io/snapshots/"

val akkaversion = "2.4.8"

libraryDependencies ++= Seq(
  "com.typesafe.akka" %% "akka-actor" % akkaversion,
  "com.typesafe.akka" %% "akka-remote" % akkaversion,
  "com.typesafe.akka" %% "akka-cluster" % akkaversion,
  "com.typesafe.akka" %% "akka-cluster-tools" % akkaversion,
  "com.typesafe.akka" %% "akka-cluster-sharding" % akkaversion,
  "com.typesafe.akka" %% "akka-persistence" % "2.4.8",
  "com.typesafe.akka" %% "akka-contrib" % akkaversion,
  "org.iq80.leveldb" % "leveldb" % "0.7",
  "org.fusesource.leveldbjni" % "leveldbjni-all" % "1.8")

resources/sharding.conf 

akka.actor.warn-about-java-serializer-usage = off
akka.log-dead-letters-during-shutdown = off
akka.log-dead-letters = off

akka {
  loglevel = INFO
  actor {
    provider = "akka.cluster.ClusterActorRefProvider"
  }

  remote {
    log-remote-lifecycle-events = off
    netty.tcp {
      hostname = "127.0.0.1"
      port = 0
    }
  }

  cluster {
    seed-nodes = [
      "akka.tcp://ShardingSystem@127.0.0.1:2551"]
    log-info = off
  }

  persistence {
    journal.plugin = "akka.persistence.journal.leveldb-shared"
    journal.leveldb-shared.store {
      # DO NOT USE 'native = off' IN PRODUCTION !!!
      native = off
      dir = "target/shared-journal"
    }
    snapshot-store.plugin = "akka.persistence.snapshot-store.local"
    snapshot-store.local.dir = "target/snapshots"
  }
}

Calculator.scala

package clustersharding.entity

import akka.actor._
import akka.cluster._
import akka.persistence._
import scala.concurrent.duration._
import akka.cluster.sharding._

object Calculator {
  sealed trait Command
  case class Num(d: Double) extends Command
  case class Add(d: Double) extends Command
  case class Sub(d: Double) extends Command
  case class Mul(d: Double) extends Command
  case class Div(d: Double) extends Command
  case object ShowResult extends Command



  sealed trait Event
  case class SetNum(d: Double) extends Event
  case class Added(x: Double, y: Double) extends Event
  case class Subtracted(x: Double, y: Double) extends Event
  case class Multiplied(x: Double, y: Double) extends Event
  case class Divided(x: Double, y: Double) extends Event

  case class State(result: Double) {

    def updateState(evt: Event): State = evt match {
      case SetNum(n) => copy(result = n)
      case Added(x, y) => copy(result = x + y)
      case Subtracted(x, y) => copy(result = x - y)
      case Multiplied(x, y) => copy(result = x * y)
      case Divided(x, y) => copy(result = {
        val _ = x.toInt / y.toInt //yield ArithmeticException when /0.00
        x / y
      })
    }
  }

  case object Disconnect extends Command    //exit cluster

  def props = Props(new Calcultor)

}

class Calcultor extends PersistentActor with ActorLogging {
  import Calculator._
  val cluster = Cluster(context.system)

  var state: State = State(0)

  override def persistenceId: String = self.path.parent.name+"-"+self.path.name

  override def receiveRecover: Receive = {
    case evt: Event => state = state.updateState(evt)
    case SnapshotOffer(_,st: State) => state = state.copy(result =  st.result)
  }

  override def receiveCommand: Receive = {
    case Num(n) => persist(SetNum(n))(evt => state = state.updateState(evt))
    case Add(n) => persist(Added(state.result,n))(evt => state = state.updateState(evt))
    case Sub(n) => persist(Subtracted(state.result,n))(evt => state = state.updateState(evt))
    case Mul(n) => persist(Multiplied(state.result,n))(evt => state = state.updateState(evt))
    case Div(n) => persist(Divided(state.result,n))(evt => state = state.updateState(evt))
    case ShowResult => log.info(s"Result on ${cluster.selfAddress.hostPort} is: ${state.result}")
    case Disconnect =>
      log.info(s"${cluster.selfAddress} is leaving cluster!!!")
      cluster.leave (cluster.selfAddress)

  }

  override def preRestart(reason: Throwable, message: Option[Any]): Unit = {
    log.info(s"Restarting calculator: ${reason.getMessage}")
    super.preRestart(reason, message)
  }
}

class CalcSupervisor extends Actor {
  def decider: PartialFunction[Throwable,SupervisorStrategy.Directive] = {
    case _: ArithmeticException => SupervisorStrategy.Resume
  }

  override def supervisorStrategy: SupervisorStrategy =
    OneForOneStrategy(maxNrOfRetries = 5, withinTimeRange = 5 seconds){
      decider.orElse(SupervisorStrategy.defaultDecider)
    }
  val calcActor = context.actorOf(Calculator.props,"calculator")

  override def receive: Receive = {
    case msg@ _ => calcActor.forward(msg)
  }

}

object CalculatorShard {
  import Calculator._

  case class CalcCommands(eid: String, msg: Command)  //user should use it to talk to shardregion
  val shardName = "calcShard"
  val getEntityId: ShardRegion.ExtractEntityId = {
    case CalcCommands(id,msg) => (id,msg)
  }
  val getShardId: ShardRegion.ExtractShardId = {
    case CalcCommands(id,_) => id.head.toString
  }
  def entityProps = Props(new CalcSupervisor)
}

CalcShard.scala

package clustersharding.shard
import akka.persistence.journal.leveldb._
import akka.actor._
import akka.cluster.sharding._
import com.typesafe.config.ConfigFactory
import akka.util.Timeout
import scala.concurrent.duration._
import akka.pattern._
import clustersharding.entity.CalculatorShard

object CalcShards {
  def create(port: Int) = {
    val config = ConfigFactory.parseString(s"akka.remote.netty.tcp.port=${port}")
      .withFallback(ConfigFactory.load("sharding"))
    // Create an Akka system
    val system = ActorSystem("ShardingSystem", config)

    startupSharding(port,system)

  }

  def startupSharedJournal(system: ActorSystem, startStore: Boolean, path: ActorPath): Unit = {
    // Start the shared journal one one node (don't crash this SPOF)
    // This will not be needed with a distributed journal
    if (startStore)
      system.actorOf(Props[SharedLeveldbStore], "store")
    // register the shared journal
    import system.dispatcher
    implicit val timeout = Timeout(15.seconds)
    val f = (system.actorSelection(path) ? Identify(None))
    f.onSuccess {
      case ActorIdentity(_, Some(ref)) =>
        SharedLeveldbJournal.setStore(ref, system)
      case _ =>
        system.log.error("Shared journal not started at {}", path)
        system.terminate()
    }
    f.onFailure {
      case _ =>
        system.log.error("Lookup of shared journal at {} timed out", path)
        system.terminate()
    }
  }

  def startupSharding(port: Int, system: ActorSystem) = {

    startupSharedJournal(system, startStore = (port == 2551), path =
      ActorPath.fromString("akka.tcp://ShardingSystem@127.0.0.1:2551/user/store"))

    ClusterSharding(system).start(
      typeName = CalculatorShard.shardName,
      entityProps = CalculatorShard.entityProps,
      settings = ClusterShardingSettings(system),
      extractEntityId = CalculatorShard.getEntityId,
      extractShardId = CalculatorShard.getShardId
    )

  }

}

ClusterShardingDemo.scala

package clustersharding.demo
import akka.actor.ActorSystem
import akka.cluster.sharding._
import clustersharding.entity.CalculatorShard.CalcCommands
import clustersharding.entity._
import clustersharding.shard.CalcShards
import com.typesafe.config.ConfigFactory

object ClusterShardingDemo extends App {

  CalcShards.create(2551)
  CalcShards.create(0)
  CalcShards.create(0)
  CalcShards.create(0)

  Thread.sleep(1000)

  val shardingSystem = ActorSystem("ShardingSystem",ConfigFactory.load("sharding"))
  CalcShards.startupSharding(0,shardingSystem)

  Thread.sleep(1000)

  val calcRegion = ClusterSharding(shardingSystem).shardRegion(CalculatorShard.shardName)

  calcRegion ! CalcCommands("1001",Calculator.Num(13.0))   //shard 1, entity 1001
  calcRegion ! CalcCommands("1001",Calculator.Add(12.0))
  calcRegion ! CalcCommands("1001",Calculator.ShowResult)  //shows address too
  calcRegion ! CalcCommands("1001",Calculator.Disconnect)   //disengage cluster

  calcRegion ! CalcCommands("2003",Calculator.Num(10.0))   //shard 2, entity 2003
  calcRegion ! CalcCommands("2003",Calculator.Mul(3.0))
  calcRegion ! CalcCommands("2003",Calculator.Div(2.0))


  Thread.sleep(15000)
  calcRegion ! CalcCommands("1001",Calculator.ShowResult)   //check if restore result on another node
  calcRegion ! CalcCommands("2003",Calculator.ShowResult)


}

Topics: Scala calculator Fragment snapshot