From: https://blog.csdn.net/liyanan21/article/details/89320872
Catalog
Part of Raft Source Code in Nacos
NamingProxy.getServers() Gets the cluster nodes
NamingProxy.refreshSrvIfNeed() Gets node information
NamingProxy.refreshServerListFromDisk() Gets cluster node information
GlobalExecutor.register(new MasterElection()) Register Election Timing Task
MasterElection.sendVote() Sends Timing Tasks
(1) RaftCommands.vote() handles / v1/ns/raft/vote requests
(2) PeerSet. DecieLeader () Election
GlobalExecutor.register(new HeartBeat()) registers heartbeat timing tasks
HeartBeat.sendBeat() Sends Heartbeat Packets
(.) RaftCommands.beat() method handles / v1/ns/raft/beat requests
Instance information persistence
(3)/raft/datum interface and/raft/datum/commit interface
Publish Entry RaftCommands.publish()
6. Raft guarantees content consistency
I. Raft algorithm
Raft reached consensus through elected leaders. The servers in the raft cluster are leaders or followers, and can be candidates (leaders are not available) in the precise case of elections. Leaders are responsible for copying logs to followers. It regularly notifies followers of its existence by sending heartbeat messages. Each follower has a timeout (usually between 150 and 300 milliseconds), which expects the leader's heartbeat. Reset timeout when receiving heartbeat. If no heartbeat is received, the follower changes his status to a candidate and starts leading the election.
See: Raft algorithm
Part of Raft Source Code in Nacos
At startup, Nacos server calls the RaftCore.init() method through the RunningConfig.onApplicationEvent() method.
init()
public static void init() throws Exception {
Loggers.RAFT.info("initializing Raft sub-system");
// Start Notifier, poll Datums, and notify RaftListener
executor.submit(notifier);
// Get the Raft cluster node and update it to PeerSet
peers.add(NamingProxy.getServers());
long start = System.currentTimeMillis();
// Data recovery by loading Datum and term data from disk
RaftStore.load();
Loggers.RAFT.info("cache loaded, peer count: {}, datum count: {}, current term: {}",
peers.size(), datums.size(), peers.getTerm());
while (true) {
if (notifier.tasks.size() <= 0) {
break;
}
Thread.sleep(1000L);
System.out.println(notifier.tasks.size());
}
Loggers.RAFT.info("finish to load data from disk, cost: {} ms.", (System.currentTimeMillis() - start));
GlobalExecutor.register(new MasterElection()); // Leader election
GlobalExecutor.register1(new HeartBeat()); // Raft heartbeat
GlobalExecutor.register(new AddressServerUpdater(), GlobalExecutor.ADDRESS_SERVER_UPDATE_INTERVAL_MS);
if (peers.size() > 0) {
if (lock.tryLock(INIT_LOCK_TIME_SECONDS, TimeUnit.SECONDS)) {
initialized = true;
lock.unlock();
}
} else {
throw new Exception("peers is empty.");
}
Loggers.RAFT.info("timer started: leader timeout ms: {}, heart-beat timeout ms: {}",
GlobalExecutor.LEADER_TIMEOUT_MS, GlobalExecutor.HEARTBEAT_INTERVAL_MS);
}
In the init method, the following main things are done:
- 1. Get the Raft cluster node peers.add(NamingProxy.getServers());
- 2. Raft cluster data recovery RaftStore.load();
- 3. Raft elects Global Executor. register (new Master Election ());
- 4. Raft heartbeat GlobalExecutor.register(new HeartBeat());
- 5. Raft publishes content
- 6. Raft guarantees content consistency
1. Get Raft cluster nodes
NamingProxy.getServers() Gets the cluster nodes
- NamingProxy.refreshSrvIfNeed() Gets node information
- Return List < String > servers
NamingProxy.refreshSrvIfNeed() Gets node information
-
If stand-alone mode
The ip:port of the host is Raft node information.
otherwise
Call NamingProxy.refreshServerListFromDisk() below to get Raft cluster node information
-
Update the List < String > serverlistFromConfig attribute and List < String > servers attribute of NamingProxy after obtaining Raft cluster node information (i.e. ip:port list).
NamingProxy.refreshServerListFromDisk() Gets cluster node information
Read Raft cluster node information, i.e. ip:port list, from disk or system environment variables
2. Raft Cluster Data Recovery
When Nacos starts/restarts, it loads Datum and term data from disk for data recovery.
After the nacos server is started - > RaftCore. init () method - > RaftStore. load () method.
RaftStore.load()
-
Get Datum data from disk:
Put Datum in the Concurrent Map < String, Datum > datums collection of RaftCore, and the key is Datum's key.
Packing Datum and ApplyAction.CHANGE into Pair and placing it in Notifier's tasks queue to notify the relevant RaftListener;
-
The term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term term
Call the RaftSet.setTerm(long term) method to update the term value of each node in the Raft cluster
3. Raft elections
GlobalExecutor.register(new MasterElection()) Register Election Timing Task
Nacos Raft elections are done through the Master Election thread task.
- Update election timeout and heart timeout of candidate nodes.
- Call MasterElection.sendVote() to vote.
public class MasterElection implements Runnable {
@Override
public void run() {
try {
if (!peers.isReady()) {
return;
}
RaftPeer local = peers.local();
local.leaderDueMs -= GlobalExecutor.TICK_PERIOD_MS;
if (local.leaderDueMs > 0) {
return;
}
// Reset election timeout, reset every heartbeat and packet received
local.resetLeaderDue();
local.resetHeartbeatDue();
// Initiation of elections
sendVote();
} catch (Exception e) {
Loggers.RAFT.warn("[RAFT] error while master election {}", e);
}
}
}
MasterElection.sendVote() Sends Timing Tasks
- Reset Raft cluster data:
The leader is null; the voteFor field of all Raft nodes is null;
- Update candidate node data:
term of office increases by 1; (by adding 1 to make the difference between terms of other nodes, it avoids that all nodes can not elect Leaders as terms do.)
The voteFor field of the candidate node is set to itself.
state is set to CANDIDATE;
- Candidate nodes send HTTP POST requests to/v1/ns/raft/vote of all Raft nodes except themselves:
The content of the request is vote: JSON. to JSONString (local)
- Candidate node receives candidate node data from other nodes and handles it to PeerSet. DecieLeader () method.
Set RaftPerr corresponding to more than half of voteFor s to Leader.
public void sendVote() {
RaftPeer local = peers.get(NetUtils.localServer());
Loggers.RAFT.info("leader timeout, start voting,leader: {}, term: {}",
JSON.toJSONString(getLeader()), local.term);
//Reset Raft Cluster Data
peers.reset();
//Update candidate node data
local.term.incrementAndGet();
local.voteFor = local.ip;
local.state = RaftPeer.State.CANDIDATE;
//Candidate nodes send HTTP POST requests to / v1/ns/raft/vote of all Raft nodes except themselves
//The content of the request is vote: JSON. to JSONString (local)
Map<String, String> params = new HashMap<String, String>(1);
params.put("vote", JSON.toJSONString(local));
for (final String server : peers.allServersWithoutMySelf()) {
final String url = buildURL(server, API_VOTE);
try {
HttpClient.asyncHttpPost(url, null, params, new AsyncCompletionHandler<Integer>() {
@Override
public Integer onCompleted(Response response) throws Exception {
if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
Loggers.RAFT.error("NACOS-RAFT vote failed: {}, url: {}", response.getResponseBody(), url);
return 1;
}
RaftPeer peer = JSON.parseObject(response.getResponseBody(), RaftPeer.class);
Loggers.RAFT.info("received approve from peer: {}", JSON.toJSONString(peer));
//Candidate node receives candidate node data from other nodes and submits it to PeerSet. DecieLeader
//Method Processing
peers.decideLeader(peer);
return 0;
}
});
} catch (Exception e) {
Loggers.RAFT.warn("error while sending vote to server: {}", server);
}
}
}
}
(1) RaftCommands.vote() handles / v1/ns/raft/vote requests
http interface for election requests
@RestController
@RequestMapping(UtilsAndCommons.NACOS_NAMING_CONTEXT + "/raft")
public class RaftController {
......
@NeedAuth
@RequestMapping(value = "/vote", method = RequestMethod.POST)
public JSONObject vote(HttpServletRequest request, HttpServletResponse response) throws Exception {
// Processing Election Requests
RaftPeer peer = raftCore.receivedVote(
JSON.parseObject(WebUtils.required(request, "vote"), RaftPeer.class));
return JSON.parseObject(JSON.toJSONString(peer));
}
......
}
Call the RaftCore.MasterElection.receivedVote() method
If the received candidate node term is smaller than the local node term, then:
Local node voteFor updates itself; (meaning I'm better suited to be a leader myself, and I vote for myself)
Otherwise:
This Follower resets its election timeout;
Update its voteFor s to receive the candidate node ip; (meaning do as you say, and this vote will be cast for you.) )
Update its term as the received candidate node term;
The local node is returned as an http response.
@Component
public class RaftCore {
......
public RaftPeer receivedVote(RaftPeer remote) {
if (!peers.contains(remote)) {
throw new IllegalStateException("can not find peer: " + remote.ip);
}
// If the term of the current node is greater than or equal to the term of the node sending the election request, choose yourself as leader.
RaftPeer local = peers.get(NetUtils.localServer());
if (remote.term.get() <= local.term.get()) {
String msg = "received illegitimate vote" +
", voter-term:" + remote.term + ", votee-term:" + local.term;
Loggers.RAFT.info(msg);
if (StringUtils.isEmpty(local.voteFor)) {
local.voteFor = local.ip;
}
return local;
}
local.resetLeaderDue();
// If the term of the current node is less than the term of the node sending the request, the node sending the request is chosen as leader.
local.state = RaftPeer.State.FOLLOWER;
local.voteFor = remote.ip;
local.term.set(remote.term.get());
Loggers.RAFT.info("vote {} as leader, term: {}", remote.ip, remote.term);
return local;
}
}
(2) PeerSet. DecieLeader () Election
@Component
@DependsOn("serverListManager")
public class RaftPeerSet implements ServerChangeListener {
......
public RaftPeer decideLeader(RaftPeer candidate) {
peers.put(candidate.ip, candidate);
SortedBag ips = new TreeBag();
int maxApproveCount = 0;
String maxApprovePeer = null;
// If voteFors are not empty, the voteFors of the nodes are added to the ips to record the number and number of nodes that have been elected the most.
for (RaftPeer peer : peers.values()) {
if (StringUtils.isEmpty(peer.voteFor)) {
continue;
}
ips.add(peer.voteFor);
if (ips.getCount(peer.voteFor) > maxApproveCount) {
maxApproveCount = ips.getCount(peer.voteFor);
maxApprovePeer = peer.voteFor;
}
}
// Set the elected node to leader
if (maxApproveCount >= majorityCount()) {
RaftPeer peer = peers.get(maxApprovePeer);
peer.state = RaftPeer.State.LEADER;
if (!Objects.equals(leader, peer)) {
leader = peer;
Loggers.RAFT.info("{} has become the LEADER", leader.ip);
}
}
return leader;
}
}
4. Raft heartbeat
GlobalExecutor.register(new HeartBeat()) registers heartbeat timing tasks
- Reset the heart timeout and election timeout of the Leader node;
- sendBeat() sends heartbeat packets
public class HeartBeat implements Runnable {
@Override
public void run() {
try {
if (!peers.isReady()) {
return;
}
RaftPeer local = peers.local();
// Heartbeat DueMs defaults to 5s, TICK_PERIOD_MS to 500ms, checks every 500ms, and sends a heartbeat every 5S.
local.heartbeatDueMs -= GlobalExecutor.TICK_PERIOD_MS;
if (local.heartbeatDueMs > 0) {
return;
}
// Reset heartbeat DueMs
local.resetHeartbeatDue();
// Send Heart Packet
sendBeat();
} catch (Exception e) {
Loggers.RAFT.warn("[RAFT] error while sending beat {}", e);
}
}
}
HeartBeat.sendBeat() Sends Heartbeat Packets
- Reset the heart timeout and election timeout of the Leader node;
- Send an HTTP POST request to a node/v1/ns/raft/beat path other than itself. The request is as follows:
JSONObject packet = new JSONObject();
packet.put("peer", local); //local is the RaftPeer object corresponding to the Leader node
packet.put("datums", array); //array encapsulates all Datum key s and timestamp s in RaftCore
Map<String, String> params = new HashMap<String, String>(1);
params.put("beat", JSON.toJSONString(packet));
- Get the http response returned by each node, namely RaftPeer object, update the Map < String, RaftPeer > peers set of PeerSet. (Keep cluster node data consistent)
public void sendBeat() throws IOException, InterruptedException {
RaftPeer local = peers.local();
// Only leader sends heartbeat
if (local.state != RaftPeer.State.LEADER && !STANDALONE_MODE) {
return;
}
Loggers.RAFT.info("[RAFT] send beat with {} keys.", datums.size());
// Replacement of the lead interval when the package is not received
local.resetLeaderDue();
// Build heartbeat package information, local for the current nacos node information, key for peer
JSONObject packet = new JSONObject();
packet.put("peer", local);
JSONArray array = new JSONArray();
// Send only heartbeat packets without data.
if (switchDomain.isSendBeatOnly()) {
Loggers.RAFT.info("[SEND-BEAT-ONLY] {}", String.valueOf(switchDomain.isSendBeatOnly()));
}
// Send related key s to follower via heartbeat packet
if (!switchDomain.isSendBeatOnly()) {
for (Datum datum : datums.values()) {
JSONObject element = new JSONObject();
// Put key s and their corresponding versions in element and eventually add them to array
if (KeyBuilder.matchServiceMetaKey(datum.key)) {
element.put("key", KeyBuilder.briefServiceMetaKey(datum.key));
} else if (KeyBuilder.matchInstanceListKey(datum.key)) {
element.put("key", KeyBuilder.briefInstanceListkey(datum.key));
}
element.put("timestamp", datum.timestamp);
array.add(element);
}
} else {
Loggers.RAFT.info("[RAFT] send beat only.");
}
// Put array s of all key s into the packet
packet.put("datums", array);
// Converting data packets into json strings and putting them into params
Map<String, String> params = new HashMap<String, String>(1);
params.put("beat", JSON.toJSONString(packet));
String content = JSON.toJSONString(params);
// Compression with gzip
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(content.getBytes("UTF-8"));
gzip.close();
byte[] compressedBytes = out.toByteArray();
String compressedContent = new String(compressedBytes, "UTF-8");
Loggers.RAFT.info("raw beat data size: {}, size of compressed data: {}",
content.length(), compressedContent.length());
// Send heartbeat packets to all follower s
for (final String server : peers.allServersWithoutMySelf()) {
try {
final String url = buildURL(server, API_BEAT);
Loggers.RAFT.info("send beat to server " + server);
HttpClient.asyncHttpPostLarge(url, null, compressedBytes, new AsyncCompletionHandler<Integer>() {
@Override
public Integer onCompleted(Response response) throws Exception {
if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
Loggers.RAFT.error("NACOS-RAFT beat failed: {}, peer: {}",
response.getResponseBody(), server);
MetricsMonitor.getLeaderSendBeatFailedException().increment();
return 1;
}
peers.update(JSON.parseObject(response.getResponseBody(), RaftPeer.class));
Loggers.RAFT.info("receive beat response from: {}", url);
return 0;
}
@Override
public void onThrowable(Throwable t) {
Loggers.RAFT.error("NACOS-RAFT error while sending heart-beat to peer: {} {}", server, t);
MetricsMonitor.getLeaderSendBeatFailedException().increment();
}
});
} catch (Exception e) {
Loggers.RAFT.error("error while sending heart-beat to peer: {} {}", server, e);
MetricsMonitor.getLeaderSendBeatFailedException().increment();
}
}
}
(.) RaftCommands.beat() method handles / v1/ns/raft/beat requests
The http interface for receiving heartbeat packets:
@RestController
@RequestMapping(UtilsAndCommons.NACOS_NAMING_CONTEXT + "/raft")
public class RaftController {
......
@NeedAuth
@RequestMapping(value = "/beat", method = RequestMethod.POST)
public JSONObject beat(HttpServletRequest request, HttpServletResponse response) throws Exception {
String entity = new String(IoUtils.tryDecompress(request.getInputStream()), "UTF-8");
String value = URLDecoder.decode(entity, "UTF-8");
value = URLDecoder.decode(value, "UTF-8");
// Analysis of Heart Packet
JSONObject json = JSON.parseObject(value);
JSONObject beat = JSON.parseObject(json.getString("beat"));
// Processing heartbeat packets and returning information from this node as response
RaftPeer peer = raftCore.receivedBeat(beat);
return JSON.parseObject(JSON.toJSONString(peer));
}
......
}
HeartBeat.receivedBeat() handles heartbeat packets
- If the node receiving the heartbeat is not the Follower role, it is set to the Follower role and its voteFor is set to the ip of the Leader node.
- Reset the heart timeout and election timeout of the local node;
- Calling PeerSet.makeLeader() notifies this node to update the Leader; (that is, the Leader node notifies other nodes to update the Leader by heartbeat)
- Check Datum:
Traverse through the datums in the request parameters, and collect the datumKey if Follwoer does not have the datumKey or if the timestamp is old.
Every 50 datum keys are collected, requests are sent to the / v1/ns/raft/get path of the Leader node. The request parameters are 50 datum keys and 50 latest Datum objects are obtained.
Traversing through these Daum objects, the next step is to do something similar to what is done in the RaftCore.onPublish() method:
1. Call RaftStore write to serialize Datum into json and write it to cacheFile
2. Store Datum in RaftCore's datums collection with key as the key value of the above datum
3. Update election timeout of local nodes
4. Update the term term of the local node
5. term persistence of local node to properties file
6. Call notifier.addTask(datum, Notifier.ApplyAction.CHANGE);
Notify the corresponding RaftListener
RaftCore.deleteDatum(String key) is used to delete old Datum
Delete the Datum corresponding to the key in the datums collection;
RaftStore.delete(), delete the Datum file on disk;
notifier.addTask(deleted, Notifier.ApplyAction.DELETE) notifies the corresponding RaftListener of a DELETE event.
- RaftPeer of the local node is returned as an http response.
@Component
public class RaftCore {
......
public RaftPeer receivedBeat(JSONObject beat) throws Exception {
final RaftPeer local = peers.local();
// Parsing node information for sending heartbeat packets
final RaftPeer remote = new RaftPeer();
remote.ip = beat.getJSONObject("peer").getString("ip");
remote.state = RaftPeer.State.valueOf(beat.getJSONObject("peer").getString("state"));
remote.term.set(beat.getJSONObject("peer").getLongValue("term"));
remote.heartbeatDueMs = beat.getJSONObject("peer").getLongValue("heartbeatDueMs");
remote.leaderDueMs = beat.getJSONObject("peer").getLongValue("leaderDueMs");
remote.voteFor = beat.getJSONObject("peer").getString("voteFor");
// If the heartbeat packet received is not sent by the leader node, an exception is thrown
if (remote.state != RaftPeer.State.LEADER) {
Loggers.RAFT.info("[RAFT] invalid state from master, state: {}, remote peer: {}",
remote.state, JSON.toJSONString(remote));
throw new IllegalArgumentException("invalid state from master, state: " + remote.state);
}
// If the local term is larger than the term of the heartbeat packet, the heartbeat packet is not processed
if (local.term.get() > remote.term.get()) {
Loggers.RAFT.info("[RAFT] out of date beat, beat-from-term: {}, beat-to-term: {}, remote peer: {}, and leaderDueMs: {}"
, remote.term.get(), local.term.get(), JSON.toJSONString(remote), local.leaderDueMs);
throw new IllegalArgumentException("out of date beat, beat-from-term: " + remote.term.get()
+ ", beat-to-term: " + local.term.get());
}
// If the current node is not a follower node, it is updated to a follower node
if (local.state != RaftPeer.State.FOLLOWER) {
Loggers.RAFT.info("[RAFT] make remote as leader, remote peer: {}", JSON.toJSONString(remote));
// mk follower
local.state = RaftPeer.State.FOLLOWER;
local.voteFor = remote.ip;
}
final JSONArray beatDatums = beat.getJSONArray("datums");
// Update the heartbeat packet sending interval and the election interval when the heartbeat packet is not received
local.resetLeaderDue();
local.resetHeartbeatDue();
// Update the leader information, set remote to the new leader, update the node information of the original leader
peers.makeLeader(remote);
// Keys of the current node are stored in a map with value s of 0
Map<String, Integer> receivedKeysMap = new HashMap<String, Integer>(datums.size());
for (Map.Entry<String, Datum> entry : datums.entrySet()) {
receivedKeysMap.put(entry.getKey(), 0);
}
// Check the received datum list
List<String> batch = new ArrayList<String>();
if (!switchDomain.isSendBeatOnly()) {
int processedCount = 0;
Loggers.RAFT.info("[RAFT] received beat with {} keys, RaftCore.datums' size is {}, remote server: {}, term: {}, local term: {}",
beatDatums.size(), datums.size(), remote.ip, remote.term, local.term);
for (Object object : beatDatums) {
processedCount = processedCount + 1;
JSONObject entry = (JSONObject) object;
String key = entry.getString("key");
final String datumKey;
// Build a datumKey (with a prefix, which is removed when the key is sent)
if (KeyBuilder.matchServiceMetaKey(key)) {
datumKey = KeyBuilder.detailServiceMetaKey(key);
} else if (KeyBuilder.matchInstanceListKey(key)) {
datumKey = KeyBuilder.detailInstanceListkey(key);
} else {
// ignore corrupted key:
continue;
}
// Get the corresponding version of the received key
long timestamp = entry.getLong("timestamp");
// Mark the received key as 1 in the map of the local key
receivedKeysMap.put(datumKey, 1);
try {
// If the received key exists locally and the local version is larger than the received version and there is data unprocessed, continue directly
if (datums.containsKey(datumKey) && datums.get(datumKey).timestamp.get() >= timestamp && processedCount < beatDatums.size()) {
continue;
}
// If the received key is not available locally, or the local version is smaller than the received version, put it in batch and prepare for the next step to get the data.
if (!(datums.containsKey(datumKey) && datums.get(datumKey).timestamp.get() >= timestamp)) {
batch.add(datumKey);
}
// Only when the batch number exceeds 50 or has been processed is the data acquisition operation performed.
if (batch.size() < 50 && processedCount < beatDatums.size()) {
continue;
}
String keys = StringUtils.join(batch, ",");
if (batch.size() <= 0) {
continue;
}
Loggers.RAFT.info("get datums from leader: {}, batch size is {}, processedCount is {}, datums' size is {}, RaftCore.datums' size is {}"
, getLeader().ip, batch.size(), processedCount, beatDatums.size(), datums.size());
// Get the data for the corresponding key
// update datum entry
String url = buildURL(remote.ip, API_GET) + "?keys=" + URLEncoder.encode(keys, "UTF-8");
HttpClient.asyncHttpGet(url, null, null, new AsyncCompletionHandler<Integer>() {
@Override
public Integer onCompleted(Response response) throws Exception {
if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
return 1;
}
List<Datum> datumList = JSON.parseObject(response.getResponseBody(), new TypeReference<List<Datum>>() {
});
// Update local data
for (Datum datum : datumList) {
OPERATE_LOCK.lock();
try {
Datum oldDatum = getDatum(datum.key);
if (oldDatum != null && datum.timestamp.get() <= oldDatum.timestamp.get()) {
Loggers.RAFT.info("[NACOS-RAFT] timestamp is smaller than that of mine, key: {}, remote: {}, local: {}",
datum.key, datum.timestamp, oldDatum.timestamp);
continue;
}
raftStore.write(datum);
if (KeyBuilder.matchServiceMetaKey(datum.key)) {
Datum<Service> serviceDatum = new Datum<>();
serviceDatum.key = datum.key;
serviceDatum.timestamp.set(datum.timestamp.get());
serviceDatum.value = JSON.parseObject(JSON.toJSONString(datum.value), Service.class);
datum = serviceDatum;
}
if (KeyBuilder.matchInstanceListKey(datum.key)) {
Datum<Instances> instancesDatum = new Datum<>();
instancesDatum.key = datum.key;
instancesDatum.timestamp.set(datum.timestamp.get());
instancesDatum.value = JSON.parseObject(JSON.toJSONString(datum.value), Instances.class);
datum = instancesDatum;
}
datums.put(datum.key, datum);
notifier.addTask(datum.key, ApplyAction.CHANGE);
local.resetLeaderDue();
if (local.term.get() + 100 > remote.term.get()) {
getLeader().term.set(remote.term.get());
local.term.set(getLeader().term.get());
} else {
local.term.addAndGet(100);
}
raftStore.updateTerm(local.term.get());
Loggers.RAFT.info("data updated, key: {}, timestamp: {}, from {}, local term: {}",
datum.key, datum.timestamp, JSON.toJSONString(remote), local.term);
} catch (Throwable e) {
Loggers.RAFT.error("[RAFT-BEAT] failed to sync datum from leader, key: {} {}", datum.key, e);
} finally {
OPERATE_LOCK.unlock();
}
}
TimeUnit.MILLISECONDS.sleep(200);
return 0;
}
});
batch.clear();
} catch (Exception e) {
Loggers.RAFT.error("[NACOS-RAFT] failed to handle beat entry, key: {}", datumKey);
}
}
// If a key exists locally but does not appear in the list of keys received, it proves that the leader has been deleted, and the local key must also be deleted.
List<String> deadKeys = new ArrayList<String>();
for (Map.Entry<String, Integer> entry : receivedKeysMap.entrySet()) {
if (entry.getValue() == 0) {
deadKeys.add(entry.getKey());
}
}
for (String deadKey : deadKeys) {
try {
deleteDatum(deadKey);
} catch (Exception e) {
Loggers.RAFT.error("[NACOS-RAFT] failed to remove entry, key={} {}", deadKey, e);
}
}
}
return local;
}
}
5. Raft publishes content
Registration Entry
Register http interface
@RestController
@RequestMapping(UtilsAndCommons.NACOS_NAMING_CONTEXT + "/instance")
public class InstanceController {
......
@CanDistro
@RequestMapping(value = "", method = RequestMethod.POST)
public String register(HttpServletRequest request) throws Exception {
// Get namespace and serviceName
String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
// Execute registration logic
serviceManager.registerInstance(namespaceId, serviceName, parseInstance(request));
return "ok";
}
}
Examples of registration
@Component
@DependsOn("nacosApplicationContext")
public class ServiceManager implements RecordListener<Service> {
......
private Map<String, Map<String, Service>> serviceMap = new ConcurrentHashMap<>();
......
// Register new instances
public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
// Create empty service, all services are stored in service Map, service Map type is: Map < String, Map < String, Service >, the key of the first layer map is namespace, the key of the second layer map is service Name;
// A clusterMap is maintained in each service, and two set s in the clusterMap are used to store instance s.
if (ServerMode.AP.name().equals(switchDomain.getServerMode())) {
createEmptyService(namespaceId, serviceName);
}
Service service = getService(namespaceId, serviceName);
if (service == null) {
throw new NacosException(NacosException.INVALID_PARAM,
"service not found, namespace: " + namespaceId + ", service: " + serviceName);
}
// Check if the instance exists and compare it over ip
if (service.allIPs().contains(instance)) {
throw new NacosException(NacosException.INVALID_PARAM, "instance already exist: " + instance);
}
// Add a new instance
addInstance(namespaceId, serviceName, instance.isEphemeral(), instance);
}
// Create an empty service
public void createEmptyService(String namespaceId, String serviceName) throws NacosException {
Service service = getService(namespaceId, serviceName);
if (service == null) {
service = new Service();
service.setName(serviceName);
service.setNamespaceId(namespaceId);
service.setGroupName(Constants.DEFAULT_GROUP);
// now validate the service. if failed, exception will be thrown
service.setLastModifiedMillis(System.currentTimeMillis());
service.recalculateChecksum();
service.validate();
putService(service);
service.init();
// Add service monitoring to synchronize data
consistencyService.listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), true), service);
consistencyService.listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), false), service);
}
}
// Add instance to the cache and persist
public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips) throws NacosException {
String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral);
Service service = getService(namespaceId, serviceName);
// Add instance to local cache
List<Instance> instanceList = addIpAddresses(service, ephemeral, ips);
Instances instances = new Instances();
instances.setInstanceList(instanceList);
// Persistence of instance information
consistencyService.put(key, instances);
}
// Add instances to the cache
public List<Instance> addIpAddresses(Service service, boolean ephemeral, Instance... ips) throws NacosException {
return updateIpAddresses(service, UtilsAndCommons.UPDATE_INSTANCE_ACTION_ADD, ephemeral, ips);
}
// Real logic to add instances to caches
public List<Instance> updateIpAddresses(Service service, String action, boolean ephemeral, Instance... ips) throws NacosException {
Datum datum = consistencyService.get(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), ephemeral));
Map<String, Instance> oldInstanceMap = new HashMap<>(16);
List<Instance> currentIPs = service.allIPs(ephemeral);
Map<String, Instance> map = new ConcurrentHashMap<>(currentIPs.size());
for (Instance instance : currentIPs) {
map.put(instance.toIPAddr(), instance);
}
if (datum != null) {
oldInstanceMap = setValid(((Instances) datum.value).getInstanceList(), map);
}
// use HashMap for deep copy:
HashMap<String, Instance> instanceMap = new HashMap<>(oldInstanceMap.size());
instanceMap.putAll(oldInstanceMap);
for (Instance instance : ips) {
if (!service.getClusterMap().containsKey(instance.getClusterName())) {
Cluster cluster = new Cluster(instance.getClusterName());
cluster.setService(service);
service.getClusterMap().put(instance.getClusterName(), cluster);
Loggers.SRV_LOG.warn("cluster: {} not found, ip: {}, will create new cluster with default configuration.",
instance.getClusterName(), instance.toJSON());
}
if (UtilsAndCommons.UPDATE_INSTANCE_ACTION_REMOVE.equals(action)) {
instanceMap.remove(instance.getDatumKey());
} else {
instanceMap.put(instance.getDatumKey(), instance);
}
}
if (instanceMap.size() <= 0 && UtilsAndCommons.UPDATE_INSTANCE_ACTION_ADD.equals(action)) {
throw new IllegalArgumentException("ip list can not be empty, service: " + service.getName() + ", ip list: "
+ JSON.toJSONString(instanceMap.values()));
}
return new ArrayList<>(instanceMap.values());
}
// Merge the old instance list with the new instance
private Map<String, Instance> setValid(List<Instance> oldInstances, Map<String, Instance> map) {
Map<String, Instance> instanceMap = new HashMap<>(oldInstances.size());
for (Instance instance : oldInstances) {
Instance instance1 = map.get(instance.toIPAddr());
if (instance1 != null) {
instance.setHealthy(instance1.isHealthy());
instance.setLastBeat(instance1.getLastBeat());
}
instanceMap.put(instance.getDatumKey(), instance);
}
return instanceMap;
}
......
}
Instance information persistence
The RaftConsistencyService Impl. put () method is used to do the persistence of instance information, that is, consistencyService.put(key, instances) mentioned above; this step
(1)Service.put()
@Service
public class RaftConsistencyServiceImpl implements PersistentConsistencyService {
......
@Override
public void put(String key, Record value) throws NacosException {
try {
raftCore.signalPublish(key, value);
} catch (Exception e) {
Loggers.RAFT.error("Raft put failed.", e);
throw new NacosException(NacosException.SERVER_ERROR, "Raft put failed, key:" + key + ", value:" + value);
}
}
}
Finally, the signalPublish() method to RaftCore is called:
(2)RaftCore.signalPublish()
@Component
public class RaftCore {
......
public void signalPublish(String key, Record value) throws Exception {
// If it's not the leader, forward the package directly to the leader
if (!isLeader()) {
JSONObject params = new JSONObject();
params.put("key", key);
params.put("value", value);
Map<String, String> parameters = new HashMap<>(1);
parameters.put("key", key);
// Call the / raft/datum interface
raftProxy.proxyPostLarge(getLeader().ip, API_PUB, params.toJSONString(), parameters);
return;
}
// If leader, send the package to all follower s
try {
OPERATE_LOCK.lock();
long start = System.currentTimeMillis();
final Datum datum = new Datum();
datum.key = key;
datum.value = value;
if (getDatum(key) == null) {
datum.timestamp.set(1L);
} else {
datum.timestamp.set(getDatum(key).timestamp.incrementAndGet());
}
JSONObject json = new JSONObject();
json.put("datum", datum);
json.put("source", peers.local());
// The local onPublish method is used to handle persistence logic
onPublish(datum, peers.local());
final String content = JSON.toJSONString(json);
final CountDownLatch latch = new CountDownLatch(peers.majorityCount());
// Send the package to all follower s, calling the / raft/datum/commit interface
for (final String server : peers.allServersIncludeMyself()) {
if (isLeader(server)) {
latch.countDown();
continue;
}
final String url = buildURL(server, API_ON_PUB);
HttpClient.asyncHttpPostLarge(url, Arrays.asList("key=" + key), content, new AsyncCompletionHandler<Integer>() {
@Override
public Integer onCompleted(Response response) throws Exception {
if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
Loggers.RAFT.warn("[RAFT] failed to publish data to peer, datumId={}, peer={}, http code={}",
datum.key, server, response.getStatusCode());
return 1;
}
latch.countDown();
return 0;
}
@Override
public STATE onContentWriteCompleted() {
return STATE.CONTINUE;
}
});
}
if (!latch.await(UtilsAndCommons.RAFT_PUBLISH_TIMEOUT, TimeUnit.MILLISECONDS)) {
// only majority servers return success can we consider this update success
Loggers.RAFT.info("data publish failed, caused failed to notify majority, key={}", key);
throw new IllegalStateException("data publish failed, caused failed to notify majority, key=" + key);
}
long end = System.currentTimeMillis();
Loggers.RAFT.info("signalPublish cost {} ms, key: {}", (end - start), key);
} finally {
OPERATE_LOCK.unlock();
}
}
}
(3)/raft/datum interface and/raft/datum/commit interface
@RestController
@RequestMapping(UtilsAndCommons.NACOS_NAMING_CONTEXT + "/raft")
public class RaftController {
......
@NeedAuth
@RequestMapping(value = "/datum", method = RequestMethod.POST)
public String publish(HttpServletRequest request, HttpServletResponse response) throws Exception {
response.setHeader("Content-Type", "application/json; charset=" + getAcceptEncoding(request));
response.setHeader("Cache-Control", "no-cache");
response.setHeader("Content-Encode", "gzip");
String entity = IOUtils.toString(request.getInputStream(), "UTF-8");
String value = URLDecoder.decode(entity, "UTF-8");
JSONObject json = JSON.parseObject(value);
// Here, RaftConsistencyServiceImpl.put() is also called for processing, and the logic of service registration is rounded here, eventually calling the signalPublish method.
String key = json.getString("key");
if (KeyBuilder.matchInstanceListKey(key)) {
raftConsistencyService.put(key, JSON.parseObject(json.getString("value"), Instances.class));
return "ok";
}
if (KeyBuilder.matchSwitchKey(key)) {
raftConsistencyService.put(key, JSON.parseObject(json.getString("value"), SwitchDomain.class));
return "ok";
}
if (KeyBuilder.matchServiceMetaKey(key)) {
raftConsistencyService.put(key, JSON.parseObject(json.getString("value"), Service.class));
return "ok";
}
throw new NacosException(NacosException.INVALID_PARAM, "unknown type publish key: " + key);
}
@NeedAuth
@RequestMapping(value = "/datum/commit", method = RequestMethod.POST)
public String onPublish(HttpServletRequest request, HttpServletResponse response) throws Exception {
response.setHeader("Content-Type", "application/json; charset=" + getAcceptEncoding(request));
response.setHeader("Cache-Control", "no-cache");
response.setHeader("Content-Encode", "gzip");
String entity = IOUtils.toString(request.getInputStream(), "UTF-8");
String value = URLDecoder.decode(entity, "UTF-8");
JSONObject jsonObject = JSON.parseObject(value);
String key = "key";
RaftPeer source = JSON.parseObject(jsonObject.getString("source"), RaftPeer.class);
JSONObject datumJson = jsonObject.getJSONObject("datum");
Datum datum = null;
if (KeyBuilder.matchInstanceListKey(datumJson.getString(key))) {
datum = JSON.parseObject(jsonObject.getString("datum"), new TypeReference<Datum<Instances>>() {});
} else if (KeyBuilder.matchSwitchKey(datumJson.getString(key))) {
datum = JSON.parseObject(jsonObject.getString("datum"), new TypeReference<Datum<SwitchDomain>>() {});
} else if (KeyBuilder.matchServiceMetaKey(datumJson.getString(key))) {
datum = JSON.parseObject(jsonObject.getString("datum"), new TypeReference<Datum<Service>>() {});
}
// This method is finally called to the onPublish method
raftConsistencyService.onPut(datum, source);
return "ok";
}
......
}
Publish Entry RaftCommands.publish()
@Component
public class RaftCore {
......
public void onPublish(Datum datum, RaftPeer source) throws Exception {
RaftPeer local = peers.local();
if (datum.value == null) {
Loggers.RAFT.warn("received empty datum");
throw new IllegalStateException("received empty datum");
}
// If the package is not published by leader, throw an exception
if (!peers.isLeader(source.ip)) {
Loggers.RAFT.warn("peer {} tried to publish data but wasn't leader, leader: {}",
JSON.toJSONString(source), JSON.toJSONString(getLeader()));
throw new IllegalStateException("peer(" + source.ip + ") tried to publish " +
"data but wasn't leader");
}
// The source term is smaller than the local current term and throws an exception
if (source.term.get() < local.term.get()) {
Loggers.RAFT.warn("out of date publish, pub-term: {}, cur-term: {}",
JSON.toJSONString(source), JSON.toJSONString(local));
throw new IllegalStateException("out of date publish, pub-term:"
+ source.term.get() + ", cur-term: " + local.term.get());
}
// Update election timeouts
local.resetLeaderDue();
// Node information persistence
// if data should be persistent, usually this is always true:
if (KeyBuilder.matchPersistentKey(datum.key)) {
raftStore.write(datum);
}
// Add to Cache
datums.put(datum.key, datum);
// Update term information
if (isLeader()) {
local.term.addAndGet(PUBLISH_TERM_INCREASE_COUNT);
} else {
if (local.term.get() + PUBLISH_TERM_INCREASE_COUNT > source.term.get()) {
//set leader term:
getLeader().term.set(source.term.get());
local.term.set(getLeader().term.get());
} else {
local.term.addAndGet(PUBLISH_TERM_INCREASE_COUNT);
}
}
raftStore.updateTerm(local.term.get());
// Notify the application node that information has changed
notifier.addTask(datum.key, ApplyAction.CHANGE);
Loggers.RAFT.info("data added/updated, key={}, term={}", datum.key, local.term);
}
}
6. Raft guarantees content consistency
Nacos publishes content through Raft, which only exists on the Leader node and ensures consistency through Raft heartbeat mechanism.
When registering information, addInstance() method adds instance to the local cache, but when raft synchronizes data from leader to follower, follower receives the package and persists through onPublish() method, instead of updating the information to the local cache, it is implemented through a listener:
At the end of the onPublish method, there is a line: notifier.addTask(datum.key, ApplyAction.CHANGE); that is, add this change to the notification task, and let's see how the notification task will be handled:
@Component
public class RaftCore {
......
public class Notifier implements Runnable {
private ConcurrentHashMap<String, String> services = new ConcurrentHashMap<>(10 * 1024);
private BlockingQueue<Pair> tasks = new LinkedBlockingQueue<Pair>(1024 * 1024);
// Add change tasks to task queue
public void addTask(String datumKey, ApplyAction action) {
if (services.containsKey(datumKey) && action == ApplyAction.CHANGE) {
return;
}
if (action == ApplyAction.CHANGE) {
services.put(datumKey, StringUtils.EMPTY);
}
tasks.add(Pair.with(datumKey, action));
}
public int getTaskSize() {
return tasks.size();
}
// Processing task threads
@Override
public void run() {
Loggers.RAFT.info("raft notifier started");
while (true) {
try {
Pair pair = tasks.take();
if (pair == null) {
continue;
}
String datumKey = (String) pair.getValue0();
ApplyAction action = (ApplyAction) pair.getValue1();
// Delete the key from the service list
services.remove(datumKey);
int count = 0;
if (listeners.containsKey(KeyBuilder.SERVICE_META_KEY_PREFIX)) {
if (KeyBuilder.matchServiceMetaKey(datumKey) && !KeyBuilder.matchSwitchKey(datumKey)) {
for (RecordListener listener : listeners.get(KeyBuilder.SERVICE_META_KEY_PREFIX)) {
try {
// Depending on the type of change, different callback methods are invoked to update the cache
if (action == ApplyAction.CHANGE) {
listener.onChange(datumKey, getDatum(datumKey).value);
}
if (action == ApplyAction.DELETE) {
listener.onDelete(datumKey);
}
} catch (Throwable e) {
Loggers.RAFT.error("[NACOS-RAFT] error while notifying listener of key: {} {}", datumKey, e);
}
}
}
}
if (!listeners.containsKey(datumKey)) {
continue;
}
for (RecordListener listener : listeners.get(datumKey)) {
count++;
try {
if (action == ApplyAction.CHANGE) {
listener.onChange(datumKey, getDatum(datumKey).value);
continue;
}
if (action == ApplyAction.DELETE) {
listener.onDelete(datumKey);
continue;
}
} catch (Throwable e) {
Loggers.RAFT.error("[NACOS-RAFT] error while notifying listener of key: {} {}", datumKey, e);
}
}
if (Loggers.RAFT.isDebugEnabled()) {
Loggers.RAFT.debug("[NACOS-RAFT] datum change notified, key: {}, listener count: {}", datumKey, count);
}
} catch (Throwable e) {
Loggers.RAFT.error("[NACOS-RAFT] Error while handling notifying task", e);
}
}
}
}
}