Simple usage record of MergeOperator of Rocksdb

Posted by vichiq on Wed, 05 Jan 2022 21:37:45 +0100

This article is just a record of how MergeOperator is used.
Rocksdb uses MergeOperator to replace the read / overwrite operation in the Update scenario, that is, a user's Update operation needs to call the Get + Put interface of rocksdb to complete.
In this case, some additional read-write amplification will be introduced, which is not cost-effective for scenes that support frequent updates such as SQL. Therefore, the Merge operation is born. Users only need to implement their own Merge operation and pass it in through option. When there is an update scene, they only need to call a Merge, Subsequent real update s for the current key will be merged in the background during the comparison and user survey Get or iterator operations. Of course, there is a problem with Merge itself, that is, if the kMergeType cannot be scheduled in a timely manner, the reading load may be heavy, because the previous unmerged merges need to be merged before they can be returned.

Because there are too many functions in the virtual base class of MergeOperator, it will distinguish between Full merge and partial merge, but for many users, it is a count accumulation or string append operation, which is not too complex. Therefore, rocksdb provides a more general virtual base class AssociativeMergeOperator to shield complex Full merge and partial merge, To inherit this class, you only need the main body to implement a Merge and Name function.

The following code uses two merge operations encapsulated by Rocksdb, one is StringAppendOperator and the other is UInt64AddOperator. Merge itself writes a key/value, but the type of key is kMergeType, and value also exists.

StringAppendOperator

The simple test code of StringAppend is as follows: we use the same key to do two Merge operations, which is equivalent to writing two kMergeType key to db, then calling Flush once, will generate a sst file (Merge), then Get will find that the key's Get is completed according to the behavior of the middle.

#include <iostream>
#include <vector>

#include <rocksdb/db.h>
#include <rocksdb/table.h>
#include <rocksdb/options.h>
#include <rocksdb/merge_operator.h>
#include <rocksdb/filter_policy.h>
#include <rocksdb/perf_context.h>
#include <rocksdb/iostats_context.h>
#include <rocksdb/trace_reader_writer.h>
#include "utilities/merge_operators.h"

using namespace rocksdb;

using namespace std;
rocksdb::DB* db;
rocksdb::Options option;

void OpenDB() {
  option.create_if_missing = true;
  option.compression = rocksdb::CompressionType::kNoCompression;

  rocksdb::BlockBasedTableOptions table_options;
  table_options.no_block_cache = true;
  table_options.cache_index_and_filter_blocks = false;
  option.table_factory.reset(NewBlockBasedTableFactory(table_options));
  // By default, different merge s are separated by commas
  option.merge_operator = MergeOperators::CreateStringAppendOperator();

  auto s = rocksdb::DB::Open(option, "./db", &db);
  if (!s.ok()) {
    cout << "open faled :  " << s.ToString() << endl;
    exit(-1);
  }
  cout << "Finish open !"<< endl;
}

void DoWrite() {
  int j = 0;
  string key = std::to_string(j);
  std::string value;

  char buf[8];
  rocksdb::Status s;
  EncodeFixed64(buf, 2);
  s = db->Merge(rocksdb::WriteOptions(),key, "2");
  s = db->Merge(rocksdb::WriteOptions(),key, "3");
  db->Flush(rocksdb::FlushOptions());
  if (!s.ok()) {
    cout << "Merge value failed: " << s.ToString() << endl;
    exit(-1);
  }

  s = db->Get(rocksdb::ReadOptions(), key, &value);
  if (!s.ok()) {
    cout << "Get after only merge is failed " << s.ToString() << endl;
    exit(-1);
  }
  cout << "Get merge value " << value.size() << " " << value << endl;
}

int main() {
  OpenDB();
  DoWrite();
  return 0;
}

The output is as follows:

Finish open !
Finish merge !
Get merge value len: 3 data: 2,3

You can see that the value obtained by Get has been merged.

UInt64AddOperator

This is a self incrementing Merge case.
The main thing is that if the lower layer of the MergeOperator is encoded and decoded, the writing requested by the upper user side also needs to be written in the encoding mode and read according to the decoding mode of the lower layer.
The case code implemented by Rocksdb will be encoded and decoded when it gets the value passed in by the user:

// A 'model' merge operator with uint64 addition semantics
// Implemented as an AssociativeMergeOperator for simplicity and example.
class UInt64AddOperator : public AssociativeMergeOperator {
 public:
  bool Merge(const Slice& /*key*/, const Slice* existing_value,
             const Slice& value, std::string* new_value,
             Logger* logger) const override {
    uint64_t orig_value = 0;
    if (existing_value){
      // Decode the existing value, then we need to encode according to Fixed64 when calling Merge to write
      orig_value = DecodeInteger(*existing_value, logger);
    }
    uint64_t operand = DecodeInteger(value, logger);

    assert(new_value);
    new_value->clear();
    PutFixed64(new_value, orig_value + operand);

    return true;  // Return true always since corruption will be treated as 0
  }

  const char* Name() const override { return "UInt64AddOperator"; }

 private:
  // Takes the string and decodes it into a uint64_t
  // On error, prints a message and returns 0
  uint64_t DecodeInteger(const Slice& value, Logger* logger) const {
    uint64_t result = 0;

    if (value.size() == sizeof(uint64_t)) {
      result = DecodeFixed64(value.data());
    } else if (logger != nullptr) {
      // If value is corrupted, treat it as 0
      ROCKS_LOG_ERROR(logger, "uint64 value corruption, size: %" ROCKSDB_PRIszt
                              " > %" ROCKSDB_PRIszt,
                      value.size(), sizeof(uint64_t));
    }

    return result;
  }

};

Case code:

#include <iostream>
#include <vector>

#include <rocksdb/db.h>
#include <rocksdb/table.h>
#include <rocksdb/options.h>
#include <rocksdb/merge_operator.h>
#include <rocksdb/filter_policy.h>
#include <rocksdb/perf_context.h>
#include <rocksdb/iostats_context.h>
#include <rocksdb/trace_reader_writer.h>
#include "utilities/merge_operators.h"

using namespace rocksdb;

using namespace std;
rocksdb::DB* db;
rocksdb::Options option;

static bool LittleEndian() {
  int i = 1;
  return *((char*)(&i));
}

inline uint32_t DecodeFixed32(const char* ptr) {
  if (LittleEndian()) {
    // Load the raw bytes
    uint32_t result;
    memcpy(&result, ptr, sizeof(result));  // gcc optimizes this to a plain load
    return result;
  } else {
    return ((static_cast<uint32_t>(static_cast<unsigned char>(ptr[0])))
        | (static_cast<uint32_t>(static_cast<unsigned char>(ptr[1])) << 8)
        | (static_cast<uint32_t>(static_cast<unsigned char>(ptr[2])) << 16)
        | (static_cast<uint32_t>(static_cast<unsigned char>(ptr[3])) << 24));
  }
}

inline uint64_t DecodeFixed64(const char* ptr) {
  if (LittleEndian()) {
    // Load the raw bytes
    uint64_t result;
    memcpy(&result, ptr, sizeof(result));  // gcc optimizes this to a plain load
    return result;
  } else {
    uint64_t lo = DecodeFixed32(ptr);
    uint64_t hi = DecodeFixed32(ptr + 4);
    return (hi << 32) | lo;
  }
}

inline void EncodeFixed64(char* buf, uint64_t value) {
  if (LittleEndian()) {
    memcpy(buf, &value, sizeof(value));
  } else {
    buf[0] = value & 0xff;
    buf[1] = (value >> 8) & 0xff;
    buf[2] = (value >> 16) & 0xff;
    buf[3] = (value >> 24) & 0xff;
    buf[4] = (value >> 32) & 0xff;
    buf[5] = (value >> 40) & 0xff;
    buf[6] = (value >> 48) & 0xff;
    buf[7] = (value >> 56) & 0xff;
  }
}

void OpenDB() {
  option.create_if_missing = true;
  option.compression = rocksdb::CompressionType::kNoCompression;

  rocksdb::BlockBasedTableOptions table_options;
  table_options.no_block_cache = true;
  table_options.cache_index_and_filter_blocks = false;
  option.table_factory.reset(NewBlockBasedTableFactory(table_options));
  option.merge_operator = MergeOperators::CreateUInt64AddOperator();

  auto s = rocksdb::DB::Open(option, "./db", &db);
  if (!s.ok()) {
    cout << "open faled :  " << s.ToString() << endl;
    exit(-1);
  }
  cout << "Finish open !"<< endl;
}

void DoWrite() {
  int j = 0;
  string key = std::to_string(j);
  std::string value;

  char buf[8];
  rocksdb::Status s;
  // Because the underlying implementation of Uint64AddOperator will encode and decode
  EncodeFixed64(buf, 2);
  // If you merge two 2's for the same key, it will become 4 at the last Get
  s = db->Merge(rocksdb::WriteOptions(),key, std::string(buf,8));
  s = db->Merge(rocksdb::WriteOptions(),key, std::string(buf,8));
  db->Flush(rocksdb::FlushOptions());
  if (!s.ok()) {
    cout << "Merge value failed: " << s.ToString() << endl;
    exit(-1);
  }
  cout << "Finish merge !" << endl;

  s = db->Get(rocksdb::ReadOptions(), key, &value);
  if (!s.ok()) {
    cout << "Get after only merge is failed " << s.ToString() << endl;
    exit(-1);
  }
  cout << "Get merge value " << value.size() << " " << DecodeFixed64(value.data()) << endl;
}

int main() {
  OpenDB();
  DoWrite();

  return 0;
}

The output is as follows:

Finish open !
Finish merge !
Get merge value 8 4