Mit6.824 Lab1 MapReduce implementation

Posted by Fergal Andrews on Sat, 05 Mar 2022 10:52:47 +0100

paper address: http://nil.csail.mit.edu/6.824/2021/schedule.html

MapReduce principle

  1. Start MapReduce and cut the input file into files with a size of 16-64MB. Then start the user program on a group of multiple machines
  2. One copy will become master and the rest will become worker The master assigns tasks to the worker (M map tasks and R reduce tasks). The master selects an idle worker to give map or reduce tasks
  3. The Map worker receives the input after segmentation, executes the Map function, and caches the results in memory
  4. The intermediate results after caching will be written to the local disk periodically and divided into R copies (number of reducers). The location of R files will be sent to the master, and the master will forward it to reducer
  5. The Reduce worker receives the location information of the intermediate file and reads it through RPC. After reading, sort according to the middle < K, V > first, and then group and merge according to the key.
  6. The Reduce worker iterates over the sorted data and gives the intermediate < K, V > to the reduce function for processing. Write the final result to the corresponding output file (fragment)
  7. After all map and reduce tasks are completed, the master wakes up the user program

MapReduce implementation process

Master

The paper mentioned that each (Map or Reduce)Task has three states: idle, in-progress and completed.

// Enumeration, indicating the task execution stage. According to the paper, it is divided into idle, executing and completed
const (
	Idle MasterTaskStatus = iota
	InProgress
	Completed
)

The Master saves the Task information

// The Task information recorded by the Master includes the Task execution phase, Task start time and the pointer of the Task object
type MasterTask struct {
	TaskStatus    MasterTaskStatus // Task execution phase
	StartTime     time.Time        // Task start time
	TaskReference *Task            // Indicates which task is currently executing
}

The Master stores the information of R intermediate files generated by the Map task.

// Master node object
type Master struct {
	TaskQueue     chan *Task          // Save the Task queue and implement the queue through the channel channel
	TaskMeta      map[int]*MasterTask // Information of all tasks in the current system. The key is taskId
	MasterPhase   State               // Master phase
	NReduce       int                 // R Reduce worker threads
	InputFiles    []string            // Enter file name
	Intermediates [][]string          // A two-dimensional array of M rows and R columns, which saves M*R intermediate files generated by the Map task
}

Map and Reduce use the same Task structure, which can take into account two-stage tasks.

// Task object
type Task struct {
	Input         string   // The name of the input file that the task is responsible for processing
	TaskState     State    // Task status
	NReducer      int      // R reducers
	TaskNumber    int      // TaskId
	Intermediates []string // Save the disk path of the R intermediate files generated by the Map task
	Output        string   // Output file name
}

Merge the states of task and master into one State

type State int
// Enumeration, indicating the status of Master and Task
const (
	Map State = iota  // Enumerating from 0
	Reduce
	Exit
	Wait
)

MapReduce implements Map and Reduce

1. Start the master

// create a Master.
// main/mrmaster.go calls this function.
// nReduce is the number of reduce tasks to use.
// Create a Master node, which is responsible for distributing tasks as a service registration center and a service scheduling center
func MakeMaster(files []string, nReduce int) *Master {
	// Create Master node
	m := Master{
		// Save the task queue and realize first in first out through chan channel
		TaskQueue: make(chan *Task, max(nReduce, len(files))),
		// The main function is to obtain the corresponding Task information through the key taskId
		TaskMeta: make(map[int]*MasterTask),
		// At the beginning, both the Master and Task are in the Map stage
		MasterPhase: Map,
		NReduce:     nReduce,
		InputFiles:  files,
		// Create a two-dimensional array, save the intermediate file path generated in the Map stage, and set the number of columns to nReduce
		Intermediates: make([][]string, nReduce),
	}
	// TODO divides the files in files into 16MB-64MB files

	// Create Map task
	m.createMapTask()
	// Start the Master node, register all the Master methods in the registry, and the worker can access the Master methods through RPC
	m.server()
	// crash, start a coroutine to constantly check the overtime tasks
	go m.catchTimeOut()
	return &m
}

Create Map task

// Create Map task
func (m *Master) createMapTask() {
	// Traverse all input files, and each file is processed with a Map task
	for idx, fileName := range m.InputFiles {
		// Create a Map Task object
		taskMeta := Task{
			Input:      fileName,
			TaskState:  Map,
			NReducer:   m.NReduce,
			TaskNumber: idx,
		}
		// Put the Task object into the queue
		m.TaskQueue <- &taskMeta
		// Fill in the Master's information about all tasks in the current queue. taskId is key and value saves the task information
		m.TaskMeta[idx] = &MasterTask{
			TaskStatus:    Idle,
			TaskReference: &taskMeta,
		}
	}
}

Constantly check overtime tasks to improve execution efficiency

// crash, start a coroutine to constantly check the overtime tasks
func (m *Master) catchTimeOut() {
	for {
		time.Sleep(5 * time.Second)
		// Lock the m.MasterPhase that other threads may use
		mu.Lock()
		// If the execution status of the Master node is exit status, exit the check
		if m.MasterPhase == Exit {
			mu.Unlock()
			return
		}
		// Check all tasks
		for _, masterTask := range m.TaskMeta {
			// If the task is in execution and the execution time is greater than 10 seconds, it will be put into the queue again for execution by other worker s
			if masterTask.TaskStatus == InProgress && time.Now().Sub(masterTask.StartTime) > 10*time.Second {
				m.TaskQueue <- masterTask.TaskReference
				masterTask.TaskStatus = Idle
			}
		}
		mu.Unlock()
	}
}

2. The master listens to worker RPC calls and assigns tasks

// Wait for the worker to request the Master's service through rpc
func (m *Master) AssignTask(args *ExampleArgs, reply *Task) error {
	// Lock the Master node
	mu.Lock()
	defer mu.Unlock()
	// There are still free tasks in the queue
	if len(m.TaskQueue) > 0 {
		// When there are free tasks in the taskQueue, a task pointer is sent to a worker
		*reply = *<-m.TaskQueue
		// Set Task status
		m.TaskMeta[reply.TaskNumber].TaskStatus = InProgress
		m.TaskMeta[reply.TaskNumber].StartTime = time.Now()
	} else if m.MasterPhase == Exit {
		// There are still tasks in the queue, but the Master status is Exit
		// A Task with Exit status is returned, indicating that the Master has terminated the service
		*reply = Task{
			TaskState: Exit,
		}
	} else {
		// If there is no task in the queue, let the requested worker wait
		*reply = Task{
			TaskState: Wait,
		}
	}
	return nil
}

3. Start the worker

// main/mrworker.go calls this function.
// Start Worker
func Worker(mapf func(string, string) []KeyValue, reducef func(string, []string) string) {
	for {
		// Get idle tasks via RPC
		task := getTask()
		// Conduct corresponding processing according to the current execution status of the task
		switch task.TaskState {
		case Map:
			mapper(&task, mapf)
		case Reduce:
			reducer(&task, reducef)
		case Wait:
			time.Sleep(5 * time.Second)
		case Exit:
			return
		}
	}
}

4. The worker sends an RPC request to the master

// Get idle tasks via RPC
func getTask() Task {
	args := ExampleArgs{}
	reply := Task{}
	// The RPC request calls the service of the Master to get the Task
	call("Master.AssignTask", &args, &reply)
	return reply
}

5. The worker obtains the MapTask and submits it to the mapper for processing

// Execute Map task
func mapper(task *Task, mapf func(string, string) []KeyValue) {
	// Get the file path corresponding to the task
	content, err := ioutil.ReadFile(task.Input)
	if err != nil {
		log.Fatal("Failed to read file: "+task.Input, err)
	}
	// Execute WC The mapf method in go performs the map phase of MapReduce to obtain a string array of nReduce intermediate file paths
	intermediates := mapf(task.Input, string(content))
	// Save the intermediate file path generated in the map phase to a two-dimensional array with NReducer columns
	buffer := make([][]KeyValue, task.NReducer)
	// Save the results to the memory buffer
	for _, intermediate := range intermediates {
		// hash according to the key and divide the result into NReducer shares
		slot := ihash(intermediate.Key) % task.NReducer
		buffer[slot] = append(buffer[slot], intermediate)
	}
	// Periodically save from memory to disk
	mapOutput := make([]string, 0)
	for i := 0; i < task.NReducer; i++ {
		// Write intermediate results to NReducer temporary files
		mapOutput = append(mapOutput, writeToLocalFile(task.TaskNumber, i, &buffer[i]))
	}
	// NReducer saves the path of a file to memory and the Master can get it
	task.Intermediates = mapOutput
	// Set the task status to completed
	TaskCompleted(task)
}

6. Notify the master after the worker completes the task

func TaskCompleted(task *Task) {
	reply := ExampleReply{}
	call("Master.TaskCompleted", task, &reply)
}

7. The master receives the completed Task

// Update Task status to completed and check
func (m *Master) TaskCompleted(task *Task, reply *ExampleReply) error {
	mu.Lock()
	defer mu.Unlock()
	// Fault tolerance, check node status, check duplicate tasks
	if task.TaskState != m.MasterPhase || m.TaskMeta[task.TaskNumber].TaskStatus == Completed {
		// Repeat task to discard
		return nil
	}
	m.TaskMeta[task.TaskNumber].TaskStatus = Completed
	go m.processTaskResult(task)
	return nil
}
  • If all reducetasks have been completed, go to the Exit phase
// The master obtains the results of task execution through collaboration
func (m *Master) processTaskResult(task *Task) {
	mu.Lock()
	defer mu.Unlock()
	switch task.TaskState {
	case Map:
		// The results are collected in the middle stage of Master Map in memory
		// key is the taskId, value is the string array of file paths, and a task has NReducer and filePath
		for reduceTaskId, filePath := range task.Intermediates {
			m.Intermediates[reduceTaskId] = append(m.Intermediates[reduceTaskId], filePath)
		}
		// If all tasks have been completed, enter the reduce phase
		if m.allTaskDone() {
			m.createReduceTask()
			m.MasterPhase = Reduce
		}
	case Reduce:
		// Reduce sets the status to Exit
		if m.allTaskDone() {
			m.MasterPhase = Exit
		}
	}
}

8. If all maptasks have been completed, create a ReduceTask and move to the Reduce phase

// Execute Reduce task
func reducer(task *Task, reducef func(string, []string) string) {
	// Read intermediate files from disk
	intermediate := *readFromLocalFile(task.Intermediates)
	// Sort dictionary order according to key
	sort.Sort(ByKey(intermediate))

	dir, _ := os.Getwd()
	tempFile, err := ioutil.TempFile(dir, "mr-2021-tmp-*")
	if err != nil {
		log.Fatal("Failed to create temp file", err)
	}
	i := 0
	// Traverse every key
	for i < len(intermediate) {
		j := i + 1
		// Grouping and merging of the same key
		for j < len(intermediate) && intermediate[i].Key == intermediate[j].Key {
			j++
		}
		// Save the final count of the key, that is, consolidate the counts of the same key
		values := []string{}
		for k := i; k < j; k++ {
			values = append(values, intermediate[k].Value)
		}
		// Submit the results to reducef for statistics
		output := reducef(intermediate[i].Key, values)
		// Save the string content of the final result to a temporary file
		fmt.Fprintf(tempFile, "%v %v\n", intermediate[i].Key, output)
		i = j
	}
	tempFile.Close()
	// Defines the file name of the output file
	oname := fmt.Sprintf("mr-2021-out-%d", task.TaskNumber)
	os.Rename(tempFile.Name(), oname)
	task.Output = oname
	TaskCompleted(task)
}

9. The master confirms that all reducetasks have been completed, enters the Exit phase, and terminates all master and worker goroutine s

//
// main/mrmaster.go calls Done() periodically to find out
// if the entire job has finished.
//
func (m *Master) Done() bool {
	mu.Lock()
	defer mu.Unlock()
	ret := m.MasterPhase == Exit
	return ret
}
  1. Concurrent

Because the master saves the information related to the Task, the master needs to be modified concurrently when the worker executes the Task, so it needs to be locked. The master communicates with multiple workers, and the data of the master is shared.

// Master node object
type Master struct {
	TaskQueue     chan *Task          // Save the Task queue and implement the queue through the channel channel
	TaskMeta      map[int]*MasterTask // Information of all tasks in the current system. The key is taskId
	MasterPhase   State               // Master phase
	NReduce       int                 // R Reduce worker threads
	InputFiles    []string            // Enter file name
	Intermediates [][]string          // A two-dimensional array of M rows and R columns, which saves M*R intermediate files generated by the Map task
}

Among them, taskmeta, phase, intermediates and TaskQueue all have reading and writing. TaskQueue is implemented using channel and has its own lock. Only operations involving intermediates, taskmeta and phase need to be locked. InputFiles and NReduce are written at one time when creating the Master, so there will be no concurrent write scenario.

11. Fault tolerance

  1. Send heartbeat detection to worker periodically
  • If the worker is lost for a period of time, the master marks the worker as failed
  • After the worker fails, the completed map task is re marked as idle, and the completed reduce task does not need to be changed
  1. For tasks with in progress and timeout, they will be put into the queue again for execution by other worker s
// crash, start a coroutine to constantly check the overtime tasks
func (m *Master) catchTimeOut() {
	for {
		time.Sleep(5 * time.Second)
		// Lock the m.MasterPhase that other threads may use
		mu.Lock()
		// If the execution status of the Master node is exit status, exit the check
		if m.MasterPhase == Exit {
			mu.Unlock()
			return
		}
		// Check all tasks
		for _, masterTask := range m.TaskMeta {
			// If the task is in execution and the execution time is greater than 10 seconds, it will be put into the queue again for execution by other worker s
			if masterTask.TaskStatus == InProgress && time.Now().Sub(masterTask.StartTime) > 10*time.Second {
				m.TaskQueue <- masterTask.TaskReference
				masterTask.TaskStatus = Idle
			}
		}
		mu.Unlock()
	}
}