Walking into a Task: what is a Task

Posted by larrygingras on Wed, 19 Jan 2022 19:45:40 +0100

preface

This series will be divided into the following articles and described in stages:

  1. What is a Task (this article)
  2. Callback execution of Task and await (TODO)
  3. What did async do (TODO)
  4. Summary and common misunderstandings (TODO)

In 2, I will share deadlock related issues with you. 2 and 3 will be interspersed with customized Awaitable topics.

This series will refer directly to the previous blog summary. NET 6 ThreadPool implementation All the students who haven't seen it please have a look at it first.

All examples in this paper are for the purpose of explanation and are not code with practical significance. The actual difference between a Task with a return value and a Task without a return value is not great. Most of the examples below do not make a special distinction. Don't tangle with the details of api use, just talk about the overall design idea of Task.

The screenshot of the code run is in NET 6, the design of other versions has no major changes and does not affect learning.

The author's interpretation is not authoritative, but hopes to give you a way to understand the Task.

From the appearance

Where do tasks come from

The following are only typical examples, not all of them

  • new Task
new Task(_ =>
{
    Console.WriteLine("Hello World!");
}, null).Start();
  • TaskFactory.StartNew
new TaskFactory().StartNew(() =>
{
    Console.WriteLine("Hello World!");
});
  • Task.Run
Task.Run(() =>
{
    Console.WriteLine("Hello World!");
});
  • Task.FromResult and others directly create a completed task
Task.FromResult("Hello World!");
var task = Task.CompletedTask;
  • An async method that does not know its internal implementation
async Task<Bar> FooAsync();

Common usage of Task

  • Register a callback and wait for the Task to get the results and execute the callback when it is completed
var task = Task.Run<string>(() => "Hello World!");
task.ContinueWith(t => Console.WriteLine(t.Result));
  • await a Task and get the results
var task = Task.Run<string>(() => "Hello World!");
var result = await task;
Console.WriteLine(result);
  • Direct GetResult
var task = Task.Run<string>(() => "Hello World!");
// Equivalent to task Result
var result = task.GetAwaiter().GetResult();
Console.WriteLine(result);

Classification of tasks

Score according to whether the Result is included, that is, whether it is a generic Task

  • Task
  • Task<T>

Task s can be divided into

  • I know how the Task comes from. In this case, we are involved in the creation process of the Task and know what the Task is doing. For example:
Task task = Task.Run<int>(() => 1 + 2);

Calculate 1 + 2 and take the result as the result of the Task.

  • I don't know how this Task came from. For example:
Task task = new HttpClient().GetStringAsync("http://localhost:5000/api/values");

The two acquisition methods correspond to two completely different emphases:

  1. A Task is a white box that focuses on what is done in the Task and where to execute the code.
  2. A Task is a black box. Focus on what a Task can give me and what I should do after the Task is completed and executed.

Break down the Task

Task s can be divided into three parts by function point

  • Task execution: through task Run and other methods to execute a section of our customized logic.
  • Callback notification and callback execution: register a callback and wait for the Task to complete.
  • Await syntax support: without await, the above two functions of task can still be fully implemented. But it will lose the simplicity of the code.

Where does the Task execute?

Thread pool

Task s can be scheduled and executed by ThreadPool as the basic unit of ThreadPool queue system.

The following common ways to create tasks are scheduled and executed in ThreadPool by default. These are essentially the same, except for the difference between the use method and the user-defined options that can support incoming.

  • new Task
new Task(_ =>
{
    Console.WriteLine("Hello World!");
}, null).Start();
  • TaskFactory.StartNew
new TaskFactory().StartNew(() =>
{
    Console.WriteLine("Hello World!");
});
  • Task.Run
// It can be seen as a simplified version of taskfactory StartNew
Task.Run(() =>
{
    Console.WriteLine("Hello World!");
});

Task Take run as an example to see what's done inside.
In portablethreadpool If you mark a breakpoint between trycreateworkerthread and the actual lambda expression to be executed, we can clearly see the whole execution process.

This is the main way to sort it out. In order to simplify understanding, the call details in ThreadPool have been omitted.

Task key code excerpt:

class Task
{
    // The subject of the task, the actual logic we want to execute
    // There may be a return value, but there may be no return value
    internal Delegate m_action;

    // Status of the task
    internal volatile int m_stateFlags;

    // ThreadPool call entry. Due to JIT inline optimization, only ExecuteEntryUnsafe can be seen in the call stack, but this method cannot be seen
    internal virtual void ExecuteFromThreadPool(Thread threadPoolThread) => ExecuteEntryUnsafe(threadPoolThread);

    internal void ExecuteEntryUnsafe(Thread? threadPoolThread)
    {
        // Set the Task status to executed
        m_stateFlags |= (int)TaskStateFlags.DelegateInvoked;

        if (!IsCancellationRequested & !IsCanceled)
        {
            ExecuteWithThreadLocal(ref t_currentTask, threadPoolThread);
        }
        else
        {
            ExecuteEntryCancellationRequestedOrCanceled();
        }
    }

    // The data that can be passed in when creating a Task is used for execution
    // new Task(state => Console.WriteLine(state), "Hello World").Start();
    internal object? m_stateObject;

    private void ExecuteWithThreadLocal(ref Task currentTaskSlot, Thread threadPoolThread = null)
    {
        // Execution context maintains some data of code execution logical context, such as AsyncLocal
        // See my AsyncLocal blog for details https://www.cnblogs.com/eventhorizon/p/12240767.html
        ExecutionContext? ec = CapturedContext;
        if (ec == null)
        {
            // No execution context, execute directly
            InnerInvoke();
        }
        else
        {
            // Is it executed on the ThreadPool thread
            if (threadPoolThread is null)
            {
                ExecutionContext.RunInternal(ec, s_ecCallback, this);
            }
            else
            {
                ExecutionContext.RunFromThreadPoolDispatchLoop(threadPoolThread, ec, s_ecCallback, this);
            }
        }
    }

    // Regardless of the ExecuteWithThreadLocal branch, it will eventually go to InnerInvoke
    internal virtual void InnerInvoke()
    {
        if (m_action is Action action)
        {
            action();
            return;
        }

        if (m_action is Action<object?> actionWithState)
        {
            actionWithState(m_stateObject);
        }
    }
}

You can see that the Task enters the ThreadPool through the ThreadPoolTaskScheduler. ThreadPool calls Task The executefromthreadpool method finally triggers the execution of the action encapsulated by the Task.

Like IThreadPoolWorkItem, another basic unit in ThreadPool, there are two possibilities for a Task to enter ThreadPoolWorkQueue: global queue or local queue.

To understand this problem, we need to take a look at threadpooltaskscheduler What is done in the queuetask.

internal sealed class ThreadPoolTaskScheduler : TaskScheduler
{
    protected internal override void QueueTask(Task task)
    {
        TaskCreationOptions options = task.Options;
        if (Thread.IsThreadStartSupported && (options & TaskCreationOptions.LongRunning) != 0)
        {
            // Create an independent thread, independent of the thread pool
            new Thread(s_longRunningThreadWork)
            {
                IsBackground = true,
                Name = ".NET Long Running Task"
            }.UnsafeStart(task);
        }
        else
        {
            // The second parameter is preferLocal
            // options & TaskCreationOptions. The enumerative usage of the preferfairness bit flag can be viewed in the official data
            // https://docs.microsoft.com/zh-cn/dotnet/csharp/language-reference/builtin-types/enum#enumeration-types-as-bit-flags
            ThreadPool.UnsafeQueueUserWorkItemInternal(task, (options & TaskCreationOptions.PreferFairness) == 0);
        }
    }
}

The taskcreation options in the above code is an option we can specify when creating a Task. The default is None.

Task.Run does not support passing in this option. You can use taskfactory Specify the overload of startnew:

new TaskFactory().StartNew(() =>
{
    Console.WriteLine("Hello World!");
}, TaskCreationOptions.PreferFairness);

Depending on the taskcreation options, there are three branches

  • Long running: independent thread, independent of thread pool
  • When including PreferFairness: preferLocal=false, enter the global queue
  • If no PreferFairness is included: preferlocal = true, enter the local queue

Tasks entering the global queue can be fairly collected and executed by threads in each thread pool, which is the literal meaning of the phrase "prefer fairness".

In the following figure, task 666 first enters the global queue and then is taken away by Thread1. Thread3 steals task 2 in Thread2 through the worksealing mechanism.

In a separate background thread

That is, the Task creation options used when creating a Task mentioned above Long running, if you need a to perform a long-time Task, such as a long-time synchronization code, you can use this. It is not recommended to execute asynchronous code (await xxx). The reasons will be explained later.

new TaskFactory().StartNew(() =>
{
    // Time consuming synchronization code
}, TaskCreationOptions.LongRunning);

The threads managed by ThreadPool are designed for reusable purposes and keep getting tasks from the queue system for execution. If a work thread is blocked on a time-consuming task, it cannot handle other tasks, and the throughput of ThreadPool will be affected.

Of course, this does not mean that ThreadPool cannot handle such tasks. For an extreme example, if the current workthreads of the thread pool are all processing the LongRunning Task. ThreadPool cannot perform new tasks until a new WorkThread is created by the Starvation Avoidance mechanism (every 500ms).

The Task life cycle of LongRunning is inconsistent with the design purpose of ThreadPool, so it needs to be separated.

Custom TaskScheduler

In addition to ThreadPoolTaskScheduler, we can also define our own TaskScheduler.

First, we need to inherit the abstract class TaskScheduler. There are three abstract methods that we need to implement.

public abstract class TaskScheduler
{
    // The Task to be scheduled for execution will be passed in through this method
    protected internal abstract void QueueTask(Task task);

    // This method will be executed only when the Task callback is executed, which will be described later
    protected abstract bool TryExecuteTaskInline(Task task, bool taskWasPreviouslyQueued);

    // Get all tasks scheduled to the TaskScheduler
    protected abstract IEnumerable<Task>? GetScheduledTasks();
}

In our customized TaskScheduler, we will get the Task when the QueueTask is executed, but how does the Task trigger the action in it.

The calling scenario of Task for ThreadPool exposes an internal method of ExecuteFromThreadPool, and also provides an ExecuteEntry method for other scenarios to call, but this method is also internal. It can only be called indirectly through the protect method of TaskScheduler.

public abstract class TaskScheduler
{
    protected bool TryExecuteTask(Task task)
    {
        if (task.ExecutingTaskScheduler != this)
        {
            throw new InvalidOperationException(SR.TaskScheduler_ExecuteTask_WrongTaskScheduler);
        }

        return task.ExecuteEntry();
    }
}


Here is a custom TaskScheduler,Sequential execution on a fixed thread Task. 
```C#
class CustomTaskScheduler : TaskScheduler
{
    private readonly BlockingCollection<Task> _queue = new();

    public CustomTaskScheduler()
    {
        new Thread(() =>
        {
            while (true)
            {
                var task = _queue.Take();
                Console.WriteLine($"task {task.Id} is going to be executed");
                TryExecuteTask(task);
                Console.WriteLine($"task {task.Id} has been executed");
            }
        })
        {
            IsBackground = true
        }.Start();
    }

    protected override IEnumerable<Task> GetScheduledTasks()
    {
        return _queue.ToArray();
    }

    protected override void QueueTask(Task task)
    {
        _queue.Add(task);
    }

    protected override bool TryExecuteTaskInline(Task task, bool taskWasPreviouslyQueued)
    {
        return false;
    }
}

In the constructor of TaskFactory, we can pass in our customized TaskScheduler

var taskFactory = new TaskFactory(new CustomTaskScheduler());
taskFactory.StartNew(() =>
    Console.WriteLine($"task {Task.CurrentId}" +
                      $" threadId: {Thread.CurrentThread.ManagedThreadId}"));
taskFactory.StartNew(() =>
    Console.WriteLine($"task {Task.CurrentId}" +
                      $" threadId: {Thread.CurrentThread.ManagedThreadId}"));
Console.ReadLine();

The output results are as follows:

var taskFactory = new TaskFactory(new CustomTaskScheduler());
taskFactory.StartNew(() =>
    Console.WriteLine($"task {Task.CurrentId}" +
                      $" threadId: {Thread.CurrentThread.ManagedThreadId}"));
taskFactory.StartNew(() =>
    Console.WriteLine($"task {Task.CurrentId}" +
                      $" threadId: {Thread.CurrentThread.ManagedThreadId}"));
Console.ReadLine();
task 1 is going to be executed
task 1 threadId: 10
task 1 has been executed
task 2 is going to be executed
task 2 threadId: 10
task 2 has been executed

All tasks are scheduled to execute in one thread.

Task s can encapsulate any type of other tasks

In the above two cases, there are clear execution entities for tasks, but sometimes they may not. Look at the following example.

var task = FooAsync();
var action = typeof(Task).GetField("m_action", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(task);
Console.WriteLine($"Task action is null: {action == null}");
task.ContinueWith(t => Console.WriteLine(t.Result));
// Callbacks can register multiple callbacks
task.ContinueWith(t => Console.WriteLine(t.Result));


Task<string> FooAsync()
{
    var tsc = new TaskCompletionSource<string>();
    new Thread(() =>
    {
        Thread.Sleep(1000);
        tsc.SetResult("Hello World");
    })
    {
        IsBackground = true
    }.Start();
    return tsc.Task;
}

Output:

Task action is null: True
Hello World
Hello World

Look at this problem from both the external and internal perspectives of FooAsync

  • Outside FooAsync: I got a Task and registered a callback
  • Within FooAsync: it is equivalent to indirectly holding this callback through TSC Setresult indirectly calls this callback.

The following is an excerpt of the key code

class Task<T>
{
    // Save a callback or set of callbacks
    private volatile object? m_continuationObject;

    internal bool TrySetResult(TResult result)
    {
        // ...
        this.m_result = result;
        FinishContinuations();
        // ...
    }

    internal void FinishContinuations()
    {
        // Handles the execution of callbacks
    }
}

public class TaskCompletionSource<TResult>
{
    public TaskCompletionSource() => _task = new Task<TResult>();

    public Task<TResult> Task => _task;

    public void SetResult(TResult result)
    {
        TrySetResult(result);
    }

    public bool TrySetResult(TResult result)
    {
        _task.TrySetResult(result);
        // ...
    }
}

Sometimes task The trigger source of trysetresult() may be caused by an asynchronous IO completion event, which is often referred to as asynchronous io. The hardware has its own processing chip. Before the asynchronous IO completion notifies the CPU (hardware interrupt), the CPU does not need to participate, which is also the value of asynchronous io.

Summary

A Task is a Task that has been completed or will be completed at a certain point in the future. You can register a callback with it and wait for it to be executed when the Task is completed.

Topics: C# .NET Task