Using C# to explore different machine learning tasks in ML.NET

Posted by Jenling on Wed, 16 Feb 2022 12:11:26 +0100

catalogue

What is ML.NET?

Install ML.NET

Tasks that support automatic ML

Binary classification

Multi category classification

regression

recommend

ranking

What is ML.NET?

ML.NET is Microsoft's open source for NET application, which allows you to use C #, F # or any other NET language performs machine learning tasks. In addition, ML.NET supports models built in other machine learning frameworks, such as TensorFlow, ONNX, PyTorch, etc. it also has high performance and can be used for various machine learning tasks.

For those who do not have deep data science skills and knowledge of various machine learning algorithms, ML.NET also provides AutoML. Auto ML is a subset of ML.NET. It abstracts the process of selecting machine learning algorithms, adjusting super parameters for these algorithms, and comparing algorithms to determine the best performance. This helps people who are new to data science to find a model that performs well without requiring greater data science skills.

The combination of all these factors makes ML.NET a very effective way to handle machine learning tasks using the applications you already have and the skills you already know.

Install ML.NET

For support Any project of. NET Standard can install ML.NET through NuGet Package Manager in Visual Studio (almost all. NET projects can do this). If you want to add ML.NET to your project, go to NuGet Package Manager and install the latest version of. I also recommend that you install Microsoft Ml and Microsoft Ml.AutoML, because AutoML is a good way to start using ML.NET. For more details on using NuGet Package Manager, refer to Microsoft's NuGet Package Manager documentation

Tasks that support automatic ML

First, I will focus on the five machine learning tasks of ML.NET supported by AutoML. Because they support AutoML, these tasks are easier to get started, so I'll provide some code for each type of task. I suggest it's best to check Microsoft's documentation on ML.NET for more details, or see their ML.NET examples on GitHub.

Binary classification

The binary classification task involves predicting a classification label that should be assigned to some content of a given set of related features. For example, given some characteristics of loan applicants, the binary classification model will predict whether the loan should be approved or rejected.

Only two possible binary categories with a single task can be predicted. If there are more than two possible values, this is a multi category classification task, which we will discuss below.

The code for running binary classification tests using AutoML may be as follows:

public ITransformer PerformBinaryClassification(IDataView trainingData, IDataView validationData)
{
     // Set up the experiment
     MLContext context = new MLContext();
     uint maxSeconds = 10;
     BinaryClassificationExperiment experiment = context.Auto().CreateBinaryClassificationExperiment(maxSeconds);

    // Run the experiment and wait synchronously for it to complete
     ExperimentResult<BinaryClassificationMetrics> result =
         experiment.Execute(trainingData, validationData, labelColumnName: "ShouldApproveLoan");

    // result.BestRun.ValidationMetrics has properties helpful for evaluating model performance
     double accuracy = result.BestRun.ValidationMetrics.Accuracy;
     double f1Score = result.BestRun.ValidationMetrics.F1Score;
     string confusionTable = result.BestRun.ValidationMetrics.ConfusionMatrix.GetFormattedConfusionTable();

    // Return the best performing trained model
     ITransformer bestModel = result.BestRun.Model;
     return bestModel;
}

You can then use the trained model to make predictions using the following code:

public LoanPrediction PredictBinaryClassification(ITransformer bestModel, IDataView trainingData, LoanData loan)
{
     MLContext context = new MLContext();

    // Create an engine capable of evaluating one or more loans in the future
     PredictionEngine<LoanData, LoanPrediction> engine =
         context.Model.CreatePredictionEngine<LoanData, LoanPrediction>(bestModel, trainingData.Schema);

    // Actually make the prediction and return the findings
     LoanPrediction prediction = engine.Predict(loan);
     return prediction;
}

Here, LoanData and LoanPrediction respectively represent the rows in the dataset and the final predicted classes of the algorithm.

Multi category classification

The multiclass classification task is very similar to the binary classification task because you try to predict the classification value of a single tag column given a set of characteristics. The main difference between binary classification problem and multi class classification problem is that for binary classification problem, there are only two possible values, while in multi class classification problem, three or more possible categories may belong to something.

The code used to train multiclass classification experiments using AutoML may be as follows:

public ITransformer PerformMultiClassification(IDataView trainingData, IDataView validationData)
{
     // Set up the experiment
     MLContext context = new MLContext();
     uint maxSeconds = 10;
     MulticlassClassificationExperiment experiment = context.Auto().CreateMulticlassClassificationExperiment(maxSeconds);

    // Run the experiment and wait synchronously for it to complete
     ExperimentResult<MulticlassClassificationMetrics> result =
         experiment.Execute(trainingData, validationData, labelColumnName: "RiskCategory");

    // result.BestRun.ValidationMetrics has properties helpful for evaluating model performance
     string confusionTable = result.BestRun.ValidationMetrics.ConfusionMatrix.GetFormattedConfusionTable();

    // Return the best performing trained model
     ITransformer bestModel = result.BestRun.Model;
     return bestModel;
}

In addition, the code using the trained multi classification model is very similar to that using the binary classification model. Like binary classification model, multi category classification model can be used without AutoML.

regression

The regression task involves predicting values given a set of characteristics. For example, you can use a regression model to predict gasoline prices given a set of known other factors, or use regression to predict the length of time you may need to defrost your car in the morning given night weather factors. Any time you need to calculate a value, you may be dealing with a regression problem.

The code used to perform model training on regression experiments is similar to that of classification experiments:

public ITransformer PerformRegression(IDataView trainingData, IDataView validationData)
{
     // Set up the experiment
     MLContext context = new MLContext();
     uint maxSeconds = 10;
     RegressionExperiment experiment = context.Auto().CreateRegressionExperiment(maxSeconds);

    // Run the experiment and wait synchronously for it to complete
     ExperimentResult<RegressionMetrics> result =
         experiment.Execute(trainingData, validationData, labelColumnName: "Temperature");

    // result.BestRun.ValidationMetrics has properties helpful for evaluating model performance
     double error = result.BestRun.ValidationMetrics.MeanAbsoluteError;

    // Return the best performing trained model
     ITransformer bestModel = result.BestRun.Model;
     return bestModel;
}

Please note that the validation index of regression experiment is completely different from that of classification experiment. The classification experiment deals with the probability of a given correct category, while the regression experiment deals with the distance between the predicted value and the actual value of known historical data.

Like these two classification model types, AutoML may not be used when training regression models, but it may be helpful if the understanding of each algorithm is limited.

recommend

The recommendation algorithm is a variant of the regression algorithm. Using the recommendation algorithm, you can enter data about different types of users and the different ratings they have given to products in the past. Given such a data set, the recommendation model can predict users' ratings of things they have never interacted with before based on the similarity of users' tastes with other known users. Recommendation models are popular in movie, music and product recommendation systems, where repeat users are common, and everyone can benefit from users finding their favorite content.

AutoML supports recommendation, and the recommendation code is very similar to the regression Code:

public ITransformer PerformRecommendation(IDataView trainingData, IDataView validationData)
{
     // Set up the experiment
     MLContext context = new MLContext();
     uint maxSeconds = 10;
     RecommendationExperiment experiment = context.Auto().CreateRecommendationExperiment(maxSeconds);

    // Run the experiment and wait synchronously for it to complete
     ExperimentResult<RegressionMetrics> result =
         experiment.Execute(trainingData, validationData, labelColumnName: "Rating");

    // result.BestRun.ValidationMetrics has properties helpful for evaluating model performance
     double error = result.BestRun.ValidationMetrics.MeanAbsoluteError;

    // Return the best performing trained model
     ITransformer bestModel = result.BestRun.Model;
     return bestModel;
}

The recommended algorithm uses matrix decomposition, which is a more complex topic. For more details on recommended systems that do not use AutoML, see Microsoft's matrix decomposition tutorial. There is also a wonderful article from Rubik's Code to further explore this topic.

ranking

Ranking is similar to the recommendation algorithm, but it is used to put items into a forced order ranking suitable for displaying search results. The ranking system is suitable for displaying an ordered list of suggestions for specific users or user groups.

The code is similar to the code we saw before, although the verification indicators are very different:

public ITransformer PerformRanking(IDataView trainingData, IDataView validationData)
{
     // Set up the experiment
     MLContext context = new MLContext();
     uint maxSeconds = 10;
     RankingExperiment experiment = context.Auto().CreateRankingExperiment(maxSeconds);

    // Run the experiment and wait synchronously for it to complete
     ExperimentResult<RankingMetrics> result =
         experiment.Execute(trainingData, validationData, labelColumnName: "Temperature");

    // result.BestRun.ValidationMetrics has properties helpful for evaluating model performance
     IEnumerable<double> gains = result.BestRun.ValidationMetrics.DiscountedCumulativeGains;
     IEnumerable<double> normalizedGains = result.BestRun.ValidationMetrics.NormalizedDiscountedCumulativeGains;

    // Return the best performing trained model
     ITransformer bestModel = result.BestRun.Model;

    RankingEvaluatorOptions options = new RankingEvaluatorOptions();
     RankingMetrics metrics = context.Ranking.Evaluate(trainingData, labelColumnName: "Label", rowGroupColumnName: "Group", scoreColumnName: "Score");
     return bestModel;
}

conclusion

In short, the Auto ML feature of ML.NET is an amazing and completely free way to help everyday programmers take advantage of the features you usually need data scientists to get. ML .NET allows you and your team to integrate machine learning functions into your application in a language you are already familiar with, without having to deeply understand various machine learning algorithms.

Topics: ASP.NET AI

Programmer Think

Using C# to explore different machine learning tasks in ML.NET

What is ML.NET?

Install ML.NET

Tasks that support automatic ML

Binary classification

Multi category classification

regression

recommend

ranking

Other solution types

Forecast time series data

clustering

anomaly detection

image classification

Object detection

conclusion

Hot Topics