catalogue
Tasks that support automatic ML
What is ML.NET?
ML.NET is Microsoft's open source for NET application, which allows you to use C #, F # or any other NET language performs machine learning tasks. In addition, ML.NET supports models built in other machine learning frameworks, such as TensorFlow, ONNX, PyTorch, etc. it also has high performance and can be used for various machine learning tasks.
For those who do not have deep data science skills and knowledge of various machine learning algorithms, ML.NET also provides AutoML. Auto ML is a subset of ML.NET. It abstracts the process of selecting machine learning algorithms, adjusting super parameters for these algorithms, and comparing algorithms to determine the best performance. This helps people who are new to data science to find a model that performs well without requiring greater data science skills.
The combination of all these factors makes ML.NET a very effective way to handle machine learning tasks using the applications you already have and the skills you already know.
Install ML.NET
For support Any project of. NET Standard can install ML.NET through NuGet Package Manager in Visual Studio (almost all. NET projects can do this). If you want to add ML.NET to your project, go to NuGet Package Manager and install the latest version of. I also recommend that you install Microsoft Ml and Microsoft Ml.AutoML, because AutoML is a good way to start using ML.NET. For more details on using NuGet Package Manager, refer to Microsoft's NuGet Package Manager documentation
Tasks that support automatic ML
First, I will focus on the five machine learning tasks of ML.NET supported by AutoML. Because they support AutoML, these tasks are easier to get started, so I'll provide some code for each type of task. I suggest it's best to check Microsoft's documentation on ML.NET for more details, or see their ML.NET examples on GitHub.
Binary classification
The binary classification task involves predicting a classification label that should be assigned to some content of a given set of related features. For example, given some characteristics of loan applicants, the binary classification model will predict whether the loan should be approved or rejected.
Only two possible binary categories with a single task can be predicted. If there are more than two possible values, this is a multi category classification task, which we will discuss below.
The code for running binary classification tests using AutoML may be as follows:
public ITransformer PerformBinaryClassification(IDataView trainingData, IDataView validationData) { // Set up the experiment MLContext context = new MLContext(); uint maxSeconds = 10; BinaryClassificationExperiment experiment = context.Auto().CreateBinaryClassificationExperiment(maxSeconds); // Run the experiment and wait synchronously for it to complete ExperimentResult<BinaryClassificationMetrics> result = experiment.Execute(trainingData, validationData, labelColumnName: "ShouldApproveLoan"); // result.BestRun.ValidationMetrics has properties helpful for evaluating model performance double accuracy = result.BestRun.ValidationMetrics.Accuracy; double f1Score = result.BestRun.ValidationMetrics.F1Score; string confusionTable = result.BestRun.ValidationMetrics.ConfusionMatrix.GetFormattedConfusionTable(); // Return the best performing trained model ITransformer bestModel = result.BestRun.Model; return bestModel; }
You can then use the trained model to make predictions using the following code:
public LoanPrediction PredictBinaryClassification(ITransformer bestModel, IDataView trainingData, LoanData loan) { MLContext context = new MLContext(); // Create an engine capable of evaluating one or more loans in the future PredictionEngine<LoanData, LoanPrediction> engine = context.Model.CreatePredictionEngine<LoanData, LoanPrediction>(bestModel, trainingData.Schema); // Actually make the prediction and return the findings LoanPrediction prediction = engine.Predict(loan); return prediction; }
Here, LoanData and LoanPrediction respectively represent the rows in the dataset and the final predicted classes of the algorithm.
Multi category classification
The multiclass classification task is very similar to the binary classification task because you try to predict the classification value of a single tag column given a set of characteristics. The main difference between binary classification problem and multi class classification problem is that for binary classification problem, there are only two possible values, while in multi class classification problem, three or more possible categories may belong to something.
The code used to train multiclass classification experiments using AutoML may be as follows:
public ITransformer PerformMultiClassification(IDataView trainingData, IDataView validationData) { // Set up the experiment MLContext context = new MLContext(); uint maxSeconds = 10; MulticlassClassificationExperiment experiment = context.Auto().CreateMulticlassClassificationExperiment(maxSeconds); // Run the experiment and wait synchronously for it to complete ExperimentResult<MulticlassClassificationMetrics> result = experiment.Execute(trainingData, validationData, labelColumnName: "RiskCategory"); // result.BestRun.ValidationMetrics has properties helpful for evaluating model performance string confusionTable = result.BestRun.ValidationMetrics.ConfusionMatrix.GetFormattedConfusionTable(); // Return the best performing trained model ITransformer bestModel = result.BestRun.Model; return bestModel; }
In addition, the code using the trained multi classification model is very similar to that using the binary classification model. Like binary classification model, multi category classification model can be used without AutoML.
regression
The regression task involves predicting values given a set of characteristics. For example, you can use a regression model to predict gasoline prices given a set of known other factors, or use regression to predict the length of time you may need to defrost your car in the morning given night weather factors. Any time you need to calculate a value, you may be dealing with a regression problem.
The code used to perform model training on regression experiments is similar to that of classification experiments:
public ITransformer PerformRegression(IDataView trainingData, IDataView validationData) { // Set up the experiment MLContext context = new MLContext(); uint maxSeconds = 10; RegressionExperiment experiment = context.Auto().CreateRegressionExperiment(maxSeconds); // Run the experiment and wait synchronously for it to complete ExperimentResult<RegressionMetrics> result = experiment.Execute(trainingData, validationData, labelColumnName: "Temperature"); // result.BestRun.ValidationMetrics has properties helpful for evaluating model performance double error = result.BestRun.ValidationMetrics.MeanAbsoluteError; // Return the best performing trained model ITransformer bestModel = result.BestRun.Model; return bestModel; }
Please note that the validation index of regression experiment is completely different from that of classification experiment. The classification experiment deals with the probability of a given correct category, while the regression experiment deals with the distance between the predicted value and the actual value of known historical data.
Like these two classification model types, AutoML may not be used when training regression models, but it may be helpful if the understanding of each algorithm is limited.
recommend
The recommendation algorithm is a variant of the regression algorithm. Using the recommendation algorithm, you can enter data about different types of users and the different ratings they have given to products in the past. Given such a data set, the recommendation model can predict users' ratings of things they have never interacted with before based on the similarity of users' tastes with other known users. Recommendation models are popular in movie, music and product recommendation systems, where repeat users are common, and everyone can benefit from users finding their favorite content.
AutoML supports recommendation, and the recommendation code is very similar to the regression Code:
public ITransformer PerformRecommendation(IDataView trainingData, IDataView validationData) { // Set up the experiment MLContext context = new MLContext(); uint maxSeconds = 10; RecommendationExperiment experiment = context.Auto().CreateRecommendationExperiment(maxSeconds); // Run the experiment and wait synchronously for it to complete ExperimentResult<RegressionMetrics> result = experiment.Execute(trainingData, validationData, labelColumnName: "Rating"); // result.BestRun.ValidationMetrics has properties helpful for evaluating model performance double error = result.BestRun.ValidationMetrics.MeanAbsoluteError; // Return the best performing trained model ITransformer bestModel = result.BestRun.Model; return bestModel; }
The recommended algorithm uses matrix decomposition, which is a more complex topic. For more details on recommended systems that do not use AutoML, see Microsoft's matrix decomposition tutorial. There is also a wonderful article from Rubik's Code to further explore this topic.
ranking
Ranking is similar to the recommendation algorithm, but it is used to put items into a forced order ranking suitable for displaying search results. The ranking system is suitable for displaying an ordered list of suggestions for specific users or user groups.
The code is similar to the code we saw before, although the verification indicators are very different:
public ITransformer PerformRanking(IDataView trainingData, IDataView validationData) { // Set up the experiment MLContext context = new MLContext(); uint maxSeconds = 10; RankingExperiment experiment = context.Auto().CreateRankingExperiment(maxSeconds); // Run the experiment and wait synchronously for it to complete ExperimentResult<RankingMetrics> result = experiment.Execute(trainingData, validationData, labelColumnName: "Temperature"); // result.BestRun.ValidationMetrics has properties helpful for evaluating model performance IEnumerable<double> gains = result.BestRun.ValidationMetrics.DiscountedCumulativeGains; IEnumerable<double> normalizedGains = result.BestRun.ValidationMetrics.NormalizedDiscountedCumulativeGains; // Return the best performing trained model ITransformer bestModel = result.BestRun.Model; RankingEvaluatorOptions options = new RankingEvaluatorOptions(); RankingMetrics metrics = context.Ranking.Evaluate(trainingData, labelColumnName: "Label", rowGroupColumnName: "Group", scoreColumnName: "Score"); return bestModel; }
Other solution types
Next, let's briefly introduce five machine learning tasks that AutoML currently does not support.
Forecast time series data
Prediction involves predicting a number of future regression values based on historical data. When you make a prediction, you are predicting future values from a window, where each predicted value has a certain level of confidence.
This is similar to the way weather forecasts work. Weather forecast is the most accurate in predicting the recent value, with a large number of relevant historical data. They can be used to predict the value at a certain time in the future, but the accuracy of these predictions will decrease significantly with the extension of the time range.
clustering
Clustering is used to combine various data points according to their similarity with nearby data points. This can be used to determine which customers are similar to each other in terms of marketing, suggestion grouping, or other purposes. When processing geographic data, it is also a good way to determine the best location of the office or mobile tower.
Cluster analysis usually optimizes the center position of each cluster by selecting any number of clusters and allowing machine learning to follow the K-Means clustering algorithm, so as to minimize the total distance from each data point to its cluster center. Clustering algorithms also tend to try to separate clusters from each other when possible.
anomaly detection
Exception detection can be used to mark a single transaction as an exception for other investigations. Anomaly detection is usually used for virus detection, credit card fraud detection and identifying abnormal network activities. You can think of exception detection as an automatic form of binary classification, in which some content is either normal or abnormal.
image classification
Image classification is similar to binary or multi class classification, but instead of processing digital features, it processes images to determine the features in a given image. As with the classification problem, you must provide ML.NET with labeled images of different sizes, lighting and arrangement, which have what you try to detect, so as to reliably classify the images.
Object detection
Object detection is similar to image classification, but it does not tell you that the image belongs to a specific class, but provides you with an actual bounding box in the image to tell you the location of the specific object. In addition, object detection can locate multiple objects in a single image, which exceeds the limitation of image classification.
Object detection is a part of Azure cognitive service. At present, it can only be used in ML.NET through model generator.
conclusion
In short, the Auto ML feature of ML.NET is an amazing and completely free way to help everyday programmers take advantage of the features you usually need data scientists to get. ML .NET allows you and your team to integrate machine learning functions into your application in a language you are already familiar with, without having to deeply understand various machine learning algorithms.