Detailed description of common input and output of Python transformer tokenizer
Tokenizer introduction and workflow
Transformers and the pre training model + fine tuning mode based on BERT family have become the standard configuration in the NLP field. Tokenizer, as the main method of text data preprocessing, has become an essential tool. This article takes the AutoTokenizer used in transformers as an example to illustrat ...
Posted by Wetzut on Tue, 04 Jan 2022 16:46:24 +0100
Chinese character and number recognition of ID card based on Python and CNN
Background and objectives
Optical character recognition (OCR) is a process of converting handwritten or printed text in an image into machine coded text to obtain text and layout information in the image. Its purpose is to recognize the words in the picture for further word processing.
The earliest OCR technology can be traced back to 1914. E ...
Posted by alconebay on Tue, 04 Jan 2022 15:23:07 +0100
Visual common plot scatter
Visual common drawing (V) scatter diagram
I Introduction to scatter diagram
Scatter chart is also called X-Y chart. It displays all data in the form of points on the rectangular coordinate system to show the degree of interaction between variables. The position of points is determined by the value of variables.
By observing the distribution ...
Posted by ssruprai on Tue, 04 Jan 2022 12:13:15 +0100
[source code analysis] machine learning parameter server PS Lite ----- application node implementation
[source code analysis] machine learning parameter server PS Lite (4) -- application node implementation
0x00 summary
This is the fourth article on parameter server, which introduces kvworker and kvserver.
Other articles in this series are:
[ Source code analysis] machine learning parameter server PS Lite (1) -- postoffice
[ Source code ...
Posted by M4F on Tue, 04 Jan 2022 09:24:14 +0100
Principal Component Analysis of Machine Learning PCA Principle and Pthon Implementation
Reference link: Mathematical Principles of PCA
Reference link: Understanding PCA and SVD with numpy
Preface
It is well known that the complexity of many machine learning algorithms is closely related to the dimensionality of the data, even exponentially. It is not uncommon to process tens of thousands or even hundreds of thousands of dime ...
Posted by delldeveloper on Tue, 04 Jan 2022 04:31:07 +0100
Technical / advertising article classifier
preface
This article is based on the previous blog Technical / advertising article classifier (I) , some optimizations were made to improve the accuracy from 84.5% to 94.4%
1, Optimization means
1. Add training data
In the previous training data set, there were only about 500 pieces of two types of data respectively, and there were ...
Posted by Mateobus on Tue, 04 Jan 2022 04:00:25 +0100
Python machine learning -- clustering algorithm -- K-means(K-means) algorithm
Types and introduction of K-means algorithm
Unsupervised learning clustering algorithm;
Clustering algorithm is an unsupervised algorithm, and K-means is a clustering algorithm;
Definition of K-means algorithm
The so-called clustering problem is to give an element set D, in which each element has n observable attributes, and use some algori ...
Posted by skypilot on Tue, 04 Jan 2022 02:32:51 +0100
Eye State Recognition in Deep Learning & Drawing of Confusion Matrix
This experiment is based on the CNN network built by ourselves to classify eye state. Originally, it was intended to migrate learning to classify using VGG16 network. However, the effect of the experiment is very poor and the speed is very slow. It should be the blogger's own problem. However, the model of CNN network built by ourselves is very ...
Posted by gabrielkolbe on Mon, 03 Jan 2022 22:35:26 +0100
Learning notes - Notes on data processing by pandas and df (Introduction)
pandas, as a necessary toolkit for data processing, records the learning process
The default processing object of pandas is DataFrame, which is loaded after installation
catalogue
1, Interpretation of the meaning of code examples
(1)pd.read_csv and pandas read various pits and common parameters of csv files
(2) The entire dataset cannot be ...
Posted by roy on Mon, 03 Jan 2022 19:08:12 +0100
sklearn Machine Learning
Task07 This study refers to the Datawhale open source learning: https://github.com/datawhalechina/machine-learning-toy-code/tree/main/ml-with-sklearn The content is arranged as follows, mainly some code implementation and some principles.
7. Integrated Learning
In the previous chapter, we talked about the decline in the effect of the dimensi ...
Posted by spiceydog on Mon, 03 Jan 2022 18:05:19 +0100