Detailed description of common input and output of Python transformer tokenizer

Tokenizer introduction and workflow Transformers and the pre training model + fine tuning mode based on BERT family have become the standard configuration in the NLP field. Tokenizer, as the main method of text data preprocessing, has become an essential tool. This article takes the AutoTokenizer used in transformers as an example to illustrat ...

Posted by Wetzut on Tue, 04 Jan 2022 16:46:24 +0100

Chinese character and number recognition of ID card based on Python and CNN

Background and objectives Optical character recognition (OCR) is a process of converting handwritten or printed text in an image into machine coded text to obtain text and layout information in the image. Its purpose is to recognize the words in the picture for further word processing. The earliest OCR technology can be traced back to 1914. E ...

Posted by alconebay on Tue, 04 Jan 2022 15:23:07 +0100

Visual common plot scatter

Visual common drawing (V) scatter diagram I Introduction to scatter diagram Scatter chart is also called X-Y chart. It displays all data in the form of points on the rectangular coordinate system to show the degree of interaction between variables. The position of points is determined by the value of variables. By observing the distribution ...

Posted by ssruprai on Tue, 04 Jan 2022 12:13:15 +0100

[source code analysis] machine learning parameter server PS Lite ----- application node implementation

[source code analysis] machine learning parameter server PS Lite (4) -- application node implementation 0x00 summary This is the fourth article on parameter server, which introduces kvworker and kvserver. Other articles in this series are: [ Source code analysis] machine learning parameter server PS Lite (1) -- postoffice [ Source code ...

Posted by M4F on Tue, 04 Jan 2022 09:24:14 +0100

Principal Component Analysis of Machine Learning PCA Principle and Pthon Implementation

Reference link: Mathematical Principles of PCA Reference link: Understanding PCA and SVD with numpy Preface It is well known that the complexity of many machine learning algorithms is closely related to the dimensionality of the data, even exponentially. It is not uncommon to process tens of thousands or even hundreds of thousands of dime ...

Posted by delldeveloper on Tue, 04 Jan 2022 04:31:07 +0100

Technical / advertising article classifier

preface This article is based on the previous blog Technical / advertising article classifier (I) , some optimizations were made to improve the accuracy from 84.5% to 94.4% 1, Optimization means 1. Add training data In the previous training data set, there were only about 500 pieces of two types of data respectively, and there were ...

Posted by Mateobus on Tue, 04 Jan 2022 04:00:25 +0100

Python machine learning -- clustering algorithm -- K-means(K-means) algorithm

Types and introduction of K-means algorithm Unsupervised learning clustering algorithm; Clustering algorithm is an unsupervised algorithm, and K-means is a clustering algorithm; Definition of K-means algorithm The so-called clustering problem is to give an element set D, in which each element has n observable attributes, and use some algori ...

Posted by skypilot on Tue, 04 Jan 2022 02:32:51 +0100

Eye State Recognition in Deep Learning & Drawing of Confusion Matrix

This experiment is based on the CNN network built by ourselves to classify eye state. Originally, it was intended to migrate learning to classify using VGG16 network. However, the effect of the experiment is very poor and the speed is very slow. It should be the blogger's own problem. However, the model of CNN network built by ourselves is very ...

Posted by gabrielkolbe on Mon, 03 Jan 2022 22:35:26 +0100

Learning notes - Notes on data processing by pandas and df (Introduction)

pandas, as a necessary toolkit for data processing, records the learning process The default processing object of pandas is DataFrame, which is loaded after installation catalogue 1, Interpretation of the meaning of code examples (1)pd.read_csv and pandas read various pits and common parameters of csv files (2) The entire dataset cannot be ...

Posted by roy on Mon, 03 Jan 2022 19:08:12 +0100

sklearn Machine Learning

Task07 This study refers to the Datawhale open source learning: https://github.com/datawhalechina/machine-learning-toy-code/tree/main/ml-with-sklearn The content is arranged as follows, mainly some code implementation and some principles. 7. Integrated Learning In the previous chapter, we talked about the decline in the effect of the dimensi ...

Posted by spiceydog on Mon, 03 Jan 2022 18:05:19 +0100