Data Mining [Page 9] - Programmer Think - where programmers share thinking

Data Mining

Python topic 9: Advanced Application of common standard library jieba Library: common and unique

There are two text documents, which are extracted from the government work reports in 2019 and 2018. Now it is necessary to count the ten words that appear most frequently in the two files as subject words, and the words are required to be no less than 2 characters. Output example: 2019: Reform: 10, enterprise: 9........, deepening: 2 Then, d ...

Posted by coderWil on Sun, 05 Dec 2021 04:16:56 +0100

There are 7 Python toolkits for time series prediction, and there is always one for you

Welcome to pay attention to me, IT industry, focus on Python! Time series problem is one of the most difficult problems in data science. Traditional processing methods such as ARIMA and SARIMA are very good, but it is difficult to achieve satisfactory prediction results when dealing with nonlinear or non-stationary time series problems. ...

Posted by pelegk2 on Fri, 03 Dec 2021 22:55:19 +0100

[data analysis and visualization] key points of data drawing 4 - problems of pie chart

Key points of data drawing 4 - problems of pie chart This article lets us understand the most criticized chart type in history: pie chart. Bad definition A pie chart is a circle divided into several parts, each part representing a part of the whole. It is usually used to display percentages where the sum of sectors equals 100%. The problem i ...

Posted by chomps on Wed, 01 Dec 2021 14:36:56 +0100

Data analysis practice - house price forecast, detailed explanation of code (kaggle competition)

The data set is referred to the following website: House Prices - Advanced Regression Techniques | KagglePredict sales prices and practice feature engineering, RFs, and gradient boostinghttps://www.kaggle.com/c/house-prices-advanced-regression-techniques preface: This paper is divided into two periods, which is too long and inconvenient. T ...

Posted by dough boy on Sun, 28 Nov 2021 11:34:00 +0100

In depth inventory: these 15 scikit learn important skills are necessary for beginners

Scikit learn is a great python library for implementing machine learning models and statistical modeling. Through it, we can not only realize various machine learning models of regression, classification and clustering, but also provide the functions of dimension reduction, feature selection, feature extraction, integration technology and built ...

Posted by buluk21 on Sun, 28 Nov 2021 10:07:58 +0100

Plot + pandas + sklearn: shoot the first shot of kaggle

Official account: Special HouseAuthor: PeterEditor: Peter Hello, I'm Peter~ Many readers have asked me: are there any good cases of data analysis and data mining? The answer is, of course, it's all on Kaggle. It's just that you have to spend time studying and even playing games. Peter has no competition experience, but he often goes to Kag ...

Posted by buddhika2010 on Fri, 26 Nov 2021 15:59:17 +0100

Data mining - data exploration (EDA)

I. Introduction the first step in the data mining competition is not what model and method you want to use, but to understand the background of the competition first. You need to carefully read the official background of the game, and then consult relevant materials to have an in-depth understanding of the problem scene. We may have a g ...

Posted by zaki on Fri, 19 Nov 2021 21:23:39 +0100

kmeans algorithm and its optimization

kmeans clustering algorithm Algorithm principle The algorithm principle of kmeans is actually very simple I use the simplest two-dimensional scatter diagram to explain As shown in the figure above, we can intuitively see that the figure can be grouped into two categories, which are represented by red dots and blue dots respectively Let ...

Posted by kark_1999 on Fri, 12 Nov 2021 23:20:12 +0100

EDA and data mining: analysis of ratings and box office of marvel and DC films

Which is better, MCU or DC? Which movie has higher ratings? This article will analyze marvel and DC films based on total box office and ratings Which is better, Marvel Cinematic vs DC Universe? It's an endless debate, isn't it? When you oppose any of these movies, fans will become crazy. In this article, we will compare Marvel and DC according ...

Posted by Stryker250 on Fri, 12 Nov 2021 09:37:07 +0100

Python Bayesian probability inference sequence data probability and a priori, likelihood and a posteriori graph visualization

Original link: http://tecdat.cn/?p=24191In this article, I will focus on an example of the inference probability given a short data sequence. I will first introduce the theory of how to use Bayesian method for expectation reasoning, and then implement the theory in Python so that we can deal with these ideas. In order to make the article easier ...

Posted by marowa on Wed, 10 Nov 2021 11:09:21 +0100

Hot Topics