Background introduction
Since 2015, the shared automobile industry has "let a hundred flowers bloom" and many projects have received huge financing. However, due to the heavy model, high operating costs, inability to make profits and other problems, one after another shared car companies closed down due to the rupture of the capital chain. According to the 2019 white paper on the innovation of China's shared car platform released by Analysys, the shared car industry in 2019 is a year in which small and medium-sized participants are constantly out and the head platform drives the industry to restart growth. The growth rate of shared cars reached 2.21% from May to October 2019, surpassing online car hailing and online car rental.
In the past, cars were bought as "big pieces". Now, due to the dazzling speed of model updating iteration and the impact of the epidemic, consumers' decision-making process will become longer. Consumers have the desire to use cars and upgrade their consumption, but they are becoming smarter and smarter. They want to know whether there is a lighter and better way to use cars. The "time-sharing rental" model of shared cars solves this problem a lot. However, there are too many cost control links in this way, which makes it very difficult to make profits.
Phase I issues
The attachment is the location data set of shared vehicles, which provides location information such as time, longitude and latitude, as well as the number and list of vehicles parked at the parking lot. Please establish a mathematical model to analyze the distribution of shared vehicles in the city, and formulate a shared vehicle scheduling scheme that is most beneficial to the enterprise.
data processing
The amount of data given in the annex is too large, and the sample size reaches 1.04 million, so the data needs to be processed first.
- Screen out all parking spots
chars = paste(data$latitude,"+",data$longitude,sep = "") red.chars = unique(chars) dt = c() for (i in 1:length(red.chars)) { dt = c(dt,strsplit(red.chars[i],"[+]")) } setwd("C:/Users/Administrator/Desktop") write.csv(dt,"location.csv")
2. Divide parking spaces into areas
It is divided into 6 categories by K-means clustering method (in order to simplify the model established later, the category is directly divided into 6 categories)
location<-read.csv("C:/Users/Administrator/Desktop/location.csv") loc.kmeans = kmeans(location,6) setwd("C:/Users/Administrator/Desktop") write.csv(loc,kmeans$cluster,"lable.csv")
3. Label the attachment data according to the divided area
data<-read.csv("C:/Users/Administrator/Desktop/Share car positioning data.csv") location<-read.csv("C:/Users/Administrator/Desktop/location.csv") label<-read.csv("C:/Users/Administrator/Desktop/lable.csv") data$label = NA location.chars = paste(data$latitude,"+",data$longitude,sep="") location.char = paste(location[,1],"+",location[,2],sep="") match = cbind(location.char,label) for(i in 1:length(location.char)){ loc.row = which(location.chars == location.char[i]) for(r in loc.row){ data$label[r] = match[i,2] } } save(data,file="C:/Users/Administrator/Desktop/data.Rdata")
4 count the number of vehicles lent and returned in each region
First, remove the brackets of the vehicle number of the annex data for the convenience of later statistics
id = as.character(data$carsList) n=length(id) index = c() m = max(data$total_cars) dt = matrix(NA,n,m) for(i in 1:n) { index = sapply(id[i], FUN = function(d){substring(d,2,nchar(d)-1)}, USE.NAMES=FALSE) tempchar = as.numeric(strsplit(index,"[,]")[[1]]) if(length(tempchar)!=0){dt[i,1:length(tempchar)]= tempchar;} } car.id = dt car.id[is.na(car.id)]<-0 save(car.id,file="C:/Users/Administrator/Desktop/carid.Rdata")