Algorithmic model mining tags for user portraits

Posted by 90Nz0 on Sun, 30 Jan 2022 20:19:59 +0100

RFM user value model

1 demand

Assuming I am a marketer, I might think about the following questions before doing an activity
Who are my more valuable customers?
Who has the potential to become a valuable customer?
Who's losing?
Who can stay?
Who cares about this event?
In fact, all the above thoughts focus on one theme value
RFM is one of the most common tools used to evaluate value and potential value

2 what is RFM

Evaluate a person's value to the company through the time since the last consumption, the consumption frequency per unit time and the average consumption amount. It can be understood that RFM is an integrated value, as follows: RFM = rencency, frequency and monetary
The RFM model can illustrate the following facts:
The closer the last purchase, the more interested the user is in the promotion
The higher the purchase frequency, the higher the satisfaction with us
The larger the amount of consumption, the richer, and the higher the consumption

3 practical application of RFM

4 high dimensional spatial model

5 unify dimensions through scoring
R: 1-3 days = 5 points, 4-6 days = 4 points, 7-9 days = 3 points, 10-15 days = 2 points, more than 16 days = 1 point
F: ≥ 200 = 5 points, 150-199 = 4 points, 100-149 = 3 points, 50-99 = 2 points, 1-49 = 1 point
M: ≥ 20w=5 points, 10-19w=4 points, 5-9w=3 points, 1-4w=2 points, < 1W = 1 point

val rScore: Column = when('r.>=(1).and('r.<=(3)), 5)
  .when('r >= 4 and 'r <= 6, 4)
  .when('r >= 7 and 'r <= 9, 3)
  .when('r >= 10 and 'r <= 15, 2)
  .when('r >= 16, 1)
  .as("r_score")

val fScore: Column = when('f >= 200, 5)
  .when(('f >= 150) && ('f <= 199), 4)
  .when((col("f") >= 100) && (col("f") <= 149), 3)
  .when((col("f") >= 50) && (col("f") <= 99), 2)
  .when((col("f") >= 1) && (col("f") <= 49), 1)
  .as("f_score")

val mScore: Column = when(col("m") >= 200000, 5)
  .when(col("m").between(100000, 199999), 4)
  .when(col("m").between(50000, 99999), 3)
  .when(col("m").between(10000, 49999), 2)
  .when(col("m") <= 9999, 1)
  .as("m_score")

6 model training and prediction

RFMTrainModel training model, saving model to HDFS, scheduling cycle, once a month.
RFMPredictModel prediction model reads the clustering model from HDFS and predicts the whole data set once a day

 def process(source: DataFrame): DataFrame = {
  val assembled = assembleDataFrame(source)
     val regressor = new KMeans()
     .setK(7)
     .setSeed(10)
     .setMaxIter(10)
     .setFeaturesCol("features")
     .setPredictionCol("predict")

   regressor.fit(assembled).save(MODEL_PATH)

  null
}
val assembled = RFMModel.assembleDataFrame(source)

val kmeans = KMeansModel.load(RFMModel.MODEL_PATH)
val predicted = kmeans.transform(assembled)

// Find the relationship between the group number generated by kmeans and the rule
val sortedCenters: IndexedSeq[(Int, Double)] = kmeans.clusterCenters.indices // IndexedSeq
  .map(i => (i, kmeans.clusterCenters(i).toArray.sum))
  .sortBy(c => c._2).reverse

val sortedDF = sortedCenters.toDF("index", "totalScore")

RFE activity

Similar to RFM, we use RFE to calculate user activity
RFE = R (last access time) + F (access frequency in a specific time) + E (number of activities)
R = datediff(date_sub(current_timestamp(),60), max('log_time))
F = count('loc_url)
E = countDistinct('loc_url)
R:0-15 days = 5 points, 16-30 days = 4 points, 31-45 days = 3 points, 46-60 days = 2 points, more than 61 days = 1 point
F: ≥ 400 = 5 points, 300-399 = 4 points, 200-299 = 3 points, 100-199 = 2 points, ≤ 99 = 1 point
E: ≥ 250 = 5 points, 230-249 = 4 points, 210-229 = 3 points, 200-209 = 2 points, 1 = 1 point

PSM price sensitivity model

PSM is used to calculate the price sensitivity of users
For different levels of price sensitive users, different degrees of marketing can be implemented

1 PSM calculation formula

PSM Score = proportion of preferential orders + (average preferential amount / average receivable per order) + proportion of preferential amount
Proportion of preferential orders
Preferential orders / total orders
Preferential orders = quantity of preferential orders / total orders
Unprivileged orders = unprivileged order quantity / total orders
Average preferential amount
Total preferential amount / number of preferential orders
Average receivable per order
Total receivable / total orders
Proportion of preferential amount
Total preferential amount / total receivable amount

// Amount receivable
val receivableAmount = ('couponCodeValue + 'orderAmount).cast(DoubleType) as "receivableAmount"
// Preferential amount
val discountAmount = 'couponCodeValue.cast(DoubleType) as "discountAmount"
// Paid in amount
val practicalAmount = 'orderAmount.cast(DoubleType) as "practicalAmount"
// Is it preferential
val state = when(discountAmount =!= 0.0d, 1) // =!= Is the method of column
  .when(discountAmount === 0.0d, 0)
  .as("state")

// Number of preferential orders
val discountCount = sum('state) as "discountCount"
// Total orders
val totalCount = count('state) as "totalCount"
// Total preference
val totalDiscountAmount = sum('discountAmount) as "totalDiscountAmount"
// Total receivable
val totalReceivableAmount = sum('receivableAmount) as "totalReceivableAmount"

// Average preferential amount
val avgDiscountAmount = ('totalDiscountAmount / 'discountCount) as "avgDiscountAmount"
// Average receivable per order
val avgReceivableAmount = ('totalReceivableAmount / 'totalCount) as "avgReceivableAmount"
// Proportion of preferential orders
val discountPercent = ('discountCount / 'totalCount) as "discountPercent"
// Proportion of average preferential amount
val avgDiscountPercent = (avgDiscountAmount / avgReceivableAmount) as "avgDiscountPercent"
// Proportion of preferential amount
val discountAmountPercent = ('totalDiscountAmount / 'totalReceivableAmount) as "discountAmountPercent"

// Proportion of preferential orders + (average preferential amount / average receivable per order) + proportion of preferential amount
val psmScore = (discountPercent + (avgDiscountPercent / avgReceivableAmount) + discountAmountPercent) as "psm"

2 principle of clustering algorithm

Select K points as the initial midpoint
Calculate the distance from each midpoint to similar points, and gather the similar points into a class (cluster)
Euclidean distance
Recalculate the midpoint of each cluster
Repeat the above steps until no changes occur

3 determine the K-elbow rule

According to the loss function, calculate the overall loss in the case of each K
Draw a graph and find the inflection point, which is the appropriate K

4 model training and iterative calculation

val kArray = Array(2, 3, 4, 5, 6, 7, 8)
val wssseMap = kArray.map(f = k => {
  val kmeans = new KMeans()
    .setK(k)
    .setMaxIter(10)
    .setPredictionCol("prediction")
    .setFeaturesCol("features")
  val model: KMeansModel = kmeans.fit(vectored)

  import spark.implicits._
  // mlLib calculation loss function
  val vestors: Array[OldVector] = model.clusterCenters.map(v => OldVectors.fromML(v))
  val libModel: LibKMeansModel = new LibKMeansModel(vestors)
  val features = vectored.rdd.map(row => {
    val ve = row.getAs[Vector]("features")
    val oldVe: OldVector = OldVectors.fromML(ve)
    oldVe
  })

  val wssse: Double = libModel.computeCost(features)
  (k, wssse)
}).toMap

Classification model - predicting gender

There are two meanings of shopping gender model:
Predict the gender of users through their shopping behavior
Through the user's shopping behavior, determine the user's shopping gender preference

1 preset label, quantization attribute

|memberId| color|productType|gender|colorIndex|  color|    productType|gender|productTypeIndex|   features|featuresIndex|
+--------+------+-----------+------+----------+------------------+---------------+------+----------------+-----------+-------------+
|       4|Cherry Blossom powder|   Smart TV|     1|      14.0|Cherry Blossom powder|       Smart TV|     1|            13.0|[14.0,13.0]|  [14.0,13.0]|
|       4|Cherry Blossom powder|   Smart TV|     1|      14.0|  blue| Haier/Haier refrigerator|     0|             1.0| [14.0,1.0]|   [14.0,1.0]|
val label = when('ogColor.equalTo("Cherry Blossom powder")
  .or('ogColor.equalTo("white"))
  .or('ogColor.equalTo("Champagne"))
  .or('ogColor.equalTo("Champagne gold"))
  .or('productType.equalTo("food processor"))
  .or('productType.equalTo("Hanging ironing machine"))
  .or('productType.equalTo("Vacuum cleaner/Mite remover")), 1)
  .otherwise(0)
  .alias("gender")

2 decision tree algorithm

Decision tree is a supervised learning algorithm, which needs to manually label the data set first. Here, the overall process is simplified, and the required labels are preset through simple matching

3 algorithm engineering and model evaluation

val featureVectorIndexer = new VectorIndexer()
  .setInputCol("features")
  .setOutputCol("featuresIndex")
  .setMaxCategories(3)

val decisionTreeClassifier = new DecisionTreeClassifier()
  .setFeaturesCol("featuresIndex")
  .setLabelCol("gender")
  .setPredictionCol("predict")
  .setMaxDepth(5)
  .setImpurity("gini")

val pipeline = new Pipeline()
  .setStages(Array(colorIndexer, productTypeIndexer, featureAssembler, featureVectorIndexer, decisionTreeClassifier))

val Array(trainData, testData) = source.randomSplit(Array(0.8, 0.2))

val model: PipelineModel = pipeline.fit(trainData)
 val pTrain = model.transform(trainData)
 val tTrain = model.transform(testData)

val accEvaluator = new MulticlassClassificationEvaluator()
  .setPredictionCol("predict")
  .setLabelCol("gender")
  .setMetricName("accuracy")//accuracy

Topics: Spark Machine Learning

Programmer Think

Algorithmic model mining tags for user portraits

RFM user value model

RFE activity

PSM price sensitivity model

Classification model - predicting gender

Hot Topics