Algorithmic model mining tags for user portraits

Posted by 90Nz0 on Sun, 30 Jan 2022 20:19:59 +0100

RFM user value model

1 demand

  • Assuming I am a marketer, I might think about the following questions before doing an activity
  • Who are my more valuable customers?
  • Who has the potential to become a valuable customer?
  • Who's losing?
  • Who can stay?
  • Who cares about this event?
  • In fact, all the above thoughts focus on one theme value
  • RFM is one of the most common tools used to evaluate value and potential value

2 what is RFM

  • Evaluate a person's value to the company through the time since the last consumption, the consumption frequency per unit time and the average consumption amount. It can be understood that RFM is an integrated value, as follows: RFM = rencency, frequency and monetary
  • The RFM model can illustrate the following facts:
  • The closer the last purchase, the more interested the user is in the promotion
  • The higher the purchase frequency, the higher the satisfaction with us
  • The larger the amount of consumption, the richer, and the higher the consumption

    3 practical application of RFM

    4 high dimensional spatial model

    5 unify dimensions through scoring
  • R: 1-3 days = 5 points, 4-6 days = 4 points, 7-9 days = 3 points, 10-15 days = 2 points, more than 16 days = 1 point
  • F: ≥ 200 = 5 points, 150-199 = 4 points, 100-149 = 3 points, 50-99 = 2 points, 1-49 = 1 point
  • M: ≥ 20w=5 points, 10-19w=4 points, 5-9w=3 points, 1-4w=2 points, < 1W = 1 point
val rScore: Column = when('r.>=(1).and('r.<=(3)), 5)
  .when('r >= 4 and 'r <= 6, 4)
  .when('r >= 7 and 'r <= 9, 3)
  .when('r >= 10 and 'r <= 15, 2)
  .when('r >= 16, 1)
  .as("r_score")

val fScore: Column = when('f >= 200, 5)
  .when(('f >= 150) && ('f <= 199), 4)
  .when((col("f") >= 100) && (col("f") <= 149), 3)
  .when((col("f") >= 50) && (col("f") <= 99), 2)
  .when((col("f") >= 1) && (col("f") <= 49), 1)
  .as("f_score")

val mScore: Column = when(col("m") >= 200000, 5)
  .when(col("m").between(100000, 199999), 4)
  .when(col("m").between(50000, 99999), 3)
  .when(col("m").between(10000, 49999), 2)
  .when(col("m") <= 9999, 1)
  .as("m_score")

6 model training and prediction

  • RFMTrainModel training model, saving model to HDFS, scheduling cycle, once a month.
  • RFMPredictModel prediction model reads the clustering model from HDFS and predicts the whole data set once a day
 def process(source: DataFrame): DataFrame = {
  val assembled = assembleDataFrame(source)
     val regressor = new KMeans()
     .setK(7)
     .setSeed(10)
     .setMaxIter(10)
     .setFeaturesCol("features")
     .setPredictionCol("predict")

   regressor.fit(assembled).save(MODEL_PATH)

  null
}
val assembled = RFMModel.assembleDataFrame(source)

val kmeans = KMeansModel.load(RFMModel.MODEL_PATH)
val predicted = kmeans.transform(assembled)

// Find the relationship between the group number generated by kmeans and the rule
val sortedCenters: IndexedSeq[(Int, Double)] = kmeans.clusterCenters.indices // IndexedSeq
  .map(i => (i, kmeans.clusterCenters(i).toArray.sum))
  .sortBy(c => c._2).reverse

val sortedDF = sortedCenters.toDF("index", "totalScore")

RFE activity

  • Similar to RFM, we use RFE to calculate user activity
  • RFE = R (last access time) + F (access frequency in a specific time) + E (number of activities)
  • R = datediff(date_sub(current_timestamp(),60), max('log_time))
  • F = count('loc_url)
  • E = countDistinct('loc_url)
  • R:0-15 days = 5 points, 16-30 days = 4 points, 31-45 days = 3 points, 46-60 days = 2 points, more than 61 days = 1 point
  • F: ≥ 400 = 5 points, 300-399 = 4 points, 200-299 = 3 points, 100-199 = 2 points, ≤ 99 = 1 point
  • E: ≥ 250 = 5 points, 230-249 = 4 points, 210-229 = 3 points, 200-209 = 2 points, 1 = 1 point

PSM price sensitivity model

  • PSM is used to calculate the price sensitivity of users
  • For different levels of price sensitive users, different degrees of marketing can be implemented

1 PSM calculation formula

  • PSM Score = proportion of preferential orders + (average preferential amount / average receivable per order) + proportion of preferential amount
  • Proportion of preferential orders
  • Preferential orders / total orders
  • Preferential orders = quantity of preferential orders / total orders
  • Unprivileged orders = unprivileged order quantity / total orders
  • Average preferential amount
  • Total preferential amount / number of preferential orders
  • Average receivable per order
  • Total receivable / total orders
  • Proportion of preferential amount
  • Total preferential amount / total receivable amount
// Amount receivable
val receivableAmount = ('couponCodeValue + 'orderAmount).cast(DoubleType) as "receivableAmount"
// Preferential amount
val discountAmount = 'couponCodeValue.cast(DoubleType) as "discountAmount"
// Paid in amount
val practicalAmount = 'orderAmount.cast(DoubleType) as "practicalAmount"
// Is it preferential
val state = when(discountAmount =!= 0.0d, 1) // =!= Is the method of column
  .when(discountAmount === 0.0d, 0)
  .as("state")

// Number of preferential orders
val discountCount = sum('state) as "discountCount"
// Total orders
val totalCount = count('state) as "totalCount"
// Total preference
val totalDiscountAmount = sum('discountAmount) as "totalDiscountAmount"
// Total receivable
val totalReceivableAmount = sum('receivableAmount) as "totalReceivableAmount"

// Average preferential amount
val avgDiscountAmount = ('totalDiscountAmount / 'discountCount) as "avgDiscountAmount"
// Average receivable per order
val avgReceivableAmount = ('totalReceivableAmount / 'totalCount) as "avgReceivableAmount"
// Proportion of preferential orders
val discountPercent = ('discountCount / 'totalCount) as "discountPercent"
// Proportion of average preferential amount
val avgDiscountPercent = (avgDiscountAmount / avgReceivableAmount) as "avgDiscountPercent"
// Proportion of preferential amount
val discountAmountPercent = ('totalDiscountAmount / 'totalReceivableAmount) as "discountAmountPercent"

// Proportion of preferential orders + (average preferential amount / average receivable per order) + proportion of preferential amount
val psmScore = (discountPercent + (avgDiscountPercent / avgReceivableAmount) + discountAmountPercent) as "psm"

2 principle of clustering algorithm

  • Select K points as the initial midpoint
  • Calculate the distance from each midpoint to similar points, and gather the similar points into a class (cluster)
  • Euclidean distance
  • Recalculate the midpoint of each cluster
  • Repeat the above steps until no changes occur

3 determine the K-elbow rule

  • According to the loss function, calculate the overall loss in the case of each K
  • Draw a graph and find the inflection point, which is the appropriate K

4 model training and iterative calculation

val kArray = Array(2, 3, 4, 5, 6, 7, 8)
val wssseMap = kArray.map(f = k => {
  val kmeans = new KMeans()
    .setK(k)
    .setMaxIter(10)
    .setPredictionCol("prediction")
    .setFeaturesCol("features")
  val model: KMeansModel = kmeans.fit(vectored)

  import spark.implicits._
  // mlLib calculation loss function
  val vestors: Array[OldVector] = model.clusterCenters.map(v => OldVectors.fromML(v))
  val libModel: LibKMeansModel = new LibKMeansModel(vestors)
  val features = vectored.rdd.map(row => {
    val ve = row.getAs[Vector]("features")
    val oldVe: OldVector = OldVectors.fromML(ve)
    oldVe
  })

  val wssse: Double = libModel.computeCost(features)
  (k, wssse)
}).toMap

Classification model - predicting gender

  • There are two meanings of shopping gender model:
  • Predict the gender of users through their shopping behavior
  • Through the user's shopping behavior, determine the user's shopping gender preference

1 preset label, quantization attribute

|memberId| color|productType|gender|colorIndex|  color|    productType|gender|productTypeIndex|   features|featuresIndex|
+--------+------+-----------+------+----------+------------------+---------------+------+----------------+-----------+-------------+
|       4|Cherry Blossom powder|   Smart TV|     1|      14.0|Cherry Blossom powder|       Smart TV|     1|            13.0|[14.0,13.0]|  [14.0,13.0]|
|       4|Cherry Blossom powder|   Smart TV|     1|      14.0|  blue| Haier/Haier refrigerator|     0|             1.0| [14.0,1.0]|   [14.0,1.0]|
val label = when('ogColor.equalTo("Cherry Blossom powder")
  .or('ogColor.equalTo("white"))
  .or('ogColor.equalTo("Champagne"))
  .or('ogColor.equalTo("Champagne gold"))
  .or('productType.equalTo("food processor"))
  .or('productType.equalTo("Hanging ironing machine"))
  .or('productType.equalTo("Vacuum cleaner/Mite remover")), 1)
  .otherwise(0)
  .alias("gender")

2 decision tree algorithm

  • Decision tree is a supervised learning algorithm, which needs to manually label the data set first. Here, the overall process is simplified, and the required labels are preset through simple matching

3 algorithm engineering and model evaluation

val featureVectorIndexer = new VectorIndexer()
  .setInputCol("features")
  .setOutputCol("featuresIndex")
  .setMaxCategories(3)

val decisionTreeClassifier = new DecisionTreeClassifier()
  .setFeaturesCol("featuresIndex")
  .setLabelCol("gender")
  .setPredictionCol("predict")
  .setMaxDepth(5)
  .setImpurity("gini")

val pipeline = new Pipeline()
  .setStages(Array(colorIndexer, productTypeIndexer, featureAssembler, featureVectorIndexer, decisionTreeClassifier))

val Array(trainData, testData) = source.randomSplit(Array(0.8, 0.2))

val model: PipelineModel = pipeline.fit(trainData)
 val pTrain = model.transform(trainData)
 val tTrain = model.transform(testData)

val accEvaluator = new MulticlassClassificationEvaluator()
  .setPredictionCol("predict")
  .setLabelCol("gender")
  .setMetricName("accuracy")//accuracy 

Topics: Spark Machine Learning