Offline Merchandise Recommendation System (Item-based Collaborative Filtering + MapReduce)

Posted by regprick on Tue, 03 Sep 2019 05:09:56 +0200

  1. Data Acquisition Section
  • Information Records of Users Browsing Items
    User Id Commodity Id Commodity types Add time
    userId itemId itemType time
  • Information Record of User Collections
    User Id Id of Collection Goods Collection time
    userId collerctId collectTime
  • Information Record of User Adding Items to Shopping Cart
    User Id Id added to shopping cart Add time
    userId             cartId cartTime
  • Information Record of User's Purchase Items

    User Id Goods Id Purchased Add time
    userId consumeId consumeTime

    When the user performs the above actions, the corresponding log records are generated, the contents of the generated log files are merged into a log file, and then the log files are monitored by flume and uploaded to hdfs.

  • Note: flume is used to monitor the directory generated by each log. Whether a new log file is uploaded to hdfs or merged into a log file and uploaded to hdfs is not well considered. The format of merging and uploading is adopted first.

2.flume configuration

# Name the agent component
score_agent.sources = r2
score_agent.sinks = k2
score_agent.channels = c2
#Monitoring Documents
score_agent.sources.r2.type = exec
#Configuration listener file path
score_agent.sources.r2.command = tail -F /usr/local/flume1.8/test/recommend/score/score_test.log
score_agent.sources.r2.shell = /bin/bash -c
 
# File Output Location
score_agent.sinks.k2.type = hdfs
#HDFS paths, stored in the form of years, months, days, and hours
score_agent.sinks.k2.hdfs.path = hdfs://ns/flume/recommend/score/%Y%m%d/%H
#Prefix for uploading files
score_agent.sinks.k2.hdfs.filePrefix = events-
#Whether to scroll folders according to time
score_agent.sinks.k2.hdfs.round = true
#How much time to create a new folder
score_agent.sinks.k2.hdfs.roundValue = 1
#Redefining unit of time
score_agent.sinks.k2.hdfs.roundUnit = hour
#Whether to use local timestamp
score_agent.sinks.k2.hdfs.useLocalTimeStamp = true
#How many Event s are saved to flush to HDFS once
score_agent.sinks.k2.hdfs.batchSize = 1000
#Set file type to support compression
score_agent.sinks.k2.hdfs.fileType = DataStream
#How often to generate a new file, per second
score_agent.sinks.k2.hdfs.rollInterval = 600
#Set the scroll size of each file to 128M in kb
score_agent.sinks.k2.hdfs.rollSize = 134217728
#File scrolling is independent of the number of Event s 
score_agent.sinks.k2.hdfs.rollCount = 0
#Minimum Redundancy Number
score_agent.sinks.k2.hdfs.minBlockReplicas = 1
 
# Use memory to save data
score_agent.channels.c2.type = memory
score_agent.channels.c2.capacity = 500000
score_agent.channels.c2.transactionCapacity = 600
 
# Binding source and sinks associations
score_agent.sources.r2.channels = c2
score_agent.sinks.k2.channel = c2

3.MapReduce

3.1pom.xml

 <dependencies>
        <!-- hadoop dependencies -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.7.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>2.7.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
            <version>2.7.3</version>
        </dependency>
    </dependencies>

3.2 StartRun

package com.htkj.recommend;

import com.htkj.recommend.mapreduce.*;
import org.apache.hadoop.conf.Configuration;

import java.util.HashMap;
import java.util.Map;


public class StartRun {
    public static void main(String[] args){
        Configuration conf=new Configuration();
//         conf.set("mapreduce.app-submission.corss-paltform", "true");
//         conf.set("mapreduce.framework.name", "local");
        Map<String, String> paths = new HashMap<String, String>();
        //The first step is to clean the data and repeat it.
        paths.put("CleanInput", "/test/recommend/input/");
        paths.put("CleanOutput", "/test/recommend/output/clean");
        //Step 2 Score Matrix of User Grouping User's Favor for Items
        paths.put("UserGroupingInput",paths.get("CleanOutput"));
        paths.put("UserGroupingOutput","/test/recommend/output/user_grouping");
        //Step 3 Cooccurrence Matrix of Items Counting
        paths.put("ItemCountInput",paths.get("UserGroupingOutput"));
        paths.put("ItemCountOutput","/test/recommend/output/item_count");
        //The fourth step is to multiply the co-occurrence matrix with the score matrix.
        paths.put("ScoreInput1", paths.get("UserGroupingOutput"));
        paths.put("ScoreInput2", paths.get("ItemCountOutput"));
        paths.put("ScoreOutput", "/test/recommend/output/score");
        //The fifth step is to add the multiplied matrices to get the result matrix.
        paths.put("AllScoreInput", paths.get("ScoreOutput"));
        paths.put("AllScoreOutput", "/test/recommend/output/all_score");
        //Step 6 Sort to get the top ten items
        paths.put("ResultSortInput", paths.get("AllScoreOutput"));
        paths.put("ResultSortOutput", "/test/recommend/output/result_sort");


       // paths.put("CleanInput", "D:\\test\\user_bought_history.txt");

//        paths.put("CleanInput", "D:\\test\\1.txt");
//        paths.put("CleanOutput", "D:\\test\\test1");
//        paths.put("UserGroupingInput",paths.get("CleanOutput")+"\\part-r-00000");
//        paths.put("UserGroupingOutput","D:\\test\\test2");
//        paths.put("ItemCountInput",paths.get("UserGroupingOutput")+"\\part-r-00000");
//        paths.put("ItemCountOutput","D:\\test\\test3");
//        paths.put("ScoreInput1", paths.get("UserGroupingOutput")+"\\part-r-00000");
//        paths.put("ScoreInput2", paths.get("ItemCountOutput")+"\\part-r-00000");
//        paths.put("ScoreOutput", "D:\\test\\test4");
//        paths.put("AllScoreInput", paths.get("ScoreOutput")+"\\part-r-00000");
//        paths.put("AllScoreOutput", "D:\\test\\test5");
//        paths.put("ResultSortInput", paths.get("AllScoreOutput")+"\\part-r-00000");
//        paths.put("ResultSortOutput", "D:\\test\\test6");
        Clean.run(conf,paths);
        UserGrouping.run(conf,paths);
        ItemCount.run(conf,paths);
        Score.run(conf,paths);
        AllScore.run(conf,paths);
        ResultSort.run(conf,paths);
    }
    public static Map<String,Integer> action=new HashMap<String, Integer>();
    static {
        action.put("click",1);//Click to count 1 point
        action.put("collect",2);//Collection calculates 2 points
        action.put("cart",3);//Add a shopping cart and count three points.
        action.put("alipay",4);//Payment calculates 4 points
    }

}

3.3Clean

package com.htkj.recommend.mapreduce;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.Map;


public class Clean {
    public  static boolean run(Configuration config, Map<String,String> paths){
        try {
            FileSystem fs=FileSystem.get(config);
            Job job=Job.getInstance(config);
            job.setJobName("Clean");
            job.setJarByClass(Clean.class);
            job.setMapperClass(CleanMapper.class);
            job.setReducerClass(CleanReduce.class);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(NullWritable.class);
            FileInputFormat.addInputPath(job,new Path(paths.get("CleanInput")));
            Path outpath=new Path(paths.get("CleanOutput"));
            if (fs.exists(outpath)){
                fs.delete(outpath,true);
            }
            FileOutputFormat.setOutputPath(job,outpath);
            boolean f=job.waitForCompletion(true);
            return  f;
        } catch (Exception e) {
            e.printStackTrace();
        }
        return  false;
    }

    static class CleanMapper extends Mapper<LongWritable, Text, Text, NullWritable>{
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            //The first line does not read here. The first line can be set to the header.
            if (key.get()!=0){
                context.write(value,NullWritable.get());
            }
        }

    }
    static  class CleanReduce extends Reducer<Text, IntWritable,Text,NullWritable>{
        @Override
        protected  void  reduce(Text key , Iterable<IntWritable> i, Context context) throws IOException, InterruptedException {
            context.write(key,NullWritable.get());
        }
    }
}

3.4UserGrouping

package com.htkj.recommend.mapreduce;

import com.htkj.recommend.StartRun;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

/*
 * Input data set
 * userId   itemID  action
 *   1       100    click
 *   1       101    collect
 *   1       102    cart
 *   1       103    alipay
 *
 * The output is as follows
 * userId    itemID:Fraction
 *   1       100:1,101:2,102:3,103:4,
 *   2       100:1,101:1,102:1,103:4,
 *   3       100:1,101:2,102:2,103:4,
 *   4       100:4,101:2,102:2,103:4,
 */


public class UserGrouping {
    public static  boolean run(Configuration config , Map<String,String> paths){
        try {
            FileSystem fs=FileSystem.get(config);
            Job job=Job.getInstance(config);
            job.setJobName("UserGrouping");
            job.setJarByClass(StartRun.class);
            job.setMapperClass(UserGroupingMapper.class);
            job.setReducerClass(UserGroupingReduce.class);
            job.setMapOutputValueClass(Text.class);
            job.setMapOutputKeyClass(Text.class);
            FileInputFormat.addInputPath(job,new Path(paths.get("UserGroupingInput")));
            Path outpath=new Path(paths.get("UserGroupingOutput"));
            if (fs.exists(outpath)){
                fs.delete(outpath,true);
            }
            FileOutputFormat.setOutputPath(job,outpath);
            boolean f=job.waitForCompletion(true);
            return  f;
        } catch (Exception e) {
            e.printStackTrace();
        }

        return false;
    }
    static class UserGroupingMapper extends Mapper<LongWritable, Text,Text,Text>{
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] split = value.toString().split(" ");
            String user = split[0];//1
            String item = split[1];//100
            String action = split[2];//click
            Text k=new Text(user);//1
//            Integer rv= StartRun.action.get(action); //click=1
            Integer rv=4;
            Text v=new Text(item+":"+rv.intValue());//100:1
            context.write(k,v);
            /*
            *   userId  itemId:Fraction
            *     1      100:1
            */
        }
    }
    static class UserGroupingReduce extends Reducer<Text,Text,Text,Text>{
        @Override
        protected  void  reduce(Text key, Iterable<Text> i, Context context) throws IOException, InterruptedException {
            Map<String, Integer> map = new HashMap<String, Integer>();
            /*
            *  userId      1
            * itmId:Score 100:1
            *             101:2
            *             102:3
            *             103:4
            */
            for (Text value : i) {
                String[] split = value.toString().split(":");
                String item = split[0];//101
               // Integer action = Integer.parseInt(split[1]);//2
                Integer action=4;
                //If there is itemId action = Original Score + current action in the map set
                //For example, 101 items are clicked, collected, added to the shopping cart and purchased. The score of 101 is 1 + 2 + 3 + 4.
                if (map.get(item) != null) {
                    action = ((Integer) map.get(item)).intValue() + action;
                }
                map.put(item,action);//
            }
            StringBuffer stringBuffer = new StringBuffer();
            for (Map.Entry<String,Integer> entry:map.entrySet()){
                stringBuffer.append(entry.getKey()+":"+entry.getValue().intValue()+",");//100:1,
            }
            context.write(key,new Text(stringBuffer.toString()));//1    100:1,101:2,102:3,103:4,
        }
    }
}

3.5ItemCount

package com.htkj.recommend.mapreduce;

import com.htkj.recommend.StartRun;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.Map;
/*
*Counting the list of items combination and establishing the co-occurrence matrix of items
*Input data set
* userId    itemID:Fraction
*   1       100:1,101:2,102:3,103:4,
*   2       100:1,101:1,102:1,103:4,
*   3       100:1,101:2,102:2,103:4,
*   4       100:4,101:2,102:2,103:4,
*itemA:itemB    frequency
*100:100	     7
*100:101         6
*100:102	     6
*100:103	     6
*101:100	     6
*101:101	     7
*101:100	     6
*
* */

public class ItemCount {
    private final static Text KEY=new Text();
    private final static IntWritable VALUE=new IntWritable(1);
    public static boolean run(Configuration config , Map<String,String> paths){
        try {
            FileSystem fs=FileSystem.get(config);
            Job job= Job.getInstance(config);
            job.setJobName("ItemCount");
            job.setJarByClass(StartRun.class);
            job.setMapperClass(ItemCountMapper.class);
            job.setReducerClass(ItemCountReduce.class);
            job.setCombinerClass(ItemCountReduce.class);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(IntWritable.class);
            FileInputFormat.addInputPath(job,new Path(paths.get("ItemCountInput")));
            Path outpath=new Path(paths.get("ItemCountOutput"));
            if (fs.exists(outpath)){
                fs.delete(outpath,true);
            }
            FileOutputFormat.setOutputPath(job,outpath);
            boolean f = job.waitForCompletion(true);
            return f;
        } catch (Exception e) {
            e.printStackTrace();
        }
        return false;
    }

    static class ItemCountMapper extends Mapper<LongWritable ,Text ,Text,IntWritable>{
        @Override
        protected void  map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            /*
            * userId    itemID:Fraction
            *    1      100:1,101:2,102:3,103:4,
            * */
            String[] s = value.toString().split("\t");
            String[] items = s[1].split(",");//[100:1 101:2 102:3 103:4]
            for (int i = 0; i < items.length; i++) {
                String itemA = items[i].split(":")[0];//100
                for (int j = 0; j < items.length; j++) {
                    String itemB = items[j].split(":")[0];//The first is 100 and the second is 101.
                    KEY.set(itemA+":"+itemB);//First 100:100 Second 100:101
                    context.write(KEY,VALUE);//The first 100:100 1 and the second 100:101 1
                }
            }
        }
    }

    static class ItemCountReduce extends Reducer<Text,IntWritable,Text,IntWritable>{
        @Override
        protected void reduce(Text key, Iterable<IntWritable> i, Context context) throws IOException, InterruptedException {
            int sum=0;
            for (IntWritable v : i) {
                sum=sum+v.get();
            }
            VALUE.set(sum);
            context.write(key,VALUE);
        }
    }
}

3.6Score

package com.htkj.recommend.mapreduce;

import com.htkj.recommend.StartRun;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.regex.Pattern;
/*
 *Multiply the co-occurrence matrix of items with the score matrix of users
 *Co-occurrence matrix A of input data set items
 *itemA:itemB    frequency
 *100:100	     7
 *100:101        6
 *100:102	     6
 *100:103	     6
 *101:100	     6
 *101:101	     7
 *101:100	     6
 * Score Matrix B for Input Data Set Users
 *  userId    itemID:Fraction
 *   1       100:1,101:2,102:3,103:4,
 *   2       100:1,101:1,102:1,103:4,
 *   3       100:1,101:2,102:2,103:4,
 *   4       100:4,101:2,102:2,103:4,
 * */

public class Score {

    public static boolean run(Configuration config, Map<String, String> paths) {
        try {
            FileSystem fs = FileSystem.get(config);
            Job job = Job.getInstance(config);
            job.setJobName("Score");
            job.setJarByClass(StartRun.class);
            job.setMapperClass(ScoreMapper.class);
            job.setReducerClass(ScoreReduce.class);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            Path[] inputPaths = {new Path(paths.get("ScoreInput1")), new Path(paths.get("ScoreInput2"))};
            FileInputFormat.setInputPaths(job,inputPaths);
            Path outpath = new Path(paths.get("ScoreOutput"));
            if (fs.exists(outpath)) {
                fs.delete(outpath, true);
            }
            FileOutputFormat.setOutputPath(job, outpath);
            boolean f = job.waitForCompletion(true);
            System.out.println(f);
            return f;
        } catch (Exception e) {
            e.printStackTrace();
        }
        return false;
    }
    static class ScoreMapper extends Mapper<LongWritable,Text,Text,Text>{
        private  String flag;//A Co-occurrence Matrix B Score Matrix
        @Override
        protected  void  setup(Context context){
            FileSplit split= (FileSplit) context.getInputSplit();
            flag=split.getPath().getParent().getName();//Judging Reading Data Set
            System.out.println(flag+"-------------------");
        }
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] s = Pattern.compile("[\t]").split(value.toString());
            if (flag.equals("item_count")){//Co-occurrence matrix//test3 item_count
                /*
                * itemA:itemB   frequency
                *  100:101       6
                * */
                String[] split = s[0].split(":");
                String item1 = split[0];//100
                String item2 = split[1];//101
                String num = s[1];//6
                Text k = new Text(item1);//100
                Text v = new Text("A:" + item2 + "," + num);//A:101,6
                context.write(k,v);//100    A:101,6
            }else if (flag.equals("user_grouping")){//Score matrix//test2 user_grouping
                /*
                * userId    itemID:Fraction
                *   1       100:1,101:2,102:3,103:4,
                */
                String userId = s[0];//1
                String[] vector = s[1].split(",");//[100:1 101:2 102:3 103:4]
                for (int i = 0; i < vector.length; i++) {
                     String[] split = vector[i].split(":");
                     String itemId=split[0];//i=0:100
                     String score = split[1];//1
                     Text k = new Text(itemId);//100
                     Text v = new Text("B:" + userId + "," + score);//B:1,1
                    context.write(k,v);//100    B:1,1
                }

            }
        }
    }
    static class ScoreReduce extends Reducer<Text,Text,Text,Text>{
        @Override
        protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            Map<String,Integer> mapA=new HashMap<String, Integer>();//Item-based co-occurrence matrix A
            Map<String,Integer> mapB=new HashMap<String, Integer>();//Score Matrix B Based on User
            for (Text line : values) {
                String s = line.toString();
                if (s.startsWith("A:")){// A:101,6
                    String[] split = Pattern.compile("[\t,]").split(s.substring(2));
                    try {
                        mapA.put(split[0],Integer.parseInt(split[1]));//101 6(itemId num)
                    }catch (Exception e){
                        e.printStackTrace();
                    }
                }else if (s.startsWith("B:")){//B:1,1
                    String[] split = Pattern.compile("[\t,]").split(s.substring(2));
                    try {
                        mapB.put(split[0],Integer.parseInt(split[1]));//1 1(userId score)
                    }catch (Exception e){
                        e.printStackTrace();
                    }
                }
            }
            double result=0;
            Iterator<String> iter = mapA.keySet().iterator();//Item-based co-occurrence matrix A
            while (iter.hasNext()){
                String mapkey = iter.next();//itemId 101
                int num = mapA.get(mapkey).intValue();//num 6
                Iterator<String> iterB = mapB.keySet().iterator();//Score Matrix B Based on User
                while (iterB.hasNext()){
                    String mapBkey = iterB.next();//userId 1
                    int socre = mapB.get(mapBkey).intValue();//score 1
                    result=num*socre;//Matrix Multiplication Multiplication 6*1=6
                    Text k = new Text(mapBkey);//userId 1
                    Text v = new Text(mapkey + "," + result);//101,6
                    context.write(k,v);//1  101,6
                }
            }
        }
    }
}

3.7AllScore

package com.htkj.recommend.mapreduce;

import com.htkj.recommend.StartRun;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.regex.Pattern;


public class AllScore {

    public static boolean run(Configuration config, Map<String,String> paths){
        try {
            FileSystem fs=FileSystem.get(config);
            Job job=Job.getInstance(config);
            job.setJobName("AllScore");
            job.setJarByClass(StartRun.class);
            job.setMapperClass(AllScoreMapper.class);
            job.setReducerClass(AllSCoreReduce.class);
            job.setMapOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            FileInputFormat.addInputPath(job,new Path(paths.get("AllScoreInput")));
            Path outpath=new Path(paths.get("AllScoreOutput"));
            if (fs.exists(outpath)){
                fs.delete(outpath,true);
            }
            FileOutputFormat.setOutputPath(job,outpath);
            boolean f=job.waitForCompletion(true);
            return f;
        } catch (Exception e) {
            e.printStackTrace();
        }
        return false;
    }
    static  class AllScoreMapper extends Mapper<LongWritable ,Text,Text,Text>{
        @Override
        protected  void  map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] split = Pattern.compile("[\t,]").split(value.toString());
            Text k = new Text(split[0]);
            Text v = new Text(split[1] + "," + split[2]);
            context.write(k,v);
        }
    }
    static  class AllSCoreReduce extends Reducer<Text,Text,Text,Text>{
        @Override
        protected  void reduce(Text key, Iterable<Text> values , Context context) throws IOException, InterruptedException {
            Map<String,Double> map=new HashMap<String, Double>();
            for (Text line : values) {
                String[] split = line.toString().split(",");
                String itemId = split[0];
                Double score = Double.parseDouble(split[1]);
                if (map.containsKey(itemId)){
                    map.put(itemId,map.get(itemId)+score);//Matrix Multiplication Sum Computation
                }else {
                    map.put(itemId,score);
                }
            }
            Iterator<String> iter=map.keySet().iterator();
            while (iter.hasNext()){
                String itemId = iter.next();
                Double score = map.get(itemId);
                Text v = new Text(itemId + "," + score);
                context.write(key,v);
            }

        }
    }
}

3.8ResultSort

package com.htkj.recommend.mapreduce;


import com.htkj.recommend.StartRun;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.Map;
import java.util.regex.Pattern;


public class ResultSort {
    private final static Text K = new Text();
    private final static Text V = new Text();
    public static boolean run(Configuration config, Map<String,String> paths){
        try {
            FileSystem fs=FileSystem.get(config);
            Job job = Job.getInstance(config);
            job.setJobName("ResultSort");
            job.setJarByClass(StartRun.class);
            job.setMapperClass(ResultSortMapper.class);
            job.setReducerClass(ResultSortRduce.class);
            job.setSortComparatorClass(SortNum.class);
            job.setGroupingComparatorClass(UserGroup.class);
            job.setMapOutputKeyClass(SortBean.class);
            job.setMapOutputValueClass(Text.class);
            FileInputFormat.addInputPath(job, new Path(paths.get("ResultSortInput")));
            Path outpath = new Path(paths.get("ResultSortOutput"));
            if (fs.exists(outpath)){
                fs.delete(outpath, true);
            }
            FileOutputFormat.setOutputPath(job, outpath);
            boolean f = job.waitForCompletion(true);
            return f;
        } catch (Exception e) {
            e.printStackTrace();
        }
        return false;
    }
    static class  SortBean implements WritableComparable<SortBean>{
        private String userId;
        private double num;
        public int compareTo(SortBean o) {
            int i = this.userId.compareTo(o.getUserId());
            if (i==0){
                return  Double.compare(this.num,o.getNum());
            }
            return i;
        }

        public void write(DataOutput dataOutput) throws IOException {
            dataOutput.writeUTF(userId);
            dataOutput.writeDouble(num);
        }

        public void readFields(DataInput dataInput) throws IOException {
            this.userId=dataInput.readUTF();
            this.num=dataInput.readDouble();
        }
        public String getUserId(){
            return userId;
        }
        public void setUserId(String userId){
            this.userId=userId;
        }
        public double getNum(){
            return num;
        }
        public void setNum(double num){
            this.num=num;
        }
    }
    static class UserGroup extends WritableComparator{
        public  UserGroup(){
            super(SortBean.class,true);
        }
        @Override
        public int compare(WritableComparable a, WritableComparable b ){
            SortBean o1=(SortBean)a;
            SortBean o2=(SortBean)b;
            return o1.getUserId().compareTo(o2.getUserId());
        }
    }
    static class SortNum extends WritableComparator{
        public SortNum(){
            super(SortBean.class,true);
        }
        @Override
        public int compare(WritableComparable a, WritableComparable b){
            SortBean o1=(SortBean)a;
            SortBean o2=(SortBean)b;
            int i = o1.getUserId().compareTo(o2.getUserId());
            if (i==0){
                return -Double.compare(o1.getNum(), o2.getNum());
            }
            return i;
        }
    }
    static class ResultSortMapper extends Mapper<LongWritable,Text, SortBean,Text>{
        @Override
        protected  void  map(LongWritable key, Text value , Context context) throws IOException, InterruptedException {
            String[] split = Pattern.compile("[\t,]").split(value.toString());
            String userId = split[0];
            String item = split[1];
            String num = split[2];
            SortBean k = new SortBean();
            k.setUserId(userId);
            k.setNum(Double.parseDouble(num));
            V.set(item+":"+num);
            context.write(k,V);
        }

    }
    static class ResultSortRduce extends Reducer<SortBean,Text,Text,Text>{
        @Override
        protected void reduce(SortBean key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            int i=0;
            StringBuffer stringBuffer = new StringBuffer();
            for (Text v : values) {
                if (i==10){break;}
                stringBuffer.append(v.toString()+",");
                i++;
            }
            K.set(key.getUserId());
            V.set(stringBuffer.toString());
            context.write(K,V);
        }
    }

}

4. Testing in Local Mode

4.1 Test Data Set 1

4.2 Output of each stage

4.2.1 Clean

4.2.2UserGrouping

4.2.3ItemCount

4.2.4Score

4.2.6AllScore

4.2.7ResultSort

5. Running under Linux

5.1 Put the project in jar package

Select the directory under src

After compiling, find the Jar package file, open it and find recommend.jar, and upload it to Linux after decompression.

Function

Command: hadoop jar recommend.jar

  • This command is followed by a parameter to write main, usually specifying the address of the file input, but here because the address has been set, so do not write.
  • Later, it may be possible to change the specified input address to be specified by the main method's parameters by setting a timed task and analyzing it regularly once a day.

6. About testing

  • How to judge the accuracy of the procedure?

Matrix can be computed manually through simple data sets to see if the results are consistent.

  • How to Know the Precision of Recommendation

Extreme data sets can be set up for testing, such as user 1 buying a product with a number of 100, and other users buying a product with a number of 103 at the same time.

See if there are 103 items in the final recommendation.

7. Notes

  • Collaborative filtering algorithm requires sparse matrix, that is to say, there is always an empty list of goods not bought.
  • It's also very understandable that users can't see all the goods, or do other behavioral operations, or what else do they recommend?

Topics: Big Data Hadoop Apache Java Linux