Mathematical modeling learning (39): dormitory allocation problem

Posted by rasherb on Fri, 17 Dec 2021 16:13:10 +0100

subject

40 students need to be assigned to 10 dormitories. We got these 40 students through the questionnaire
As shown in the figure below, the complete data is in excel:

How to divide dormitories can ensure that the number of common hobbies of students in different dormitories and the same dormitories are distributed as evenly as possible.

thinking

(1) How to understand the distribution of the number of common hobbies of students in the same dormitory between different dormitories? The distribution is as uniform as possible: the variance (standard deviation) is as small as possible!
Take each dormitory as a whole, and first calculate the number of common hobbies of the four students in the dormitory.
For example, the notation (x,y) indicates the number of common hobbies of X and y students. If there are four ABCD students in a dormitory, the number of common hobbies in this dormitory should be equal to: (A,B)+(A,C)+(A,D)+(B,C)+(B,D)+(C,D). Then, we calculate the number of common hobbies of four students in each dormitory in 10 bedrooms, record it in the tmp vector, and then calculate the variance or standard deviation of this vector, Our objective function is to make the variance or standard deviation of tmp as small as possible.
(2) How to preprocess data?
Step 1: delete the other item in the interest because it is too broad.
Step 2: regard each student's interest as a set (the elements in the set are mutually exclusive),
The common interests of any two students can be calculated by using the "intersection ∩" operation between sets
Calculate the number of elements in this set to get the common interests and love of any two students
Good count.
(3) What method is used to find the optimal partition scheme?
We can consider using simulated annealing algorithm here. 40 students were replaced by 1-40.
Define the initial solution: a random arrangement of 1-40
For example: 39, 5, 7, 16, 21, 3, 31, 17,..., 9, 21, 19, 2. Every four locations are divided into a dormitory.
Objective function: the standard deviation of the number of common hobbies in 10 bedrooms should be as small as possible
The way to generate new solutions: similar to the traveling salesman problem (TSP)

(4) Is there anything else that can be improved?
Suppose that in scheme 1, the number of common hobbies in each bedroom is equal to 10; In scheme 2, the number of common hobbies in each bedroom is equal to 12, Then, the objective functions calculated by scheme 1 and scheme 2 are equal to 0 (the same elements in a group mean no fluctuation, and the standard deviation is 0), but it is obvious that we prefer scheme 2. Therefore, while ensuring that the distribution of the number of common hobbies of students in each dormitory is as uniform as possible, the more the number of common hobbies in the dormitory, the better. (multi-objective problem).
Define a new objective function: (std(tmp) + 1) /max(tmp). The smaller the objective function, the better

code

python version:

import numpy as np
import pandas as pd
df = pd.read_csv(r'C:\Users\sxjmqf\Desktop\data.csv')
data = df.iloc[:,1].tolist()
data = [set(i.replace('other','').strip(',').split(',')) for i in data]
result = [len(i & j) for i in data for j in data]
result = np.array(result).reshape(len(data),-1)
np.savetxt(r'C:\Users\sxjmqf\Desktop\mydata.csv', result, delimiter=',')

matlab code is too troublesome. It is not recommended, but let's show it:
calculate_ans.m

function [ans,tmp] = calculate_ans(path,mydata)
    n = length(path);  % Number of students
    nums = n  / 4;   % Number of dormitories
    path = reshape(path,4,nums);  % Every four people are assigned to a bedroom
    tmp = zeros(1,nums);  % tmp Used to keep the number of common hobbies in each bedroom
    for i = 1 : nums
        tmp(i) = get_total_hobby(path(:,i),mydata);  % Call the sub function to calculate the number of common hobbies of four students in each dormitory separately
    end
%     ans = std(tmp);   % Calculate standard deviation
    ans = (std(tmp) + 1)  / max(tmp);
%     ans = (std(tmp) + 1)  / mean(tmp);
end

gen_new_path.m (it can be imagined that matlab processing files is really a little old)

function path1 = gen_new_path(path0)
    % path0: Original path
    n = length(path0);
    % Two methods of generating new paths are randomly selected
    p1 = 0.33;  % The probability of generating a new path using the exchange method
    p2 = 0.33;  % The probability of generating a new path using the shift method
    r = rand(1); % Randomly generate a[0 1]Uniformly distributed random numbers between
    if  r< p1    % Generate new path using exchange method 
        c1 = randi(n);   % Generate 1-n A random integer in
        c2 = randi(n);   % Generate 1-n A random integer in
        path1 = path0;  % take path0 Value assigned to path1
        path1(c1) = path0(c2);  %change path1 The first c1 The element at position is path0 The first c2 Elements in multiple locations
        path1(c2) = path0(c1);  %change path1 The first c2 The element at position is path0 The first c1 Elements in multiple locations
    elseif r < p1+p2 % Generate new path using shift method
        c1 = randi(n);   % Generate 1-n A random integer in
        c2 = randi(n);   % Generate 1-n A random integer in
        c3 = randi(n);   % Generate 1-n A random integer in
        sort_c = sort([c1 c2 c3]);  % yes c1 c2 c3 Sort from small to large
        c1 = sort_c(1);  c2 = sort_c(2);  c3 = sort_c(3);  % c1 < = c2 <=  c3
        tem1 = path0(1:c1-1);
        tem2 = path0(c1:c2);
        tem3 = path0(c2+1:c3);
        tem4 = path0(c3+1:end);
        path1 = [tem1 tem3 tem2 tem4];
    else  % Generate new path using inversion method
        c1 = randi(n);   % Generate 1-n A random integer in
        c2 = randi(n);   % Generate 1-n A random integer in
        if c1>c2  % If c1 than c2 Big, exchange c1 and c2 Value of
            tem = c2;
            c2 = c1;
            c1 = tem;
        end
        tem1 = path0(1:c1-1);
        tem2 = path0(c1:c2);
        tem3 = path0(c2+1:end);
        path1 = [tem1 fliplr(tem2) tem3];   %Left and right symmetric flip of matrix fliplr,Flip up and down symmetrically flipud
    end
end

get_total_hobby.m

function hobby = get_total_hobby(path,mydata)
% path: Number of four students in the same dormitory
% mydata: Number of common hobbies between each two students
    hobby = 0;
    for i = 1 : 4
        for j = 1:i
            if i ~= j
                hobby = hobby+mydata(path(i), path(j));
            end
        end
    end
end

Main function: main m

clear;clc
tic
rng('shuffle')  % Controls the generation of random numbers, otherwise it is opened every time matlab The results are the same
% https://ww2.mathworks.cn/help/matlab/math/why-do-random-numbers-repeat-after-startup.html
% https://ww2.mathworks.cn/help/matlab/ref/rng.html
n = 40;  % How many students are there
mydata = csvread('C:\Users\sxjmqf\Desktop\mydata.csv');

%% Parameter initialization
T0 = 1000;   % initial temperature 
T = T0; % The temperature will change during the iteration. The temperature is zero at the first iteration T0
maxgen = 500;  % Maximum number of iterations
Lk = 500;  % Number of iterations per temperature
alpfa = 0.95;  % Temperature attenuation coefficient

%%  Randomly generate an initial solution
path0 = randperm(n);
result0 = calculate_ans(path0,mydata);  % Calculate the standard deviation corresponding to this set of solutions

%% Define some quantities for saving intermediate processes to facilitate output results and drawing
min_result = result0;    
RESULT = zeros(maxgen,1); % Record the results found at the end of each outer cycle min_result ((convenient for drawing)

%% Simulated annealing process
for iter = 1 : maxgen  % External circulation, What I use here is to specify the maximum number of iterations
    for i = 1 : Lk  %  Internal circulation, starting iteration at each temperature
        path1 = gen_new_path(path0);  
        result1 = calculate_ans(path1,mydata); 
        if result1 < result0    
            path0 = path1; 
            result0 = result1; 
        else
            p = exp(-(result1 - result0)/T); % according to Metropolis The criterion calculates a probability
            if rand(1) < p   % Generate a random number and compare it with this probability. If the random number is less than this probability
                path0 = path1;  
                result0 = result1; 
            end
        end
        % Determine whether to update the best solution found
        if result0 < min_result  % If the current solution is better, update it
            min_result = result0;  
            best_path = path0;  
        end
    end
    RESULT(iter) = min_result; % Save the smallest result found after the end of this external loop
    T = alpfa*T;   % Temperature drop       
end


disp('The best solution is:'); disp(reshape(best_path,4,n/4))
disp('At this time, the optimal value is:'); disp(min_result)


%% Draw the figure found after each iteration
figure
plot(1:maxgen,RESULT,'b-');
xlabel('Number of iterations');


[~,tmp] = calculate_ans(best_path,mydata)  % ahead~Indicates that the first returned result is not required
toc

That's all) matlab is different when we use it together. You can talk to me in private if you need complete documents.

Topics: R Language Algorithm linear algebra