PythonのPuLPでクラス割り当て問題を解いてみた

今回はPythonで数理モデルを解いてみたいと思います。

今まで機械学習やディープラーニングばかりやっていましたが、現実問題を解くのに数理モデルという方法もあるとどこかの企業様のインタビュー記事を読んだ時に学びました。
(再検索してみたが、該当の記事は見つかりませんでした。ただ印象にはすごく残っています。)

数理モデルと機械学習モデルの違いとは
数理モデルとは何なのか
PuLPとは何なのか
クラスの割り当て問題(class assignment problem)を解いてみる
まとめ

数理モデルと機械学習モデルの違いとは

数理モデルと機械学習モデルはそれぞれ異なるアプローチで問題をモデリングおよび解決する手法です。

数理モデルは手動で定義される数学的な式や制約に基づくモデルで、問題の特性や制約を明示的に表現するために自分自身で数学的な式や制約を構築します。

問題を厳密な数学的な形で表現し解析的な手法や最適化アルゴリズムが使用されることが一般的です。

一方で機械学習モデルはデータセットから学習することで未知のデータに対して予測や分類を行います。

統計的手法やアルゴリズムを使用してデータの特徴やパターンを抽出し、そのパターンを用いて未知のデータに対する予測を行います。

機械学習はデータ駆動型のアプローチであり、問題が不確かである場合や複雑なパターンがある場合に特に有効です。
(数理モデルはデータ駆動型ではなく、主に問題の理論的な構造に基づいています)

数理モデルは主に最適化、制約充足、シミュレーションなどの問題に利用されていて、機械学習は主に予測、分類、クラスタリング、異常検知などの問題に活用されています。

数理モデルと機械学習モデルは問題の性質や要件に応じて使い分けられ、共存することもあります。

例えば、損失関数を最小化するようにモデルを訓練する場合など機械学習も最適化問題の一形態と見なすことができます。

数理モデルとは何なのか

数理モデルは、現実世界の問題やシステムを数学的に表現し問題の理解や解決に役立てる手法です。

数理モデルは、最適化問題、シミュレーション、制約充足問題など、様々な問題に適用されます。

問題の抽象化と形式化

数理モデルは、複雑な現実の問題を数学的な概念や式に変換し、問題の本質を捉えるための手法です。

これにより、問題を解析しやすい形に抽象化しています。

変数、制約、目的関数

数理モデルでは、問題に含まれる要素を変数、制約、目的関数などの数学的な構成要素で表現します。

変数は問題における未知の量を表し、制約は問題の条件を表現します。

目的関数は最適化問題において最大化または最小化したい目標を表します。

最適化問題への応用

数理モデルは特に最適化問題に広く使用されます。

最適化問題では、目的関数を最大化または最小化する変数の値を見つけることが目標です。

例えば、生産計画、物流最適化、資源配分などの様々な問題が最適化問題としてモデリングされます。

数理モデルは、問題解決のための強力なツールであり、現実の複雑な状況や意思決定をサポートするために広く活用されています。

数理モデルの代表例

代表的なものとして巡回セールスマン問題による経路最適化や待ち行列理論による渋滞現象のモデル化などがあります。

確かに何気なく知識として持っていたものが数理モデルだったのだと理解することで知識の整理が出来ました。

PuLPとは何なのか

さて、数理モデルを何となく理解が出来たので数理モデルを記述することができるPuLPというPythonのライブラリをインストールして使ってみようと思います。

PuLPの基本的な使用例は、制約条件と目的関数を定義し最適化問題を解くことです。

さまざまな外部の線形計画法ソルバー（CBC、GLPK、CPLEX、Gurobiなど）のいずれかを呼び出してこのモデルを解決し、Pythonコマンドを使用して解決策を操作および表示することができます。

PuLP is a free open source software written in Python. It is used to describe optimisation problems as mathematical models. PuLP can then call any of numerous external LP solvers (CBC, GLPK, CPLEX, Gurobi etc) to solve this model and then use python commands to manipulate and display the solution.
引用:https://coin-or.github.io/pulp/main/installing_pulp_at_home.html#installing-pulp-at-home

線形計画法(Linear Programming)は、要件が線形な関係で表される数学モデルにおいて、最良の結果（最大利益や最小コストなど）を達成するための手法です。

線形計画法ソルバーは、線形計画法の問題を解決し最適な解を見つけるために設計されたツールまたはアルゴリズムです。

与えられた線形制約と目的関数に基づいて実行可能な解の空間を効率的に探索し、最良の解を特定するための計算します。

クラスの割り当て問題(class assignment problem)を解いてみる

学校の先生にとって同じ学年のクラスで学生をどう割り振るかは毎年悩みの種だと思います。

様々な制約がある中で最適解を作るのに数理モデルが役に立ちそうですね。

今回は5人の生徒を3つのクラスに割り当ててみようと思います。

制約は下記です。

それぞれの生徒は1つのクラスにしか割り当てられない
1つのクラスには最大2名までしか割り当てられない
Student1とStudent3は一緒のクラスには割り当てない

satisfaction_valuesはそれぞれの生徒がどのクラス(担任)に合うのか事前に点数をつけたものです。

5点が最も合う、3点が普通、1点が合わないという定義で点数をつけています。

クラス割り当て問題を解く

from pulp import LpMaximize, LpProblem, LpVariable, lpSum

# Create a problem variable
prob = LpProblem("ClassAssignmentProblem", LpMaximize)

# Define sets and parameters
students = ["Student1", "Student2", "Student3", "Student4", "Student5"]
classes = ["ClassA", "ClassB","ClassC"]

# Define decision variables
x = LpVariable.dicts("Assign", (students, classes), cat="Binary")

# Define objective function
satisfaction_values = {
    ("Student1", "ClassA"): 5,
    ("Student1", "ClassB"): 3,
    ("Student1", "ClassC"): 1,
    ("Student2", "ClassA"): 3,
    ("Student2", "ClassB"): 5,
    ("Student2", "ClassC"): 1,
    ("Student3", "ClassA"): 5,
    ("Student3", "ClassB"): 3,
    ("Student3", "ClassC"): 1,
    ("Student4", "ClassA"): 1,
    ("Student4", "ClassB"): 3,
    ("Student4", "ClassC"): 5,
    ("Student5", "ClassA"): 1,
    ("Student5", "ClassB"): 3,
    ("Student5", "ClassC"): 5
}

prob += lpSum(x[student][clazz] * satisfaction_values[(student, clazz)] for student in students for clazz in classes)

# Define constraints: each student must be assigned to exactly one class
for student in students:
    prob += lpSum(x[student][clazz] for clazz in classes) == 1

# Define constraints: each class must have at most two students
for clazz in classes:
    prob += lpSum(x[student][clazz] for student in students) <= 2

# Add constraint: Student1 and Student3 should not be in the same class
for clazz in classes:
    prob += x["Student1"][clazz] + x["Student3"][clazz] <= 1

# Solve the problem
prob.solve()

# Print the results
print("Status:", prob.status)

# Print the optimal assignment
for student in students:
    for clazz in classes:
        if x[student][clazz].value() == 1:
            print(f"{student} is assigned to {clazz} with satisfaction {satisfaction_values[(student, clazz)]}")

Out[0]

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/hinomaruc/Desktop/blog/my-venv/lib/python3.8/site-packages/pulp/solverdir/cbc/osx/64/cbc /var/folders/n0/ft3172td6v70d6klgt216_3r0000gn/T/6e7edef65a5a4a4c9add68f517b50aae-pulp.mps max timeMode elapsed branch printingOptions all solution /var/folders/n0/ft3172td6v70d6klgt216_3r0000gn/T/6e7edef65a5a4a4c9add68f517b50aae-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 16 COLUMNS
At line 98 RHS
At line 110 BOUNDS
At line 126 ENDATA
Problem MODEL has 11 rows, 15 columns and 36 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 23 - 0.00 seconds
Cgl0004I processed model has 11 rows, 15 columns (15 integer (15 of which binary)) and 36 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of -23
Cbc0038I Before mini branch and bound, 15 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.00 seconds)
Cbc0038I After 0.00 seconds - Feasibility pump exiting with objective of -23 - took 0.00 seconds
Cbc0012I Integer solution of -23 found by feasibility pump after 0 iterations and 0 nodes (0.00 seconds)
Cbc0001I Search completed - best objective -23, took 0 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from -23 to -23
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
ZeroHalf was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                23.00000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.00
Time (Wallclock seconds):       0.00

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.01

Status: 1
Student1 is assigned to ClassB with satisfaction 3
Student2 is assigned to ClassB with satisfaction 5
Student3 is assigned to ClassA with satisfaction 5
Student4 is assigned to ClassC with satisfaction 5
Student5 is assigned to ClassC with satisfaction 5

Result - Optimal solution foundという結果と共に、それぞれの生徒がどのクラスにどれほどの合致度で選択されたか出力されます。

Status: 1
Student1 is assigned to ClassB with satisfaction 3
Student2 is assigned to ClassB with satisfaction 5
Student3 is assigned to ClassA with satisfaction 5
Student4 is assigned to ClassC with satisfaction 5
Student5 is assigned to ClassC with satisfaction 5

Student1以外は一番良い結果になりましたね。

Student1とStudent3も違うクラスになったので問題なさそうです。

次はちょっとお試しで、クラス数を2つにすることで制約を満たすことが出来なくなるとどうなるか見てみます。

classesからClassCを除外しただけです。

解がないクラス割り当て問題

# ClassCを除外しておき、再実行
classes = ["ClassA", "ClassB"]

Out[0]

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/hinomaruc/Desktop/blog/my-venv/lib/python3.8/site-packages/pulp/solverdir/cbc/osx/64/cbc /var/folders/n0/ft3172td6v70d6klgt216_3r0000gn/T/f00f24bd6c744f7e81c651b7f9b05595-pulp.mps max timeMode elapsed branch printingOptions all solution /var/folders/n0/ft3172td6v70d6klgt216_3r0000gn/T/f00f24bd6c744f7e81c651b7f9b05595-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 14 COLUMNS
At line 69 RHS
At line 79 BOUNDS
At line 90 ENDATA
Problem MODEL has 9 rows, 10 columns and 24 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Problem is infeasible - 0.00 seconds
Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.00

Status: -1
Student1 is assigned to ClassA with satisfaction 5
Student2 is assigned to ClassA with satisfaction 3
Student3 is assigned to ClassB with satisfaction 3
Student4 is assigned to ClassA with satisfaction 1
Student5 is assigned to ClassB with satisfaction 3

Problem is infeasible - 0.00 seconds
Status: -1

という結果になってしまいました。

どの制約を満たすことが出来ないのか表示されたらなと思いますが、ぱっと見なさそうですね。