自分のキャリアをあれこれ考えながら、Pythonで様々なデータを分析していくブログです

(その2-3) Heart Disease(Cleveland)のデータセットで心臓病かどうかをAutoMLで予測してみた。

Data Analytics
Data Analytics

前回はニューラルネットワークを使ってモデリングをしました。

今回はAutoMLを使ってモデリングをしたいと思います。

AutoMLに慣れてしまうとインデータさえきちんとしていれば簡単に精度の出せるモデルが作成出来てしまうので、少し戸惑ってしまいますね 笑

今回は3つのAutoMLを使います。

・mljar
・AutoGluon
・auto-sklearn

モデリング用のデータ作成は「モデリング用データの準備」にコードがありますので参照ください。

以降、学習用データとテスト用データの「X_train、y_train、X_test、y_test」の4変数を作成してあると見做して分析を進めます。

スポンサーリンク

mljarでモデリング

mljarをCompeteモードで実行する
from supervised.automl import AutoML
automl = AutoML(mode="Compete", random_state=100)

# fitで学習させる
automl.fit(X_train,y_train)
Out[0]
AutoML directory: AutoML_1
The task is binary_classification with evaluation metric logloss
AutoML will use algorithms: ['Decision Tree', 'Linear', 'Random Forest', 'Extra Trees', 'LightGBM', 'Xgboost', 'CatBoost', 'Neural Network', 'Nearest Neighbors']
AutoML will stack models
AutoML will ensemble available models
AutoML steps: ['adjust_validation', 'simple_algorithms', 'default_algorithms', 'not_so_random', 'golden_features', 'kmeans_features', 'insert_random_feature', 'features_selection', 'hill_climbing_1', 'hill_climbing_2', 'boost_on_errors', 'ensemble', 'stack', 'ensemble_stacked']
* Step adjust_validation will try to check up to 1 model
1_DecisionTree logloss 2.02506 trained in 0.97 seconds
Adjust validation. Remove: 1_DecisionTree
*** Disable stacking for small dataset (nrows < 500)
Validation strategy: 10-fold CV Shuffle,Stratify
* Step simple_algorithms will try to check up to 4 models
1_DecisionTree logloss 1.177056 trained in 3.0 seconds
2_DecisionTree logloss 0.701035 trained in 2.67 seconds
3_DecisionTree logloss 0.53853 trained in 3.58 seconds
4_Linear logloss 0.390309 trained in 7.7 seconds
* Step default_algorithms will try to check up to 7 models
5_Default_LightGBM logloss 0.433876 trained in 20.82 seconds
6_Default_Xgboost logloss 0.451812 trained in 5.31 seconds
7_Default_CatBoost logloss 0.404469 trained in 6.43 seconds
8_Default_NeuralNetwork logloss 0.498901 trained in 6.26 seconds
9_Default_RandomForest logloss 0.424915 trained in 13.36 seconds
10_Default_ExtraTrees logloss 0.426999 trained in 11.49 seconds
11_Default_NearestNeighbors logloss 1.001154 trained in 5.8 seconds
* Step not_so_random will try to check up to 61 models
21_LightGBM logloss 0.407261 trained in 5.82 seconds
12_Xgboost logloss 0.374154 trained in 9.48 seconds
30_CatBoost logloss 0.38979 trained in 7.77 seconds
39_RandomForest logloss 0.42454 trained in 16.44 seconds
48_ExtraTrees logloss 0.422043 trained in 15.21 seconds
57_NeuralNetwork logloss 0.508284 trained in 7.92 seconds
66_NearestNeighbors logloss 0.713205 trained in 6.84 seconds
22_LightGBM logloss 0.378577 trained in 9.72 seconds
13_Xgboost logloss 0.405763 trained in 8.78 seconds
31_CatBoost logloss 0.408574 trained in 12.41 seconds
40_RandomForest logloss 0.397487 trained in 12.54 seconds
49_ExtraTrees logloss 0.399412 trained in 11.95 seconds
58_NeuralNetwork logloss 0.499919 trained in 9.13 seconds
67_NearestNeighbors logloss 0.714694 trained in 7.49 seconds
23_LightGBM logloss 0.469583 trained in 8.28 seconds
14_Xgboost logloss 0.438485 trained in 9.95 seconds
32_CatBoost logloss 0.427753 trained in 21.26 seconds
41_RandomForest logloss 0.424355 trained in 19.32 seconds
50_ExtraTrees logloss 0.43091 trained in 23.07 seconds
59_NeuralNetwork logloss 0.620012 trained in 9.87 seconds
68_NearestNeighbors logloss 1.162394 trained in 8.58 seconds
24_LightGBM logloss 0.420758 trained in 9.82 seconds
15_Xgboost logloss 0.414963 trained in 11.08 seconds
33_CatBoost logloss 0.402595 trained in 11.54 seconds
42_RandomForest logloss 0.422792 trained in 17.52 seconds
51_ExtraTrees logloss 0.438129 trained in 15.44 seconds
60_NeuralNetwork logloss 0.714978 trained in 11.14 seconds
69_NearestNeighbors logloss 1.162394 trained in 10.22 seconds
25_LightGBM logloss 0.407529 trained in 10.94 seconds
16_Xgboost logloss 0.690224 trained in 11.63 seconds
34_CatBoost logloss 0.423424 trained in 29.71 seconds
43_RandomForest logloss 0.411063 trained in 18.97 seconds
52_ExtraTrees logloss 0.446924 trained in 23.67 seconds
61_NeuralNetwork logloss 0.819889 trained in 14.32 seconds
70_NearestNeighbors logloss 0.714694 trained in 15.81 seconds
26_LightGBM logloss 0.378221 trained in 13.95 seconds
17_Xgboost logloss 0.693147 trained in 13.63 seconds
35_CatBoost logloss 0.408431 trained in 18.64 seconds
44_RandomForest logloss 0.428817 trained in 25.81 seconds
53_ExtraTrees logloss 0.468533 trained in 26.32 seconds
62_NeuralNetwork logloss 0.444628 trained in 15.5 seconds
71_NearestNeighbors logloss 1.165557 trained in 13.86 seconds
27_LightGBM logloss 0.402419 trained in 15.58 seconds
18_Xgboost logloss 0.693147 trained in 15.26 seconds
36_CatBoost logloss 0.415805 trained in 18.93 seconds
45_RandomForest logloss 0.430171 trained in 20.91 seconds
54_ExtraTrees logloss 0.459077 trained in 18.89 seconds
63_NeuralNetwork logloss 0.562846 trained in 15.65 seconds
72_NearestNeighbors logloss 1.162394 trained in 14.11 seconds
28_LightGBM logloss 0.411542 trained in 14.82 seconds
19_Xgboost logloss 0.384715 trained in 16.08 seconds
37_CatBoost logloss 0.389346 trained in 15.76 seconds
46_RandomForest logloss 0.424571 trained in 22.12 seconds
55_ExtraTrees logloss 0.469758 trained in 23.9 seconds
64_NeuralNetwork logloss 0.760196 trained in 16.17 seconds
29_LightGBM logloss 0.394738 trained in 15.96 seconds
20_Xgboost logloss 0.693147 trained in 15.5 seconds
38_CatBoost logloss 0.404094 trained in 18.1 seconds
47_RandomForest logloss 0.424042 trained in 22.77 seconds
56_ExtraTrees logloss 0.443431 trained in 22.56 seconds
65_NeuralNetwork logloss 0.506584 trained in 17.8 seconds
* Step golden_features will try to check up to 3 models
None 10
Add Golden Feature: thal_7.0_sum_cp_4.0
Add Golden Feature: cp_4.0_sum_ca
Add Golden Feature: thal_7.0_sum_slope_2.0
Add Golden Feature: slope_2.0_sum_cp_4.0
Add Golden Feature: thal_7.0_ratio_slope_2.0
Add Golden Feature: slope_2.0_ratio_thal_7.0
Add Golden Feature: thal_7.0_multiply_slope_2.0
Add Golden Feature: thal_7.0_sum_thal_6.0
Add Golden Feature: slope_2.0_sum_exang
Add Golden Feature: cp_3.0_diff_slope_2.0
Created 10 Golden Features in 9.24 seconds.
12_Xgboost_GoldenFeatures logloss 0.385574 trained in 27.6 seconds
26_LightGBM_GoldenFeatures logloss 0.3848 trained in 17.88 seconds
22_LightGBM_GoldenFeatures logloss 0.387534 trained in 17.93 seconds
* Step kmeans_features will try to check up to 3 models
12_Xgboost_KMeansFeatures logloss 0.404483 trained in 20.09 seconds
26_LightGBM_KMeansFeatures logloss 0.398293 trained in 19.17 seconds
22_LightGBM_KMeansFeatures logloss 0.39724 trained in 19.56 seconds
* Step insert_random_feature will try to check up to 1 model
12_Xgboost_RandomFeature logloss 0.380384 trained in 21.54 seconds
Drop features ['chol', 'thal_6.0', 'slope_3.0', 'restecg_1.0', 'cp_3.0', 'fbs', 'cp_2.0', 'trestbps', 'random_feature', 'thalach']
* Step features_selection will try to check up to 6 models
12_Xgboost_SelectedFeatures logloss 0.366608 trained in 19.46 seconds
26_LightGBM_SelectedFeatures logloss 0.358819 trained in 18.94 seconds
37_CatBoost_SelectedFeatures logloss 0.371678 trained in 20.21 seconds
40_RandomForest_SelectedFeatures logloss 0.406463 trained in 24.95 seconds
49_ExtraTrees_SelectedFeatures logloss 0.398534 trained in 26.73 seconds
62_NeuralNetwork_SelectedFeatures logloss 0.42059 trained in 20.64 seconds
* Step hill_climbing_1 will try to check up to 25 models
73_LightGBM_SelectedFeatures logloss 0.359481 trained in 20.38 seconds
74_LightGBM_SelectedFeatures logloss 0.357452 trained in 20.12 seconds
75_Xgboost_SelectedFeatures logloss 0.366608 trained in 21.15 seconds
76_CatBoost_SelectedFeatures logloss 0.379173 trained in 22.7 seconds
77_Xgboost logloss 0.374154 trained in 22.12 seconds
78_LightGBM logloss 0.378577 trained in 22.85 seconds
79_LightGBM logloss 0.374824 trained in 22.25 seconds
80_LightGBM logloss 0.378221 trained in 21.11 seconds
81_Xgboost logloss 0.384715 trained in 31.71 seconds
82_CatBoost logloss 0.406751 trained in 25.02 seconds
83_RandomForest logloss 0.409271 trained in 31.82 seconds
84_ExtraTrees_SelectedFeatures logloss 0.397793 trained in 31.39 seconds
85_ExtraTrees logloss 0.419875 trained in 36.41 seconds
86_RandomForest_SelectedFeatures logloss 0.39693 trained in 27.03 seconds
87_RandomForest logloss 0.413053 trained in 27.54 seconds
88_RandomForest logloss 0.412969 trained in 27.45 seconds
89_NeuralNetwork_SelectedFeatures logloss 0.548245 trained in 23.17 seconds
90_ExtraTrees logloss 0.429518 trained in 28.52 seconds
91_NeuralNetwork logloss 0.429078 trained in 23.99 seconds
92_NeuralNetwork logloss 0.482681 trained in 24.22 seconds
93_NeuralNetwork logloss 0.453291 trained in 23.93 seconds
94_DecisionTree logloss 0.575294 trained in 23.18 seconds
95_DecisionTree logloss 0.535709 trained in 23.17 seconds
96_DecisionTree logloss 1.543826 trained in 23.22 seconds
97_DecisionTree logloss 0.587037 trained in 23.41 seconds
* Step hill_climbing_2 will try to check up to 30 models
98_LightGBM_SelectedFeatures logloss 0.357452 trained in 24.23 seconds
99_LightGBM_SelectedFeatures logloss 0.358819 trained in 24.8 seconds
100_LightGBM_SelectedFeatures logloss 0.359481 trained in 25.31 seconds
101_Xgboost_SelectedFeatures logloss 0.381465 trained in 25.55 seconds
102_Xgboost_SelectedFeatures logloss 0.361988 trained in 25.85 seconds
103_Xgboost_SelectedFeatures logloss 0.381465 trained in 26.03 seconds
104_Xgboost_SelectedFeatures logloss 0.361988 trained in 26.39 seconds
105_CatBoost_SelectedFeatures logloss 0.370163 trained in 25.67 seconds
106_Xgboost logloss 0.385467 trained in 26.45 seconds
107_Xgboost logloss 0.379588 trained in 26.93 seconds
108_CatBoost_SelectedFeatures logloss 0.377549 trained in 26.41 seconds
109_CatBoost logloss 0.393326 trained in 26.61 seconds
110_RandomForest_SelectedFeatures logloss 0.390588 trained in 31.92 seconds
111_RandomForest_SelectedFeatures logloss 0.395898 trained in 32.28 seconds
112_RandomForest logloss 0.40556 trained in 31.96 seconds
113_RandomForest logloss 0.39588 trained in 32.23 seconds
114_ExtraTrees_SelectedFeatures logloss 0.396858 trained in 31.19 seconds
115_ExtraTrees_SelectedFeatures logloss 0.398634 trained in 31.75 seconds
116_ExtraTrees_SelectedFeatures logloss 0.397028 trained in 32.86 seconds
117_ExtraTrees_SelectedFeatures logloss 0.401296 trained in 34.35 seconds
118_ExtraTrees logloss 0.394118 trained in 32.23 seconds
119_ExtraTrees logloss 0.405253 trained in 58.66 seconds
120_RandomForest_SelectedFeatures logloss 0.402852 trained in 64.98 seconds
121_RandomForest_SelectedFeatures logloss 0.40528 trained in 47.67 seconds
122_NeuralNetwork_SelectedFeatures logloss 0.461389 trained in 35.11 seconds
123_NeuralNetwork_SelectedFeatures logloss 0.49609 trained in 52.19 seconds
124_NeuralNetwork logloss 0.576635 trained in 52.01 seconds
125_NeuralNetwork logloss 0.600478 trained in 52.65 seconds
126_NeuralNetwork logloss 0.641598 trained in 52.85 seconds
127_NeuralNetwork logloss 0.466681 trained in 53.34 seconds
* Step boost_on_errors will try to check up to 1 model
98_LightGBM_SelectedFeatures_BoostOnErrors logloss 0.366409 trained in 51.7 seconds
* Step ensemble will try to check up to 1 model
Ensemble logloss 0.355239 trained in 201.76 seconds
AutoML fit time: 3618.58 seconds
AutoML best model: Ensemble

3618秒かかりました。

# 学習データとテストデータへの当てはまりを確認
from sklearn.metrics import accuracy_score
y_train_pred = automl.predict(X_train)
y_test_pred = automl.predict(X_test)
print("train",accuracy_score(y_train, y_train_pred))
print("test",accuracy_score(y_test, y_test_pred))
Out[0]
train 0.871900826446281
test 0.9180327868852459

UserWarning: X has feature names, but StandardScaler was fitted without feature namesが出ましたがとりあえず気にしないことにします 笑

精度はロジスティク回帰やニューラルネットワークより良いですね。

スポンサーリンク

AutoGluonでモデリング (デフォルト設定)

AutoGluonは目的変数も含めたデータをモデルに渡してあげる必要があるので、X_train,y_train,X_test,y_testではなく、train変数とtest変数をそのまま利用します。

eval_metricはAutoGluonのbinary問題ではデフォルトだとaccuracyになるようです。mljarはlog_lossだったみたいなので合わせても良いのですがここはデフォルト設定のままにしようと思います。

If eval_metric = None, it is automatically chosen based on problem_type
引用: https://auto.gluon.ai/stable/api/autogluon.tabular.TabularPredictor.html

「デフォルトの設定のまま実行するパターン」と「AutoMLPipelineFeatureGeneratorや実行時間をmljarの実行時間と合わせるパターン」の2パターン試してみようと思います。

# https://auto.gluon.ai/stable/api/autogluon.tabular.TabularPredictor.html
# autogluonのモデル作成
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label="target", problem_type="binary",path="RESULT_AUTOGLUON",eval_metric=None).fit(train)
Out[0]
/Users/hinomaruc/miniforge3/envs/conda-autogluon/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Beginning AutoGluon training ...
AutoGluon will save models to "RESULT_AUTOGLUON/"
AutoGluon Version:  0.8.2
Python Version:     3.8.17
Operating System:   Darwin
Platform Machine:   x86_64
Platform Version:   Darwin Kernel Version 19.6.0: Tue Jun 21 21:18:39 PDT 2022; root:xnu-6153.141.66~1/RELEASE_X86_64
Disk Space Avail:   128.68 GB / 239.85 GB (53.7%)
Train Data Rows:    242
Train Data Columns: 18
Label Column: target
Preprocessing data ...
Selected class <--> label mapping:  class 1 = 1, class 0 = 0
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
    Available Memory:                    11508.48 MB
    Train Data (Original)  Memory Usage: 0.03 MB (0.0% of available memory)
    Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    Stage 1 Generators:
        Fitting AsTypeFeatureGenerator...
            Note: Converting 12 features to boolean dtype as they only contain 2 unique values.
    Stage 2 Generators:
        Fitting FillNaFeatureGenerator...
    Stage 3 Generators:
        Fitting IdentityFeatureGenerator...
    Stage 4 Generators:
        Fitting DropUniqueFeatureGenerator...
    Stage 5 Generators:
        Fitting DropDuplicatesFeatureGenerator...
    Types of features in original data (raw dtype, special dtypes):
        ('float', []) : 18 | ['age', 'sex', 'trestbps', 'chol', 'fbs', ...]
    Types of features in processed data (raw dtype, special dtypes):
        ('float', [])     :  6 | ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', ...]
        ('int', ['bool']) : 12 | ['sex', 'fbs', 'exang', 'cp_2.0', 'cp_3.0', ...]
    0.2s = Fit runtime
    18 features in original data used to generate 18 features in processed data.
    Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.27s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 193, Val Rows: 49
User-specified model hyperparameters to be fit:
{
    'NN_TORCH': {},
    'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
    'CAT': {},
    'XGB': {},
    'FASTAI': {},
    'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
    'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
    'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
Fitting 13 L1 models ...
Fitting model: KNeighborsUnif ...
    0.5714   = Validation score   (accuracy)
    4.82s    = Training   runtime
    0.03s    = Validation runtime
Fitting model: KNeighborsDist ...
    0.551    = Validation score   (accuracy)
    0.02s    = Training   runtime
    0.02s    = Validation runtime
Fitting model: LightGBMXT ...
    0.8163   = Validation score   (accuracy)
    1.09s    = Training   runtime
    0.0s     = Validation runtime
Fitting model: LightGBM ...
    0.7755   = Validation score   (accuracy)
    0.18s    = Training   runtime
    0.0s     = Validation runtime
Fitting model: RandomForestGini ...
    0.7143   = Validation score   (accuracy)
    0.81s    = Training   runtime
    0.06s    = Validation runtime
Fitting model: RandomForestEntr ...
    0.7143   = Validation score   (accuracy)
    0.67s    = Training   runtime
    0.06s    = Validation runtime
Fitting model: CatBoost ...
    0.7755   = Validation score   (accuracy)
    0.64s    = Training   runtime
    0.0s     = Validation runtime
Fitting model: ExtraTreesGini ...
    0.7755   = Validation score   (accuracy)
    0.69s    = Training   runtime
    0.06s    = Validation runtime
Fitting model: ExtraTreesEntr ...
    0.7755   = Validation score   (accuracy)
    0.76s    = Training   runtime
    0.06s    = Validation runtime
Fitting model: NeuralNetFastAI ...
No improvement since epoch 1: early stopping
    0.8367   = Validation score   (accuracy)
    2.48s    = Training   runtime
    0.02s    = Validation runtime
Fitting model: XGBoost ...
    0.7143   = Validation score   (accuracy)
    0.22s    = Training   runtime
    0.0s     = Validation runtime
Fitting model: NeuralNetTorch ...
    0.8163   = Validation score   (accuracy)
    1.01s    = Training   runtime
    0.02s    = Validation runtime
Fitting model: LightGBMLarge ...
    0.7143   = Validation score   (accuracy)
    0.33s    = Training   runtime
    0.0s     = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
    0.8776   = Validation score   (accuracy)
    1.16s    = Training   runtime
    0.0s     = Validation runtime
AutoGluon training complete, total runtime = 15.93s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("RESULT_AUTOGLUON/")

15.93秒で完了しました。WeightedEnsemble_L2が選択されたようです。

変数の重要度の確認
importance  stddev  p_value     n   p99_high    p99_low
ca  0.047934    0.016425    0.001424    5   0.081754    0.014114
thal_7.0    0.043802    0.015070    0.001446    5   0.074831    0.012773
cp_4.0  0.038843    0.015626    0.002564    5   0.071017    0.006668
sex     0.025620    0.012188    0.004653    5   0.050716    0.000524
thalach     0.017355    0.023266    0.085317    5   0.065260    -0.030549
exang   0.015702    0.013517    0.030099    5   0.043534    -0.012129
oldpeak     0.008264    0.013390    0.119832    5   0.035835    -0.019306
cp_3.0  0.008264    0.008264    0.044505    5   0.025281    -0.008752
slope_2.0   0.006612    0.010776    0.120991    5   0.028799    -0.015575
fbs     0.004959    0.008958    0.141756    5   0.023404    -0.013487
thal_6.0    0.004959    0.003457    0.016339    5   0.012077    -0.002160
trestbps    0.004132    0.005844    0.094502    5   0.016165    -0.007900
chol    0.003306    0.008958    0.227830    5   0.021751    -0.015140
age     0.000826    0.001848    0.186950    5   0.004631    -0.002979
restecg_2.0     0.000000    0.005061    0.500000    5   0.010421    -0.010421
restecg_1.0     0.000000    0.000000    0.500000    5   0.000000    0.000000
cp_2.0  -0.000826   0.003457    0.689346    5   0.006292    -0.007945
slope_3.0   -0.001653   0.002263    0.911096    5   0.003007    -0.006313

ca,thal_7,cp_4が重要な変数なようです。

リーダーボードの確認
predictor.leaderboard(test, silent=True)
Out[0]
model   score_test  score_val   pred_time_test  pred_time_val   fit_time    pred_time_test_marginal     pred_time_val_marginal  fit_time_marginal   stack_level     can_infer   fit_order
0   NeuralNetTorch  0.918033    0.816327    0.013557    0.016752    1.010797    0.013557    0.016752    1.010797    1   True    12
1   NeuralNetFastAI     0.885246    0.836735    0.026323    0.016018    2.475871    0.026323    0.016018    2.475871    1   True    10
2   CatBoost    0.852459    0.775510    0.004657    0.002295    0.640289    0.004657    0.002295    0.640289    1   True    7
3   WeightedEnsemble_L2     0.852459    0.877551    0.033321    0.019821    4.722100    0.003053    0.001439    1.155352    2   True    14
4   RandomForestEntr    0.852459    0.714286    0.094608    0.063713    0.673625    0.094608    0.063713    0.673625    1   True    6
5   ExtraTreesEntr  0.836066    0.775510    0.077791    0.060216    0.757012    0.077791    0.060216    0.757012    1   True    9
6   ExtraTreesGini  0.836066    0.775510    0.077792    0.057055    0.687126    0.077792    0.057055    0.687126    1   True    8
7   RandomForestGini    0.836066    0.714286    0.120471    0.063285    0.809849    0.120471    0.063285    0.809849    1   True    5
8   LightGBMXT  0.819672    0.816327    0.003945    0.002364    1.090877    0.003945    0.002364    1.090877    1   True    3
9   XGBoost     0.819672    0.714286    0.010792    0.004440    0.219609    0.010792    0.004440    0.219609    1   True    11
10  LightGBMLarge   0.803279    0.714286    0.002192    0.003415    0.325471    0.002192    0.003415    0.325471    1   True    13
11  LightGBM    0.786885    0.775510    0.002399    0.002177    0.178292    0.002399    0.002177    0.178292    1   True    4
12  KNeighborsDist  0.622951    0.551020    0.015986    0.020918    0.023371    0.015986    0.020918    0.023371    1   True    2
13  KNeighborsUnif  0.622951    0.571429    0.016880    0.033736    4.824293    0.016880    0.033736    4.824293    1   True    1

各アルゴリズムに対する適用結果がまとめて分かります。

精度の確認
# 学習データとテストデータへの当てはまりを確認
from sklearn.metrics import accuracy_score
y_train_pred = predictor.predict(train)
y_test_pred = predictor.predict(test)
print("train",accuracy_score(y_train, y_train_pred))
print("test",accuracy_score(y_test, y_test_pred))
Out[0]
train 0.859504132231405
test 0.8524590163934426

オーバーフィットはしていなさそうですが、testデータへの当てはまり具合はmljarの方が断然良かったですね。

スポンサーリンク

AutoGluonでモデリング (チューニングあり)

auto_stackとfeature_generatorとtime_limitを設定
# https://auto.gluon.ai/stable/api/autogluon.tabular.TabularPredictor.html
# autogluonのモデル作成
from autogluon.tabular import TabularPredictor
from autogluon.features.generators import AutoMLPipelineFeatureGenerator
auto_ml_pipeline_feature_generator = AutoMLPipelineFeatureGenerator()
predictor = TabularPredictor(label="target",
                             problem_type="binary",
                             path="RESULT_AUTOGLUON2",
                             eval_metric=None).fit(train,
                                                   auto_stack=True,
                                                   time_limit=3619,
                                                   feature_generator=auto_ml_pipeline_feature_generator)
Out[0]
Stack configuration (auto_stack=True): num_stack_levels=0, num_bag_folds=5, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 3619s
AutoGluon will save models to "RESULT_AUTOGLUON2/"
AutoGluon Version:  0.8.2
Python Version:     3.8.17
Operating System:   Darwin
Platform Machine:   x86_64
Platform Version:   Darwin Kernel Version 19.6.0: Tue Jun 21 21:18:39 PDT 2022; root:xnu-6153.141.66~1/RELEASE_X86_64
Disk Space Avail:   128.37 GB / 239.85 GB (53.5%)
Train Data Rows:    242
Train Data Columns: 18
Label Column: target
Preprocessing data ...
Selected class <--> label mapping:  class 1 = 1, class 0 = 0
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
    Available Memory:                    11084.47 MB
    Train Data (Original)  Memory Usage: 0.03 MB (0.0% of available memory)
    Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    Stage 1 Generators:
        Fitting AsTypeFeatureGenerator...
            Note: Converting 12 features to boolean dtype as they only contain 2 unique values.
    Stage 2 Generators:
        Fitting FillNaFeatureGenerator...
    Stage 3 Generators:
        Fitting IdentityFeatureGenerator...
    Stage 4 Generators:
        Fitting DropUniqueFeatureGenerator...
    Stage 5 Generators:
        Fitting DropDuplicatesFeatureGenerator...
    Types of features in original data (raw dtype, special dtypes):
        ('float', []) : 18 | ['age', 'sex', 'trestbps', 'chol', 'fbs', ...]
    Types of features in processed data (raw dtype, special dtypes):
        ('float', [])     :  6 | ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', ...]
        ('int', ['bool']) : 12 | ['sex', 'fbs', 'exang', 'cp_2.0', 'cp_3.0', ...]
    0.2s = Fit runtime
    18 features in original data used to generate 18 features in processed data.
    Train Data (Processed) Memory Usage: 0.01 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.25s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
    'NN_TORCH': {},
    'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
    'CAT': {},
    'XGB': {},
    'FASTAI': {},
    'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
    'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
    'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
Fitting 13 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 3618.75s of the 3618.74s of remaining time.
    0.6488   = Validation score   (accuracy)
    0.02s    = Training   runtime
    0.02s    = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 3618.68s of the 3618.67s of remaining time.
    0.6364   = Validation score   (accuracy)
    0.01s    = Training   runtime
    0.02s    = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3618.6s of the 3618.6s of remaining time.
Will use sequential fold fitting strategy because import of ray failed. Reason: ray is required to train folds in parallel. A quick tip is to install via pip install ray==2.2.0, or use sequential fold fitting by passing sequential_local to ag_args_ensemble when calling tabular.fitFor example: predictor.fit(..., ag_args_ensemble={'fold_fitting_strategy': 'sequential_local'})
    Fitting 5 child models (S1F1 - S1F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8512   = Validation score   (accuracy)
    1.3s     = Training   runtime
    0.01s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3617.2s of the 3617.2s of remaining time.
    Fitting 5 child models (S1F1 - S1F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8512   = Validation score   (accuracy)
    1.05s    = Training   runtime
    0.01s    = Validation runtime
Fitting model: RandomForestGini_BAG_L1 ... Training model for up to 3616.08s of the 3616.07s of remaining time.
    0.7934   = Validation score   (accuracy)
    0.72s    = Training   runtime
    0.12s    = Validation runtime
Fitting model: RandomForestEntr_BAG_L1 ... Training model for up to 3615.19s of the 3615.18s of remaining time.
    0.7769   = Validation score   (accuracy)
    0.65s    = Training   runtime
    0.12s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3614.37s of the 3614.36s of remaining time.
    Fitting 5 child models (S1F1 - S1F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8595   = Validation score   (accuracy)
    2.69s    = Training   runtime
    0.01s    = Validation runtime
Fitting model: ExtraTreesGini_BAG_L1 ... Training model for up to 3611.62s of the 3611.61s of remaining time.
    0.7851   = Validation score   (accuracy)
    0.69s    = Training   runtime
    0.16s    = Validation runtime
Fitting model: ExtraTreesEntr_BAG_L1 ... Training model for up to 3610.71s of the 3610.7s of remaining time.
    0.7975   = Validation score   (accuracy)
    0.72s    = Training   runtime
    0.11s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3609.8s of the 3609.79s of remaining time.
    Fitting 5 child models (S1F1 - S1F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 2: early stopping
No improvement since epoch 2: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 7: early stopping
No improvement since epoch 8: early stopping
    0.8388   = Validation score   (accuracy)
    9.62s    = Training   runtime
    0.07s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3599.99s of the 3599.98s of remaining time.
    Fitting 5 child models (S1F1 - S1F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8017   = Validation score   (accuracy)
    1.07s    = Training   runtime
    0.02s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3598.79s of the 3598.78s of remaining time.
    Fitting 5 child models (S1F1 - S1F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8595   = Validation score   (accuracy)
    5.61s    = Training   runtime
    0.06s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3593.06s of the 3593.05s of remaining time.
    Fitting 5 child models (S1F1 - S1F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8306   = Validation score   (accuracy)
    1.5s     = Training   runtime
    0.01s    = Validation runtime
Repeating k-fold bagging: 2/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3591.47s of the 3591.47s of remaining time.
    Fitting 5 child models (S2F1 - S2F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    2.32s    = Training   runtime
    0.02s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3590.39s of the 3590.39s of remaining time.
    Fitting 5 child models (S2F1 - S2F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8099   = Validation score   (accuracy)
    2.04s    = Training   runtime
    0.02s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3589.33s of the 3589.32s of remaining time.
    Fitting 5 child models (S2F1 - S2F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    5.46s    = Training   runtime
    0.03s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3586.49s of the 3586.48s of remaining time.
    Fitting 5 child models (S2F1 - S2F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 1: early stopping
No improvement since epoch 6: early stopping

No improvement since epoch 1: early stopping
No improvement since epoch 6: early stopping
No improvement since epoch 1: early stopping
    0.843    = Validation score   (accuracy)
    18.69s   = Training   runtime
    0.15s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3577.22s of the 3577.21s of remaining time.
    Fitting 5 child models (S2F1 - S2F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.7893   = Validation score   (accuracy)
    2.01s    = Training   runtime
    0.04s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3576.16s of the 3576.15s of remaining time.
    Fitting 5 child models (S2F1 - S2F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    11.02s   = Training   runtime
    0.11s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3570.62s of the 3570.61s of remaining time.
    Fitting 5 child models (S2F1 - S2F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8099   = Validation score   (accuracy)
    2.97s    = Training   runtime
    0.02s    = Validation runtime
Repeating k-fold bagging: 3/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3569.08s of the 3569.07s of remaining time.
    Fitting 5 child models (S3F1 - S3F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8264   = Validation score   (accuracy)
    3.28s    = Training   runtime
    0.03s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3568.06s of the 3568.05s of remaining time.
    Fitting 5 child models (S3F1 - S3F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.814    = Validation score   (accuracy)
    3.11s    = Training   runtime
    0.03s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3566.92s of the 3566.91s of remaining time.
    Fitting 5 child models (S3F1 - S3F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    8.58s    = Training   runtime
    0.04s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3563.74s of the 3563.73s of remaining time.
    Fitting 5 child models (S3F1 - S3F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 7: early stopping
No improvement since epoch 2: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 2: early stopping
    0.843    = Validation score   (accuracy)
    28.05s   = Training   runtime
    0.22s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3554.19s of the 3554.18s of remaining time.
    Fitting 5 child models (S3F1 - S3F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.7893   = Validation score   (accuracy)
    2.92s    = Training   runtime
    0.06s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3553.16s of the 3553.15s of remaining time.
    Fitting 5 child models (S3F1 - S3F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    16.49s   = Training   runtime
    0.17s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3547.57s of the 3547.57s of remaining time.
    Fitting 5 child models (S3F1 - S3F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.814    = Validation score   (accuracy)
    4.29s    = Training   runtime
    0.03s    = Validation runtime
Repeating k-fold bagging: 4/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3546.19s of the 3546.18s of remaining time.
    Fitting 5 child models (S4F1 - S4F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8264   = Validation score   (accuracy)
    4.22s    = Training   runtime
    0.04s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3545.18s of the 3545.17s of remaining time.
    Fitting 5 child models (S4F1 - S4F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    4.12s    = Training   runtime
    0.04s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3544.09s of the 3544.08s of remaining time.
    Fitting 5 child models (S4F1 - S4F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    11.52s   = Training   runtime
    0.05s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3541.09s of the 3541.09s of remaining time.
    Fitting 5 child models (S4F1 - S4F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 2: early stopping
No improvement since epoch 0: early stopping
No improvement since epoch 4: early stopping
No improvement since epoch 1: early stopping
    0.8512   = Validation score   (accuracy)
    36.75s   = Training   runtime
    0.29s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3532.22s of the 3532.21s of remaining time.
    Fitting 5 child models (S4F1 - S4F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8017   = Validation score   (accuracy)
    3.71s    = Training   runtime
    0.08s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3531.32s of the 3531.31s of remaining time.
    Fitting 5 child models (S4F1 - S4F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8347   = Validation score   (accuracy)
    22.01s   = Training   runtime
    0.23s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3525.69s of the 3525.68s of remaining time.
    Fitting 5 child models (S4F1 - S4F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8058   = Validation score   (accuracy)
    5.68s    = Training   runtime
    0.04s    = Validation runtime
Repeating k-fold bagging: 5/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3524.23s of the 3524.22s of remaining time.
    Fitting 5 child models (S5F1 - S5F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8264   = Validation score   (accuracy)
    5.18s    = Training   runtime
    0.05s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3523.21s of the 3523.2s of remaining time.
    Fitting 5 child models (S5F1 - S5F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.814    = Validation score   (accuracy)
    5.03s    = Training   runtime
    0.05s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3522.25s of the 3522.24s of remaining time.
    Fitting 5 child models (S5F1 - S5F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8512   = Validation score   (accuracy)
    14.19s   = Training   runtime
    0.06s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3519.53s of the 3519.52s of remaining time.
    Fitting 5 child models (S5F1 - S5F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 3: early stopping
No improvement since epoch 2: early stopping
No improvement since epoch 3: early stopping
No improvement since epoch 2: early stopping
    0.8636   = Validation score   (accuracy)
    46.01s   = Training   runtime
    0.36s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3510.09s of the 3510.08s of remaining time.
    Fitting 5 child models (S5F1 - S5F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.7975   = Validation score   (accuracy)
    4.56s    = Training   runtime
    0.11s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3509.12s of the 3509.11s of remaining time.
    Fitting 5 child models (S5F1 - S5F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    27.88s   = Training   runtime
    0.29s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3503.12s of the 3503.11s of remaining time.
    Fitting 5 child models (S5F1 - S5F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.814    = Validation score   (accuracy)
    6.99s    = Training   runtime
    0.05s    = Validation runtime
Repeating k-fold bagging: 6/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3501.74s of the 3501.74s of remaining time.
    Fitting 5 child models (S6F1 - S6F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8306   = Validation score   (accuracy)
    6.12s    = Training   runtime

    0.06s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3500.75s of the 3500.74s of remaining time.
    Fitting 5 child models (S6F1 - S6F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    5.99s    = Training   runtime
    0.06s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3499.72s of the 3499.71s of remaining time.
    Fitting 5 child models (S6F1 - S6F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8471   = Validation score   (accuracy)
    16.64s   = Training   runtime
    0.08s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3497.2s of the 3497.2s of remaining time.
    Fitting 5 child models (S6F1 - S6F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 1: early stopping
No improvement since epoch 2: early stopping
No improvement since epoch 3: early stopping
No improvement since epoch 2: early stopping
    0.8554   = Validation score   (accuracy)
    54.87s   = Training   runtime
    0.43s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3488.16s of the 3488.15s of remaining time.
    Fitting 5 child models (S6F1 - S6F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8058   = Validation score   (accuracy)
    5.4s     = Training   runtime
    0.13s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3487.22s of the 3487.21s of remaining time.
    Fitting 5 child models (S6F1 - S6F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8306   = Validation score   (accuracy)
    33.74s   = Training   runtime
    0.35s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3481.24s of the 3481.23s of remaining time.
    Fitting 5 child models (S6F1 - S6F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8223   = Validation score   (accuracy)
    8.35s    = Training   runtime
    0.06s    = Validation runtime
Repeating k-fold bagging: 7/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3479.8s of the 3479.79s of remaining time.
    Fitting 5 child models (S7F1 - S7F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8347   = Validation score   (accuracy)
    7.05s    = Training   runtime
    0.07s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3478.81s of the 3478.8s of remaining time.
    Fitting 5 child models (S7F1 - S7F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    6.93s    = Training   runtime
    0.07s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3477.81s of the 3477.81s of remaining time.
    Fitting 5 child models (S7F1 - S7F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8471   = Validation score   (accuracy)
    19.04s   = Training   runtime
    0.09s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3475.35s of the 3475.34s of remaining time.
    Fitting 5 child models (S7F1 - S7F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 5: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 3: early stopping
No improvement since epoch 1: early stopping
    0.8554   = Validation score   (accuracy)
    63.9s    = Training   runtime
    0.5s     = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3466.14s of the 3466.14s of remaining time.
    Fitting 5 child models (S7F1 - S7F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.7934   = Validation score   (accuracy)
    6.2s     = Training   runtime
    0.15s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3465.22s of the 3465.22s of remaining time.
    Fitting 5 child models (S7F1 - S7F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8306   = Validation score   (accuracy)
    38.59s   = Training   runtime
    0.41s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3460.24s of the 3460.23s of remaining time.
    Fitting 5 child models (S7F1 - S7F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    9.64s    = Training   runtime
    0.07s    = Validation runtime
Repeating k-fold bagging: 8/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3458.89s of the 3458.88s of remaining time.
    Fitting 5 child models (S8F1 - S8F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    8.01s    = Training   runtime
    0.08s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3457.86s of the 3457.85s of remaining time.
    Fitting 5 child models (S8F1 - S8F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    7.91s    = Training   runtime
    0.08s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3456.8s of the 3456.79s of remaining time.
    Fitting 5 child models (S8F1 - S8F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8554   = Validation score   (accuracy)
    21.43s   = Training   runtime
    0.1s     = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3454.34s of the 3454.33s of remaining time.
    Fitting 5 child models (S8F1 - S8F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 3: early stopping
No improvement since epoch 3: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 5: early stopping
No improvement since epoch 2: early stopping
    0.8512   = Validation score   (accuracy)
    73.54s   = Training   runtime
    0.57s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3444.51s of the 3444.5s of remaining time.
    Fitting 5 child models (S8F1 - S8F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.781    = Validation score   (accuracy)
    7.05s    = Training   runtime
    0.17s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3443.55s of the 3443.55s of remaining time.
    Fitting 5 child models (S8F1 - S8F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    44.64s   = Training   runtime
    0.47s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3437.38s of the 3437.37s of remaining time.
    Fitting 5 child models (S8F1 - S8F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.814    = Validation score   (accuracy)
    11.08s   = Training   runtime
    0.08s    = Validation runtime
Repeating k-fold bagging: 9/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3435.84s of the 3435.83s of remaining time.
    Fitting 5 child models (S9F1 - S9F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8306   = Validation score   (accuracy)
    9.01s    = Training   runtime
    0.09s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3434.77s of the 3434.76s of remaining time.
    Fitting 5 child models (S9F1 - S9F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8223   = Validation score   (accuracy)
    8.92s    = Training   runtime
    0.09s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3433.69s of the 3433.68s of remaining time.
    Fitting 5 child models (S9F1 - S9F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8471   = Validation score   (accuracy)
    24.02s   = Training   runtime
    0.11s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3431.04s of the 3431.03s of remaining time.
    Fitting 5 child models (S9F1 - S9F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 9: early stopping
No improvement since epoch 5: early stopping
No improvement since epoch 2: early stopping
No improvement since epoch 0: early stopping
    0.843    = Validation score   (accuracy)
    83.89s   = Training   runtime
    0.65s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3420.48s of the 3420.47s of remaining time.
    Fitting 5 child models (S9F1 - S9F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8017   = Validation score   (accuracy)
    7.98s    = Training   runtime

    0.19s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3419.44s of the 3419.43s of remaining time.
    Fitting 5 child models (S9F1 - S9F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8471   = Validation score   (accuracy)
    50.14s   = Training   runtime
    0.52s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3413.81s of the 3413.8s of remaining time.
    Fitting 5 child models (S9F1 - S9F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.814    = Validation score   (accuracy)
    12.82s   = Training   runtime
    0.09s    = Validation runtime
Repeating k-fold bagging: 10/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3411.98s of the 3411.97s of remaining time.
    Fitting 5 child models (S10F1 - S10F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8347   = Validation score   (accuracy)
    10.15s   = Training   runtime
    0.1s     = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3410.74s of the 3410.73s of remaining time.
    Fitting 5 child models (S10F1 - S10F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8223   = Validation score   (accuracy)
    9.88s    = Training   runtime
    0.09s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3409.7s of the 3409.69s of remaining time.
    Fitting 5 child models (S10F1 - S10F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8512   = Validation score   (accuracy)
    26.56s   = Training   runtime
    0.13s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3407.08s of the 3407.08s of remaining time.
    Fitting 5 child models (S10F1 - S10F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 5: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 4: early stopping
    0.8388   = Validation score   (accuracy)
    92.97s   = Training   runtime
    0.72s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3397.82s of the 3397.81s of remaining time.
    Fitting 5 child models (S10F1 - S10F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8017   = Validation score   (accuracy)
    9.09s    = Training   runtime
    0.22s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3396.57s of the 3396.57s of remaining time.
    Fitting 5 child models (S10F1 - S10F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8471   = Validation score   (accuracy)
    55.21s   = Training   runtime
    0.58s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3391.38s of the 3391.37s of remaining time.
    Fitting 5 child models (S10F1 - S10F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    14.42s   = Training   runtime
    0.1s     = Validation runtime
Repeating k-fold bagging: 11/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3389.6s of the 3389.6s of remaining time.
    Fitting 5 child models (S11F1 - S11F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8306   = Validation score   (accuracy)
    11.37s   = Training   runtime
    0.11s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3388.31s of the 3388.3s of remaining time.
    Fitting 5 child models (S11F1 - S11F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    10.83s   = Training   runtime
    0.1s     = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3387.29s of the 3387.29s of remaining time.
    Fitting 5 child models (S11F1 - S11F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    29.02s   = Training   runtime
    0.14s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3384.76s of the 3384.75s of remaining time.
    Fitting 5 child models (S11F1 - S11F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 6: early stopping
No improvement since epoch 6: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 1: early stopping
    0.843    = Validation score   (accuracy)
    103.09s  = Training   runtime
    0.91s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3374.31s of the 3374.31s of remaining time.
    Fitting 5 child models (S11F1 - S11F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.7934   = Validation score   (accuracy)
    9.89s    = Training   runtime
    0.24s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3373.4s of the 3373.39s of remaining time.
    Fitting 5 child models (S11F1 - S11F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8471   = Validation score   (accuracy)
    60.74s   = Training   runtime
    0.63s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3367.74s of the 3367.74s of remaining time.
    Fitting 5 child models (S11F1 - S11F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8223   = Validation score   (accuracy)
    15.91s   = Training   runtime
    0.11s    = Validation runtime
Repeating k-fold bagging: 12/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3366.16s of the 3366.15s of remaining time.
    Fitting 5 child models (S12F1 - S12F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8347   = Validation score   (accuracy)
    12.37s   = Training   runtime
    0.12s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3365.08s of the 3365.07s of remaining time.
    Fitting 5 child models (S12F1 - S12F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    11.76s   = Training   runtime
    0.11s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3364.07s of the 3364.06s of remaining time.
    Fitting 5 child models (S12F1 - S12F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    31.42s   = Training   runtime
    0.15s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3361.6s of the 3361.6s of remaining time.
    Fitting 5 child models (S12F1 - S12F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 1: early stopping
No improvement since epoch 3: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 2: early stopping
No improvement since epoch 1: early stopping
    0.8471   = Validation score   (accuracy)
    111.39s  = Training   runtime
    0.98s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3353.11s of the 3353.1s of remaining time.
    Fitting 5 child models (S12F1 - S12F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.7934   = Validation score   (accuracy)
    10.72s   = Training   runtime
    0.26s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3352.15s of the 3352.14s of remaining time.
    Fitting 5 child models (S12F1 - S12F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    67.07s   = Training   runtime
    0.69s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3345.69s of the 3345.68s of remaining time.
    Fitting 5 child models (S12F1 - S12F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8099   = Validation score   (accuracy)
    17.35s   = Training   runtime
    0.12s    = Validation runtime
Repeating k-fold bagging: 13/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3344.15s of the 3344.14s of remaining time.
    Fitting 5 child models (S13F1 - S13F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8347   = Validation score   (accuracy)
    13.31s   = Training   runtime
    0.13s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3343.14s of the 3343.13s of remaining time.
    Fitting 5 child models (S13F1 - S13F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    12.75s   = Training   runtime
    0.12s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3342.06s of the 3342.05s of remaining time.

    Fitting 5 child models (S13F1 - S13F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8347   = Validation score   (accuracy)
    33.92s   = Training   runtime
    0.16s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3339.49s of the 3339.48s of remaining time.
    Fitting 5 child models (S13F1 - S13F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 2: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 3: early stopping
No improvement since epoch 4: early stopping
    0.8471   = Validation score   (accuracy)
    119.89s  = Training   runtime
    1.05s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3330.81s of the 3330.8s of remaining time.
    Fitting 5 child models (S13F1 - S13F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.7851   = Validation score   (accuracy)
    11.63s   = Training   runtime
    0.28s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3329.78s of the 3329.77s of remaining time.
    Fitting 5 child models (S13F1 - S13F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    72.24s   = Training   runtime
    0.74s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3324.47s of the 3324.46s of remaining time.
    Fitting 5 child models (S13F1 - S13F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    18.65s   = Training   runtime
    0.13s    = Validation runtime
Repeating k-fold bagging: 14/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3323.09s of the 3323.08s of remaining time.
    Fitting 5 child models (S14F1 - S14F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8347   = Validation score   (accuracy)
    14.25s   = Training   runtime
    0.14s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3322.09s of the 3322.08s of remaining time.
    Fitting 5 child models (S14F1 - S14F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8264   = Validation score   (accuracy)
    13.69s   = Training   runtime
    0.13s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3321.05s of the 3321.04s of remaining time.
    Fitting 5 child models (S14F1 - S14F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    36.31s   = Training   runtime
    0.18s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3318.59s of the 3318.58s of remaining time.
    Fitting 5 child models (S14F1 - S14F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 1: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 0: early stopping
No improvement since epoch 2: early stopping
No improvement since epoch 2: early stopping
    0.8471   = Validation score   (accuracy)
    128.43s  = Training   runtime
    1.12s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3309.85s of the 3309.85s of remaining time.
    Fitting 5 child models (S14F1 - S14F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.7893   = Validation score   (accuracy)
    12.41s   = Training   runtime
    0.3s     = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3308.95s of the 3308.95s of remaining time.
    Fitting 5 child models (S14F1 - S14F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    77.57s   = Training   runtime
    0.8s     = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3303.49s of the 3303.49s of remaining time.
    Fitting 5 child models (S14F1 - S14F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8099   = Validation score   (accuracy)
    20.16s   = Training   runtime
    0.14s    = Validation runtime
Repeating k-fold bagging: 15/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3301.88s of the 3301.87s of remaining time.
    Fitting 5 child models (S15F1 - S15F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8347   = Validation score   (accuracy)
    15.17s   = Training   runtime
    0.15s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3300.89s of the 3300.88s of remaining time.
    Fitting 5 child models (S15F1 - S15F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8223   = Validation score   (accuracy)
    14.65s   = Training   runtime
    0.14s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3299.86s of the 3299.85s of remaining time.
    Fitting 5 child models (S15F1 - S15F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8306   = Validation score   (accuracy)
    38.88s   = Training   runtime
    0.19s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3297.22s of the 3297.21s of remaining time.
    Fitting 5 child models (S15F1 - S15F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 2: early stopping
No improvement since epoch 4: early stopping
No improvement since epoch 5: early stopping
No improvement since epoch 5: early stopping
No improvement since epoch 2: early stopping
    0.8471   = Validation score   (accuracy)
    137.46s  = Training   runtime
    1.19s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3288.0s of the 3287.99s of remaining time.
    Fitting 5 child models (S15F1 - S15F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.7934   = Validation score   (accuracy)
    13.32s   = Training   runtime
    0.32s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3286.97s of the 3286.96s of remaining time.
    Fitting 5 child models (S15F1 - S15F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    83.78s   = Training   runtime
    0.86s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3280.63s of the 3280.62s of remaining time.
    Fitting 5 child models (S15F1 - S15F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.814    = Validation score   (accuracy)
    21.51s   = Training   runtime
    0.15s    = Validation runtime
Repeating k-fold bagging: 16/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3279.2s of the 3279.19s of remaining time.
    Fitting 5 child models (S16F1 - S16F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    16.1s    = Training   runtime
    0.16s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3278.2s of the 3278.19s of remaining time.
    Fitting 5 child models (S16F1 - S16F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8223   = Validation score   (accuracy)
    15.68s   = Training   runtime
    0.15s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3277.08s of the 3277.07s of remaining time.
    Fitting 5 child models (S16F1 - S16F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8347   = Validation score   (accuracy)
    41.33s   = Training   runtime
    0.2s     = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3274.56s of the 3274.55s of remaining time.
    Fitting 5 child models (S16F1 - S16F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 2: early stopping
No improvement since epoch 5: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 1: early stopping
    0.8512   = Validation score   (accuracy)
    145.89s  = Training   runtime
    1.26s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3265.93s of the 3265.92s of remaining time.
    Fitting 5 child models (S16F1 - S16F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8017   = Validation score   (accuracy)
    14.16s   = Training   runtime
    0.34s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3264.97s of the 3264.96s of remaining time.
    Fitting 5 child models (S16F1 - S16F5) | Fitting with SequentialLocalFoldFittingStrategy

    0.843    = Validation score   (accuracy)
    89.41s   = Training   runtime
    0.91s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3259.21s of the 3259.2s of remaining time.
    Fitting 5 child models (S16F1 - S16F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    22.82s   = Training   runtime
    0.16s    = Validation runtime
Repeating k-fold bagging: 17/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3257.82s of the 3257.81s of remaining time.
    Fitting 5 child models (S17F1 - S17F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8347   = Validation score   (accuracy)
    17.13s   = Training   runtime
    0.17s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3256.7s of the 3256.69s of remaining time.
    Fitting 5 child models (S17F1 - S17F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    16.64s   = Training   runtime
    0.16s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3255.64s of the 3255.64s of remaining time.
    Fitting 5 child models (S17F1 - S17F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8306   = Validation score   (accuracy)
    43.9s    = Training   runtime
    0.21s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3253.01s of the 3253.0s of remaining time.
    Fitting 5 child models (S17F1 - S17F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 2: early stopping
No improvement since epoch 5: early stopping
No improvement since epoch 8: early stopping
No improvement since epoch 2: early stopping
    0.8554   = Validation score   (accuracy)
    155.31s  = Training   runtime
    1.33s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3243.39s of the 3243.38s of remaining time.
    Fitting 5 child models (S17F1 - S17F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8017   = Validation score   (accuracy)
    15.03s   = Training   runtime
    0.36s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3242.4s of the 3242.39s of remaining time.
    Fitting 5 child models (S17F1 - S17F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    94.55s   = Training   runtime
    0.96s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3237.13s of the 3237.12s of remaining time.
    Fitting 5 child models (S17F1 - S17F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8264   = Validation score   (accuracy)
    24.11s   = Training   runtime
    0.17s    = Validation runtime
Repeating k-fold bagging: 18/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3235.76s of the 3235.75s of remaining time.
    Fitting 5 child models (S18F1 - S18F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    18.07s   = Training   runtime
    0.18s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3234.74s of the 3234.73s of remaining time.
    Fitting 5 child models (S18F1 - S18F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8182   = Validation score   (accuracy)
    17.57s   = Training   runtime
    0.17s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3233.73s of the 3233.72s of remaining time.
    Fitting 5 child models (S18F1 - S18F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8306   = Validation score   (accuracy)
    46.34s   = Training   runtime
    0.23s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3231.21s of the 3231.21s of remaining time.
    Fitting 5 child models (S18F1 - S18F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 7: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 5: early stopping
No improvement since epoch 2: early stopping
    0.8512   = Validation score   (accuracy)
    164.31s  = Training   runtime
    1.4s     = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3222.01s of the 3222.0s of remaining time.
    Fitting 5 child models (S18F1 - S18F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8058   = Validation score   (accuracy)
    15.82s   = Training   runtime
    0.38s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3221.09s of the 3221.09s of remaining time.
    Fitting 5 child models (S18F1 - S18F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    100.48s  = Training   runtime
    1.02s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3215.04s of the 3215.03s of remaining time.
    Fitting 5 child models (S18F1 - S18F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8264   = Validation score   (accuracy)
    25.38s   = Training   runtime
    0.18s    = Validation runtime
Repeating k-fold bagging: 19/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3213.69s of the 3213.68s of remaining time.
    Fitting 5 child models (S19F1 - S19F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    19.01s   = Training   runtime
    0.18s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3212.68s of the 3212.68s of remaining time.
    Fitting 5 child models (S19F1 - S19F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8223   = Validation score   (accuracy)
    18.55s   = Training   runtime
    0.18s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3211.63s of the 3211.62s of remaining time.
    Fitting 5 child models (S19F1 - S19F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8306   = Validation score   (accuracy)
    48.74s   = Training   runtime
    0.24s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3209.16s of the 3209.15s of remaining time.
    Fitting 5 child models (S19F1 - S19F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 3: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 1: early stopping
    0.8595   = Validation score   (accuracy)
    173.84s  = Training   runtime
    1.46s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3199.44s of the 3199.44s of remaining time.
    Fitting 5 child models (S19F1 - S19F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8058   = Validation score   (accuracy)
    16.66s   = Training   runtime
    0.4s     = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3198.48s of the 3198.47s of remaining time.
    Fitting 5 child models (S19F1 - S19F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.843    = Validation score   (accuracy)
    106.37s  = Training   runtime
    1.07s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3192.46s of the 3192.45s of remaining time.
    Fitting 5 child models (S19F1 - S19F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.814    = Validation score   (accuracy)
    26.83s   = Training   runtime
    0.19s    = Validation runtime
Repeating k-fold bagging: 20/20
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 3190.89s of the 3190.88s of remaining time.
    Fitting 5 child models (S20F1 - S20F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    19.94s   = Training   runtime
    0.19s    = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 3189.88s of the 3189.88s of remaining time.
    Fitting 5 child models (S20F1 - S20F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8223   = Validation score   (accuracy)
    19.88s   = Training   runtime
    0.19s    = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 3188.45s of the 3188.45s of remaining time.
    Fitting 5 child models (S20F1 - S20F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8264   = Validation score   (accuracy)
    51.37s   = Training   runtime
    0.25s    = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 3185.74s of the 3185.73s of remaining time.

    Fitting 5 child models (S20F1 - S20F5) | Fitting with SequentialLocalFoldFittingStrategy
No improvement since epoch 1: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 1: early stopping
No improvement since epoch 3: early stopping
No improvement since epoch 7: early stopping
    0.8554   = Validation score   (accuracy)
    182.63s  = Training   runtime
    1.54s    = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 3176.75s of the 3176.74s of remaining time.
    Fitting 5 child models (S20F1 - S20F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8058   = Validation score   (accuracy)
    17.5s    = Training   runtime
    0.42s    = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 3175.78s of the 3175.77s of remaining time.
    Fitting 5 child models (S20F1 - S20F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.8388   = Validation score   (accuracy)
    111.38s  = Training   runtime
    1.13s    = Validation runtime
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3170.64s of the 3170.63s of remaining time.
    Fitting 5 child models (S20F1 - S20F5) | Fitting with SequentialLocalFoldFittingStrategy
    0.814    = Validation score   (accuracy)
    28.13s   = Training   runtime
    0.2s     = Validation runtime
Completed 20/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 361.87s of the 3169.23s of remaining time.
    0.8678   = Validation score   (accuracy)
    1.01s    = Training   runtime
    0.0s     = Validation runtime
AutoGluon training complete, total runtime = 450.85s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("RESULT_AUTOGLUON2/")

ベストモデルはデフォルト設定と変化ありませんでしたが、実行時間は450秒に増えました。

重要度の確認
importance  stddev  p_value     n   p99_high    p99_low
ca  0.088430    0.013262    0.000059    5   0.115736    0.061123
thal_7.0    0.066942    0.013517    0.000189    5   0.094774    0.039111
cp_4.0  0.059504    0.011165    0.000142    5   0.082492    0.036516
trestbps    0.044628    0.007949    0.000116    5   0.060994    0.028262
thalach     0.039669    0.013891    0.001543    5   0.068271    0.011068
chol    0.034711    0.004711    0.000040    5   0.044412    0.025010
age     0.030579    0.007507    0.000403    5   0.046035    0.015122
oldpeak     0.021488    0.012534    0.009281    5   0.047295    -0.004319
restecg_2.0     0.017355    0.003457    0.000179    5   0.024474    0.010237
exang   0.016529    0.002922    0.000112    5   0.022545    0.010513
sex     0.013223    0.011466    0.030708    5   0.036833    -0.010386
slope_2.0   0.013223    0.001848    0.000045    5   0.017028    0.009418
fbs     0.009091    0.003457    0.002091    5   0.016209    0.001972
thal_6.0    0.008264    0.004132    0.005528    5   0.016773    -0.000244
cp_2.0  0.006612    0.006267    0.038871    5   0.019515    -0.006292
slope_3.0   0.005785    0.002263    0.002318    5   0.010445    0.001125
cp_3.0  0.003306    0.003457    0.049650    5   0.010424    -0.003813
restecg_1.0     0.000826    0.004527    0.352000    5   0.010147    -0.008494

トップ3に変化はありませんね。

リーダーボードの確認
predictor.leaderboard(test, silent=True)
Out[0]
model   score_test  score_val   pred_time_test  pred_time_val   fit_time    pred_time_test_marginal     pred_time_val_marginal  fit_time_marginal   stack_level     can_infer   fit_order
0   NeuralNetTorch_BAG_L1   0.901639    0.838843    1.313896    1.131090    111.379313  1.313896    1.131090    111.379313  1   True    12
1   NeuralNetFastAI_BAG_L1  0.901639    0.855372    1.844487    1.535011    182.631848  1.844487    1.535011    182.631848  1   True    10
2   RandomForestEntr_BAG_L1     0.885246    0.776860    0.091751    0.120122    0.647892    0.091751    0.120122    0.647892    1   True    6
3   CatBoost_BAG_L1     0.885246    0.826446    0.213271    0.251431    51.373805   0.213271    0.251431    51.373805   1   True    7
4   WeightedEnsemble_L2     0.885246    0.867769    2.078809    1.805338    235.025834  0.003749    0.001581    1.007252    2   True    14
5   RandomForestGini_BAG_L1     0.868852    0.793388    0.088803    0.116070    0.715415    0.088803    0.116070    0.715415    1   True    5
6   ExtraTreesEntr_BAG_L1   0.868852    0.797521    0.097888    0.113942    0.724379    0.097888    0.113942    0.724379    1   True    9
7   LightGBM_BAG_L1     0.868852    0.822314    0.258172    0.193620    19.879937   0.258172    0.193620    19.879937   1   True    4
8   LightGBMXT_BAG_L1   0.868852    0.838843    0.334872    0.192702    19.937207   0.334872    0.192702    19.937207   1   True    3
9   XGBoost_BAG_L1  0.868852    0.805785    1.129513    0.416742    17.504010   1.129513    0.416742    17.504010   1   True    11
10  LightGBMLarge_BAG_L1    0.852459    0.814050    0.316309    0.197132    28.125010   0.316309    0.197132    28.125010   1   True    13
11  ExtraTreesGini_BAG_L1   0.836066    0.785124    0.079650    0.155725    0.693128    0.079650    0.155725    0.693128    1   True    8
12  KNeighborsDist_BAG_L1   0.672131    0.636364    0.017302    0.017315    0.012930    0.017302    0.017315    0.012930    1   True    2
13  KNeighborsUnif_BAG_L1   0.655738    0.648760    0.027467    0.019590    0.016064    0.027467    0.019590    0.016064    1   True    1

BAG_L1がついているアルゴリズムがほとんどですね。

精度の確認
# 学習データとテストデータへの当てはまりを確認
from sklearn.metrics import accuracy_score
y_train_pred = predictor.predict(train)
y_test_pred = predictor.predict(test)
print("train",accuracy_score(y_train, y_train_pred))
print("test",accuracy_score(y_test, y_test_pred))
Out[0]
train 0.9462809917355371
test 0.8852459016393442

精度は上がりましたが、オーバーフィットしてしまっているかも。

スポンサーリンク

auto-sklearnでモデリング

最後はauto-sklearnです。

auto-sklearnは少し古いバージョンのsklearnが依存関係になっています。
そのため、モデリング用データの作成を下記のように変更する必要があります。

  • handle_unknown='igrnore' → handle_unknown='error'

OneHotEnc = OneHotEncoder(categories='auto',drop='first',handle_unknown='error')

  • get_feature_names_out() → get_feature_names()
    dummy_cols = OneHotEnc.get_feature_names()
モデル定義と学習
import autosklearn.classification

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=3619,
    per_run_time_limit=600,
    tmp_folder="autosklearn_classification_cleveland",
)
automl.fit(X_train, y_train, dataset_name="cleveland")
Out[0]
AutoSklearnClassifier(ensemble_class=class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
                      per_run_time_limit=600, time_left_for_this_task=3619,
                      tmp_folder='autosklearn_classification_cleveland')

学習の進捗具合はtmp_folderで指定したフォルダの中にログとして書き込まれています。

精度の確認
# 学習データとテストデータへの当てはまりを確認
from sklearn.metrics import accuracy_score
y_train_pred = automl.predict(X_train)
y_test_pred = automl.predict(X_test)
print("train",accuracy_score(y_train, y_train_pred))
print("test",accuracy_score(y_test, y_test_pred))
Out[0]
train 0.8801652892561983
test 0.8688524590163934

精度はAutoGluonのデフォルト設定よりはいいかなという感じですね。

スポンサーリンク

まとめ

今回は3つのAutoMLを試して見ました。

本データセットにはmljarのアルゴリズムが合っていたようで、一番精度が高かったです。(時間はかかりますが、、)

結果の分かりやすさを重視するのであれば、決定木やロジスティック回帰を使ってモデリングをするのが個人的には良いかと思いますが、精度重視の場合はAutoMLを使うのがいいかも知れませんね。

タイトルとURLをコピーしました