PLSA.utils package

Submodules

PLSA.utils.cutoff module

Module for determinding cutoffs in common

The function of this Module is served for determinding cutoffs by different methods in common.

PLSA.utils.cutoff.accuracy(y_true, y_prob)

Cutoff maximize accuracy.

Parameters:
  • y_true (np.array or pandas.Series) – True value.
  • y_prob (np.array or pandas.Series) – Predicted value.
Returns:

Optimal cutoff and max metrics.

Return type:

tuple(float, float)

Examples

>>> accuracy(y_true, y_prob)
PLSA.utils.cutoff.youden(target, predicted)

Cutoff maximize Youden Index.

Parameters:
  • target (np.array or pandas.Series) – True value.
  • predicted (np.array or pandas.Series) – Predicted value.
Returns:

optimal cutoff and max metrics.

Return type:

tuple(float, float)

Examples

>>> youden(y_true, y_prob)

PLSA.utils.metrics module

Module for evaluating model by many kinds of metrics

The function of this Module is served for evaluating model by many kinds of metrics.

PLSA.utils.metrics.calibration(y_true, pred_proba, n_bins=10, in_sample=False)

Calibration and test of predictive model.

Parameters:
  • y_true (np.array or pandas.Series) – True label.
  • pred_proba (np.array or pandas.Series) – Predicted label.
  • n_bins (int) – Number of groups.
  • in_sample (bool, default False) – Is Calibration-Test in sample.
Returns:

Table of calibration.

Return type:

pandas.DataFrame

Examples

>>> calibration(y_test, y_pred, n_bins=5)
PLSA.utils.metrics.calibration_table(y_true, y_prob, normalize=False, n_bins=10)

Calibration table of predictive model.

Parameters:
  • y_true (np.array or pandas.Series) – True label.
  • y_prob (np.array or pandas.Series) – Predicted label.
  • n_bins (int) – Number of groups.
Returns:

true, sum and total number of each group.

Return type:

tuple(numpy.array)

Examples

>>> calibration_table(y_test, y_pred, n_bins=5)
PLSA.utils.metrics.discrimination(y_true, y_pred_proba, threshold=None, name='Model X')

Discrimination of classification model.

Parameters:
  • y_true (np.array or pandas.Series) – True label.
  • pred_proba (np.array or pandas.Series) – Predicted label.
  • threshold (float) – Cutoff value.
  • name (str) – Title for printing.
Returns:

Dict with kinds of metrics.

{

“points”: threshold, “Sen”: Re, “Spe”: Spe, “Acc”: Accuracy, “F1”: F1

}

Return type:

dict

Examples

>>> discrimination(y_true, y_pred_proba, threshold=0.21)
PLSA.utils.metrics.discrimination_ver(y_true, y_pred_proba, threshold=None, name='Model X')

Discrimination of classification model in version 2.

Parameters:
  • y_true (np.array or pandas.Series) – True label.
  • pred_proba (np.array or pandas.Series) – Predicted label.
  • threshold (float) – Cutoff value.
  • name (str) – Title for printing.
Returns:

Dict with kinds of metrics.

{

“points”: threshold, “Sen”: Sen, “Spe”: Spe, “PPV”: ppv, “NPV”: npv

}

Return type:

dict

Examples

>>> discrimination_ver(y_true, y_pred_proba, threshold=0.21)

PLSA.utils.test module

Module for statistical test

The function of this Module is served for statistical test.

PLSA.utils.test.Delong_Test(y_true, pred_a, pred_b)

Delong-Test for comparing two predictive model.

Parameters:
  • y_true (numpy.array or pandas.Series.) – True label.
  • pred_a (numpy.array or pandas.Series.) – Prediction of model A.
  • pred_b (numpy.array or pandas.Series.) – Prediction of model B.
Returns:

chi2 value and P-value.

Return type:

tuple

Examples

>>> # pred_proba1 = xgb1.predict_proba(test_X)
>>> # pred_proba2 = xgb2.predict_proba(test_X)
>>> Delong_test(test_y, pred_proba1[:, 1], pred_proba2[:, 1])
PLSA.utils.test.Hosmer_Lemeshow_Test(bins_true, bins_pred, bins_tot, n_bins=10, in_sample=False)

Hosmer-Lemeshow Test for testing calibration.

Parameters:
  • bins_true (numpy.array) – True Number of people in each group.
  • bins_pred (numpy.array) – Pred Number of people in each group.
  • bins_tot (numpy.array) – Totol Number of people in each group.
  • n_bins (int) – Number of groups.
  • in_sample (bool, default False) – Is Calibration-Test in sample.
Returns:

chi2 value and P value.

Return type:

tuple

Examples

>>> Hosmer_Lemeshow_Test(bins_true, bins_pred, bins_tot, n_bins=5)
PLSA.utils.test.VIF_Test(data, cols=None)

Variance Inflation Factors for each variable.

Parameters:
  • data (pandas.DataFrame) – Targeted data.
  • cols (list(str), default None) – Given columns to calculate VIF.
Returns:

Return VIF for each variable included in cols.

Return type:

pandas.Series

Examples

>>> VIF_Test(data[x_cols])

PLSA.utils.write module

Module for outputting result

The function of this Module is served for outputting result.

PLSA.utils.write.xgboost_to_pmml(data_X, data_y, par_file, save_model_as)

Save Xgboost Model to PMMl file.

Parameters:
  • data_X (pandas.DataFrame) – Variables of train data.
  • date_y (pandas.DataFrame) – Lables of train data.
  • par_file (str) – File path of model’s parameters.
  • save_model_as (str) – File path of PMML.
Returns:

Generate PMML file locally as save_model_as given.

Return type:

None

Examples

>>> xgboost_to_pmml(data_x, data_y, "par.json", "model.pmml")

Module contents