PLSA.utils package¶

Submodules¶

PLSA.utils.cutoff module¶

Module for determinding cutoffs in common

The function of this Module is served for determinding cutoffs by different methods in common.

PLSA.utils.cutoff.accuracy(y_true, y_prob)¶

Cutoff maximize accuracy.

Parameters:	y_true (np.array or pandas.Series) – True value. y_prob (np.array or pandas.Series) – Predicted value.
Returns:	Optimal cutoff and max metrics.
Return type:	tuple(float, float)

Examples

>>> accuracy(y_true, y_prob)

PLSA.utils.cutoff.youden(target, predicted)¶

Cutoff maximize Youden Index.

Parameters:	target (np.array or pandas.Series) – True value. predicted (np.array or pandas.Series) – Predicted value.
Returns:	optimal cutoff and max metrics.
Return type:	tuple(float, float)

Examples

>>> youden(y_true, y_prob)

PLSA.utils.metrics module¶

Module for evaluating model by many kinds of metrics

The function of this Module is served for evaluating model by many kinds of metrics.

PLSA.utils.metrics.calibration(y_true, pred_proba, n_bins=10, in_sample=False)¶

Calibration and test of predictive model.

Parameters:	y_true (np.array or pandas.Series) – True label. pred_proba (np.array or pandas.Series) – Predicted label. n_bins (int) – Number of groups. in_sample (bool, default False) – Is Calibration-Test in sample.
Returns:	Table of calibration.
Return type:	pandas.DataFrame

Examples

>>> calibration(y_test, y_pred, n_bins=5)

PLSA.utils.metrics.calibration_table(y_true, y_prob, normalize=False, n_bins=10)¶

Calibration table of predictive model.

Parameters:	y_true (np.array or pandas.Series) – True label. y_prob (np.array or pandas.Series) – Predicted label. n_bins (int) – Number of groups.
Returns:	true, sum and total number of each group.
Return type:	tuple(numpy.array)

Examples

>>> calibration_table(y_test, y_pred, n_bins=5)

PLSA.utils.metrics.discrimination(y_true, y_pred_proba, threshold=None, name='Model X')¶

Discrimination of classification model.

Parameters:

y_true (np.array or pandas.Series) – True label.
pred_proba (np.array or pandas.Series) – Predicted label.
threshold (float) – Cutoff value.
name (str) – Title for printing.

Returns:

Dict with kinds of metrics.

{

“points”: threshold, “Sen”: Re, “Spe”: Spe, “Acc”: Accuracy, “F1”: F1

}

Return type:

dict

Examples

>>> discrimination(y_true, y_pred_proba, threshold=0.21)

PLSA.utils.metrics.discrimination_ver(y_true, y_pred_proba, threshold=None, name='Model X')¶

Discrimination of classification model in version 2.

Parameters:

y_true (np.array or pandas.Series) – True label.
pred_proba (np.array or pandas.Series) – Predicted label.
threshold (float) – Cutoff value.
name (str) – Title for printing.

Returns:

Dict with kinds of metrics.

{

“points”: threshold, “Sen”: Sen, “Spe”: Spe, “PPV”: ppv, “NPV”: npv

}

Return type:

dict

Examples

>>> discrimination_ver(y_true, y_pred_proba, threshold=0.21)

PLSA.utils.test module¶

Module for statistical test

The function of this Module is served for statistical test.

PLSA.utils.test.Delong_Test(y_true, pred_a, pred_b)¶

Delong-Test for comparing two predictive model.

Parameters:	y_true (numpy.array or pandas.Series.) – True label. pred_a (numpy.array or pandas.Series.) – Prediction of model A. pred_b (numpy.array or pandas.Series.) – Prediction of model B.
Returns:	chi2 value and P-value.
Return type:	tuple

Examples

>>> # pred_proba1 = xgb1.predict_proba(test_X)
>>> # pred_proba2 = xgb2.predict_proba(test_X)
>>> Delong_test(test_y, pred_proba1[:, 1], pred_proba2[:, 1])

PLSA.utils.test.Hosmer_Lemeshow_Test(bins_true, bins_pred, bins_tot, n_bins=10, in_sample=False)¶

Hosmer-Lemeshow Test for testing calibration.

Parameters:	bins_true (numpy.array) – True Number of people in each group. bins_pred (numpy.array) – Pred Number of people in each group. bins_tot (numpy.array) – Totol Number of people in each group. n_bins (int) – Number of groups. in_sample (bool, default False) – Is Calibration-Test in sample.
Returns:	chi2 value and P value.
Return type:	tuple

Examples

>>> Hosmer_Lemeshow_Test(bins_true, bins_pred, bins_tot, n_bins=5)

PLSA.utils.test.VIF_Test(data, cols=None)¶

Variance Inflation Factors for each variable.

Parameters:	data (pandas.DataFrame) – Targeted data. cols (list(str), default None) – Given columns to calculate VIF.
Returns:	Return VIF for each variable included in cols.
Return type:	pandas.Series

Examples

>>> VIF_Test(data[x_cols])

PLSA.utils.write module¶

Module for outputting result

The function of this Module is served for outputting result.

PLSA.utils.write.xgboost_to_pmml(data_X, data_y, par_file, save_model_as)¶

Save Xgboost Model to PMMl file.

Parameters:	data_X (pandas.DataFrame) – Variables of train data. date_y (pandas.DataFrame) – Lables of train data. par_file (str) – File path of model’s parameters. save_model_as (str) – File path of PMML.
Returns:	Generate PMML file locally as save_model_as given.
Return type:	None

Examples

>>> xgboost_to_pmml(data_x, data_y, "par.json", "model.pmml")

PLSA.utils package¶

Submodules¶

PLSA.utils.cutoff module¶

PLSA.utils.metrics module¶

PLSA.utils.test module¶

PLSA.utils.write module¶

Module contents¶