WhiteBox Part 2: Interpretable Machine Learning

7 minute read

WhiteBox-Part2

The White Box Project is a project that introduces many ways to solve the part of the black box of machine learning. This project is based on Interpretable Machine Learning by Christoph Molnar [1]. I recommend you to read the book first and practice this project. If you are R user, you can see R code used in examples here.

한글로 번역된 내용은 여기서 확인하실 수 있습니다. 번역은 저자와 협의 후 진행되었음을 알립니다.

만약 번역본에 잘못된 해석이 있다면 wogur379@gmail.com 또는 issue에 남겨주세요. 감사합니다.

Purpose

The goal is to analysis various data into black box models and to build a pipeline of analysis reports using interpretable methods.

Requirements

numpy == 1.17.3
scikit-learn == 0.21.2
xgboost == 0.90
tensorflow == 1.14.0

Dataset

Titanic: Machine Learning from Disaster (Classification) [2]
Cervical Cancer (Classification) [3]
House Prices: Advanced Regression Techniques (Regression) [4]
Bike Sharing (Regression) [5]
Youtube Spam (Classification & NLP) [6]

Black Box Models

The parameters used to learn the model can be found here.

Random Forest (RF)
XGboost (XGB)
LigthGBM (LGB)
Deep Neural Network (DNN)

Interpretable Methods

Model-specific methods [ English , Korean ]

Linear Regression [ English , Korean ]
Logistic Regression [ English , Korean ]
GLM, GAM and more [ English , Korean ]
Decision Tree [ English , Korean ]
Decision Rules [ English , Korean ]
RuleFit [ English , Korean ]
Other Interpretable Models [ English , Korean ]

Model-agnostic methods [ English , Korean ]

Partial Dependence Plot (PDP) [ English , Korean ]
Individual Conditional Expectation (ICE) [ English , Korean ]
Accumulated Local Effects (ALE) Plot [ English , Korean ]
Feature Interaction [ English , Korean ]
Permutation Feature Importance [ English , Korean ]
Global Surrogate [ English , Korean ]
Local Surrogate (LIME) [ English , Korean ]
Scoped Rules (Anchors) [ English , Korean ]
Shapley Values [ English , Korean ]
SHAP (SHapley Additive exPlanations) [ English , Korean ]

Python Implementation

Interpretable Models

Name	Packages
Linear Regression	`scikit-learn` `statsmodels`
Logistic Regression	`scikit-learn` `statsmodels`
Ridge Regression	`scikit-learn` `statsmodels`
Lasso Regression	`scikit-learn` `statsmodels`
Generalized Linear Model (GLM)	`statsmodels`
Generalized Additive Model (GAM)	`statsmodels` `pyGAM`
Decision Tree	`scikit-learn`
Baysian Rule Lists	`skater`
RuleFit	`rulefit`
Skope-rules	`skope-rules`

Model-Agnostic Methods

Name	Packages
Partial Dependence Plot (PDP)	`skater` `scikit-learn`
Individual Conditional Expectation (ICE) Plot	`PyCEbox`
Feature Importance	`skater`
Local Surrogate	`skater` `lime`
Global Surrogate	`skater`
Scoped Rules (Anchors)	`alibi`
SHapley Additive exPlanation (SHAP)	`shap`

Example-Based Explanations

Name	Packages
Contrastive Explanations Method (CEM)	`alibi`
Counterfactual Instances	`alibi`
Prototype Counterfactuals	`MMD-critic`
Influence Instances	`influence-release`

Install packages

scikit-learn

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license.

The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the About us page for a list of core contributors.

It is currently maintained by a team of volunteers.

Scikit-learn is available in through conda provided by Anaconda.

Documentation : https://scikit-learn.org/stable/
Github Repository : https://github.com/scikit-learn/scikit-learn

Installation

# Pip
pip install -U scikit-learn
# Conda
onda install scikit-learn

import sklearn

statsmodels

statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at statsmodels.org.

Statsmodels is available in through conda provided by Anaconda.

Documentation : https://www.statsmodels.org/stable/index.html
Github Repository : https://github.com/statsmodels/statsmodels

Installation

# Pip
pip install statsmodels 
# Conda
conda install -c conda-forge statsmodels

import statsmodels

pyGAM

pyGAM is a package for building Generalized Additive Models in Python, with an emphasis on modularity and performance. The API will be immediately familiar to anyone with experience of scikit-learn or scipy.

Documentation : https://pygam.readthedocs.io/en/latest/notebooks/quick_start.html
Github Repository : https://github.com/dswah/pyGAM

Installation

# Pip
pip install pygam
# Conda
conda install -c conda-forge pygam

import pygam

skater

Skater is a open source unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases. Skater supports algorithms to demystify the learned structures of a black box model both globally(inference on the basis of a complete data set) and locally(inference about an individual prediction).

Documentation : https://oracle.github.io/Skater/index.html
Github Repository : https://github.com/oracle/Skater

Installation

# Option 1: without rule lists and without deepinterpreter
pip install -U skater

# Option 2: without rule lists and with deepinterpreter:
pip3 install --upgrade tensorflow 
sudo pip install keras
pip install -U skater

# Option 3: For everything included
conda install gxx_linux-64
pip3 install --upgrade tensorflow 
sudo pip install keras
sudo pip install -U --no-deps --force-reinstall --install-option="--rl=True" skater==1.1.1b1

# Conda
conda install -c conda-forge Skater

import skater

PDPbox

python partial dependence plot toolbox

This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model prediction for any supervised learning algorithm using partial dependence plots R1 R2. PDPbox now supports all scikit-learn algorithms.

Documentation : https://pdpbox.readthedocs.io/en/latest/index.html#
Github Repository : https://github.com/SauceCat/PDPbox

Installation

# Pip
pip install pdpbox

import pdpbox

LIME

This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data) or images, with a package called lime (short for local interpretable model-agnostic explanations). Lime is based on the work presented in this paper (bibtex here for citation).

Documentation : https://lime-ml.readthedocs.io/en/latest/index.html
Github Repository : https://github.com/marcotcr/lime

Installation

# Pip
pip install lime

import lime

PyCEbox

A Python implementation of individual conditional expecation plots inspired by R’s ICEbox. Individual conditional expectation plots were introduced in Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation (arXiv:1309.6392).

Documentation : http://austinrochford.github.io/PyCEbox/docs/
Github Repository : https://github.com/AustinRochford/PyCEbox

Installation

# Pip 
pip install pycebox

import pycebox

rulefit

Implementation of a rule based prediction algorithm based on [the rulefit algorithm from Friedman and Popescu (PDF)(http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf)]

Github Repository : https://github.com/christophM/rulefit

Installation

# Pip
pip install git+git://github.com/christophM/rulefit.git

import rulefit

skope-rules

Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.

Skope-rules aims at learning logical, interpretable rules for “scoping” a target class, i.e. detecting with high precision instances of this class.

Skope-rules is a trade off between the interpretability of a Decision Tree and the modelization power of a Random Forest.

Documentation : https://skope-rules.readthedocs.io/en/latest/index.html
Github Repository : https://github.com/scikit-learn-contrib/skope-rules

Installation

# Pip
pip install skope-rules

import skrules

alibi

Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The initial focus on the library is on black-box, instance based model explanations.

Documentation : https://docs.seldon.io/projects/alibi/en/latest/#
Github Repository : https://github.com/SeldonIO/alibi

Installation

# Pip 
pip install alibi

import alibi

influence-instance

This code replicates the experiments from the following paper:

Pang Wei Koh and Percy Liang Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017.

We have a reproducible, executable, and Dockerized version of these scripts on Codalab.

The datasets for the experiments can also be found at the Codalab link.

Github Repository : https://github.com/kohpangwei/influence-release

shap

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (see papers for details and citations).

Documentation : https://shap.readthedocs.io/en/latest/#
Github Repository : https://github.com/slundberg/shap

Installation

# Pip
pip install shap
# Conda
conda install -c conda-forge shap

import shap

MMD-critic

This method is proposed in this papaer.

Abstract
Example-based explanations are widely used in the effort to improve the interpretability of highly complex distributions. However, prototypes alone are rarely sufficient to represent the gist of the complexity. In order for users to construct better mental models and understand complex data distributions, we also need criticism to explain what are not captured by prototypes. Motivated by the Bayesian model criticism framework, we develop MMD-critic which efficiently learns prototypes and criticism, designed to aid human interpretability. A human subject pilot study shows that the MMD-critic selects prototypes and criticism that are useful to facilitate human understanding and reasoning. We also evaluate the prototypes selected by MMD-critic via a nearest prototype classifier, showing competitive performance compared to baselines.

Github Repository : https://github.com/BeenKim/MMD-critic

Reference

[1] Molnar, Christoph. “Interpretable machine learning. A Guide for Making Black Box Models Explainable”, 2019. https://christophm.github.io/interpretable-ml-book/.

[2] Kaggle Competiton : Titanic: Machine Learning from Disaster

[3] Kelwin Fernandes, Jaime S. Cardoso, and Jessica Fernandes. ‘Transfer Learning with Partial Observability Applied to Cervical Cancer Screening.’ Iberian Conference on Pattern Recognition and Image Analysis. Springer International Publishing, 2017. [Link]

[4] Kaggle Competition : House Prices: Advanced Regression Techniques

[5] Fanaee-T, Hadi, and Gama, Joao, “Event labeling combining ensemble detectors and background knowledge”, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg. [Link]

[6] Alberto, T.C., Lochter J.V., Almeida, T.A. TubeSpam: Comment Spam Filtering on YouTube. Proceedings of the 14th IEEE International Conference on Machine Learning and Applications (ICMLA’15), 1-6, Miami, FL, USA, December, 2015. [Link]

[7] Lundberg, Scott M., and Su-In Lee. “A unified approach to interpreting model predictions.” Advances in Neural Information Processing Systems. 2017. (Korean Version)

Role

Responsibilities

Period

Project Contents

WhiteBox Part 2: Interpretable Machine Learning

WhiteBox-Part2

Purpose

Requirements

Dataset

Black Box Models

Interpretable Methods

Python Implementation

Install packages

scikit-learn

statsmodels

pyGAM

skater

PDPbox

LIME

PyCEbox

rulefit

skope-rules

alibi

influence-instance

shap

MMD-critic

Reference