WhiteBox Part 2: Interpretable Machine Learning

7 minute read

WhiteBox-Part2

Hits

The White Box Project is a project that introduces many ways to solve the part of the black box of machine learning. This project is based on Interpretable Machine Learning by Christoph Molnar [1]. I recommend you to read the book first and practice this project. If you are R user, you can see R code used in examples here.

한글로 번역된 내용은 여기서 확인하실 수 있습니다. 번역은 저자와 협의 후 진행되었음을 알립니다.

만약 번역본에 잘못된 해석이 있다면 wogur379@gmail.com 또는 issue에 남겨주세요. 감사합니다.

Purpose

The goal is to analysis various data into black box models and to build a pipeline of analysis reports using interpretable methods.

Requirements

numpy == 1.17.3
scikit-learn == 0.21.2
xgboost == 0.90
tensorflow == 1.14.0

Dataset

  1. Titanic: Machine Learning from Disaster (Classification) [2]
  2. Cervical Cancer (Classification) [3]
  3. House Prices: Advanced Regression Techniques (Regression) [4]
  4. Bike Sharing (Regression) [5]
  5. Youtube Spam (Classification & NLP) [6]

Black Box Models

The parameters used to learn the model can be found here.

  1. Random Forest (RF)
  2. XGboost (XGB)
  3. LigthGBM (LGB)
  4. Deep Neural Network (DNN)

Interpretable Methods

Model-specific methods [ English , Korean ]

Model-agnostic methods [ English , Korean ]

Python Implementation

Interpretable Models

Name Packages
Linear Regression scikit-learn statsmodels
Logistic Regression scikit-learn statsmodels
Ridge Regression scikit-learn statsmodels
Lasso Regression scikit-learn statsmodels
Generalized Linear Model (GLM) statsmodels
Generalized Additive Model (GAM) statsmodels pyGAM
Decision Tree scikit-learn
Baysian Rule Lists skater
RuleFit rulefit
Skope-rules skope-rules

Model-Agnostic Methods

Name Packages
Partial Dependence Plot (PDP) skater scikit-learn
Individual Conditional Expectation (ICE) Plot PyCEbox
Feature Importance skater
Local Surrogate skater lime
Global Surrogate skater
Scoped Rules (Anchors) alibi
SHapley Additive exPlanation (SHAP) shap

Example-Based Explanations

Name Packages
Contrastive Explanations Method (CEM) alibi
Counterfactual Instances alibi
Prototype Counterfactuals MMD-critic
Influence Instances influence-release

Install packages

scikit-learn

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license.

The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the About us page for a list of core contributors.

It is currently maintained by a team of volunteers.

Scikit-learn is available in through conda provided by Anaconda.

Installation

# Pip
pip install -U scikit-learn
# Conda
onda install scikit-learn
import sklearn

statsmodels

statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at statsmodels.org.

Statsmodels is available in through conda provided by Anaconda.

Installation

# Pip
pip install statsmodels 
# Conda
conda install -c conda-forge statsmodels
import statsmodels

pyGAM

pyGAM is a package for building Generalized Additive Models in Python, with an emphasis on modularity and performance. The API will be immediately familiar to anyone with experience of scikit-learn or scipy.

Installation

# Pip
pip install pygam
# Conda
conda install -c conda-forge pygam
import pygam

skater

Skater is a open source unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases. Skater supports algorithms to demystify the learned structures of a black box model both globally(inference on the basis of a complete data set) and locally(inference about an individual prediction).

Installation

# Option 1: without rule lists and without deepinterpreter
pip install -U skater

# Option 2: without rule lists and with deepinterpreter:
pip3 install --upgrade tensorflow 
sudo pip install keras
pip install -U skater

# Option 3: For everything included
conda install gxx_linux-64
pip3 install --upgrade tensorflow 
sudo pip install keras
sudo pip install -U --no-deps --force-reinstall --install-option="--rl=True" skater==1.1.1b1

# Conda
conda install -c conda-forge Skater
import skater

PDPbox

python partial dependence plot toolbox

This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model prediction for any supervised learning algorithm using partial dependence plots R1 R2. PDPbox now supports all scikit-learn algorithms.

Installation

# Pip
pip install pdpbox
import pdpbox

LIME

This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data) or images, with a package called lime (short for local interpretable model-agnostic explanations). Lime is based on the work presented in this paper (bibtex here for citation).

Installation

# Pip
pip install lime
import lime

PyCEbox

A Python implementation of individual conditional expecation plots inspired by R’s ICEbox. Individual conditional expectation plots were introduced in Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation (arXiv:1309.6392).

Installation

# Pip 
pip install pycebox
import pycebox

rulefit

Implementation of a rule based prediction algorithm based on [the rulefit algorithm from Friedman and Popescu (PDF)(http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf)]

Installation

# Pip
pip install git+git://github.com/christophM/rulefit.git
import rulefit

skope-rules

Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.

Skope-rules aims at learning logical, interpretable rules for “scoping” a target class, i.e. detecting with high precision instances of this class.

Skope-rules is a trade off between the interpretability of a Decision Tree and the modelization power of a Random Forest.

Installation

# Pip
pip install skope-rules
import skrules

alibi

Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The initial focus on the library is on black-box, instance based model explanations.

Installation

# Pip 
pip install alibi
import alibi

influence-instance

This code replicates the experiments from the following paper:

Pang Wei Koh and Percy Liang Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017.

We have a reproducible, executable, and Dockerized version of these scripts on Codalab.

The datasets for the experiments can also be found at the Codalab link.

shap

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (see papers for details and citations).

Installation

# Pip
pip install shap
# Conda
conda install -c conda-forge shap
import shap

MMD-critic

This method is proposed in this papaer.

Abstract
Example-based explanations are widely used in the effort to improve the interpretability of highly complex distributions. However, prototypes alone are rarely sufficient to represent the gist of the complexity. In order for users to construct better mental models and understand complex data distributions, we also need criticism to explain what are not captured by prototypes. Motivated by the Bayesian model criticism framework, we develop MMD-critic which efficiently learns prototypes and criticism, designed to aid human interpretability. A human subject pilot study shows that the MMD-critic selects prototypes and criticism that are useful to facilitate human understanding and reasoning. We also evaluate the prototypes selected by MMD-critic via a nearest prototype classifier, showing competitive performance compared to baselines.

Reference

[1] Molnar, Christoph. “Interpretable machine learning. A Guide for Making Black Box Models Explainable”, 2019. https://christophm.github.io/interpretable-ml-book/.

[2] Kaggle Competiton : Titanic: Machine Learning from Disaster

[3] Kelwin Fernandes, Jaime S. Cardoso, and Jessica Fernandes. ‘Transfer Learning with Partial Observability Applied to Cervical Cancer Screening.’ Iberian Conference on Pattern Recognition and Image Analysis. Springer International Publishing, 2017. [Link]

[4] Kaggle Competition : House Prices: Advanced Regression Techniques

[5] Fanaee-T, Hadi, and Gama, Joao, “Event labeling combining ensemble detectors and background knowledge”, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg. [Link]

[6] Alberto, T.C., Lochter J.V., Almeida, T.A. TubeSpam: Comment Spam Filtering on YouTube. Proceedings of the 14th IEEE International Conference on Machine Learning and Applications (ICMLA’15), 1-6, Miami, FL, USA, December, 2015. [Link]

[7] Lundberg, Scott M., and Su-In Lee. “A unified approach to interpreting model predictions.” Advances in Neural Information Processing Systems. 2017. (Korean Version)