pygmtools: Python Graph Matching Tools¶
pygmtools
provides graph matching solvers in Python and is easily accessible via:
$ pip install pygmtools
Official documentation: https://pygmtools.readthedocs.io
Source code: https://github.com/Thinklab-SJTU/pygmtools
Graph matching is a fundamental yet challenging problem in pattern recognition, data mining, and others. Graph matching aims to find node-to-node correspondence among multiple graphs, by solving an NP-hard combinatorial optimization problem.
Doing graph matching in Python used to be non-trivial, and this library wants to make researchers’ lives easier.
To highlight, pygmtools
has the following features:
Support various solvers, including traditional combinatorial solvers (including linear, quadratic, and multi-graph) and novel deep learning-based solvers;
Support various backends, including
numpy
which is universally accessible, and some state-of-the-art deep learning architectures with GPU support:pytorch
,paddle
,jittor
.Deep learning friendly, the operations are designed to best preserve the gradient during computation and batched operations support for the best performance.
Installation¶
You can install the stable release on PyPI:
$ pip install pygmtools
or get the latest version by running:
$ pip install -U https://github.com/Thinklab-SJTU/pygmtools/archive/master.zip # with --user for user install (no root)
Now the pygmtools is available with the numpy
backend.
The following packages are required, and shall be automatically installed by pip
:
Python >= 3.5
requests >= 2.25.1
scipy >= 1.4.1
Pillow >= 7.2.0
numpy >= 1.18.5
easydict >= 1.7
appdirs >= 1.4.4
tqdm >= 4.64.1
Available Graph Matching Solvers¶
This library offers user-friendly API for the following solvers:
-
Linear assignment solvers including the differentiable soft Sinkhorn algorithm [1], and the exact solver Hungarian [2].
Soft and differentiable quadratic assignment solvers, including spectral graph matching [3] and random-walk-based graph matching [4].
Discrete (non-differentiable) quadratic assignment solver integer projected fixed point method [5].
-
Composition based Affinity Optimization (CAO) solver [6] by optimizing the affinity score, meanwhile gradually infusing the consistency.
Multi-Graph Matching based on Floyd shortest path algorithm [7].
Graduated-assignment based multi-graph matching solver [8][9] by graduated annealing of Sinkhorn’s temperature.
-
Intra-graph and cross-graph embedding based neural graph matching solvers PCA-GM and IPCA-GM [10] for matching individual graphs.
Channel independent embedding (CIE) [11] based neural graph matching solver for matching individual graphs.
Neural graph matching solver (NGM) [12] for the general quadratic assignment formulation.
Available Backends¶
This library is designed to support multiple backends with the same set of API. Please follow the official instructions to install your backend.
The following backends are available:
Numpy (default backend, CPU only)
PyTorch (recommended backend, GPU friendly, deep learning friendly)
PaddlePaddle (GPU friendly, deep learning friendly)
Jittor (GPU friendly, deep learning friendly)
For more details, please read the documentation.
The Deep Graph Matching Benchmark¶
pygmtools
is also featured with a standard data interface of several graph matching benchmarks. We also maintain a
repository containing non-trivial implementation of deep graph matching models, please check out
ThinkMatch if you are interested!
Contributing¶
Any contributions/ideas/suggestions from the community is welcomed! Before starting your contribution, please read the Contributing Guide.
Developers and Maintainers¶
pygmtools
is currently developed and maintained by members from ThinkLab at
Shanghai Jiao Tong University.
References¶
[1] Sinkhorn, Richard, and Paul Knopp. “Concerning nonnegative matrices and doubly stochastic matrices.” Pacific Journal of Mathematics 21.2 (1967): 343-348.
[2] Munkres, James. “Algorithms for the assignment and transportation problems.” Journal of the society for industrial and applied mathematics 5.1 (1957): 32-38.
[3] Leordeanu, Marius, and Martial Hebert. “A spectral technique for correspondence problems using pairwise constraints.” International Conference on Computer Vision (2005).
[4] Cho, Minsu, Jungmin Lee, and Kyoung Mu Lee. “Reweighted random walks for graph matching.” European conference on Computer vision. Springer, Berlin, Heidelberg, 2010.
[5] Leordeanu, Marius, Martial Hebert, and Rahul Sukthankar. “An integer projected fixed point method for graph matching and map inference.” Advances in neural information processing systems 22 (2009).
[6] Yan, Junchi, et al. “Multi-graph matching via affinity optimization with graduated consistency regularization.” IEEE transactions on pattern analysis and machine intelligence 38.6 (2015): 1228-1242.
[7] Jiang, Zetian, Tianzhe Wang, and Junchi Yan. “Unifying offline and online multi-graph matching via finding shortest paths on supergraph.” IEEE transactions on pattern analysis and machine intelligence 43.10 (2020): 3648-3663.
[8] Solé-Ribalta, Albert, and Francesc Serratosa. “Graduated assignment algorithm for multiple graph matching based on a common labeling.” International Journal of Pattern Recognition and Artificial Intelligence 27.01 (2013): 1350001.
[9] Wang, Runzhong, Junchi Yan, and Xiaokang Yang. “Graduated assignment for joint multi-graph matching and clustering with application to unsupervised graph matching network learning.” Advances in Neural Information Processing Systems 33 (2020): 19908-19919.
[10] Wang, Runzhong, Junchi Yan, and Xiaokang Yang. “Combinatorial learning of robust deep graph matching: an embedding based approach.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).
[11] Yu, Tianshu, et al. “Learning deep graph matching with channel-independent embedding and hungarian attention.” International conference on learning representations. 2019.
[12] Wang, Runzhong, Junchi Yan, and Xiaokang Yang. “Neural graph matching network: Learning lawler’s quadratic assignment problem with extension to hypergraph and multiple-graph matching.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
Contents of Official Documentation¶
Introduction and Guidelines¶
This page provides a brief introduction to graph matching and some guidelines for using pygmtools
.
If you are seeking some background information, this is the right place!
Note
For more technical details, we recommend the following two surveys.
About learning-based deep graph matching: Junchi Yan, Shuang Yang, Edwin Hancock. “Learning Graph Matching and Related Combinatorial Optimization Problems.” IJCAI 2020.
About non-learning two-graph matching and multi-graph matching: Junchi Yan, Xu-Cheng Yin, Weiyao Lin, Cheng Deng, Hongyuan Zha, Xiaokang Yang. “A Short Survey of Recent Advances in Graph Matching.” ICMR 2016.
Why Graph Matching?¶
Graph Matching (GM) is a fundamental yet challenging problem in pattern recognition, data mining, and others. GM aims to find node-to-node correspondence among multiple graphs, by solving an NP-hard combinatorial problem. Recently, there is growing interest in developing deep learning-based graph matching methods.
Compared to other straight-forward matching methods e.g. greedy matching, graph matching methods are more reliable because it is based on an optimization form. Besides, graph matching methods exploit both node affinity and edge affinity, thus graph matching methods are usually more robust to noises and outliers. The recent line of deep graph matching methods also enables many graph matching solvers to be integrated into a deep learning pipeline.
Graph matching techniques have been applied to the following applications:
-
-
Model ensemble and federated learning
-
and more…
If your task involves matching two or more graphs, you should try the solvers in pygmtools
!
What is Graph Matching?¶
The Graph Matching Pipeline¶
Solving a real-world graph-matching problem may involve the following steps:
Extract node/edge features from the graphs you want to match.
Build an affinity matrix from node/edge features.
Solve the graph matching problem with GM solvers.
And Step 1 may be done by methods depending on your application, Step 2&3 can be handled by pygmtools
.
The following plot illustrates a standard deep graph matching pipeline.

The Math Form¶
Let’s involve a little bit of math to better understand the graph matching pipeline. In general, graph matching is of the following form, known as Quadratic Assignment Problem (QAP):
The notations are explained as follows:
\(\mathbf{X}\) is known as the permutation matrix which encodes the matching result. It is also the decision variable in graph matching problem. \(\mathbf{X}_{i,a}=1\) means node \(i\) in graph 1 is matched to node \(a\) in graph 2, and \(\mathbf{X}_{i,a}=0\) means non-matched. Without loss of generality, it is assumed that \(n_1\leq n_2.\) \(\mathbf{X}\) has the following constraints:
The sum of each row must be equal to 1: \(\mathbf{X}\mathbf{1} = \mathbf{1}\);
The sum of each column must be equal to, or smaller than 1: \(\mathbf{X}\mathbf{1} \leq \mathbf{1}\).
\(\mathtt{vec}(\mathbf{X})\) means the column-wise vectorization form of \(\mathbf{X}\).
\(\mathbf{1}\) means a column vector whose elements are all 1s.
\(\mathbf{K}\) is known as the affinity matrix which encodes the information of the input graphs. Both node-wise and edge-wise affinities are encoded in \(\mathbf{K}\):
The diagonal element \(\mathbf{K}_{i + a\times n_1, i + a\times n_1}\) means the node-wise affinity of node \(i\) in graph 1 and node \(a\) in graph 2;
The off-diagonal element \(\mathbf{K}_{i + a\times n_1, j + b\times n_1}\) means the edge-wise affinity of edge \(ij\) in graph 1 and edge \(ab\) in graph 2.
Graph Matching Best Practice¶
We need to understand the advantages and limitations of graph matching solvers. As discussed above, the major advantage of graph matching solvers is that they are more robust to noises and outliers. Graph matching also utilizes edge information, which is usually ignored in linear matching methods. The major drawback of graph matching solvers is their efficiency and scalability since the optimization problem is NP-hard. Therefore, to decide which matching method is most suitable, one needs to balance between the required matching accuracy and the affordable time and memory cost according to his/her application.
Note
Anyway, it does no harm to try graph matching first!
When to use pygmtools¶
pygmtools
is recommended for the following cases, and you could benefit from the friendly API:
If you want to integrate graph matching as a step of your pipeline (either learning or non-learning).
If you want a quick benchmarking and profiling of the graph matching solvers available in
pygmtools
.If you do not want to dive too deep into the algorithm details and do not need to modify the algorithm.
We offer the following guidelines for your reference:
If you want to integrate graph matching solvers into your end-to-end supervised deep learning pipeline, try
neural_solvers
.If no ground truth label is available for the matching step, try
classic_solvers
.If there are multiple graphs to be jointly matched, try
multi_graph_solvers
.If time and memory cost of the above methods are unacceptable for your task, try
linear_solvers
.
When not to use pygmtools¶
As a highly packed toolkit, pygmtools
lacks some flexibilities in the implementation details, especially for
experts in graph matching. If you are researching new graph matching algorithms or developing next-generation deep
graph matching neural networks, pygmtools
may not be suitable. We recommend
ThinkMatch as the protocol for academic research.
Get Started¶
Basic Install by pip¶
You can install the stable release on PyPI:
$ pip install pygmtools
or get the latest version by running:
$ pip install -U https://github.com/Thinklab-SJTU/pygmtools/archive/master.zip # with --user for user install (no root)
Now the pygmtools is available with the numpy
backend:

You may jump to Example: Matching Isomorphic Graphs if you do not need other backends.
The following packages are required, and shall be automatically installed by pip
:
Python >= 3.5
requests >= 2.25.1
scipy >= 1.4.1
Pillow >= 7.2.0
numpy >= 1.18.5
easydict >= 1.7
appdirs >= 1.4.4
tqdm >= 4.64.1
Install Other Backends¶
Currently, we also support deep learning architectures pytorch
, paddle
, jittor
which are GPU-friendly and deep learning-friendly.
Once the backend is ready, you may switch to the backend globally by the following command:
>>> import pygmtools as pygm
>>> pygm.BACKEND = 'pytorch' # replace 'pytorch' by other backend names
PyTorch Backend¶

PyTorch is an open-source machine learning framework developed and maintained by Meta Inc./Linux Foundation.
PyTorch is popular, especially among the deep learning research community.
The PyTorch backend of pygmtools
is designed to support GPU devices and facilitate deep learning research.
Please follow the official PyTorch installation guide.
This package is developed with torch==1.6.0
and shall work with any PyTorch versions >=1.6.0
.
How to enable PyTorch backend:
>>> import pygmtools as pygm
>>> import torch
>>> pygm.BACKEND = 'pytorch'
Paddle Backend¶

PaddlePaddle is an open-source deep learning platform originated from industrial practice, which is developed and
maintained by Baidu Inc.
The Paddle backend of pygmtools
is designed to support GPU devices and deep learning applications.
Please follow the official PaddlePaddle installation guide.
This package is developed with paddlepaddle==2.3.1
and shall work with any PaddlePaddle versions >=2.3.1
.
How to enable Paddle backend:
>>> import pygmtools as pygm
>>> import paddle
>>> pygm.BACKEND = 'paddle'
Jittor Backend¶

Jittor is an open-source deep learning platform based on just-in-time (JIT) for high performance, which is developed
and maintained by the CSCG group from Tsinghua University.
The Jittor backend of pygmtools
is designed to support GPU devices and deep learning applications.
Please follow the official Jittor installation guide.
This package is developed with jittor==1.3.4.16
and shall work with any Jittor versions >=1.3.4.16
.
How to enable Jittor backend:
>>> import pygmtools as pygm
>>> import jittor
>>> pygm.BACKEND = 'jittor'
Example: Matching Isomorphic Graphs¶
Here we provide a basic example of matching two isomorphic graphs (i.e. two graphs have the same nodes and edges, but the node permutations are unknown).
Step 0: Import packages and set backend
>>> import numpy as np
>>> import pygmtools as pygm
>>> pygm.BACKEND = 'numpy'
>>> np.random.seed(1)
Step 1: Generate a batch of isomorphic graphs
>>> batch_size = 3
>>> X_gt = np.zeros((batch_size, 4, 4))
>>> X_gt[:, np.arange(0, 4, dtype=np.int64), np.random.permutation(4)] = 1
>>> A1 = np.random.rand(batch_size, 4, 4)
>>> A2 = np.matmul(np.matmul(X_gt.transpose((0, 2, 1)), A1), X_gt)
>>> n1 = n2 = np.repeat([4], batch_size)
Step 2: Build an affinity matrix and select an affinity function
>>> conn1, edge1, ne1 = pygm.utils.dense_to_sparse(A1)
>>> conn2, edge2, ne2 = pygm.utils.dense_to_sparse(A2)
>>> import functools
>>> gaussian_aff = functools.partial(pygm.utils.gaussian_aff_fn, sigma=1.) # set affinity function
>>> K = pygm.utils.build_aff_mat(None, edge1, conn1, None, edge2, conn2, n1, ne1, n2, ne2, edge_aff_fn=gaussian_aff)
Step 3: Solve graph matching by RRWM
>>> X = pygm.rrwm(K, n1, n2, beta=100)
>>> X = pygm.hungarian(X)
>>> X # X is the permutation matrix
[[[0. 0. 0. 1.]
[0. 0. 1. 0.]
[1. 0. 0. 0.]
[0. 1. 0. 0.]]
[[0. 0. 0. 1.]
[0. 0. 1. 0.]
[1. 0. 0. 0.]
[0. 1. 0. 0.]]
[[0. 0. 0. 1.]
[0. 0. 1. 0.]
[1. 0. 0. 0.]
[0. 1. 0. 0.]]]
Final Step: Evaluate the accuracy
>>> (X * X_gt).sum() / X_gt.sum()
1.0
Graph Matching Benchmark¶
pygmtools also provides a protocol to fairly compare existing deep graph matching algorithms under different datasets & experiment settings.
The Benchmark
module provides a unified data interface and an evaluating platform for different datasets.
If you are interested in the performance and the full deep learning pipeline, please refer to our ThinkMatch project.
Evaluation Metrics and Results¶
Our evaluation metrics include matching_precision (p), matching_recall (r) and f1_score (f1). Also, to measure the reliability of the evaluation result, we define coverage (cvg) for each class in the dataset as the number of evaluated pairs in the class/number of all possible pairs in the class. Therefore, larger coverage refers to higher reliability.
An example of evaluation result (p==r==f1
because this evaluation does not involve partial matching/outliers):
Matching accuracy
Car: p = 0.8395±0.2280, r = 0.8395±0.2280, f1 = 0.8395±0.2280, cvg = 1.0000
Duck: p = 0.7713±0.2255, r = 0.7713±0.2255, f1 = 0.7713±0.2255, cvg = 1.0000
Face: p = 0.9656±0.0913, r = 0.9656±0.0913, f1 = 0.9656±0.0913, cvg = 0.2612
Motorbike: p = 0.8821±0.1821, r = 0.8821±0.1821, f1 = 0.8821±0.1821, cvg = 1.0000
Winebottle: p = 0.8929±0.1569, r = 0.8929±0.1569, f1 = 0.8929±0.1569, cvg = 0.9662
average accuracy: p = 0.8703±0.1767, r = 0.8703±0.1767, f1 = 0.8703±0.1767
Evaluation complete in 1m 55s
Available Datasets¶
Dataset can be automatically downloaded and unzipped, but you can also download the dataset yourself, and make sure it in the right path.
PascalVOC-Keypoint Dataset¶
Download VOC2011 dataset and make sure it looks like
data/PascalVOC/TrainVal/VOCdevkit/VOC2011
Download keypoint annotation for VOC2011 from Berkeley server or google drive and make sure it looks like
data/PascalVOC/annotations
Download the train/test split file and make sure it looks like
data/PascalVOC/voc2011_pairs.npz
Please cite the following papers if you use PascalVOC-Keypoint dataset:
@article{EveringhamIJCV10,
title={The pascal visual object classes (voc) challenge},
author={Everingham, Mark and Van Gool, Luc and Williams, Christopher KI and Winn, John and Zisserman, Andrew},
journal={International Journal of Computer Vision},
volume={88},
pages={303–338},
year={2010}
}
@inproceedings{BourdevICCV09,
title={Poselets: Body part detectors trained using 3d human pose annotations},
author={Bourdev, L. and Malik, J.},
booktitle={International Conference on Computer Vision},
pages={1365--1372},
year={2009},
organization={IEEE}
}
Willow-Object-Class Dataset¶
Download Willow-ObjectClass dataset
Unzip the dataset and make sure it looks like
data/WillowObject/WILLOW-ObjectClass
Please cite the following paper if you use Willow-Object-Class dataset:
@inproceedings{ChoICCV13,
author={Cho, Minsu and Alahari, Karteek and Ponce, Jean},
title = {Learning Graphs to Match},
booktitle = {International Conference on Computer Vision},
pages={25--32},
year={2013}
}
CUB2011 Dataset¶
Download CUB-200-2011 dataset.
Unzip the dataset and make sure it looks like
data/CUB_200_2011/CUB_200_2011
Please cite the following report if you use CUB2011 dataset:
@techreport{CUB2011,
Title = {{The Caltech-UCSD Birds-200-2011 Dataset}},
Author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},
Year = {2011},
Institution = {California Institute of Technology},
Number = {CNS-TR-2011-001}
}
IMC-PT-SparseGM Dataset¶
Download the IMC-PT-SparseGM dataset from google drive or baidu drive (code: 0576)
Unzip the dataset and make sure it looks like
data/IMC_PT_SparseGM/annotations
Please cite the following papers if you use IMC-PT-SparseGM dataset:
@article{JinIJCV21,
title={Image Matching across Wide Baselines: From Paper to Practice},
author={Jin, Yuhe and Mishkin, Dmytro and Mishchuk, Anastasiia and Matas, Jiri and Fua, Pascal and Yi, Kwang Moo and Trulls, Eduard},
journal={International Journal of Computer Vision},
pages={517--547},
year={2021}
}
API Reference¶
See the API doc of Benchmark module and the API doc of datasets for details.
File Organization¶
dataset.py
: The file includes 5 dataset classes, used to automatically download the dataset and process the dataset into a json file, and also save the training set and the testing set.benchmark.py
: The file includes Benchmark class that can be used to fetch data from the json file and evaluate prediction results.dataset_config.py
: The default dataset settings, mostly dataset path and classes.
Example¶
import pygmtools as pygm
from pygm.benchmark import Benchmark
# Define Benchmark on PascalVOC.
bm = Benchmark(name='PascalVOC', sets='train',
obj_resize=(256, 256), problem='2GM',
filter='intersection')
# Random fetch data and ground truth.
data_list, gt_dict, _ = bm.rand_get_data(cls=None, num=2)
API and Modules¶
Classic (learning-free) linear assignment problem solvers. |
|
Classic (learning-free) two-graph matching solvers. |
|
Classic (learning-free) multi-graph matching solvers. |
|
Neural network-based graph matching solvers. |
|
Utility functions: problem formulating, data processing, and beyond. |
|
The Benchmark module with a unified data interface to evaluate graph matching methods. |
|
The implementations of data loading and data processing. |
Warning
By default the API functions and modules run on numpy
backend. You could set the default backend by setting
pygm.BACKEND
. If you enable other backends than numpy
, the corresponding package should be installed. See
the installation guide for details.
Contributing to pygmtools¶
First, thank you for contributing to pygmtools
!
How to contribute¶
The preferred workflow for contributing to pygmtools
is to fork the
main repository on
GitHub, clone, and develop on a branch. Steps:
Fork the project repository by clicking on the ‘Fork’ button near the top right of the page. This creates a copy of the code under your GitHub user account. For more details on how to fork a repository see this guide.
Clone your fork of the repo from your GitHub account to your local disk:
$ git clone git@github.com:YourUserName/pygmtools.git $ cd pygmtools
Create a
feature
branch to hold your development changes:$ git checkout -b my-feature
Always use a
feature
branch. It is good practice to never work on themaster
branch!Develop the feature on your feature branch. Add changed files using
git add
and thengit commit
files:$ git add modified_files $ git commit
to record your changes in Git, then push the changes to your GitHub account with:
$ git push -u origin my-feature
Follow these instructions to create a pull request from your fork. This will email the committers and an automatic check will run.
(If any of the above seems like magic to you, please look up the Git documentation on the web, or ask a friend or another contributor for help.)
Pull Request Checklist¶
We recommended that your contribution complies with the following rules before you submit a pull request:
Follow the PEP8 Guidelines.
If your pull request addresses an issue, please use the pull request title to describe the issue and mention the issue number in the pull request description. This will make sure a link back to the original issue is created.
All public methods should have informative docstrings with sample usage presented as doctests when appropriate.
When adding additional functionality, provide at least one example script in the
examples/
folder. Have a look at other examples for reference. Examples should demonstrate why the new functionality is useful in practice and, if possible, compare it to other methods available inpygmtools
.Documentation and high-coverage tests are necessary for enhancements to be accepted. Bug-fixes or new features should be provided with non-regression tests. These tests verify the correct behavior of the fix or feature. In this manner, further modifications on the code base are granted to be consistent with the desired behavior. For the Bug-fixes case, at the time of the PR, these tests should fail for the code base in master and pass for the PR code.
At least one paragraph of narrative documentation with links to references in the literature and the example.
You can also check for common programming errors with the following tools:
No pyflakes warnings, check with:
$ pip install pyflakes $ pyflakes path/to/module.py
No PEP8 warnings, check with:
$ pip install pep8 $ pep8 path/to/module.py
AutoPEP8 can help you fix some of the easy redundant errors:
$ pip install autopep8 $ autopep8 path/to/pep8.py
Filing bugs¶
We use Github issues to track all bugs and feature requests; feel free to open an issue if you have found a bug or wish to see a feature implemented.
It is recommended to check that your issue complies with the following rules before submitting:
Verify that your issue is not being currently addressed by other issues or pull requests.
Please ensure all code snippets and error messages are formatted in appropriate code blocks. See Creating and highlighting code blocks.
Please include your operating system type and version number, as well as your Python, pygmtools, numpy, and scipy versions. Please also provide the name of your running backend, and the GPU/CUDA versions if you are using GPU. This information can be found by running the following environment report (
pygmtools>=0.2.9
):$ python3 -c 'import pygmtools; pygmtools.env_report()'
If you are using GPU, make sure to install
pynvml
before running the above script:pip install pynvml
.Please be specific about what estimators and/or functions are involved and the shape of the data, as appropriate; please include a reproducible code snippet or link to a gist. If an exception is raised, please provide the traceback.
Documentation¶
We are glad to accept any sort of documentation: function docstrings, reStructuredText documents, tutorials, etc. reStructuredText documents live in the source code repository under the doc/ directory.
You can edit the documentation using any text editor and then generate
the HTML output by typing make html
from the docs/
directory.
The resulting HTML index is docs/_build/index.html
and is viewable
in a web browser.
For building the documentation, you will need sphinx, matplotlib, and pillow.
When you are writing documentation, it is important to keep a good compromise between mathematical and algorithmic details, and give intuition to the reader on what the algorithm does. It is best to always start with a small paragraph with a hand-waving explanation of what the method does to the data and a figure (coming from an example) illustrating it.
This Contribution guide is strongly inpired by the one of the scikit-learn team.