TPOT vs Auto-sklearn: comparison of two AutoML libraries (~4min)

Comparison between TPOT And Self-sklearn, two AutoML libraries. TPOT uses genetic algorithms, East fast and friendly for the beginners. Auto-sklearn, based on the meta-learning, offers a advanced customization. Choose TPOT for simplicity, Auto-sklearn for flexibility and in-depth customization based on specific needs.

Thomas FRAMERY Profile Picture
Thomas FRAMERY Data Scientist

Introduction

AutoML is a machine learning method that automates the entire machine learning process, including feature engineering, model selection, and hyperparameter optimization. This allows developers to create machine learning models in record time, and to focus as much as possible on tasks with high added value. In this article, we will compare two popular AutoML libraries: TPOT and Auto-sklearn.

TPOT Overview

TPOT (Tree-based Pipeline Optimization Tool) is an open source AutoML library. TPOT uses genetic algorithms to optimize machine learning modeling pipelines. A genetic algorithm is an optimization technique inspired by natural selection, several generations are created and only the best individuals are kept.

 

TPOT supports a wide variety of machine learning models, for example decision trees, neural networks, random forests and SVMs. Once the best model has been trained, it is possible to export the Python code to create and train the ML model.

 

Due to the use of genetic algorithms, the results may be different each time the model is retrained.

Auto-sklearn overview

Auto-sklearn is another open source AutoML library. This library uses Bayesian optimization to select and optimize machine learning models. Bayesian optimization is a strategy seeking the extrema of an objective function, it is a method used when the objective function is very expensive to calculate.

 

A second method used by this library is meta-learning. This method consists of anticipating the performance of a model on certain data. It makes it possible to avoid testing models considered useless and to optimize calculation time.

 

Just like TPOT, it also supports a wide variety of machine learning models like decision trees, neural networks, random forests, and SVMs.

Comparison of TPOT and Auto-sklearn

As seen above, TPOT is based on genetic algorithms, while auto-sklearn uses meta-learning. In terms of performance, both libraries have comparable results, although TPOT is generally faster than auto-sklearn. When it comes to ease of use, TPOT is simpler to use than auto-sklearn because it does not require programming knowledge.

 

TPOT

Auto-Sklearn

Method

Genetic algorithms

Meta-learning

Performance

Good performance on data with noise and/or non-homogeneous

Satisfactory in the majority of cases

Speed

GPU for XGboost otherwise CPU

CPU only

Ease of use

Ability to use an entire pipeline easily

Very similar to sklearn

OS of use

Linux/Windows

Linux

Conclusion

In summary, TPOT and auto-sklearn are two excellent AutoML libraries. TPOT stands out for its ease of use and speed, while auto-sklearn offers greater flexibility and better customization due to the fact that the library is built on the basis of Sklearn. Auto-sklearn leaves greater freedom in the choice of algorithms where TPOT will be more focused on tree algorithms (Random Forest, Decision Tree, etc.)

 

Choosing between the two will depend on your specific needs. For beginners in AutoML, TPOT is highly recommended. However, if you are looking for deeper customization and flexibility, auto-sklearn can be a great option

A must see

Most popular articles

Do you have a transformation project? Let's talk about it !