I’m starting to experiment with the Yellowbrick machine learning visualizer tools to learn how to visualize models more effectively. The documentation is good, and getting started with the tools is pretty straightforward. I started to get bored with the default color palette after playing with some of the basic visualization examples in the documentation (more on that in future posts). Also, selecting, manipulating and applying color is really important for creating visualizations that convey the meaning behind the data.
Yellowbrick offers a set of pre-defined sequences that are used to build its color palettes. Yellowbrick wraps Matplotlib, so some of the sequences available to use in a color palette are based on Matplotlib’s colormaps. A colormap is a list of colors that can be mapped to pixel or data values.
Here, I’m displaying all of the pre-defined sequences that are based on the Matplotlib RdGy colormap, which transitions from red to gray. I’m getting the list of colors in the colormap which are indexed by the length of the color list:
Here is the low level view of those sequence colors. The color values are represented using a hex triplet.
Matplotlib’s pylib can be used to create longer color lists using a sequence of numbers. An easy way to generate a sequence is by using Matplotlib’s pylab with numpy’s linspace function. Here, the linspace function generates and returns numbers incremented by one up to max_colors. Making max_colors very large makes a smoother color palette. The idea for this code came from Matplotlib’s colormap reference, which also has the full list of Matplotlib’s colormaps. If the default color sequences don’t map well to data in a Yellowbrick model, this could be a good technique to solve the problem.