Customize Yellowbrick color palettes

Published January 16, 2019 in data - 0 Comments

I’m starting to experiment with the Yellowbrick machine learning visualizer tools to learn how to visualize models more effectively. The documentation is good, and getting started with the tools is pretty straightforward. I started to get bored with the default color palette after playing with some of the basic visualization examples in the documentation (more […]

New EC2 launch template version

Published January 8, 2019 in devops - 0 Comments

The AWS web console is not always intuitive to navigate and the documentation can be opaque. I recently had to edit an EC2 launch template, which is only possible by creating a new version and then making the new version the default used by the autoscale group. Here are the steps to create a new […]

Tags: aws

Python comprehension fun!

Published November 15, 2018 in programming - 0 Comments

List comprehensions are an extremely useful and optimized idiomatic Python language feature for manipulating and returning data stored in lists (or any iterable type). The Python 3 docs describe how to use basic and nested list comprehensions. Advanced nested list comprehensions If I wanted to clean some text by removing stop words and generate lists […]

Tags: python

Robust automatic retries in Celery tasks

Published October 22, 2018 in programming - 0 Comments

The Celery distributed task queue introduced retrying a failed task automatically for known exception types in version 4.0, and some very useful retry exponential backoff settings in version 4.2. Exponential backoff is beneficial because it spaces out retry requests in exponentially increasing intervals which can allow time to recover or restart. I turned off jitter […]

Tags: celery , docker , python

Attach to existing SageMaker job

Published July 30, 2018 in data - 0 Comments

The AWS SageMaker ntm_20newsgroups_topic_model example notebook is a simple to follow introduction to SageMaker’s pre-packaged Natural Language Processing (NLP) tools. The notebook demonstrates how to use the Neural Topic Model (NTM) algorithm to extract a set of topics from a sample usenet newsgroups dataset and visualize as word clouds. It also contains code demonstrating how […]

How to split a Pandas dataframe into training and test sets?

Published June 18, 2018 in data - 0 Comments

This question came up recently on a project where Pandas data needed to be fed to a TensorFlow classifier. In this case, we wanted to divide the dataframe using a random sampling. Frameworks like scikit-learn may have utilities to split data sets into training, test and cross-validation sets. For example, sklearn.model_selection.train_test_split split numpy arrays or […]

Docker Compose for integration testing

Published June 17, 2018 in devops - 0 Comments

For integration tests with few external dependencies that don’t require much orchestration beyond networking Docker containers and setting up environment variables, Docker Compose is a simple and easy to manage solution for building, running and tearing down tests. This Flask application example is typical. The project’s multistage Dockerfile defines both the service and test images: […]

Tags: ci , docker

Why do I use containers?

Published May 2, 2018 in devops - 0 Comments

Here are my best reasons for using container solutions like Docker: 1. Simplify the deployment process! I used to struggle with software deployments. Traditional software package installers work fine for simple deployments on a single platform type, but require maintenance if deployment guidelines change between OS verisons. Deployment tools that need to target multiple platforms […]

Tags: docker , python

Pandas DataFrame axis basics (Part 2)

Published April 13, 2018 in data - 0 Comments

Part 1 covered Pandas DataFrame basics. Pandas offers multiple options for accessing DataFrame values by axis labels using he DataFrame.loc function, or by integer indexes using the DataFrame.iloc function in one or two dimensions. If the DataFrame has a numerical index, calling the DataFrame.loc and DataFrame.iloc functions looks the same. Otherwise, use the appropriate axis […]

Tags: pandas

Pandas DataFrame axis basics (Part 1)

Published April 2, 2018 in data - 0 Comments

By default, a Pandas DataFrame is 2 dimensional with 2 axes initialized as empty Index structures. Under the basic indexing scheme, the first axis is the ‘index’ axis, which by default is a numerical index starting from 0 (using np.arange) generated for each DataFrame row. The second axis is the ‘columns’ axis, which is the […]

Tags: pandas