- Home>
- python
The Celery distributed task queue introduced retrying a failed task automatically for known exception types in version 4.0, and some very useful retry exponential backoff settings in version 4.2. Exponential backoff is beneficial because it spaces out retry requests in exponentially increasing intervals which can allow time to recover or restart. I turned off jitter […]
The AWS SageMaker ntm_20newsgroups_topic_model example notebook is a simple to follow introduction to SageMaker’s pre-packaged Natural Language Processing (NLP) tools. The notebook demonstrates how to use the Neural Topic Model (NTM) algorithm to extract a set of topics from a sample usenet newsgroups dataset and visualize as word clouds. It also contains code demonstrating how […]
This question came up recently on a project where Pandas data needed to be fed to a TensorFlow classifier. In this case, we wanted to divide the dataframe using a random sampling. Frameworks like scikit-learn may have utilities to split data sets into training, test and cross-validation sets. For example, sklearn.model_selection.train_test_split split numpy arrays or […]
Here are my best reasons for using container solutions like Docker: 1. Simplify the deployment process! I used to struggle with software deployments. Traditional software package installers work fine for simple deployments on a single platform type, but require maintenance if deployment guidelines change between OS verisons. Deployment tools that need to target multiple platforms […]
All python code is Python 3.5+. Having an automatic way to build GitHub pull requests before merging saves a lot of time and trouble compared with pulling, building and testing a GitHub pull request locally. TeamCity makes it easy to set this up using branch specifications. The blog post refers to a much older version […]
All python code is Python 3.5+. PEP484 goes beyond built-in type annotations. Another feature of the Python type hinting libary is the ability to create type aliases. I’ve used type aliasing frequently in C++ (typedef, using) to improve code readability and for its other benefits. I’m happy to see that it’s available in Python too. […]
All python code in this post is Python 3.5+. In my previous post, I described how I got usable Pandas dataframes from the Kaggle movies dataset. My next step was to start exploring the data with simple visualizations. The first feature I wanted explore was the distribution of movies by year in the movies_metadata data […]
All python code in this post is Python 3.5+. This post describes how I parsed movies_metadata.csv from the Kaggle movies dataset; a task I started in Part 1 and Part 2. After some digging into the Pandas documentation and Stack Overflow, I found that the best solution to my parsing problems was to explicitly set […]
All python code is Python 3.5+. In the Pandas data import posts, I’m using Python type hints. Type hinting is a fairly new feature in Python and has been provisionally accepted as a language feature. It is also a thought-provoking design feature for a dynamically typed language. Types are not enforced at runtime. The type […]
All python code in this post is Python 3.5+. Continuing from Part 1, I discovered that movies_metadata.csv contains malformed rows that have missing fields, which is what caused file import to fail. I tried experimenting with some of the more advanced Pandas.read_csv parameters to see if I could work around the malformed rows. def main(path: […]