Tag Archives for " python "

Attach to existing SageMaker job

Published July 30, 2018 in data - 0 Comments

The AWS SageMaker ntm_20newsgroups_topic_model example notebook is a simple to follow introduction to SageMaker’s pre-packaged Natural Language Processing (NLP) tools. The notebook demonstrates how to use the Neural Topic Model (NTM) algorithm to extract a set of topics from a sample usenet newsgroups dataset and visualize as word clouds. It also contains code demonstrating how […]

How to split a Pandas dataframe into training and test sets?

Published June 18, 2018 in data - 0 Comments

This question came up recently on a project where Pandas data needed to be fed to a TensorFlow classifier. In this case, we wanted to divide the dataframe using a random sampling. Frameworks like scikit-learn may have utilities to split data sets into training, test and cross-validation sets. For example, sklearn.model_selection.train_test_split split numpy arrays or […]

Why do I use containers?

Published May 2, 2018 in devops - 0 Comments

Here are my best reasons for using container solutions like Docker: 1. Simplify the deployment process! I used to struggle with software deployments. Traditional software package installers work fine for simple deployments on a single platform type, but require maintenance if deployment guidelines change between OS verisons. Deployment tools that need to target multiple platforms […]

Tags: docker , python

GitHub GraphQL in CI

Published March 9, 2018 in devops - 0 Comments

All python code is Python 3.5+. Having an automatic way to build GitHub pull requests before merging saves a lot of time and trouble compared with pulling, building and testing a GitHub pull request locally. TeamCity makes it easy to set this up using branch specifications. The blog post refers to a much older version […]

Tags: ci , git , graphql , python

Python type hints: alias those types!

Published February 21, 2018 in programming - 0 Comments

All python code is Python 3.5+. PEP484 goes beyond built-in type annotations. Another feature of the Python type hinting libary is the ability to create type aliases. I’ve used type aliasing frequently in C++ (typedef, using) to improve code readability and for its other benefits. I’m happy to see that it’s available in Python too. […]

Tags: python

Importing Stringified JSON Objects Into Pandas (Part 2)

Published November 30, 2017 in data , programming - 0 Comments

All python code in this post is Python 3.5+. Continuing from Part 1, I discovered that movies_metadata.csv contains malformed rows that have missing fields, which is what caused file import to fail. I tried experimenting with some of the more advanced Pandas.read_csv parameters to see if I could work around the malformed rows.

[…]

Importing Stringified JSON Objects Into Pandas (Part 1)

Published November 24, 2017 in data , programming - 0 Comments

All python code in this post is Python 3.5+. I’m continuing to work with the same Kaggle movies dataset as in the SQL import experiment. This time, I imported the data into Pandas DataFrames. The trickiest dataset to import was movies_metadata.csv. I first tried to use pandas.read_csv with the default settings.

I was able […]