Archives

Categories

Book Review: The Pragmatic Programmer: your journey to mastery, 20th Anniversary Edition, 2nd edition

Published January 31, 2021 in review - 0 Comments

Disclosure: I was not compensated for writing this post nor was my review solicited. All opinions are 100% my own. Quick Take: mostly worth your time and can be a good review for professional programmers. This book may offer the most value for people entering the field. The Pragmatic Programmer book was recommended by a […]

Tags: review

Python text analysis tools: FuzzyWuzzy’s basic string matching

Published March 29, 2020 in data - 0 Comments

The Python FuzzyWuzzy module uses Levenshtein edit distance to implement fuzzy string matching. FuzzyWuzzy’s matching tools return results on a scale from 0 to 100. The simplest matching tool FuzzyWuzzy offers is the ratio(..) function: The basic ratio function works well for simple string matching. However if you’re trying to fuzzy match a single word […]

Ijson coroutines and generators

Published February 27, 2020 in data - 0 Comments

Reader comments on an old post about the ijson parser prompted me to check out the project’s more recent releases. The latest pre-release (v3.0rc1) added a coroutine interface, which allow users to supply their own file readers and have more control over when the parser is called. It looked like a fun feature to explore, […]

Python text analysis tools: Levenshtein Distance

Published January 31, 2020 in data - 0 Comments

Figuring out how similar two strings are and then making that similarity a quantitative measurement is a basic problem in text analysis, text mining and natural language processing. There are a number of efficient methods to solve this problem. This survey looks at Python implementations of a simple but widely used method: Levenshtein distance as […]

A closer look at Airflow’s KubernetesPodOperator and XCom

Published July 11, 2019 in data - 8 Comments

The KubernetesPodOperator handles communicating XCom values differently than other operators. The basics are described in the operator documentation under the xcom_push parameter. I’ve written up a more detailed example that expands on that documentation. An Airflow task instance described by the KubernetesPodOperator can write a dict to the file /airflow/xcom/return.json (always the same file) that […]

Trigger DAG runs with Airflow REST API

Published June 24, 2019 in data - 0 Comments

This article and code is applicable to Airflow 1.10.13. Hopefully the REST API will mature as Airflow is developed further, and the authentication methods will be easier. The experimental REST API does not use the Airflow role-based users. Instead, it currently requires a SQLAlchemy models.User object whose data is saved in the database. The code […]

Tags: airflow , python

Useful Airflow on Kubernetes Features

Published June 7, 2019 in data , devops - 0 Comments

KubernetesExecutor The KubernetesExecutor sets up Airflow to run on a Kubernetes cluster. This executor runs task instances in pods created from the same Airflow Docker image used by the KubernetesExecutor itself, unless configured otherwise (more on that at the end). Getting Airflow deployed with the KubernetesExecutor to a cluster is not a trivial task. I […]

SageMaker lifecycle configuration time constraints

Published May 17, 2019 in data - 0 Comments

Creating AWS SageMaker Lifecycle configuration scripts to customize notebook instances beats installing packages and making other environment changes in notebook instances. One advantage is that the customization code doesn’t need to be copied from notebook to notebook. Another is that the lifecycle configurations are managed outside of and separately from notebook instances, and can be […]

Tags: aws , python
1 2 3