Archives

Categories

Tag Archives for " python "

TIL: Python recursion limits

Published September 27, 2022 in today I learned - 0 Comments

Today I learned that you can get (sys.getrecursionlimit()) and set (sys.setrecursionlimit(limit)) the Python interpreter’s maximum stack depth. I can see that on my system, the maximum stack depth or recursion limit is 3000. Exceeding the limit results in a RecursionError. This came up in a Prefect flow using a LocalDaskExecutor that attempted to serialize an […]

Tags: python

TIL: Python logger extra keyword

Published April 21, 2022 in today I learned - 0 Comments

Today I learned that extra arguments can be passed to any Python stdlib logger method that logs a message with the extra keyword argument as a dict. The extra arguments may be used in a formatter or with other logging tools like structlog. Output is: 2022-04-21 00:47:31,691 Critical error message! | Linux #29~20.04.1-Ubuntu SMP Fri […]

Tags: python

Python Dependency and Environment Management Deep-Dive

Published April 11, 2022 in devops - 0 Comments

Struggling to make sense of the Python package and virtual environment landscape? Not sure how what tools have features that make building CI/CD for your Python project easier? Frustrated by slow dependency resolution? To improve Recursion‘s Python developer experience, my colleague Dan Maljovec and I did a deep dive on those very topics and wrote […]

Tags: python

Python text analysis tools: FuzzyWuzzy’s basic string matching

Published March 29, 2020 in data - 0 Comments

The Python FuzzyWuzzy module uses Levenshtein edit distance to implement fuzzy string matching. FuzzyWuzzy’s matching tools return results on a scale from 0 to 100. The simplest matching tool FuzzyWuzzy offers is the ratio(..) function: The basic ratio function works well for simple string matching. However if you’re trying to fuzzy match a single word […]

Ijson coroutines and generators

Published February 27, 2020 in data - 0 Comments

Reader comments on an old post about the ijson parser prompted me to check out the project’s more recent releases. The latest pre-release (v3.0rc1) added a coroutine interface, which allow users to supply their own file readers and have more control over when the parser is called. It looked like a fun feature to explore, […]

Python text analysis tools: Levenshtein Distance

Published January 31, 2020 in data - 0 Comments

Figuring out how similar two strings are and then making that similarity a quantitative measurement is a basic problem in text analysis, text mining and natural language processing. There are a number of efficient methods to solve this problem. This survey looks at Python implementations of a simple but widely used method: Levenshtein distance as […]

A closer look at Airflow’s KubernetesPodOperator and XCom

Published July 11, 2019 in data - 8 Comments

The KubernetesPodOperator handles communicating XCom values differently than other operators. The basics are described in the operator documentation under the xcom_push parameter. I’ve written up a more detailed example that expands on that documentation. An Airflow task instance described by the KubernetesPodOperator can write a dict to the file /airflow/xcom/return.json (always the same file) that […]

Trigger DAG runs with Airflow REST API

Published June 24, 2019 in data - 0 Comments

This article and code is applicable to Airflow 1.10.13. Hopefully the REST API will mature as Airflow is developed further, and the authentication methods will be easier. The experimental REST API does not use the Airflow role-based users. Instead, it currently requires a SQLAlchemy models.User object whose data is saved in the database. The code […]

Tags: airflow , python

Useful Airflow on Kubernetes Features

Published June 7, 2019 in data , devops - 0 Comments

KubernetesExecutor The KubernetesExecutor sets up Airflow to run on a Kubernetes cluster. This executor runs task instances in pods created from the same Airflow Docker image used by the KubernetesExecutor itself, unless configured otherwise (more on that at the end). Getting Airflow deployed with the KubernetesExecutor to a cluster is not a trivial task. I […]

1 2 3