- Home>
- python
All python code in this post is Python 3.5+. This post describes how I parsed movies_metadata.csv from the Kaggle movies dataset; a task I started in Part 1 and Part 2. After some digging into the Pandas documentation and Stack Overflow, I found that the best solution to my parsing problems was to explicitly set […]
All python code is Python 3.5+. In the Pandas data import posts, I’m using Python type hints. Type hinting is a fairly new feature in Python and has been provisionally accepted as a language feature. It is also a thought-provoking design feature for a dynamically typed language. Types are not enforced at runtime. The type […]
All python code in this post is Python 3.5+. Continuing from Part 1, I discovered that movies_metadata.csv contains malformed rows that have missing fields, which is what caused file import to fail. I tried experimenting with some of the more advanced Pandas.read_csv parameters to see if I could work around the malformed rows. def main(path: […]
All python code in this post is Python 3.5+. I’m continuing to work with the same Kaggle movies dataset as in the SQL import experiment. This time, I imported the data into Pandas DataFrames. The trickiest dataset to import was movies_metadata.csv. I first tried to use pandas.read_csv with the default settings. import argparse import pandas […]
All python code is Python 3.5+. PostgreSQL database version is 10. I started digging into the Kaggle movies dataset recently, which is a collection of CSV files. I was curious to see if the data could be inserted into a SQL database (PostgreSQL) for further exploration. The credits.csv file contains two columns (cast, crew) of […]
All python code is Python 3.5+. A few months ago, I had to extract a small amount of data from a large and deeply nested JSON file quickly and export to CSV. I was working in C++ and Python on this project, so my first attempts to extract the data were using the Python json […]