If this helped you, please share!

Importing Stringified JSON Objects Into Pandas (Part 2)

Published November 30, 2017 in data , programming - 0 Comments

All python code in this post is Python 3.5+.

Continuing from Part 1, I discovered that movies_metadata.csv contains malformed rows that have missing fields, which is what caused file import to fail. I tried experimenting with some of the more advanced Pandas.read_csv parameters to see if I could work around the malformed rows.

Since the malformed rows have too few instead of too many commas, using the error_bad_lines option was not helpful. Using error_bad_lines is not supported by the ‘c’ parsing engine, and that parser still throws a value error. The Python fixed-width formatted line engine (‘python-fwf’) was also not helpful.

Increasing parser flexibility by using the ‘python’ engine succeeded when only the first two column types were specified. Increasing the number of columns with assigned types failed on attempting to parse a malformed row once again.

The first of the malformed rows in movies_metadata.csv is on line 129. It made sense to try skipping it by using the skiprows parameter. Rows are expected to be zero-indexed, so I used 128 as my row number. Unfortunately, using skiprows also failed with the same value error as before.

My final attempt to parse the file used converter functions, which finally solved my import problems.

No comments yet

Leave a Reply: