If this helped you, please share!

Ijson coroutines and generators

Published February 27, 2020 in data - 0 Comments

Reader comments on an old post about the ijson parser prompted me to check out the project’s more recent releases. The latest pre-release (v3.0rc1) added a coroutine interface, which allow users to supply their own file readers and have more control over when the parser is called. It looked like a fun feature to explore, and here is a bit of code to try it out.

Below is some test code for parsing the large San Francisco City Lots JSON dataset using ijson v3.0rc1 coroutines. I’m also using a generator function to lazily read the JSON file line by line. As an alternative, the example in the ijson documentation shows reading from a file object in chunks. The parser_coroutine function gets the output generated by the low-level parser and prints it:

All python code is Python 3.8+.

The dataset contains an array of JSON objects with the city lots property information. The low-level ijson parser iterates over the JSON elements and breaks them down into three element tuples that describe where the element fits in the JSON structure, it’s type and value. Here is the beginning of the dataset file and the first JSON object in the features array:

This is the last JSON object in the features array and end of the file:

Here is a similar example with the higher-level interface, where we can just parse the properties objects as Python dicts:

No comments yet

Leave a Reply: