If this helped you, please share!

Python comprehension fun!

Published November 15, 2018 in programming - 0 Comments

List comprehensions are an extremely useful and optimized idiomatic Python language feature for manipulating and returning data stored in lists (or any iterable type). The Python 3 docs describe how to use basic and nested list comprehensions.

Advanced nested list comprehensions

If I wanted to clean some text by removing stop words and generate lists of word tokens, I can process each sentence in the text list and check its contents against stop_words:

stop_words = ["the", "for", "on", "is", "in", "and"]
text = ["The quick fox jumped over the lazy dog",
         "Bob and Alice went out for lunch on Tuesday",
         "Coding in Python is fun"]
word_lists = [[word for word in line.lower().split() if word not in stop_words]
              for line in text]

The inner list comprehension operates on the output of the split function called on each item in the text list.

[['quick', 'fox', 'jumped', 'over', 'lazy', 'dog'], ['bob', 'alice', 'went', 'out', 'lunch', 'tuesday'], ['coding', 'python', 'fun']]

It’s also easy to flatten the list of lists returned by the nested list comprehension using the Python standard library itertools.chain function:

import itertools

flat_list = list(itertools.chain(*word_lists))

The flattened list is:

['quick', 'fox', 'jumped', 'over', 'lazy', 'dog', 'bob', 'alice', 'went', 'out', 'lunch', 'tuesday', 'coding', 'python', 'fun']

Try to avoid writing for loops for simple list tasks. Save for loops for complicated list operations where using comprehensions and built-in functions is not easy or straightforward.

Set comprehensions

Besides lists, we can use comprehensions to build sets from arbitrary data. Here is a simple contrived example:

test_list = [1, 4, 6, 8, 9, 2, 4, 6, 8]
test_set = {x for x in test_list if x % 2 == 0}

The test_set variable contains the even integers from test_list without repetition:

{8, 2, 4, 6}

Dict comprehensions

We can also use comprehensions to build dictionaries from arbitrary data. For example, if I wanted to build a lookup table for a list of ordinal data categories:

category_list = ['category0', 'category1', 'category2']
test_dict = {category:index for index, category in enumerate(category_list)}

The result is:

{'category0': 0, 'category1': 1, 'category2': 2}

Two lists can also be joined into a dictionary; using the zip function for example:

list1 = ['A', 'B', 'C']
list2 = ['a', 'b', 'c']
test_dict = {item1:item2 for item1, item2 in zip(list1, list2)}

The result is:

{'A': 'a', 'B': 'b', 'C': 'c'}

One thing to watch out for when using zip is input lists are expected to have the same length. The function will iterate until the shortest input list is exhausted; remaining items from longer lists will be ignored. The itertools.zip_longest function works well when list sizes are not the same and using a fill value is appropriate.

Dict and set comprehensions were introduced in Python 2.7 through PEP 274.

Generator expressions

Here’s where comprehensions get very interesting. Generator expressions were introduced in PEP 289 and have similar syntax to list, set and dict comprehensions. Instead of returning concrete collection types however, a generator expression creates an iterable that can be used as input to functions that expect an iterable as input:

test_list = [5, 17, 12, 14, 0, 12, 14, 10, 15, 9]
max_even = max(x for x in test_list if x % 2 == 0)
print('Maximum even integer: ', max_even)

This code returns the maximum integer in test_list. Explicitly creating and iterating over a list comprehension is not needed:

Maximum even integer:  14

Here, we are computing log values from a list of integers. If log is undefined for a given input, NaN is used instead:

from math import log

test_list = [5, 17, 12, 14, 0, 12, 14, 10, 15, 9]
sorted_log_vals = sorted(log(x) if x > 0 else float('nan') for x in test_list)
print('Sorted log values:', sorted_log_vals)

The result is:

Sorted log values: [1.6094379124341003, 2.1972245773362196, 2.302585092994046, 2.4849066497880004, 2.4849066497880004, 2.6390573296152584, 2.833213344056216, nan, 2.6390573296152584, 2.70805020110221]

Generator expressions used in conjunction with next is an efficient way to filter and return the first match in a collection:

val = next(x for x in range(1, 10) if x > 1)
print('found x > 1:', val)

The first list item that matches is 2:

found x > 1: 2

If the generator expression does not have a conditional statement, then next will simply return the first value generated by the expression.

Here, the generator is exhausted and raises the StopIteration exception since there is no list item less than one:

    val = next(x for x in range(1, 10) if x < 1)
        print('found x < 1:', val)
except StopIteration as exc:
    print('x < 1 not found', exc)

The result is:

x < 1 not found

The next function can also return a default value. In this case, the generator expression needs to be in parentheses:

val = next((x for x in range(1, 10) if x < 1), 0)
print('x < 1 not found, returning default', val)

The result is:

x < 1 not found, returning default 0

PEP 289 is worth reading to dive even deeper into generator expressions.

Tags: python

No comments yet

Leave a Reply: