Disclosure: I was given access to Effective Pandas – Standard to review. I was not compensated for writing this post and the links to the course are not affiliate links. All opinions are 100% my own.
Quick take: Effective Pandas is a high quality course with well thought out content. I felt like it was worth the time investment to watch the videos and that the lessons stick better if you do all the exercises.
Pandas is a super-powered data analysis tool packed with a great deal of functionality. Getting started with the basics of working with data in Pandas is not too difficult, but moving beyond those basics to extract the most value from all that functionality takes time, effort and good learning resources. Pandas has extensive API documentation and guides for getting started, but the documentation doesn’t always feature realistic use cases. There are books and blog posts, which can be a good way to learn Pandas if you have the time to look for high quality content that goes beyond the basics.
What if you don’t have the time to look for good learning resources? One possibility is to check out the standard version of the Effective Pandas course, which is about 3 hours of video organized into lecture and exercise videos. Code is written in Jupyter notebooks during the lectures and exercises, and the exercises are available for download in a Jupyter notebook in the standard course. Pandas flashcards and cheatsheets are also available for download. The course content is entirely focused on Pandas features and functionality typically used in data analytics. The course also incorporates recent Pandas features from 1.x releases, where significant interface and data type changes were introduced. The basic, lowest cost version of the course does not include the exercises or other downloadable extras.
The course is laid out in a logical progression, starting from installing the Pandas library and setting up a Jupyter notebook to using advanced Pandas functions. It takes you though small end-to-end projects using public datasets that progress from data loading to visualization. I felt like the exercises did a good job of reinforcing the contents of the lectures. Matt Harrison doesn’t assume that anyone taking the course has a lot of Python programming experience. It may be helpful to have some basic statistics knowledge. Anyone with rudimentary programming skills should be able to work through the exercises, and doing the exercises would be a good way to develop those skills further.
The course has good productivity tips; for working in Jupyter notebooks, for working with Pandas documentation, and for data cleaning and wrangling. The course covers some Pandas quirks that can trip up even experienced users, such as Pandas slicing and how and when to use loc
and iloc
attributes.
The course also has solid, representative examples of using both Pandas built-in plotting functions and the Matplotlib visualization library to produce high quality data visualizations. The data visualization techniques are based on the same principles that I learned during the scientific visualization course I took during my master’s degree.
I learned about some useful Pandas features. For example, I didn’t know about the Grouper
class until I watched the lecture on grouping. The Grouper
class enhances groupby()
by adding features like grouping by frequency for datetime or datetime-compatible data types. I’ve never had the chance to use stack
() and unstack()
functions to reshape DataFrames. I also tend to forget the details on how merge()
and join()
differ; there’s a nice lecture about how to use those functions.
The course is served on the Podia platform, which has a decent interface for presenting the course materials and a video player with standard controls. Lectures, exercises and downloadable content are organized in a table of contents.