Over the last couple days, I completed Google’s Python Class. I have tried so many learning methods, from books to MOOCs and everything in between, and this really hit a sweet spot for me. I did not watch the videos; I found the written info + exercises to be enough. The difficulty setting on this class was just right; it wasn’t starting from 0, as so many Python tutorials do (Which is fine! Those kinds of learning materials are so important!). Also, I found the exercises really engaging, particularly the more comprehensive ones towards the end.
And with those more advanced exercises in mind, I thought it might be interesting to take a look at my solution for the final exercise and compare it to their provided solution. Both solutions are at this gist, with mine on top and Google’s on the bottom, should you be interested in checking them out in full. Here are the program’s requirements.
The first difference that jumped out at me was a stylistic one: the solution code is so much more terse than mine. Their variable names are things like
match, whereas I prefer to be a little more descriptive with my variables, opting for
url_match for my corresponding variables. I’m also a fan of creating intermediate semantic variables rather than chaining methods together. The step where we had to create an
index.html file provides a good study in contrast. First, Google’s code:
index = file(os.path.join(dest_dir, 'index.html'), 'w')
I feel this one line is doing a lot of work, possibly too much. The interpreter (Compiler? What’s the deal in Python?) can certainly handle it, but I find it difficult to parse as a human. Here’s my code:
index_path = os.path.join(dest_dir, 'index.html') index_file = open(index_path, 'w')
I prefer this style because my general rule of thumb (stolen from Kyle Simpson) is to err on the side of optimizing for human readability. The variable names give me an idea of what is stored in them, whereas
index doesn’t tell me much.
Interestingly, there’s another example where the script (see what I did there) is flipped. When sorting the image URLs, the solution file defines a function called
url_sort_key(url) and then passes a reference to it as an argument to
def url_sort_key(url): """Used to order the urls in increasing order by 2nd word if present.""" match = re.search(r'-(\w+)-(\w+)\.\w+', url) if match: return match.group(2) else: return url
return sorted(url_dict.keys(), key=url_sort_key)
sorted_jpg_files = sorted(jpg_files, key=lambda f: re.search(sort_pattern, f).group())
So, it’s interesting to me that the solution author opted for a more verbose code expression here, though I’m guessing they probably didn’t want to delve into lambdas in a fairly introductory course.
Let’s look at some other differences. Here’s a great example where my verbosity did not enhance the readability of the code. The goal was to extract a hostname from the filename of a provided log file, and the filenames all followed the same convention:
someword_hostname. For the life of me, I could not remember the word “host,” which is how I ended up with the variable name
base_url. I opted for a regular expression here, which works, but is ultimately too much of a muchness, I think:
url_match = re.search(r'\w+_(\S+)', filename) base_url = '' if url_match: base_url = url_match.group(1)
It’s clear that this code is doing too much, particularly when you compare it to the elegance of the provided solution:
underbar = filename.index('_') host = filename[underbar + 1:]
Looking at my code and then looking at their code gives me the same feeling of tension and relief I experience when using my asthma inhaler. Like, oh, that’s what oxygen feels like.
Another interesting difference is that the solution code opted to iterate over the log file line by line and searched each line for the magic sauce, whereas I used
.findall() on the whole file string. I suppose I felt comfortable using
findall() because I knew that the size of the log files was fairly small. But in a real-world situation, it would probably make more sense to go line by line.
One of the subgoals of this exercise was to remove any possible duplicate URLs from the final list. The provided solution used a dict, and I wish I had gone that route. Instead, I used this rather hacky strategy:
jpg_files = list(dict.fromkeys(jpg_files))
It works, but it’s doing a lot with one line, and it isn’t semantic. Thank u, next.
Some more differences I learned from:
- The Google code provided better feedback to the user.
- They also changed the names of the image files locally, which made them more semantic to the user
Getting to read/write/create files in this way was so fun, and it’s something I haven’t had an opportunity to do much of in my career up until now. I learned so much from this course, and I look forward to doing much more with Python.