Helpful Unix Tools: ssh, htop, pstree, strace

The Problem

While working on my color generator project (which I have now dubbed ColorPi), I ran into an interesting issue.

I’m working on the part of my project where my Raspberry Pi lights up an RGB LED in whatever color my API has generated. My Pi is headless (meaning it’s not hooked up to a display), so doing dev work on the Pi itself is kind of annoying because I’m restricted to vim or nano, and y’all, they’re just not my thing right now. So instead, I’m coding on my laptop and using git to keep my Pi up-to-date.

Because the Pi is headless, everything I’m doing is via SSH. It’s so cool, I can communicate everything to the Pi directly from my laptop. The convenience factor of only having to deal with one keyboard and one screen instead of hopping back and forth between devices is immense, and it’s also really improved my command line skills.

However, sometimes I’ll lose the SSH connection because of something simple like accidentally closing my terminal window, or restarting my laptop, and when I SSH back into the Pi, I’ll no longer have access to the logs from the script I’m running – even though the process is still going!

Some Inelegant Solutions

The first time this happened, I rebooted the Pi and, after it started back up and I SSHed back into it, restarted my script. This option works, but it’s far from ideal. The next time it happened, I thought I’d figure out how to find the specific process that was associated with my script, and then terminate and restart it. For this, a tool like htop is very useful. There are different ways to see your running processes: ps -aux will do it, but htop gives you such a nice interactive interface!

There’s something that htop can’t help me with, though. In the above screenshot, there are 3 processes that are running because of my python deltaListener.py command. This can happen when a process spawns child processes. Which one should I kill?

To answer this, I learned about pstree, which shows the parent/child relationship between processes. By running pstree -p to get the names and PIDs of my running processes, I get the following output:

systemd(1)-+-alsactl(340)
           |-avahi-daemon(352)---avahi-daemon(406)
           |-bluealsa(634)-+-{bluealsa}(641)
           |               |-{bluealsa}(642)
           |               `-{bluealsa}(643)
           |-bluetoothd(629)
           |-cron(361)
           |-dbus-daemon(336)
           |-dhcpcd(416)
           |-hciattach(628)
           |-login(497)---bash(669)
           |-polkitd(455)-+-{polkitd}(456)
           |              `-{polkitd}(458)
           |-rngd(380)-+-{rngd}(382)
           |           |-{rngd}(383)
           |           `-{rngd}(384)
           |-rsyslogd(341)-+-{rsyslogd}(402)
           |               |-{rsyslogd}(403)
           |               `-{rsyslogd}(404)
           |-sshd(464)-+-sshd(1173)---sshd(1195)---bash(1198)---start(596)---python(597)-+-{python}(601)
           |           |                                                                 `-{python}(610)
           |           |-sshd(14486)---sshd(14508)---bash(14511)---pstree(2686)
           |           `-sshd(30093)---sshd(30344)---bash(30347)
           |-systemd(653)---(sd-pam)(656)
           |-systemd-journal(107)
           |-systemd-logind(342)
           |-systemd-timesyn(291)---{systemd-timesyn}(335)
           |-systemd-udevd(141)
           |-thd(356)
           |-udisksd(363)-+-{udisksd}(428)
           |              |-{udisksd}(439)
           |              |-{udisksd}(493)
           |              `-{udisksd}(507)
           |-wpa_supplicant(338)
           `-wpa_supplicant(469)

This is very cool! At line 20, we can see a process that says start(596). This is the bash script I created to kick off my python program so I didn’t have to keep typing the command line arguments. From there, we can see that the PID of the python process that spawned the others is 597. Now I can run kill -9 597 to stop everything so I can restart!

This is still an inelegant solution to my problem, though. Remember: I don’t really want to stop this process. I would like it to keep running, so I need a way to tap into the output it’s providing.

The Answer, and Many More Questions

At this point, I expected strace to solve my problems for me. This post on Unix Stack Exchange gave me so much hope! To whit: “If all you want to do is spy on the existing process, you can use strace -p1234 -s9999 -e writewhere 1234 is the process ID.” Sounds perfect, right?

Well, I tested it by first closing my terminal window with the ssh session and then running strace -p597 -s9999 -e write, and got this output:

strace: attach: ptrace(PTRACE_SEIZE, 597): No such process

Imaging my surprise. I returned to htop, and lo and behold, there was indeed no such process. What had happened?

Further inspection of my pstree output shows that python(597) is a result of start(596), which itself is a result of bash and then … sshd. So it looks as though by killing the SSH session, we killed all the processes it was running! The problem I thought I had, the one that sent me down this rabbit hole, turned out to not exist at all. My python script does not keep running if my SSH session dies. Is this a problem? That’s just one of many questions this new bit of information raises.

Stay tuned.

Python!

Screenshot of code that is discussed in this post. The link to the full text is included below.

Over the last couple days, I completed Google’s Python Class. I have tried so many learning methods, from books to MOOCs and everything in between, and this really hit a sweet spot for me. I did not watch the videos; I found the written info + exercises to be enough. The difficulty setting on this class was just right; it wasn’t starting from 0, as so many Python tutorials do (Which is fine! Those kinds of learning materials are so important!). Also, I found the exercises really engaging, particularly the more comprehensive ones towards the end.

And with those more advanced exercises in mind, I thought it might be interesting to take a look at my solution for the final exercise and compare it to their provided solution. Both solutions are at this gist, with mine on top and Google’s on the bottom, should you be interested in checking them out in full. Here are the program’s requirements.

The first difference that jumped out at me was a stylistic one: the solution code is so much more terse than mine. Their variable names are things like f and match, whereas I prefer to be a little more descriptive with my variables, opting for file and url_match for my corresponding variables. I’m also a fan of creating intermediate semantic variables rather than chaining methods together. The step where we had to create an index.html file provides a good study in contrast. First, Google’s code:

index = file(os.path.join(dest_dir, 'index.html'), 'w')

I feel this one line is doing a lot of work, possibly too much. The interpreter (Compiler? What’s the deal in Python?) can certainly handle it, but I find it difficult to parse as a human. Here’s my code:

index_path = os.path.join(dest_dir, 'index.html')
index_file = open(index_path, 'w')

I prefer this style because my general rule of thumb (stolen from Kyle Simpson) is to err on the side of optimizing for human readability. The variable names give me an idea of what is stored in them, whereas index doesn’t tell me much.

Interestingly, there’s another example where the script (see what I did there) is flipped. When sorting the image URLs, the solution file defines a function called url_sort_key(url) and then passes a reference to it as an argument to sorted():

def url_sort_key(url):
  """Used to order the urls in increasing order by 2nd word if present."""
  match = re.search(r'-(\w+)-(\w+)\.\w+', url)
  if match:
    return match.group(2)
  else:
    return url
return sorted(url_dict.keys(), key=url_sort_key)

As a JavaScript dev, I am used to seeing anonymous functions in these contexts, so I looked into how Python would achieve such a thing. Turns out, Python has something called lambdas that are basically the same thing, although I can’t find a way to give a lambda a name the way you can with an “anonymous” function expression in JavaScript. Here’s my analogous code:

sorted_jpg_files = sorted(jpg_files, key=lambda f: re.search(sort_pattern, f).group())

So, it’s interesting to me that the solution author opted for a more verbose code expression here, though I’m guessing they probably didn’t want to delve into lambdas in a fairly introductory course.

Let’s look at some other differences. Here’s a great example where my verbosity did not enhance the readability of the code. The goal was to extract a hostname from the filename of a provided log file, and the filenames all followed the same convention: someword_hostname. For the life of me, I could not remember the word “host,” which is how I ended up with the variable name base_url. I opted for a regular expression here, which works, but is ultimately too much of a muchness, I think:

url_match = re.search(r'\w+_(\S+)', filename)
base_url = ''
if url_match:
  base_url = url_match.group(1)

It’s clear that this code is doing too much, particularly when you compare it to the elegance of the provided solution:

underbar = filename.index('_')
host = filename[underbar + 1:]

Looking at my code and then looking at their code gives me the same feeling of tension and relief I experience when using my asthma inhaler. Like, oh, that’s what oxygen feels like.

Another interesting difference is that the solution code opted to iterate over the log file line by line and searched each line for the magic sauce, whereas I used .findall() on the whole file string. I suppose I felt comfortable using findall() because I knew that the size of the log files was fairly small. But in a real-world situation, it would probably make more sense to go line by line.

One of the subgoals of this exercise was to remove any possible duplicate URLs from the final list. The provided solution used a dict, and I wish I had gone that route. Instead, I used this rather hacky strategy:

jpg_files = list(dict.fromkeys(jpg_files))

It works, but it’s doing a lot with one line, and it isn’t semantic. Thank u, next.

Some more differences I learned from:

  • The Google code provided better feedback to the user.
  • They also changed the names of the image files locally, which made them more semantic to the user

Getting to read/write/create files in this way was so fun, and it’s something I haven’t had an opportunity to do much of in my career up until now. I learned so much from this course, and I look forward to doing much more with Python.

Color Generator: Middleware

I had been struggling with the architecture of my project. I knew I wanted a separate web API that was secured with an API key. Where I was sort of lost was in figuring out where to go from there.

I knew that I didn’t want my Vue app to talk directly to my API because I didn’t want to expose my API key. So my next thought was to route my API requests through Express.

I set about creating two Express routes, one for each API endpoint. This was easy enough. Both functions look almost the same, so here is the one for /generate-color:

app.get("/generate-color", function(req, res, next) {
    var options = {
        host: API_HOST,
        path: "/api/v1/color"
    };
    
    var request = https.get(options, function(response) {
        var bodyChunks = [];
        response.on("data", function(chunk) {
            bodyChunks.push(chunk);
            console.log(JSON.parse(chunk));
        }).on("end", function() {
            var body = Buffer.concat(bodyChunks);
            var parsed = JSON.parse(body);
            res.status(200).send({
                status: "success",
                message: "here is your color!",
                color: parsed.color
            });
            console.log(parsed);
        }).on("error", function(err) {
            console.log(err);
        });
    });

    request.on("error", function(e) {
        console.log(e.message);
    });
});

The code is fairly self contained, but there are a few things to note. First, API_HOST appears to come out of nowhere, but is actually stored in an environment variable. The dotenv package makes it nice and easy to use environment variables in a node project; you define your variables in a .env file, and these two magical lines make them available in your app code:

const dotenv = require("dotenv");
dotenv.config();

Currently, my API doesn’t implement authentication, but if it did, I would also have an API_KEY variable in my .env file.

Second, if you’re not familiar with how requests and responses work in node, this code may look a little odd. What is being pushed to the bodyChunks array? why is there a bodyChunks array in the first place?

As you may know, Node.js is event-based. That means that there is always an event loop running to check and see what events have taken place, if any. In this example, we can see three events being used: data, end, and error. We’ll zoom in on the data event. Whenever information is being passed from one place to another in Node, you are dealing with a Stream. A stream sends small chunks of information at a time, and when it does so, it emits a data event. In my code, we are listening for that data event, and when it happens, we are grabbing that chunk of data and we are pushing it to our bodyChunks array. We call it that because the array comprises the chunks of our response body from the server.

Once our stream sends its last chunk of information, it emits the end event. Then, it’s time to put all of our chunks together into a Buffer and turn it into something we can use: an object! That’s what we send back in our response.

Next, we’ll talk about how I set up my Vue client.

Storing Colors in MongoDB

Photo by Magda Ehlers from Pexels

I wanted to practice my database skills with this project, so I decided to incorporate a feature that logged each random color generated by a user request to a Mongo database. I chose MongoDB because my experience with relational databases is fairly limited, and I didn’t want to introduce too much complexity all at once. I knew I’d be comfortable (or at least something adjacent to comfortable) with Mongo’s documents.

This commit has the code I’ll be talking about in this post.

First, I had to install the correct driver from npm. I chose the official driver because it seemed like the obvious option for someone who’s never done this before. Next, I had to establish a connection between my app and Mongo. This part of the docs was very helpful to me. I instantiated a new MongoClient object as they instructed and then called client.connect():

client.connect(function(err, database) {
	assert.equal(null, err);
	console.log("Connected successfully to database server");

	app.listen(PORT, () => {
		console.log(`server running on port ${PORT}`);
	});
});

Initially, I hard-coded my PORT value, but eventually I wised up and installed dotenv.

I need to test and see if app.listen() needs to be inside the client.connect() callback. I put it there because I didn’t want my app to start until after the connection to the database had been made, but maybe that doesn’t matter. I’ll come back to that another day.

I was expecting to have to create a database and a collection in the shell before I could start coding, but it turns out you can run an insert command on a nonexistent collection in a nonexistent database and both the database and the collection will be created as soon as you insert the data. That’s handy.

My app has two get() handlers, one of which is for the endpoint that generates the random color. In addition to calling my generateColor() function, it takes the returned object and inserts it into the colors collection. Here’s what that get() looks like:

app.get("/api/v1/color", function handleRequest(req, res) {
	console.log("getting a request");
	var color = generateColor();
	db = client.db(dbName);
	var collection = db.collection("colors");
	collection.insertOne(color)
	res.status(200).send({
		success: "true",
		message: "color retrieved successfully",
		color
	});
});

Looking at the code now, I’m wishing that I used the return value of collection.insertOne() to inform what kind of response I sent to the user. According to the docs, the method returns a boolean value indicating the write concern, which indicates whether the insert command was successful. If I had continued with MongoDB, I’d go and change that now, but because I decided to switch to MySQL, it’ll have to wait.

Next up: retrieving the stored colors and serving them up to the user.

Let’s Get Colorful

colorful sequins in ROYGBIV order
Photo by Sharon McCutcheon from Pexels

The people who say you should be working on a personal side project are right and wrong at the same time. They’re wrong because it’s a bad practice to expect people to work a full time job and then do more work in their free time in order to be considered viable in the industry. They’re right because it’s what potential employers will always ask you about, particularly if your current full-time role is something they perceive as nontechnical, like teaching or managing.

I’ve always felt paralyzed by the prospect of coming up with an idea for a side project. I don’t have any business ideas. I don’t have any interest in business, period. And the problems in my life that could be improved by technology are usually things that require a small tool, usually something to make repetitive tasks easier.

I love making tools. I wilt in the oppressive doldrums of repetitive mundanity, and I don’t even mind if building and using the tool doesn’t save me time. Breaking even is a win as long as I’m taking on a challenge.

Another thing that sparks joy for me is thinking about how can I make this colorful? Life can be so boring. Where are the rainbow sprinkles? This was the inspiration for Roy, a tool that just makes coloring your Node.js stdout output a little bit easier.

function generateColor() {
	let r = Math.random();
	r = Math.floor(r * 256);
	let g = Math.random();
	g = Math.floor(g * 256);
	let b = Math.random();
	b = Math.floor(b * 256);
	let rString = r.toString(16).toUpperCase();
	let rHex = rString.length == 2 ? rString : `0${rString}`;
	let gString = g.toString(16).toUpperCase();
	let gHex = gString.length == 2 ? gString : `0${gString}`;
	let bString = b.toString(16).toUpperCase();
	let bHex = bString.length == 2 ? bString : `0${bString}`;
	let color = {
		rgb: {r, g, b},
		hex: `${rHex}${gHex}${bHex}`,
		timestamp: Date.now()
	}
	return client.db(dbName).collection("colors").insertOne(color);
}

Recently, a human I trust and respect suggested that I Work On A Personal Project. She was right, so I started wracking my brain about what I should make. I hate this stage because I JUST WANT TO MAKE THINGS! I don’t want to pull an idea out of thin air. So I decided on the simplest concept I could think of: a web app that just served up a random RGB color. And thus, an app was born.

This is it … this function is the whole idea. I’m going to create a web service and deploy it to increasingly complex environments and learn stuff. I’m also going to build a client because, well, I’m primarily a front-end dev right now. I should probably have a client.

Next step: Heroku.