I was halfway through a master’s in Computer Science when my vision changed. I was working as a data scientist during my summer off from school, and I had friends who said things like “I’m at a local maximum” in normal conversation. Wherever I looked, suboptimality began standing out as though highlighted in the angry red of a programming error.
There was the athletic-looking youth whose slow amble blocked the walking path so no one else could pass—suboptimal. There was the friend I ran errands with who planned stops in an inefficient order, so the driving took three Nicki Minaj songs longer than it needed to—suboptimal. And there was me. I could rarely go an hour without at some point becoming forgetful, distracted, tired, or slow—embarrassingly suboptimal.
One of the first things I’d learned about optimization was that something is optimal if it is equal or preferable to any alternative. To optimize an experience, then, is to shepherd it toward the preferable.
Decision-making is generally hard because you don’t know what each choice will result in. You do have a lifetime of data on how your actions have historically played out, though, and with that you can guess at which option will have the most preferable outcome. This is the basic idea behind reinforcement learning, which underpins the AI that can learn to play video games and Go; other problems in the field sport names like The Restless Multi-Armed Bandit. When a computer agent makes a choice that yields a favorable outcome in reinforcement learning, the memory that the choice was “good” goes on to affect future decision-making, reinforcing the behavior. If a day of being alive is also a series of decisions, could an algorithm successfully optimize my life, too?
The question sounded simple, but I couldn’t stop wondering about it. Finally, I decided: I would try to formally optimize my daily life. On one Saturday, I would make decisions using an algorithm I had sketched out for choosing optimal actions. My algorithm was a rough translation of Q-Learning—one of the simpler reinforcement learning algorithms—into steps a human could follow.
Here’s how it worked: When I had a decision to make, I’d first convert it into a set of actions to choose between. I would then decide which one to choose with the help of a random number generator on my phone. The RNG would produce a number between one and 100. If that number was six or higher, I’d go with the option that had historically led to the most preferable results. An actual reinforcement learning algorithm helps score how preferable a given option is based on the computer agent’s past observations. I would crudely approximate this by reflecting on the outcomes of similar decisions I’d made in the past.
If the random number I got was five or lower, however, I would “explore” and take a random option instead. This option would be chosen by generating a second random number. For example, to pick a random option out of a set of five possibilities, I’d split the numbers 1 to 100 into five buckets. The bucket for the first option would have the numbers 1 through 20, the bucket for the second option would have the numbers 21 through 40, and so on. The option I picked would be the one whose bucket contained the new random number I rolled.
With a cutoff of five, I would be choosing a random option for about one in every 20 decisions I made with my algorithm. I picked five as the cutoff because it seemed like a reasonable frequency for occasional randomness. For go-getters, there are further optimization processes for deciding what cutoff to use, or even changing the cutoff value as learning continues. Your best bet is often to try some values and see which is the most effective. Reinforcement learning algorithms sometimes take random actions because they rely on past experience. Always selecting the predicted best option could mean missing out on a better choice that’s never been tried before.
I doubted that this algorithm would truly improve my life. But the optimization framework, backed up by mathematical proofs, peer-reviewed papers, and billions in Silicon Valley revenues, made so much sense to me. How, exactly, would it fall apart in practice?
The first decision? Whether to get up at 8:30 like I’d planned. I turned my alarm off, opened the RNG, and held my breath as it spun and spit out … a 9!
Now the big question: In the past, has sleeping in or getting up on time produced more preferable results for me? My intuition screamed that I should skip any reasoning and just sleep in, but for the sake of fairness, I tried to ignore it and tally up my hazy memories of morning snoozes. The joy of staying in bed was greater than that of an unhurried weekend morning, I decided, as long as I didn’t miss anything important.
I had a group project meeting in the morning and some machine learning reading to finish before it started (“Bayesian Deep Learning via Subnetwork Inference,” anyone?), so I couldn’t sleep for long. The RNG instructed me to decide based on previous experience whether to skip the meeting; I opted to attend. To decide whether to do my reading, I rolled again and got a 5, meaning I would choose randomly between doing the reading and skipping it.
It was such a small decision, but I was surprisingly nervous as I prepared to roll another random number on my phone. If I got a 50 or lower, I would skip the reading to honor the “exploration” component of the decision-making algorithm, but I didn’t really want to. Apparently, shirking your reading is only fun when you do it on purpose.
I pressed the GENERATE button.
65. I would read after all.
I wrote out a list of options for how to spend the swath of free time I now faced. I could walk to a distant café I’d been wanting to try, call home, start some schoolwork, look at PhD programs to apply to, go down an irrelevant internet rabbit hole, or take a nap. A high number came out of the RNG—I would need to make a data-driven decision about what to do.
This was the day’s first decision more complicated than yes or no, and the moment I began puzzling over how “preferable” each option was, it became clear that I had no way to make an accurate estimation. When an AI agent following an algorithm like mine makes decisions, computer scientists have already told it what qualifies as “preferable.” They translate what the agent experiences into a reward score, which the AI then tries to maximize, like “time survived in a video game” or “money earned on the stock market.” Reward functions can be tricky to define, though. An intelligent cleaning robot is a classic example. If you instruct the robot to simply maximize pieces of trash thrown away, it could learn to knock over the trash can and put the same trash away again to increase its score.
The longer I thought about which of my options was most preferable, the more uncomfortable I felt. How could I possibly measure the excitement of the new café against the comfort of a nap or the relief of making progress on those nagging applications? It seemed that these outcomes were utterly incomparable. Any estimate of their values would invariably fall short. And yet, the very definitions of “optimal” and “preferable” required that I compare them.
Before I knew it, I’d spent half an hour thinking about my options. Any metric I imagined for preferability was flawed. Decisions made using measurements are doomed to overvalue factors that can be measured: salary over fulfillment in careers, quantity over quality in friendships. Unfortunately, we owe the richest moments of being human to emotions we can’t measure accurately. At least not yet.
What’s more, the options I gave myself for each decision were far more complex than those a computer scientist would offer an agent. These are generally along the lines of “step left,” “turn on this motor,” or “sell this stock,” basic actions that offer a more general set of possibilities for what the agent can achieve. Imagine if instead of giving myself a limited list of ways to spend free time, I repeatedly picked a specific muscle to move—I could theoretically go anywhere or do anything by coming up with a sequence of discrete motions! The tradeoff is that most combinations of very basic actions would be useless, and figuring out which would be useful would be harder. I certainly wouldn’t have known how to make data-driven decisions about muscle movement. Some combinations of basic actions can also lead an agent to harm, which is fine in a computer simulation but not in real life. What if the random number gods assigned me muscle movements for doing the splits?
Overall, AI delivers “exactly what we ask for—for better or for worse” in the words of Janelle Shane. My algorithm couldn’t pave the way to a perfect life if I didn’t have a clear vision of what that life ought to look like. Articulating what “optimal” means is also difficult when you apply AI to real problems. To encourage intelligent-looking behavior, sometimes “optimal” is defined as “hard to distinguish from human performance.” This has helped produce text-generation models whose writing sounds impressively human, but these models also learn human flaws and human prejudices. We are left wondering what it means to be optimally fair, safe, and helpful when we manage, care for, and interact with other people, concerns that have puzzled humanity since long before the advent of the computer.
Finally, lunchtime came. Once again, I could use the structure of the day to make decisions for me.
A deadline was creeping up on me. Starting my writing assignment and finishing it quickly would be the optimal use of my time. However, no matter what I tried, I remained a slow writer.
In general, I believe that having more of certain things—namely health, time, money, and energy—is always preferable. But we can lose a lot when we optimize for these four goals. Beyond paying in one to obtain another, there are compelling arguments that fixating on optimization can make people less connected to reality and unduly obsessed with control.
Remember, however, that optimization doesn’t necessarily imply blind efficiency. It can also create opportunities for humility and reflection or hide preferences we aren’t aware of.
For me, optimizing something at any scale—even scheduling laundry day so no item is dirty or mid-wash right when I want to wear it—is deeply satisfying. But this preference for optimization had gone from a tool for eliminating distractions and boosting productivity to a distraction itself, an end rather than a means of approaching some greater direction. Unfortunately, identifying a direction is the hardest problem of all.
The writing I was working on would eventually become this essay, but I ended up scrapping everything I wrote that afternoon. Working faster would only have sent me farther in the wrong direction.
As I was heading out to meet some friends, I squeezed a final round of decisions out of my optimization algorithm. What do I eat for dinner? What do I wear? How much do I drink? A couple of RNG spins instructed me to pick a random jacket and estimate the most preferable option for everything else.
For much of the day, generating the random numbers had felt reassuring, as though my commitment to the complex and logical RNG ritual meant I deserved optimization participation points. When I caught myself excited about how the restaurant’s menu had many dishes I’d never tried before, I had to acknowledge that the RNG process hadn’t been necessary: I like trying new things even without an algorithm telling me to.
I’m a terrible lightweight, so the drinking decision was the easiest one. I could have 2.5 drinks, max, or I’d suffer awful physical discomfort later.
Half tipsy, I finally asked the two friends I was with what optimizing life meant to them.
Rajath said what you’d expect to hear yelled over the din of a bar: “Do what makes you happy, and be with people who make you happy.”
Yejun’s answer was unexpectedly clear and specific, almost ready for conversion into an algorithm. She must think about this a lot. “Optimal is when you only do things that make you happy. You don’t have to do anything you don’t want to. Any task comes with a reward.”
Happy. That’s a direction, right? Just then, our server came out with twice as much sangria as we’d ordered. He’d made a mistake, he said in a kind voice, and we should enjoy the extra drinks on the house. I hesitated for a second, thinking of my earlier optimized decision, then accepted. After all, didn’t optimization mean doing what makes you happy?
I lay in bed for an eternity, sweating and panting and swearing my way through a headache and a too-fast heartbeat and the itchy, angry red flush that had crept over my skin. It was exactly the feeling I always promised myself I would never feel again, suboptimal in every way.