You've heard of the alignment problem in AI?

Aaron

You've heard of the alignment problem in AI? This is where it will really start to be an issue:

LLMs Can Now Self-Evolve At Test Time Using Reinforcement Learning | by Dr. Ashish Bamania | Jul, 2025 | AI Advances https://ai.gopubby.com/llms-can-now-self-evolve-at-test-time-using-reinforcement-learning-e769ee6d3f86

#LLM
#alignment
#AI

Aaron

Reinforcement learning is what inspired the concept in the first place. It's been known for decades now that if your RL "reward function" (or, equivalently, "cost function", "loss function", or "fitness function") isn't carefully designed, the system will often discover loopholes -- unintended shortcuts that lead to high reward via behaviors that violate the original design constraints of the system. This happens because the real world is complicated, and it's *really* hard to design robust reward functions for anything even remotely approximating the complexity of the real world.

Aaron

You may have heard of the boat-racing video game where the system learned to go in tight circles to maximize reward instead of completing the race? For a more familiar and realistic example, look at drug-seeking behavior in drug addiction and how it short-circuits the ability of a person to care for themself.

Aaron

I refuse to call them "general intelligence", but LLMs are of sufficient complexity that they are used to interface with the real world. This opens them up to all sorts of short-circuits in their reward functions. To have them learn dynamically within the environment is a huge risk, and there needs to be a rich framework of safeguards around their behaviors to even consider letting them loose to operate "in the wild".

Aaron

I don't think we'll ever see the so-called "singularity", to be clear. We already have extremely intelligent human beings, and it has been observed that at a certain point, the correlation between IQ and wealth/power breaks down. But that doesn't mean that these LLM-based agents won't be capable of a remarkable level of destructiveness if we aren't careful with them. We have safeguards to keep human beings in check. We need them for artificial agents, as well.

Aaron

In fact, we should probably have a richer set of safeguards around artificial agents. At least human beings mostly have morals, consciences, ethics, and fear of consequences. These artificial agents will have no such bounds unless we build them in.

Aaron

Something you may be surprised to learn, is that intelligence and self-awareness alone are not enough to give a being feelings or emotions. Hollywood got it wrong. There is no spontaneous waking up and developing a will. In an artificial system, the will *is* the reward function; it is whatever we shape it to be. You program in a reward for killing people, and the system will like killing people, period. You program in a reward for making people smile, and the system will like making people smile, period. It will find the shortest path to that, even if it means pulling a Joker-style trick and surgically altering your face.

myrmepropagandist

@hosford42

Even ants have emotions. Or, if that's too anthropomorphic for you, they have persistent identifiable states that influence the way that they make choices in ways that are consistent but different based on individual temperament and experience.

Ants can be in a state of alarm and will see biting or stinging as a more attractive solution. Ants that are calm will investigate more with their antennae.

Chebucto Regional Softball Club

You've heard of the alignment problem in AI?