Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Darkly)
  • No Skin
Collapse

Chebucto Regional Softball Club

  1. Home
  2. Uncategorized
  3. You've heard of the alignment problem in AI?
A forum for discussing and organizing recreational softball and baseball games and leagues in the greater Halifax area.

You've heard of the alignment problem in AI?

Scheduled Pinned Locked Moved Uncategorized
llmalignment
8 Posts 2 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • AaronH This user is from outside of this forum
    AaronH This user is from outside of this forum
    Aaron
    wrote last edited by
    #1

    You've heard of the alignment problem in AI? This is where it will really start to be an issue:

    LLMs Can Now Self-Evolve At Test Time Using Reinforcement Learning | by Dr. Ashish Bamania | Jul, 2025 | AI Advances https://ai.gopubby.com/llms-can-now-self-evolve-at-test-time-using-reinforcement-learning-e769ee6d3f86

    #LLM
    #alignment
    #AI

    AaronH 1 Reply Last reply
    0
    • AaronH Aaron

      You've heard of the alignment problem in AI? This is where it will really start to be an issue:

      LLMs Can Now Self-Evolve At Test Time Using Reinforcement Learning | by Dr. Ashish Bamania | Jul, 2025 | AI Advances https://ai.gopubby.com/llms-can-now-self-evolve-at-test-time-using-reinforcement-learning-e769ee6d3f86

      #LLM
      #alignment
      #AI

      AaronH This user is from outside of this forum
      AaronH This user is from outside of this forum
      Aaron
      wrote last edited by
      #2

      Reinforcement learning is what inspired the concept in the first place. It's been known for decades now that if your RL "reward function" (or, equivalently, "cost function", "loss function", or "fitness function") isn't carefully designed, the system will often discover loopholes -- unintended shortcuts that lead to high reward via behaviors that violate the original design constraints of the system. This happens because the real world is complicated, and it's *really* hard to design robust reward functions for anything even remotely approximating the complexity of the real world.

      AaronH 1 Reply Last reply
      0
      • AaronH Aaron

        Reinforcement learning is what inspired the concept in the first place. It's been known for decades now that if your RL "reward function" (or, equivalently, "cost function", "loss function", or "fitness function") isn't carefully designed, the system will often discover loopholes -- unintended shortcuts that lead to high reward via behaviors that violate the original design constraints of the system. This happens because the real world is complicated, and it's *really* hard to design robust reward functions for anything even remotely approximating the complexity of the real world.

        AaronH This user is from outside of this forum
        AaronH This user is from outside of this forum
        Aaron
        wrote last edited by
        #3

        You may have heard of the boat-racing video game where the system learned to go in tight circles to maximize reward instead of completing the race? For a more familiar and realistic example, look at drug-seeking behavior in drug addiction and how it short-circuits the ability of a person to care for themself.

        AaronH 1 Reply Last reply
        0
        • AaronH Aaron

          You may have heard of the boat-racing video game where the system learned to go in tight circles to maximize reward instead of completing the race? For a more familiar and realistic example, look at drug-seeking behavior in drug addiction and how it short-circuits the ability of a person to care for themself.

          AaronH This user is from outside of this forum
          AaronH This user is from outside of this forum
          Aaron
          wrote last edited by
          #4

          I refuse to call them "general intelligence", but LLMs are of sufficient complexity that they are used to interface with the real world. This opens them up to all sorts of short-circuits in their reward functions. To have them learn dynamically within the environment is a huge risk, and there needs to be a rich framework of safeguards around their behaviors to even consider letting them loose to operate "in the wild".

          AaronH 1 Reply Last reply
          0
          • AaronH Aaron

            I refuse to call them "general intelligence", but LLMs are of sufficient complexity that they are used to interface with the real world. This opens them up to all sorts of short-circuits in their reward functions. To have them learn dynamically within the environment is a huge risk, and there needs to be a rich framework of safeguards around their behaviors to even consider letting them loose to operate "in the wild".

            AaronH This user is from outside of this forum
            AaronH This user is from outside of this forum
            Aaron
            wrote last edited by
            #5

            I don't think we'll ever see the so-called "singularity", to be clear. We already have extremely intelligent human beings, and it has been observed that at a certain point, the correlation between IQ and wealth/power breaks down. But that doesn't mean that these LLM-based agents won't be capable of a remarkable level of destructiveness if we aren't careful with them. We have safeguards to keep human beings in check. We need them for artificial agents, as well.

            AaronH 1 Reply Last reply
            0
            • AaronH Aaron

              I don't think we'll ever see the so-called "singularity", to be clear. We already have extremely intelligent human beings, and it has been observed that at a certain point, the correlation between IQ and wealth/power breaks down. But that doesn't mean that these LLM-based agents won't be capable of a remarkable level of destructiveness if we aren't careful with them. We have safeguards to keep human beings in check. We need them for artificial agents, as well.

              AaronH This user is from outside of this forum
              AaronH This user is from outside of this forum
              Aaron
              wrote last edited by
              #6

              In fact, we should probably have a richer set of safeguards around artificial agents. At least human beings mostly have morals, consciences, ethics, and fear of consequences. These artificial agents will have no such bounds unless we build them in.

              AaronH 1 Reply Last reply
              0
              • AaronH Aaron

                In fact, we should probably have a richer set of safeguards around artificial agents. At least human beings mostly have morals, consciences, ethics, and fear of consequences. These artificial agents will have no such bounds unless we build them in.

                AaronH This user is from outside of this forum
                AaronH This user is from outside of this forum
                Aaron
                wrote last edited by
                #7

                Something you may be surprised to learn, is that intelligence and self-awareness alone are not enough to give a being feelings or emotions. Hollywood got it wrong. There is no spontaneous waking up and developing a will. In an artificial system, the will *is* the reward function; it is whatever we shape it to be. You program in a reward for killing people, and the system will like killing people, period. You program in a reward for making people smile, and the system will like making people smile, period. It will find the shortest path to that, even if it means pulling a Joker-style trick and surgically altering your face.

                myrmepropagandistF 1 Reply Last reply
                1
                0
                • myrmepropagandistF myrmepropagandist shared this topic
                • AaronH Aaron

                  Something you may be surprised to learn, is that intelligence and self-awareness alone are not enough to give a being feelings or emotions. Hollywood got it wrong. There is no spontaneous waking up and developing a will. In an artificial system, the will *is* the reward function; it is whatever we shape it to be. You program in a reward for killing people, and the system will like killing people, period. You program in a reward for making people smile, and the system will like making people smile, period. It will find the shortest path to that, even if it means pulling a Joker-style trick and surgically altering your face.

                  myrmepropagandistF This user is from outside of this forum
                  myrmepropagandistF This user is from outside of this forum
                  myrmepropagandist
                  wrote last edited by
                  #8

                  @hosford42

                  Even ants have emotions. Or, if that's too anthropomorphic for you, they have persistent identifiable states that influence the way that they make choices in ways that are consistent but different based on individual temperament and experience.

                  Ants can be in a state of alarm and will see biting or stinging as a more attractive solution. Ants that are calm will investigate more with their antennae.

                  1 Reply Last reply
                  0

                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  Powered by NodeBB Contributors
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups