When discussing AI, many focus on the incredible suite of functionalities that the technology can bring to the table to make our lives easier, such as the capabilities in personal assistants and self-driving cars. However, in order for these functions to take place and provide the most utility to human users, the AI behind it needs to be built to learn and execute its functionalities in a way that aligns with human end user success metrics and standards. This is where the concept of AI alignment comes into play. AI alignment is the study and practice of building out AI utility functions to be in line with our own. This practice requires the designer to establish a detailed point system that assigns points based on the positive or negative utility that the human end users realizes based on the outcome of specific actions. If there is too little detail, then negative outcomes can come about.
A simplistic example of AI misalignment can be seen in Disney’s Fantasia, where Mickey Mouse brings to life a broom and orders it fill up a cauldron. There was not sufficient detail inputted in aligning the broom to this task and it ends up flooding the room. The utility function in this case can be summarized as “If cauldron is full = 1 point, If cauldron is empty = 0 points”. Now, if we were to apply AI alignment principals to this situation, the function would include more details to align the intelligent agent’s values with that of the end user such as “If room floods = -10 points, If someone dies in the pursuit or result of this task = -1,000 points, If task can be completed in 10 minutes = +0.2 points, etc”. By adding additional nuance, the AI is able to complete the task as intended by the end user without leaving room for unintended consequences.
What are some other examples of proper or improper AI alignment in technology today? How can integrative thinking be applied to AI alignment? How do differing cultures impact deriving end user utility?