Humans can achieve great things, but they can also harm each other. That's why we have a written set of rules called a constitution that tells us what's permissible and what's not. And recently a group of researchers are trying to apply the same thing to an AI. Let's learn how a set of rules can be applied to AI.
How was the problem of harmfulness addressed before?
Either by restricting some of the phrases or supervising their conversational AI models with some human feedback. These solutions are a step in the right direction but we are still seeing that it is able to generate harmful content with these conversational AI models by jailbreaking them. And unfortunately, human feedback as a solution is not going to be scalable in the long term.
How was Anthropic's Claude trained?
Antropic trained their AI assistant Claude using a new safety approach called Constitutional AI. This method trains the model considering only one piece of human input a constitution of rules and principles. The goal is to arrive at a model that is helpful, so one that is not avoiding answering the question but is also not harmful so it does not cooperate when subject to harmful prompts.
What is the main takeaway from this work?
The main points to take away from the study are the possibilities of guiding large language model generation toward ethical values through explicit statements in prompts, and how preference and reward models can be trained almost entirely without human input.
How does ChatGPT compare to Claude?
Claude tries to balance harmlessness and helpfulness by not avoiding answering harmful questions and rather giving the user reasons as to why a query is harmful.
▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬
🖥️ Website: https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_mis_38
🐦 Twitter: https://twitter.com/AssemblyAI
🦾 Discord: https://discord.gg/Cd8MyVJAXd
▶️ Subscribe: null
🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#MachineLearning #DeepLearning
7 Comments