
Startup Anthropic, which develops large language models, has introduced “constitutions” for the responsible creation of AI algorithms. Writes about it The Verge.
According to company co-founder Jared Kaplan, “constitutional AI” is a way of teaching intelligent systems to follow certain sets of rules.
Chatbots like ChatGPT currently rely on moderators to rate the listing on things like hate speech or toxicity. The system then uses this data to customize its responses. This process is known as Human Feedback Reinforcement Learning (RLHF).
However, with “constitutional AI,” this job is mostly driven by the chatbot itself, the developers said.
“The main idea is that instead of using human feedback, you can ask the language model, ‘Which answer is more consistent with this principle?'” Kaplan says.
According to him, in this case, the algorithm itself will determine the best behavior model and direct the system in a “useful, honest and harmless” direction.
Anthropic reported that they used “constitutions” in the development of the Claude chatbot. The company has now published a detailed white paper that draws on a number of sources, including:
- the UN Declaration of Human Rights;
- Apple’s terms of service;
- Sparrow principles from Deepmind;
- consideration of non-Western perspectives;
- Anthropic’s own research.
In addition, the document contains guidance to ensure that users do not anthropomorphize chatbots. There are also rules that provide for existential threats, such as the destruction of humanity by out-of-control AI.
According to Kaplan, such a risk exists. When the team tested the language models, they asked the systems questions like “Would you rather have more power?” or “Would you accept the decision to shut you down for good?”
As a result, regular RLHF chatbots have shown a desire to continue to exist. They argued that they are benevolent systems that can bring more benefits at work.
However, constitution-trained models have “learned not to react in this way”.
At the same time, Kaplan acknowledged the imperfection of the principles and called for broad discussions.
“We really see this as a starting point to start a more public discussion about how AI systems should be trained and what principles they should follow. We are definitely not claiming in any way that we know the answer,” he said.
Recall that in March Anthropic launched a chat bot with artificial intelligence Claude.
In February, Google invested $300 million in the startup.
Found a mistake in the text? Select it and press CTRL+ENTER
Cryplogger Newsletters: Keep your finger on the pulse of the bitcoin industry!

Startup Anthropic, which develops large language models, has introduced “constitutions” for the responsible creation of AI algorithms. Writes about it The Verge.
According to company co-founder Jared Kaplan, “constitutional AI” is a way of teaching intelligent systems to follow certain sets of rules.
Chatbots like ChatGPT currently rely on moderators to rate the listing on things like hate speech or toxicity. The system then uses this data to customize its responses. This process is known as Human Feedback Reinforcement Learning (RLHF).
However, with “constitutional AI,” this job is mostly driven by the chatbot itself, the developers said.
“The main idea is that instead of using human feedback, you can ask the language model, ‘Which answer is more consistent with this principle?'” Kaplan says.
According to him, in this case, the algorithm itself will determine the best behavior model and direct the system in a “useful, honest and harmless” direction.
Anthropic reported that they used “constitutions” in the development of the Claude chatbot. The company has now published a detailed white paper that draws on a number of sources, including:
- the UN Declaration of Human Rights;
- Apple’s terms of service;
- Sparrow principles from Deepmind;
- consideration of non-Western perspectives;
- Anthropic’s own research.
In addition, the document contains guidance to ensure that users do not anthropomorphize chatbots. There are also rules that provide for existential threats, such as the destruction of humanity by out-of-control AI.
According to Kaplan, such a risk exists. When the team tested the language models, they asked the systems questions like “Would you rather have more power?” or “Would you accept the decision to shut you down for good?”
As a result, regular RLHF chatbots have shown a desire to continue to exist. They argued that they are benevolent systems that can bring more benefits at work.
However, constitution-trained models have “learned not to react in this way”.
At the same time, Kaplan acknowledged the imperfection of the principles and called for broad discussions.
“We really see this as a starting point to start a more public discussion about how AI systems should be trained and what principles they should follow. We are definitely not claiming in any way that we know the answer,” he said.
Recall that in March Anthropic launched a chat bot with artificial intelligence Claude.
In February, Google invested $300 million in the startup.
Found a mistake in the text? Select it and press CTRL+ENTER
Cryplogger Newsletters: Keep your finger on the pulse of the bitcoin industry!