AI Alignment and Safety
Artificial Intelligence has the potential to improve many different aspects of society, from production to scientific discovery. However, the potential for misuse is also high, for example if AI were to be in control of nuclear weapons. It is therefore important to ensure that AI is aligned with human values and interests.
There is no one-size-fits-all answer to ensuring that artificial intelligence is aligned with humanity. Different organizations and societies will have different needs and values, so the approach to alignment will be different for each of them. However, there are some general principles that can be followed to help ensure that artificial intelligence is aligned with humanity.
- Ensure that artificial intelligence is transparent and accountable. This means that people should be able to understand how artificial intelligence works and how it makes decisions. Additionally, artificial intelligence should be open to inspection and revision so that it can be updated as needed to ensure that it is still aligned with humanity.
- Ensure that artificial intelligence is ethically responsible. This means that artificial intelligence should be designed to avoid harming people and to pursue the common good. Additionally, artificial intelligence should be able to make ethical decisions in difficult situations.
- Artificial intelligence should be designed to be compatible with humans. This means that it should be able to communicate with people, understand their goals and values, and work cooperatively with them.
- Artificial intelligence should be secure and reliable. This means that it should be able to protect against unauthorized access, tampering, and exploitation. Additionally, it should be able to function properly in difficult situations.
—
- Artificial intelligence should be upgradable. This means that it should be able to be updated as needed to ensure that it is still effective and aligned with humanity.
- Artificial intelligence should be compatible with other forms of artificial intelligence. This means that it should be able to work with other forms of artificial intelligence to create synergies.
Overall, there is no one-size-fits-all answer to align artificial intelligence with humanity. However, by following these general principles, it should be feasible to create advanced artificial intelligence that is aligned with humanity.
Explainable Machine Learning is a key part of AI Safety, centered around understanding machine learning models and trying to devise new ways to train them that lead to desired behaviours. For example, getting large language models like OpenAIs GPT3 to output benign completions to a given prompt.
Recommended Books
- Nick Bostroms Superintelligence is a great starter
- Fantastic book on AI safety and security with lots of papers
Papers
Links
- Positively shaping the development of Artificial Intelligence by 80,000 Hours
- 2022 AGI Safety Fundamentals alignment curriculum
- AI Alignment Forum Library (highly recommended)
- AGI safety from first principles by Richard Ngo
- Late 2021 MIRI Conversations
- AI safety resources by Victoria Krakovna
- AI safety syllabus by 80,000 hours
- EA reading list: Paul Christiano
- Technical AGI safety research outside AI
- Alignment Newsletter by Rohin Shah
- Building safe artificial intelligence: specification, robustness, and assurance
- OpenAI: Aligning Language Models to Follow Instructions
- OpenAIs Alignment Research Overview
- Links (57) & AI safety special by José Ricon
- Steve Byrnes’ essays on Artificial General Intelligence (AGI) safety
- On AI forecasting
- Practically-A-Book Review: Yudkowsky Contra Ngo On Agents
- AI Safety essays by Gwern
Podcast and videos