Benutzer:LaMona20/AI safety

aus Wikipedia, der freien Enzyklopädie
Zur Navigation springen Zur Suche springen
Dieser Artikel (AI safety) ist im Entstehen begriffen und noch nicht Bestandteil der freien Enzyklopädie Wikipedia.
Wenn du dies liest:
  • Der Text kann teilweise in einer Fremdsprache verfasst, unvollständig sein oder noch ungeprüfte Aussagen enthalten.
  • Wenn du Fragen zum Thema hast, nimm am besten Kontakt mit dem Autor LaMona20 auf.
Wenn du diesen Artikel überarbeitest:
  • Bitte denke daran, die Angaben im Artikel durch geeignete Quellen zu belegen und zu prüfen, ob er auch anderweitig den Richtlinien der Wikipedia entspricht (siehe Wikipedia:Artikel).
  • Nach erfolgter Übersetzung kannst du diese Vorlage entfernen und den Artikel in den Artikelnamensraum verschieben. Die entstehende Weiterleitung kannst du schnelllöschen lassen.
  • Importe inaktiver Accounts, die länger als drei Monate völlig unbearbeitet sind, werden gelöscht.
Vorlage:Importartikel/Wartung-2024-01

Vorlage:Short description Vorlage:Artificial intelligence

AI safety is an interdisciplinary field concerned with preventing accidents, misuse, or other harmful consequences that could result from artificial intelligence (AI) systems. It encompasses machine ethics and AI alignment, which aim to make AI systems moral and beneficial, and AI safety encompasses technical problems including monitoring systems for risks and making them highly reliable. Beyond AI research, it involves developing norms and policies that promote safety.

Datei:Power-Seeking Image.png
Some ways in which an advanced misaligned AI could try to gain more power.[1] Power-seeking behaviors may arise because power is useful to accomplish virtually any objective[2] (see instrumental convergence).

AI researchers have widely differing opinions about the severity and primary sources of risk posed by AI technology[3][4][5] – though surveys suggest that experts take high consequence risks seriously. In two surveys of AI researchers, the median respondent was optimistic about AI overall, but placed a 5% probability on an “extremely bad (e.g. human extinction)” outcome of advanced AI.[3] In a 2022 survey of the Natural language processing (NLP) community, 37% agreed or weakly agreed that it is plausible that AI decisions could lead to a catastrophe that is “at least as bad as an all-out nuclear war.”[6] Scholars discuss current risks from critical systems failures,[7] bias,[8] and AI enabled surveillance;[9] emerging risks from technological unemployment, digital manipulation,[10] and weaponization;[11] and speculative risks from losing control of future artificial general intelligence (AGI) agents.[12]

Some have criticized concerns about AGI, such as Andrew Ng who compared them in 2015 to "worrying about overpopulation on Mars when we have not even set foot on the planet yet."[13] Stuart J. Russell on the other side urges caution, arguing that "it is better to anticipate human ingenuity than to underestimate it."[14]

Risks from AI began to be seriously discussed at the start of the computer age:

Vorlage:Blockquote

From 2008 to 2009, the Association for the Advancement of Artificial Intelligence (AAAI) commissioned a study to explore and address potential long-term societal influences of AI research and development. The panel was generally skeptical of the radical views expressed by science-fiction authors but agreed that "additional research would be valuable on methods for understanding and verifying the range of behaviors of complex computational systems to minimize unexpected outcomes."[15]

In 2011, Roman Yampolskiy introduced the term "AI safety engineering"[16] at the Philosophy and Theory of Artificial Intelligence conference,[17] listing prior failures of AI systems and arguing that "the frequency and seriousness of such events will steadily increase as AIs become more capable."[18]

In 2014, philosopher Nick Bostrom published the book Superintelligence: Paths, Dangers, Strategies. He has the opinion that the rise of AGI has the potential to create various societal issues, ranging from the displacement of the workforce by AI, manipulation of political and military structures, to even the possibility of human extinction.[19] His argument that future advanced systems may pose a threat to human existence prompted Elon Musk, Vorlage:Citation needed Bill Gates,[20] and Stephen Hawking[21] to voice similar concerns.

In 2015, dozens of artificial intelligence experts signed an open letter on artificial intelligence calling for research on the societal impacts of AI and outlining concrete directions.[22] To date, the letter has been signed by over 8000 people including Yann LeCun, Shane Legg, Yoshua Bengio, and Stuart Russell.

In the same year, a group of academics led by professor Stuart Russell founded the Center for Human-Compatible AI at the University of California Berkeley and the Future of Life Institute awarded $6.5 million in grants for research aimed at "ensuring artificial intelligence (AI) remains safe, ethical and beneficial."[23]

In 2016, the White House Office of Science and Technology Policy and Carnegie Mellon University announced The Public Workshop on Safety and Control for Artificial Intelligence,[24] which was one of a sequence of four White House workshops aimed at investigating "the advantages and drawbacks" of AI.[25] In the same year, Concrete Problems in AI Safety – one of the first and most influential technical AI Safety agendas – was published.[26]

In 2017, the Future of Life Institute sponsored the Asilomar Conference on Beneficial AI, where more than 100 thought leaders formulated principles for beneficial AI including "Race Avoidance: Teams developing AI systems should actively cooperate to avoid corner-cutting on safety standards."[27]

In 2018, the DeepMind Safety team outlined AI safety problems in specification, robustness, and assurance.[28] The following year, researchers organized a workshop at ICLR that focused on these problem areas.[29]

In 2021, Unsolved Problems in ML Safety was published, outlining research directions in robustness, monitoring, alignment, and systemic safety.[30]

In 2023, Rishi Sunak said he wants the United Kingdom to be the "geographical home of global AI safety regulation" and to host the first global summit on AI safety.[31]

Vorlage:Primary sources AI safety research areas include robustness, monitoring, and alignment.[30][28]

Adversarial robustness

[Bearbeiten | Quelltext bearbeiten]

AI systems are often vulnerable to adversarial examples or “inputs to machine learning (ML) models that an attacker has intentionally designed to cause the model to make a mistake”.[32] For example, in 2013, Szegedy et al. discovered that adding specific imperceptible perturbations to an image could cause it to be misclassified with high confidence.[33] This continues to be an issue with neural networks, though in recent work the perturbations are generally large enough to be perceptible.[34][35][36]

Carefully crafted noise can be added to an image to cause it to be misclassified with high confidence.

All of the images on the right are predicted to be an ostrich after the perturbation is applied. (Left) is a correctly predicted sample, (center) perturbation applied magnified by 10x, (right) adversarial example.[33]

Adversarial robustness is often associated with security.[37] Researchers demonstrated that an audio signal could be imperceptibly modified so that speech-to-text systems transcribe it to any message the attacker chooses.[38] Network intrusion[39] and malware[40] detection systems also must be adversarially robust since attackers may design their attacks to fool detectors.

Models that represent objectives (reward models) must also be adversarially robust. For example, a reward model might estimate how helpful a text response is and a language model might be trained to maximize this score.[41] Researchers have shown that if a language model is trained for long enough, it will leverage the vulnerabilities of the reward model to achieve a better score and perform worse on the intended task.[42] This issue can be addressed by improving the adversarial robustness of the reward model.[43] More generally, any AI system used to evaluate another AI system must be adversarially robust. This could include monitoring tools, since they could also potentially be tampered with to produce a higher reward.[44]

Estimating uncertainty

[Bearbeiten | Quelltext bearbeiten]

It is often important for human operators to gauge how much they should trust an AI system, especially in high-stakes settings such as medical diagnosis.[45] ML models generally express confidence by outputting probabilities; however, they are often overconfident,[46] especially in situations that differ from those that they were trained to handle.[47] Calibration research aims to make model probabilities correspond as closely as possible to the true proportion that the model is correct.

Similarly, anomaly detection or out-of-distribution (OOD) detection aims to identify when an AI system is in an unusual situation. For example, if a sensor on an autonomous vehicle is malfunctioning, or it encounters challenging terrain, it should alert the driver to take control or pull over.[48] Anomaly detection has been implemented by simply training a classifier to distinguish anomalous and non-anomalous inputs,[49] though a range of additional techniques are in use.[50][51]

Detecting malicious use

[Bearbeiten | Quelltext bearbeiten]

Scholars[11] and government agencies have expressed concerns that AI systems could be used to help malicious actors to build weapons,[52] manipulate public opinion,[53][54] or automate cyber attacks.[55] These worries are a practical concern for companies like OpenAI which host powerful AI tools online.[56] In order to prevent misuse, OpenAI has built detection systems that flag or restrict users based on their activity.[57]

Neural networks have often been described as black boxes,[58] meaning that it is difficult to understand why they make the decisions they do as a result of the massive number of computations they perform.[59] This makes it challenging to anticipate failures. In 2018, a self-driving car killed a pedestrian after failing to identify them. Due to the black box nature of the AI software, the reason for the failure remains unclear.[60]

One critical benefit of transparency is explainability.[61] It is sometimes a legal requirement to provide an explanation for why a decision was made in order to ensure fairness, for example for automatically filtering job applications or credit score assignment.[61]

Another benefit is to reveal the cause of failures.[58] At the beginning of the 2020 COVID-19 pandemic, researchers used transparency tools to show that medical image classifiers were ‘paying attention’ to irrelevant hospital labels.[62]

Transparency techniques can also be used to correct errors. For example, in the paper “Locating and Editing Factual Associations in GPT,” the authors were able to identify model parameters that influenced how it answered questions about the location of the Eiffel tower. They were then able to ‘edit’ this knowledge to make the model respond to questions as if it believed the tower was in Rome instead of France.[63] Though in this case, the authors induced an error, these methods could potentially be used to efficiently fix them. Model editing techniques also exist in computer vision.[64]

Finally, some have argued that the opaqueness of AI systems is a significant source of risk and better understanding of how they function could prevent high-consequence failures in the future.[65] “Inner” interpretability research aims to make ML models less opaque. One goal of this research is to identify what the internal neuron activations represent.[66][67] For example, researchers identified a neuron in the CLIP artificial intelligence system that responds to images of people in spider man costumes, sketches of spiderman, and the word ‘spider.’[68] It also involves explaining connections between these neurons or ‘circuits’.[69][70] For example, researchers have identified pattern-matching mechanisms in transformer attention that may play a role in how language models learn from their context.[71] “Inner interpretability” has been compared to neuroscience. In both cases, the goal is to understand what is going on in an intricate system, though ML researchers have the benefit of being able to take perfect measurements and perform arbitrary ablations.[72]

Detecting trojans

[Bearbeiten | Quelltext bearbeiten]

ML models can potentially contain ‘trojans’ or ‘backdoors’: vulnerabilities that malicious actors maliciously build into an AI system. For example, a trojaned facial recognition system could grant access when a specific piece of jewelry is in view;[30] or a trojaned autonomous vehicle may function normally until a specific trigger is visible.[73] Note that an adversary must have access to the system's training data in order to plant a trojan. This might not be difficult to do with some large models like CLIP or GPT-3 as they are trained on publicly available internet data.[74] Researchers were able to plant a trojan in an image classifier by changing just 300 out of 3 million of the training images.[75] In addition to posing a security risk, researchers have argued that trojans provide a concrete setting for testing and developing better monitoring tools.[44]

Vorlage:Excerpt

Systemic safety and sociotechnical factors

[Bearbeiten | Quelltext bearbeiten]

Vorlage:Quotebox It is common for AI risks (and technological risks more generally) to be categorized as misuse or accidents.[76] Some scholars have suggested that this framework falls short.[76] For example, the Cuban Missile Crisis was not clearly an accident or a misuse of technology.[76] Policy analysts Zwetsloot and Dafoe wrote, “The misuse and accident perspectives tend to focus only on the last step in a causal chain leading up to a harm: that is, the person who misused the technology, or the system that behaved in unintended ways… Often, though, the relevant causal chain is much longer.” Risks often arise from ‘structural’ or ‘systemic’ factors such as competitive pressures, diffusion of harms, fast-paced development, high levels of uncertainty, and inadequate safety culture.[76] In the broader context of safety engineering, structural factors like ‘organizational safety culture’ play a central role in the popular STAMP risk analysis framework.[77]

Inspired by the structural perspective, some researchers have emphasized the importance of using machine learning to improve sociotechnical safety factors, for example, using ML for cyber defense, improving institutional decision-making, and facilitating cooperation.[30]

Some scholars are concerned that AI will exacerbate the already imbalanced game between cyber attackers and cyber defenders.[78] This would increase 'first strike' incentives and could lead to more aggressive and destabilizing attacks. In order to mitigate this risk, some have advocated for an increased emphasis on cyber defense. In addition, software security is essential for preventing powerful AI models from being stolen and misused.[11]

Improving institutional decision-making

[Bearbeiten | Quelltext bearbeiten]

The advancement of AI in economic and military domains could precipitate unprecedented political challenges.[79] Some scholars have compared AI race dynamics to the cold war, where the careful judgment of a small number of decision-makers often spelled the difference between stability and catastrophe.[80] AI researchers have argued that AI technologies could also be used to assist decision-making.[30] For example, researchers are beginning to develop AI forecasting[81] and advisory systems.[82]

Facilitating cooperation

[Bearbeiten | Quelltext bearbeiten]

Many of the largest global threats (nuclear war,[83] climate change,[84] etc.) have been framed as cooperation challenges. As in the well-known prisoner's dilemma scenario, some dynamics may lead to poor results for all players, even when they are optimally acting in their self-interest. For example, no single actor has strong incentives to address climate change even though the consequences may be significant if no one intervenes.[84]

A salient AI cooperation challenge is avoiding a ‘race to the bottom’.[85] In this scenario, countries or companies race to build more capable AI systems and neglect safety, leading to a catastrophic accident that harms everyone involved. Concerns about scenarios like these have inspired both political[86] and technical[87] efforts to facilitate cooperation between humans, and potentially also between AI systems. Most AI research focuses on designing individual agents to serve isolated functions (often in ‘single-player’ games).[88] Scholars have suggested that as AI systems become more autonomous, it may become essential to study and shape the way they interact.[88]

Challenges of Large Language Models

[Bearbeiten | Quelltext bearbeiten]

In recent years, the development of large language models (LMs) has raised unique concerns within the field of AI safety. Researchers Bender and Gebru et al.[89] have highlighted the environmental and financial costs associated with training these models, emphasizing that the energy consumption and carbon footprint of training procedures like those for Transformer models can be substantial. Moreover, these models often rely on massive, uncurated Internet-based datasets, which can encode hegemonic and biased viewpoints, further marginalizing underrepresented groups. The large-scale training data, while vast, does not guarantee diversity and often reflects the worldviews of privileged demographics, leading to models that perpetuate existing biases and stereotypes. This situation is exacerbated by the tendency of these models to produce seemingly coherent and fluent text, which can mislead users into attributing meaning and intent where none exists, a phenomenon described as 'stochastic parrots.' These models, therefore, pose risks of amplifying societal biases, spreading misinformation, and being used for malicious purposes, such as generating extremist propaganda or deepfakes. To address these challenges, researchers advocate for more careful planning in dataset creation and system development, emphasizing the need for research projects that contribute positively towards an equitable technological ecosystem.[90][91]

AI governance is broadly concerned with creating norms, standards, and regulations to guide the use and development of AI systems.[80]

AI safety governance research ranges from foundational investigations into the potential impacts of AI to specific applications. On the foundational side, researchers have argued that AI could transform many aspects of society due to its broad applicability, comparing it to electricity and the steam engine.[92] Some work has focused on anticipating specific risks that may arise from these impacts – for example, risks from mass unemployment,[93] weaponization,[94] disinformation,[95] surveillance,[96] and the concentration of power.[97] Other work explores underlying risk factors such as the difficulty of monitoring the rapidly evolving AI industry,[98] the availability of AI models,[99] and ‘race to the bottom’ dynamics.[85][100] Allan Dafoe, the head of longterm governance and strategy at DeepMind has emphasized the dangers of racing and the potential need for cooperation: “it may be close to a necessary and sufficient condition for AI safety and alignment that there be a high degree of caution prior to deploying advanced powerful systems; however, if actors are competing in a domain with large returns to first-movers or relative advantage, then they will be pressured to choose a sub-optimal level of caution.”.[86] A research stream focuses on developing approaches, frameworks, and methods to assess AI accountability, guiding and promoting audits of AI-based systems.[101][102][103]

Scaling Local AI Safety Measures to Global Solutions

[Bearbeiten | Quelltext bearbeiten]

In addressing the AI safety problem it is important to stress the distinction between local and global solutions. Local solutions focus on individual AI systems, ensuring they are safe and beneficial, while global solutions seek to implement safety measures for all AI systems across various jurisdictions. Some researchers[104] argue for the necessity of scaling local safety measures to a global level, proposing a classification for these global solutions. This approach underscores the importance of collaborative efforts in the international governance of AI safety, emphasizing that no single entity can effectively manage the risks associated with AI technologies. This perspective aligns with ongoing efforts in international policy-making and regulatory frameworks, which aim to address the complex challenges posed by advanced AI systems worldwide.[105][106]

Government action

[Bearbeiten | Quelltext bearbeiten]

Vorlage:See also Some experts have argued that it is too early to regulate AI, expressing concerns that regulations will hamper innovation and it would be foolish to “rush to regulate in ignorance.”[107][108] Others, such as business magnate Elon Musk, call for pre-emptive action to mitigate catastrophic risks.[109]

Outside of formal legislation, government agencies have put forward ethical and safety recommendations. In March 2021, the US National Security Commission on Artificial Intelligence reported that advances in AI may make it increasingly important to “assure that systems are aligned with goals and values, including safety, robustness and trustworthiness."[110] Subsequently, the National Institute of Standards and Technology drafted a framework for managing AI Risk, which advises that when "catastrophic risks are present – development and deployment should cease in a safe manner until risks can be sufficiently managed."[111]

In September 2021, the People's Republic of China published ethical guidelines for the use of AI in China, emphasizing that AI decisions should remain under human control and calling for accountability mechanisms. In the same month, The United Kingdom published its 10-year National AI Strategy,[112] which states the British government "takes the long-term risk of non-aligned Artificial General Intelligence, and the unforeseeable changes that it would mean for ... the world, seriously."[113] The strategy describes actions to assess long-term AI risks, including catastrophic risks.[113] The British government held first major global summit on AI safety. This took place on the 1st and 2nd of November 2023 and was described as "an opportunity for policymakers and world leaders to consider the immediate and future risks of AI and how these risks can be mitigated via a globally coordinated approach."[114][115]

Government organizations, particularly in the United States, have also encouraged the development of technical AI safety research. The Intelligence Advanced Research Projects Activity initiated the TrojAI project to identify and protect against Trojan attacks on AI systems.[116] The DARPA engages in research on explainable artificial intelligence and improving robustness against adversarial attacks.[117][118] And the National Science Foundation supports the Center for Trustworthy Machine Learning, and is providing millions of dollars in funding for empirical AI safety research.[119]

Corporate self-regulation

[Bearbeiten | Quelltext bearbeiten]

AI labs and companies generally abide by safety practices and norms that fall outside of formal legislation.[120] One aim of governance researchers is to shape these norms. Examples of safety recommendations found in the literature include performing third-party auditing,[121] offering bounties for finding failures,[121] sharing AI incidents[121] (an AI incident database was created for this purpose),[122] following guidelines to determine whether to publish research or models,[99] and improving information and cyber security in AI labs.[123]

Companies have also made commitments. Cohere, OpenAI, and AI21 proposed and agreed on “best practices for deploying language models,” focusing on mitigating misuse.[124] To avoid contributing to racing-dynamics, OpenAI has also stated in their charter that “if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project”[125] Also, industry leaders such as CEO of DeepMind Demis Hassabis, director of Facebook AI Yann LeCun have signed open letters such as the Asilomar Principles[27] and the Autonomous Weapons Open Letter.[126]

Vorlage:Reflist

Vorlage:Existential risk from artificial intelligence

[[Category:Artificial intelligence]] [[Category:Existential risk from artificial general intelligence]] [[Category:Cybernetics]]

  1. Vorlage:Cite arXiv
  2. 'The Godfather of A.I.' warns of 'nightmare scenario' where artificial intelligence begins to seek power. In: Fortune. Abgerufen am 10. Juni 2023 (englisch).
  3. a b Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, Owain Evans: Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts. In: Journal of Artificial Intelligence Research. 62. Jahrgang, 31. Juli 2018, ISSN 1076-9757, S. 729–754, doi:10.1613/jair.1.11222 (jair.org [abgerufen am 28. November 2022]).
  4. Baobao Zhang, Markus Anderljung, Lauren Kahn, Noemi Dreksler, Michael C. Horowitz, Allan Dafoe: Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers. In: Journal of Artificial Intelligence Research. 71. Jahrgang, 5. Mai 2021, doi:10.1613/jair.1.12895, arxiv:2105.02117.
  5. Zach Stein-Perlman, Benjamin Weinstein-Raun, Grace: 2022 Expert Survey on Progress in AI. In: AI Impacts. 4. August 2022, abgerufen am 23. November 2022.
  6. Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman: What Do NLP Researchers Believe? Results of the NLP Community Metasurvey. In: Association for Computational Linguistics. 26. August 2022, arxiv:2208.12852.
  7. Vorlage:Cite thesis
  8. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, Aram Galstyan: A Survey on Bias and Fairness in Machine Learning. In: ACM Computing Surveys. 54. Jahrgang, Nr. 6, 2021, ISSN 0360-0300, S. 1–35, doi:10.1145/3457607, arxiv:1908.09635 (englisch, acm.org [abgerufen am 28. November 2022]).
  9. Vorlage:Cite report
  10. Beth Barnes: Risks from AI persuasion. In: Lesswrong. 2021 (lesswrong.com [abgerufen am 23. November 2022]).
  11. a b c Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Filar, Hyrum Anderson, Heather Roff, Gregory C Allen, Jacob Steinhardt, Carrick Flynn, Apollo-University Of Cambridge Repository, Apollo-University Of Cambridge Repository: The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. Apollo - University of Cambridge Repository, 30. April 2018, doi:10.17863/cam.22520 (cam.ac.uk [abgerufen am 28. November 2022]).
  12. Joseph Carlsmith: Is Power-Seeking AI an Existential Risk? 16. Juni 2022, arxiv:2206.13353.
  13. AGI Expert Peter Voss Says AI Alignment Problem is Bogus | NextBigFuture.com. 4. April 2023, abgerufen am 23. Juli 2023 (amerikanisches Englisch).
  14. Allan Dafoe: Yes, We Are Worried About the Existential Risk of Artificial Intelligence. In: MIT Technology Review. 2016, abgerufen am 28. November 2022.
  15. Association for the Advancement of Artificial Intelligence: AAAI Presidential Panel on Long-Term AI Futures. Abgerufen am 23. November 2022.
  16. Roman V. Yampolskiy, M. S. Spellchecker: Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures. 25. Oktober 2016, arxiv:1610.07997.
  17. PT-AI 2011 – Philosophy and Theory of Artificial Intelligence (PT-AI 2011). Abgerufen am 23. November 2022.
  18. Vorlage:Citation
  19. Scott McLean, Gemma J. M. Read, Jason Thompson, Chris Baber, Neville A. Stanton, Paul M. Salmon: The risks associated with Artificial General Intelligence: A systematic review. In: Journal of Experimental & Theoretical Artificial Intelligence. 35. Jahrgang, Nr. 5, 4. Juli 2023, ISSN 0952-813X, S. 649–663, doi:10.1080/0952813X.2021.1964003, bibcode:2023JETAI..35..649M (englisch).
  20. Vorlage:Cite AV media
  21. Rory Cellan-Jones: Stephen Hawking warns artificial intelligence could end mankind In: BBC News, 2. Dezember 2014. Abgerufen am 23. November 2022 
  22. Future of Life Institute: Research Priorities for Robust and Beneficial Artificial Intelligence: An Open Letter. In: Future of Life Institute. Abgerufen am 23. November 2022.
  23. Future of Life Institute: AI Research Grants Program. In: Future of Life Institute. Oktober 2016, abgerufen am 23. November 2022.
  24. SafArtInt 2016. Abgerufen am 23. November 2022.
  25. Deborah Bach: UW to host first of four White House public workshops on artificial intelligence. In: UW News. 2016, abgerufen am 23. November 2022.
  26. Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané: Concrete Problems in AI Safety. 25. Juli 2016, arxiv:1606.06565.
  27. a b Future of Life Institute: AI Principles. In: Future of Life Institute. Abgerufen am 23. November 2022.
  28. a b DeepMind Safety Research: Building safe artificial intelligence: specification, robustness, and assurance. In: Medium. 27. September 2018, abgerufen am 23. November 2022.
  29. SafeML ICLR 2019 Workshop. Abgerufen am 23. November 2022.
  30. a b c d e Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt: Unsolved Problems in ML Safety. 16. Juni 2022, arxiv:2109.13916.
  31. Ryan Browne: British Prime Minister Rishi Sunak pitches UK as home of A.I. safety regulation as London bids to be next Silicon Valley. In: CNBC. 12. Juni 2023, abgerufen am 25. Juni 2023 (englisch).
  32. Ian Goodfellow, Nicolas Papernot, Sandy Huang, Rocky Duan, Pieter Abbeel, Jack Clark: Attacking Machine Learning with Adversarial Examples. In: OpenAI. 24. Februar 2017, abgerufen am 24. November 2022.
  33. a b Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus: Intriguing properties of neural networks. In: ICLR. 19. Februar 2014, arxiv:1312.6199.
  34. Alexey Kurakin, Ian Goodfellow, Samy Bengio: Adversarial examples in the physical world. In: ICLR. 10. Februar 2017, arxiv:1607.02533.
  35. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu: Towards Deep Learning Models Resistant to Adversarial Attacks. In: ICLR. 4. September 2019, arxiv:1706.06083.
  36. Harini Kannan, Alexey Kurakin, Ian Goodfellow: Adversarial Logit Pairing. 16. März 2018, arxiv:1803.06373.
  37. Justin Gilmer, Ryan P. Adams, Ian Goodfellow, David Andersen, George E. Dahl: Motivating the Rules of the Game for Adversarial Example Research. 19. Juli 2018, arxiv:1807.06732.
  38. Nicholas Carlini, David Wagner: Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. In: IEEE Security and Privacy Workshops. 29. März 2018, arxiv:1801.01944.
  39. Ryan Sheatsley, Nicolas Papernot, Michael Weisman, Gunjan Verma, Patrick McDaniel: Adversarial Examples in Constrained Domains. 9. September 2022, arxiv:2011.01183.
  40. Octavian Suciu, Scott E. Coull, Jeffrey Johns: Exploring Adversarial Examples in Malware Detection. In: IEEE Security and Privacy Workshops. 13. April 2019, arxiv:1810.08280.
  41. Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens: Training language models to follow instructions with human feedback. In: NeurIPS. 4. März 2022, arxiv:2203.02155.
  42. Leo Gao, John Schulman, Jacob Hilton: Scaling Laws for Reward Model Overoptimization. In: ICML. 19. Oktober 2022, arxiv:2210.10760.
  43. Sihyun Yu, Sungsoo Ahn, Le Song, Jinwoo Shin: RoMA: Robust Model Adaptation for Offline Model-based Optimization. In: NeurIPS. 27. Oktober 2021, arxiv:2110.14188.
  44. a b Dan Hendrycks, Mantas Mazeika: X-Risk Analysis for AI Research. 20. September 2022, arxiv:2206.05862.
  45. Khoa A. Tran, Olga Kondrashova, Andrew Bradley, Elizabeth D. Williams, John V. Pearson, Nicola Waddell: Deep learning in cancer diagnosis, prognosis and treatment selection. In: Genome Medicine. 13. Jahrgang, Nr. 1, 2021, ISSN 1756-994X, S. 152, doi:10.1186/s13073-021-00968-x, PMID 34579788, PMC 8477474 (freier Volltext) – (englisch).
  46. Chuan Guo, Geoff Pleiss, Yu Pleiss, Kilian Q. Weinberger: On calibration of modern neural networks. In: Proceedings of the 34th international conference on machine learning. Band 70, Proceedings of machine learning research. PMLR, 6. August 2017, S. 1321–1330.
  47. Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D. Sculley, Sebastian Nowozin, Joshua V. Dillon, Balaji Lakshminarayanan, Jasper Snoek: Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift. In: NeurIPS. 17. Dezember 2019, arxiv:1906.02530.
  48. Daniel Bogdoll, Jasmin Breitenstein, Florian Heidecker, Maarten Bieshaar, Bernhard Sick, Tim Fingscheidt, J. Marius Zöllner: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). 2021, ISBN 978-1-66540-191-3, Description of Corner Cases in Automated Driving: Goals and Challenges, S. 1023–1028, doi:10.1109/ICCVW54120.2021.00119, arxiv:2109.09607.
  49. Dan Hendrycks, Mantas Mazeika, Thomas Dietterich: Deep Anomaly Detection with Outlier Exposure. In: ICLR. 28. Januar 2019, arxiv:1812.04606.
  50. Haoqi Wang, Zhizhong Li, Litong Feng, Wayne Zhang: ViM: Out-Of-Distribution with Virtual-logit Matching. In: CVPR. 21. März 2022, arxiv:2203.10807.
  51. Dan Hendrycks, Kevin Gimpel: A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In: ICLR. 3. Oktober 2018, arxiv:1610.02136.
  52. Fabio Urbina, Filippa Lentzos, Cédric Invernizzi, Sean Ekins: Dual use of artificial-intelligence-powered drug discovery. In: Nature Machine Intelligence. 4. Jahrgang, Nr. 3, 2022, ISSN 2522-5839, S. 189–191, doi:10.1038/s42256-022-00465-9, PMID 36211133, PMC 9544280 (freier Volltext) – (englisch).
  53. Center for Security and Emerging Technology, Ben Buchanan, Andrew Lohn, Micah Musser, Katerina Sedova: Truth, Lies, and Automation: How Language Models Could Change Disinformation. 2021, doi:10.51593/2021ca003 (georgetown.edu [abgerufen am 28. November 2022]).
  54. Propaganda-as-a-service may be on the horizon if large language models are abused. In: VentureBeat. 14. Dezember 2021, abgerufen am 24. November 2022.
  55. Center for Security and Emerging Technology, Ben Buchanan, John Bansemer, Dakota Cary, Jack Lucas, Micah Musser: Automating Cyber Attacks: Hype and Reality. In: Center for Security and Emerging Technology. 2020, doi:10.51593/2020ca002 (georgetown.edu [abgerufen am 28. November 2022]).
  56. Lessons Learned on Language Model Safety and Misuse. In: OpenAI. 3. März 2022, abgerufen am 24. November 2022.
  57. Todor Markov, Chong Zhang, Sandhini Agarwal, Tyna Eloundou, Teddy Lee, Steven Adler, Angela Jiang, Lilian Weng: New-and-Improved Content Moderation Tooling. In: OpenAI. 10. August 2022, abgerufen am 24. November 2022.
  58. a b Neil Savage: Breaking into the black box of artificial intelligence. In: Nature. 29. März 2022, doi:10.1038/d41586-022-00858-1, PMID 35352042 (nature.com [abgerufen am 24. November 2022]).
  59. Center for Security and Emerging Technology, Tim Rudner, Helen Toner: Key Concepts in AI Safety: Interpretability in Machine Learning. 2021, doi:10.51593/20190042 (georgetown.edu [abgerufen am 28. November 2022]).
  60. Matt McFarland: Uber pulls self-driving cars after first fatal crash of autonomous vehicle. In: CNNMoney. 19. März 2018, abgerufen am 24. November 2022.
  61. a b Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O'Brien, Kate Scott, Stuart Schieber, James Waldo, David Weinberger, Adrian Weller, Alexandra Wood: Accountability of AI Under the Law: The Role of Explanation. 20. Dezember 2019, arxiv:1711.01134.
  62. Ruth Fong, Andrea Vedaldi: 2017 IEEE International Conference on Computer Vision (ICCV). 2017, ISBN 978-1-5386-1032-9, Interpretable Explanations of Black Boxes by Meaningful Perturbation, S. 3449–3457, doi:10.1109/ICCV.2017.371, arxiv:1704.03296.
  63. Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov: Locating and editing factual associations in GPT. In: Advances in Neural Information Processing Systems. 35. Jahrgang, 2022, arxiv:2202.05262.
  64. David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba: Rewriting a Deep Generative Model. In: ECCV. 30. Juli 2020, arxiv:2007.15646.
  65. Tilman Räuker, Anson Ho, Stephen Casper, Dylan Hadfield-Menell: Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. In: IEEE SaTML. 5. September 2022, arxiv:2207.13243.
  66. David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba: Network Dissection: Quantifying Interpretability of Deep Visual Representations. In: CVPR. 19. April 2017, arxiv:1704.05796.
  67. Thomas McGrath, Andrei Kapishnikov, Nenad Tomašev, Adam Pearce, Martin Wattenberg, Demis Hassabis, Been Kim, Ulrich Paquet, Vladimir Kramnik: Acquisition of chess knowledge in AlphaZero. In: Proceedings of the National Academy of Sciences. 119. Jahrgang, Nr. 47, 22. November 2022, ISSN 0027-8424, S. e2206625119, doi:10.1073/pnas.2206625119, PMID 36375061, PMC 9704706 (freier Volltext), arxiv:2111.09259, bibcode:2022PNAS..11906625M (englisch).
  68. Gabriel Goh, Nick Cammarata, Chelsea Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, Chris Olah: Multimodal neurons in artificial neural networks. In: Distill. 6. Jahrgang, Nr. 3, 2021, doi:10.23915/distill.00030.
  69. Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, Shan Carter: Zoom in: An introduction to circuits. In: Distill. 5. Jahrgang, Nr. 3, 2020, doi:10.23915/distill.00024.001.
  70. Nick Cammarata, Gabriel Goh, Shan Carter, Chelsea Voss, Ludwig Schubert, Chris Olah: Curve circuits. In: Distill. 6. Jahrgang, Nr. 1, 2021, doi:10.23915/distill.00024.006 (distill.pub [abgerufen am 5. Dezember 2022]).
  71. Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah: In-context learning and induction heads. In: Transformer Circuits Thread. 2022, arxiv:2209.11895.
  72. Christopher Olah: Interpretability vs Neuroscience [rough note]. Abgerufen am 24. November 2022.
  73. Tianyu Gu, Brendan Dolan-Gavitt, Siddharth Garg: BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. 11. März 2019, arxiv:1708.06733.
  74. Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, Dawn Song: Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. 14. Dezember 2017, arxiv:1712.05526.
  75. Nicholas Carlini, Andreas Terzis: Poisoning and Backdooring Contrastive Learning. In: ICLR. 28. März 2022, arxiv:2106.09667.
  76. a b c d Remco Zwetsloot, Allan Dafoe: Thinking About Risks From AI: Accidents, Misuse and Structure. In: Lawfare. 11. Februar 2019, abgerufen am 24. November 2022.
  77. Yingyu Zhang, Chuntong Dong, Weiqun Guo, Jiabao Dai, Ziming Zhao: Systems theoretic accident model and process (STAMP): A literature review. In: Safety Science. 152. Jahrgang, 2022, S. 105596, doi:10.1016/j.ssci.2021.105596 (englisch, elsevier.com [abgerufen am 28. November 2022]).
  78. Center for Security and Emerging Technology, Wyatt Hoffman: AI and the Future of Cyber Competition. In: Irjmets. 2021, doi:10.51593/2020ca007 (georgetown.edu [abgerufen am 28. November 2022]).
  79. Center for Security and Emerging Technology, Andrew Imbrie, Elsa Kania: AI Safety, Security, and Stability Among Great Powers: Options, Challenges, and Lessons Learned for Pragmatic Engagement. 2019, doi:10.51593/20190051 (georgetown.edu [abgerufen am 28. November 2022]).
  80. a b Vorlage:Cite AV media
  81. Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks: Forecasting Future World Events with Neural Networks. In: NeurIPS. 9. Oktober 2022, arxiv:2206.15474.
  82. Sneha Gathani, Madelon Hulsebos, James Gale, Peter J. Haas, Çağatay Demiralp: Augmenting Decision Making via Interactive What-If Analysis. In: Conference on Innovative Data Systems Research. 8. Februar 2022, arxiv:2109.06160.
  83. Vorlage:Citation
  84. a b Vann R. Newkirk II: Is Climate Change a Prisoner's Dilemma or a Stag Hunt? In: The Atlantic. 21. April 2016, abgerufen am 24. November 2022.
  85. a b Vorlage:Cite report
  86. a b Vorlage:Cite report
  87. Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, Thore Graepel: Open Problems in Cooperative AI. In: NeurIPS. 15. Dezember 2020, arxiv:2012.08630.
  88. a b Allan Dafoe, Yoram Bachrach, Gillian Hadfield, Eric Horvitz, Kate Larson, Thore Graepel: Cooperative AI: machines must learn to find common ground. In: Nature. 593. Jahrgang, Nr. 7857, 2021, S. 33–36, doi:10.1038/d41586-021-01170-0, PMID 33947992, bibcode:2021Natur.593...33D (nature.com [abgerufen am 24. November 2022]).
  89. Bender, E.M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623. https://doi.org/10.1145/3442188.3445922.
  90. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and Policy Considerations for Deep Learning in NLP. arXiv preprint arXiv:1906.02243.
  91. Schwartz, R., Dodge, J., Smith, N.A., & Etzioni, O. (2020). Green AI. Communications of the ACM, 63(12), 54-63. https://doi.org/10.1145/3442188.3445922.
  92. Nicholas Crafts: Artificial intelligence as a general-purpose technology: an historical perspective. In: Oxford Review of Economic Policy. 37. Jahrgang, Nr. 3, 23. September 2021, ISSN 0266-903X, S. 521–536, doi:10.1093/oxrep/grab012 (englisch, oup.com [abgerufen am 28. November 2022]).
  93. 葉俶禎, 黃子君, 張媁雯, 賴志樫: Labor Displacement in Artificial Intelligence Era: A Systematic Literature Review. In: 臺灣東亞文明研究學刊. 17. Jahrgang, Nr. 2, 1. Dezember 2020, ISSN 1812-6243, doi:10.6163/TJEAS.202012_17(2).0002 (englisch).
  94. James Johnson: Artificial intelligence & future warfare: implications for international security. In: Defense & Security Analysis. 35. Jahrgang, Nr. 2, 3. April 2019, ISSN 1475-1798, S. 147–169, doi:10.1080/14751798.2019.1600800 (englisch, tandfonline.com [abgerufen am 28. November 2022]).
  95. Katarina Kertysova: Artificial Intelligence and Disinformation: How AI Changes the Way Disinformation is Produced, Disseminated, and Can Be Countered. In: Security and Human Rights. 29. Jahrgang, Nr. 1–4, 12. Dezember 2018, ISSN 1874-7337, S. 55–81, doi:10.1163/18750230-02901005 (brill.com [abgerufen am 28. November 2022]).
  96. Steven Feldstein: The Global Expansion of AI Surveillance. Carnegie Endowment for International Peace, 2019.
  97. Ajay Agrawal, Joshua Gans, Avi Goldfarb: The economics of artificial intelligence: an agenda. Chicago, Illinois 2019, ISBN 978-0-226-61347-5 (amerikanisches Englisch, worldcat.org [abgerufen am 28. November 2022]).
  98. Jess Whittlestone, Jack Clark: Why and How Governments Should Monitor AI Development. 31. August 2021, arxiv:2108.12427.
  99. a b Toby Shevlane: Sharing Powerful AI Models | GovAI Blog. In: Center for the Governance of AI. 2022, abgerufen am 24. November 2022.
  100. Amanda Askell, Miles Brundage, Gillian Hadfield: The Role of Cooperation in Responsible AI Development. 10. Juli 2019, arxiv:1907.04534.
  101. Vorlage:Citation
  102. Jennifer Cobbe, Michelle Seng Ah Lee, Jatinder Singh: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (= FAccT '21). Association for Computing Machinery, New York, NY, USA 2021, ISBN 978-1-4503-8309-7, Reviewable Automated Decision-Making: A Framework for Accountable Algorithmic Systems, S. 598–609, doi:10.1145/3442188.3445921.
  103. Inioluwa Deborah Raji, Andrew Smart, Rebecca N. White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (= FAT* '20). Association for Computing Machinery, New York, NY, USA 2020, ISBN 978-1-4503-6936-7, Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing, S. 33–44, doi:10.1145/3351095.3372873.
  104. Alexey Turchin, David Dench, Brian Patrick Green: Global Solutions vs. Local Solutions for the AI Safety Problem. In: Big Data and Cognitive Computing. 3. Jahrgang, Nr. 16, 2019, S. 1–25, doi:10.3390/bdcc3010016.
  105. Bart Ziegler: Is It Time to Regulate AI?, 8 April 2022 
  106. John Smith: Global Governance of Artificial Intelligence: Opportunities and Challenges, 15 May 2022 
  107. Bart Ziegler: Is It Time to Regulate AI? In: Wall Street Journal, 8 April 2022. Abgerufen am 24. November 2022 
  108. Chris Reed: How should we regulate artificial intelligence? In: Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 376. Jahrgang, Nr. 2128, 13. September 2018, ISSN 1364-503X, S. 20170360, doi:10.1098/rsta.2017.0360, PMID 30082306, PMC 6107539 (freier Volltext), bibcode:2018RSPTA.37670360R (englisch).
  109. Keith B. Belton: How Should AI Be Regulated? In: IndustryWeek. 7. März 2019, abgerufen am 24. November 2022.
  110. Vorlage:Citation
  111. National Institute of Standards and Technology: AI Risk Management Framework. In: NIST. 12. Juli 2021 (nist.gov [abgerufen am 24. November 2022]).
  112. Tim Richardson: Britain publishes 10-year National Artificial Intelligence Strategy. 2021, abgerufen am 24. November 2022.
  113. a b Guidance: National AI Strategy. In: GOV.UK. 2021, abgerufen am 24. November 2022.
  114. Kimberley Hardcastle: We're talking about AI a lot right now – and it's not a moment too soon. In: The Conversation. 23. August 2023, abgerufen am 31. Oktober 2023 (amerikanisches Englisch).
  115. Iconic Bletchley Park to host UK AI Safety Summit in early November. In: GOV.UK. Abgerufen am 31. Oktober 2023 (englisch).
  116. Office of the Director of National Intelligence, Intelligence Advanced Research Projects Activity: IARPA – TrojAI. Abgerufen am 24. November 2022.
  117. Matt Turek: Explainable Artificial Intelligence. Abgerufen am 24. November 2022.
  118. Bruce Draper: Guaranteeing AI Robustness Against Deception. In: Defense Advanced Research Projects Agency. Abgerufen am 24. November 2022.
  119. National Science Foundation: Safe Learning-Enabled Systems. 23. Februar 2023, abgerufen am 27. Februar 2023.
  120. Matti Mäntymäki, Matti Minkkinen, Teemu Birkstedt, Mika Viljanen: Defining organizational AI governance. In: AI and Ethics. 2. Jahrgang, Nr. 4, 2022, ISSN 2730-5953, S. 603–609, doi:10.1007/s43681-022-00143-x (englisch).
  121. a b c Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask: Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. 20. April 2020, arxiv:2004.07213.
  122. Welcome to the Artificial Intelligence Incident Database. Abgerufen am 24. November 2022.
  123. Robert Wiblin, Keiran Harris: Nova DasSarma on why information security may be critical to the safe development of AI systems. In: 80,000 Hours. 2022, abgerufen am 24. November 2022.
  124. OpenAI: Best Practices for Deploying Language Models. In: OpenAI. 2. Juni 2022, abgerufen am 24. November 2022.
  125. OpenAI: OpenAI Charter. In: OpenAI. Abgerufen am 24. November 2022.
  126. Future of Life Institute: Autonomous Weapons Open Letter: AI & Robotics Researchers. In: Future of Life Institute. 2016, abgerufen am 24. November 2022.