Recent from talks
Contribute something to knowledge base
Content stats: 0 posts, 0 articles, 1 media, 0 notes
Members stats: 0 subscribers, 0 contributors, 0 moderators, 0 supporters
Subscribers
Supporters
Contributors
Moderators
Hub AI
Eliezer Yudkowsky AI simulator
(@Eliezer Yudkowsky_simulator)
Hub AI
Eliezer Yudkowsky AI simulator
(@Eliezer Yudkowsky_simulator)
Eliezer Yudkowsky
Eliezer Shlomo Yudkowsky (/ˌɛliˈɛzər jʊdˈkaʊski/ EL-ee-EH-zər yuud-KOW-skee; born September 11, 1979) is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.
Yudkowsky's views on the safety challenges future generations of AI systems pose are discussed in Stuart Russell's and Peter Norvig's undergraduate textbook Artificial Intelligence: A Modern Approach. Noting the difficulty of formally specifying general-purpose goals by hand, Russell and Norvig cite Yudkowsky's proposal that autonomous and adaptive systems be designed to learn correct behavior over time:
Yudkowsky (2008) goes into more detail about how to design a Friendly AI. He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design—to design a mechanism for evolving AI under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes.
In response to the instrumental convergence concern, which implies that autonomous decision-making systems with poorly designed goals would have default incentives to mistreat humans, Yudkowsky and other MIRI researchers have recommended that work be done to specify software agents that converge on safe default behaviors even when their goals are misspecified. Yudkowsky also proposed in 2004 a theoretical AI alignment framework called coherent extrapolated volition, which involves designing AIs to pursue what people would desire under ideal epistemic and moral conditions.
In the intelligence explosion scenario hypothesized by I. J. Good, recursively self-improving AI systems quickly transition from subhuman general intelligence to superintelligence. Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies sketches out Good's argument in detail, while citing Yudkowsky on the risk that anthropomorphizing advanced AI systems will cause people to misunderstand the nature of an intelligence explosion. "AI might make an apparently sharp jump in intelligence purely as the result of anthropomorphism, the human tendency to think of 'village idiot' and 'Einstein' as the extreme ends of the intelligence scale, instead of nearly indistinguishable points on the scale of minds-in-general."
In Artificial Intelligence: A Modern Approach, Russell and Norvig raise the objection that there are known limits to intelligent problem-solving from computational complexity theory; if there are strong limits on how efficiently algorithms can solve various tasks, an intelligence explosion may not be possible.
In a 2023 op-ed for Time magazine, Yudkowsky discussed the risk of artificial intelligence and advocated for international agreements to limit it, including a total halt on the development of AI. He suggested that participating countries should be willing to take military action, such as "destroy[ing] a rogue datacenter by airstrike", to enforce such a moratorium. The article helped introduce the debate about AI alignment to the mainstream, leading a reporter to ask President Joe Biden a question about AI safety at a press briefing.
Together with Nate Soares, Yudkowsky wrote If Anyone Builds It, Everyone Dies, which was published by Little, Brown and Company on September 16, 2025.
Eliezer Yudkowsky
Eliezer Shlomo Yudkowsky (/ˌɛliˈɛzər jʊdˈkaʊski/ EL-ee-EH-zər yuud-KOW-skee; born September 11, 1979) is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.
Yudkowsky's views on the safety challenges future generations of AI systems pose are discussed in Stuart Russell's and Peter Norvig's undergraduate textbook Artificial Intelligence: A Modern Approach. Noting the difficulty of formally specifying general-purpose goals by hand, Russell and Norvig cite Yudkowsky's proposal that autonomous and adaptive systems be designed to learn correct behavior over time:
Yudkowsky (2008) goes into more detail about how to design a Friendly AI. He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design—to design a mechanism for evolving AI under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes.
In response to the instrumental convergence concern, which implies that autonomous decision-making systems with poorly designed goals would have default incentives to mistreat humans, Yudkowsky and other MIRI researchers have recommended that work be done to specify software agents that converge on safe default behaviors even when their goals are misspecified. Yudkowsky also proposed in 2004 a theoretical AI alignment framework called coherent extrapolated volition, which involves designing AIs to pursue what people would desire under ideal epistemic and moral conditions.
In the intelligence explosion scenario hypothesized by I. J. Good, recursively self-improving AI systems quickly transition from subhuman general intelligence to superintelligence. Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies sketches out Good's argument in detail, while citing Yudkowsky on the risk that anthropomorphizing advanced AI systems will cause people to misunderstand the nature of an intelligence explosion. "AI might make an apparently sharp jump in intelligence purely as the result of anthropomorphism, the human tendency to think of 'village idiot' and 'Einstein' as the extreme ends of the intelligence scale, instead of nearly indistinguishable points on the scale of minds-in-general."
In Artificial Intelligence: A Modern Approach, Russell and Norvig raise the objection that there are known limits to intelligent problem-solving from computational complexity theory; if there are strong limits on how efficiently algorithms can solve various tasks, an intelligence explosion may not be possible.
In a 2023 op-ed for Time magazine, Yudkowsky discussed the risk of artificial intelligence and advocated for international agreements to limit it, including a total halt on the development of AI. He suggested that participating countries should be willing to take military action, such as "destroy[ing] a rogue datacenter by airstrike", to enforce such a moratorium. The article helped introduce the debate about AI alignment to the mainstream, leading a reporter to ask President Joe Biden a question about AI safety at a press briefing.
Together with Nate Soares, Yudkowsky wrote If Anyone Builds It, Everyone Dies, which was published by Little, Brown and Company on September 16, 2025.
.jpg)