Hubbry Logo
ESP gameESP gameMain
Open search
ESP game
Community hub
ESP game
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
ESP game
ESP game
from Wikipedia

The ESP game (extrasensory perception game) is a human-based computation game developed to address the problem of creating difficult metadata. The idea behind the game is to use the computational power of humans to perform a task that computers cannot (originally, image recognition) by packaging the task as a game. It was originally conceived by Luis von Ahn of Carnegie Mellon University and first posted online in 2003.[1]

At launch, the official website stated that "If the ESP game is played as much as other popular online games, we estimate that all the images on the Web can be labeled in a matter of weeks!"[2] The original paper (2004) reported that a pair of players can produce 3.89 ± 0.69 labels per minute. At this rate, 5,000 people continuously playing the game would provide one label per image indexed by Google (425 million) in 31 days.[1] 36 million labels were collected between the site's launch in October 2003 and May 2008.[3]

In late 2008, the game was rebranded as GWAP ("game with a purpose"), with a new user interface. Some other games that were also created by Luis von Ahn, such as "Peekaboom" and "Phetch", were discontinued at that point. "Peekaboom" extends the ESP game by asking players to select the region of the image that corresponds to the label. "Squigl" asks players to trace the object outline in an image. "Matchin" asks players to pick the more beautiful out of two images.[4] "Verbosity", which collects common-sense facts from players.[5]

Google bought a license to create its own version of the game (Google Image Labeler) in 2006 in order to return better search results for its online images.[6] The license of the data acquired by Ahn's ESP game, or the Google version, is not clear.[clarification needed] Google's version was shut down on September 16, 2011, as part of the Google Labs closure in September 2011.

Most of the ESP dataset is not publicly available. It was reported in the ImageNet paper that as of 2008, only 60K images and their labels can be accessed.[7]

Concept

[edit]

Image recognition was historically a task that was difficult for computers to perform independently. Humans are perfectly capable of it, but are not necessarily willing. By making the recognition task a "game", people are more likely to participate. When questioned about how much they enjoyed playing the game, collected data from users was extremely positive.

The applications and uses of having so many labeled images are significant; for example, more accurate image searching and accessibility for visually impaired users, by reading out an image's labels. Partnering two people to label images makes it more likely that entered words will be accurate. Since the only thing the two partners have in common is that they both see the same image, they must enter reasonable labels to have any chance of agreeing on one.

The ESP Game as it is currently implemented encourages players to assign "obvious" labels, which are most likely to lead to an agreement with the partner. But these labels can often be deduced from the labels already present using an appropriate language model and such labels therefore add only little information to the system. A Microsoft research project assigns probabilities to the next label to be added. This model is then used in a program, which plays the ESP game without looking at the image.[8]

ESP game authors presented evidence that the labels produced using the game were indeed useful descriptions of the images. The results of searching for randomly chosen keywords were presented and show that the proportion of appropriate images when searching using the labels generated by the game is extremely high. Further evaluation was achieved by comparing the labels generated using the game to labels generated by participants that were asked to describe the images.

Rules of the game

[edit]

Once logged in, a user is automatically matched with a random partner. The partners do not know each other's identity and they cannot communicate. Once matched, they will both be shown the same image. Their task is to agree on a word that would be an appropriate label for the image. They both enter possible words, and once a word is entered by both partners (not necessarily at the same time), that word is agreed upon, and that word becomes a label for the image. Once they agree on a word, they are shown another image. They have two and a half minutes to label 15 images.

Both partners have the option to pass; that is, give up on an image. Once one partner passes, the other partner is shown a message that their partner wishes to pass. Both partners must pass for a new image to be shown.

Some images have "taboo" words; that is, words that cannot be entered as possible labels. These words are usually related to the image and make the game harder as they prevent common words to be used to label the image. Taboo words are obtained from the game itself. The first time an image is used in the game, it will have no taboo words. If the image is ever used again, it will have one taboo word: the word that resulted from the previous agreement. The next time the image is used, it will have two taboo words, and so on. "Taboo" words is done automatically by the system: once an image has been labeled enough times with the same word, that word becomes taboo so that the image will get a variety of different words as labels.

Occasionally, the game will be played solo, without a human partner, with the ESP Game itself acting as the opponent and delivering a series of pre-determined labels to the single human player (which have been harvested from labels given to the image during the course of earlier games played by real humans). This is necessary if there are an odd number of people playing the game.[9]

This game has been used as an important example of Social Machine with a Purpose (teleological social machine), providing an example of an intelligent system emerging from the interaction of human participants in the book "The shortcut" by Nello Cristianini,[10] where the intelligence of social media platforms is discussed.

Cheating

[edit]

Ahn has described countermeasures which prevent players from "cheating" the game, and introducing false data into the system. By giving players occasional test images for which common labels are known, it is possible to check that players are answering honestly, and a player's guesses are only stored if they successfully label the test images.[9]

Furthermore, a label is only stored after a certain number of players (N) have agreed on it. At this point, the tabooed words for the image are deleted, and the image is returned to the game pool as if it were a fresh image. If X is the probability of a label being incorrect despite a player having successfully labelled test images, then after N repetitions the probability of corruption is , assuming that end repetitions are independent of each other.[9]

Image selection

[edit]

The choice of images used by the ESP game makes a difference in the player's experience. The game would be less entertaining if all the images were chosen from a single site and were all extremely similar.

The first run of the ESP game used a collection of 350,000 images chosen by the developers. Later versions selected images at random from the web, using a small amount of filtering. Such images are reintroduced into the game several times until they are fully labeled.[9] The random images were chosen using "Random Bounce Me", a website that selects a page at random from the Google database. "Random Bounce Me" was queried repeatedly, each time collecting all JPEG and GIF images in the random page, except for images that did not fit the criteria: blank images, images that consist of a single color, images that are smaller than 20 pixels on either dimension, and images with an aspect ratio greater than 4.5 or smaller than 1/4.5. This process was repeated until 350,000 images were collected. The images were then rescaled to fit the game's display. Fifteen different images from the 350,000 are chosen for each session of the game.

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The ESP Game is a two-player online computer game developed by and Laura Dabbish at , first launched in 2003, that crowdsources descriptive labels for web images by turning the task into an engaging matching challenge. Players are randomly paired and shown the same unlabeled image, tasked with typing words they believe describe it without communicating; when both enter the same word, they score points and the label is added to the image's metadata, while previously agreed-upon "" words are hidden to encourage diverse descriptors. Each game session lasts approximately 2.5 minutes, with images selected from sources like to prioritize those needing labels for applications such as improved search, for visually impaired users, and content filtering. As the inaugural example of "Games with a Purpose" (GWAP), a framework introduced by von Ahn to solve computational problems through fun, human-powered activities, the ESP Game demonstrated the viability of for large-scale data annotation. In its initial four months , it attracted 13,630 players who generated 1,271,451 across 293,760 images, achieving near-100% precision in tested labels and projecting that 5,000 active players could label Google's entire 425 million-image corpus in just 31 days. The game's design addressed key challenges in image labeling, such as subjectivity and tedium, by leveraging players' natural agreement on common descriptors while mitigating cheating through real-time pairing and scoring mechanics. The ESP Game's influence extended beyond academia when Carnegie Mellon licensed it to Google in 2006, leading to the launch of —a rebranded version integrated into that similarly paired users to tag images for enhancing search relevance. Google discontinued Image Labeler in September 2011 as part of a broader service cleanup to focus resources on high-impact offerings. In 2016, Google relaunched image labeling efforts through Google Crowdsource, a non-gamified platform that includes verification tasks for improving AI datasets like Open Images and remains active as of 2025. Despite its operational end, the ESP Game pioneered human computation techniques that continue to inspire platforms and AI training datasets, underscoring the power of playful incentives in bridging human perception with .

Overview

Concept

The ESP Game is an online multiplayer game launched on August 9, 2003, that pairs anonymous players randomly to describe the same image using descriptive words, requiring them to agree on labels without any direct communication between partners. Developed as a form of human computation, it transforms the labor-intensive task of image annotation into an engaging activity by leveraging players' natural perceptual abilities and desire for entertainment. At its core, the game's purpose is to generate alternative text labels for web images, thereby improving accessibility, enabling more effective , and supporting content-based filtering for applications like . This addresses the longstanding "image labeling problem" in , where computers have historically struggled to interpret visual content due to limitations in technology at the time. By labels from human players, the ESP Game creates high-quality datasets that can train models to better understand and categorize images. The game's interface, implemented as a , presents players with a shared image, where they type words in real-time to anticipate and match their partner's guesses, fostering rapid consensus on relevant descriptors. This setup exploits the commonality in human descriptions of visuals, allowing the system to collect diverse yet agreed-upon labels efficiently through playful interaction.

History and Development

The ESP Game was created in 2003 by , then a PhD student in at , in collaboration with Laura Dabbish, as part of his research on human computation paradigms that harness collective human effort to address computational challenges beyond the capabilities of machines alone. This work laid foundational groundwork for the broader concept of Games With A Purpose (GWAP), where entertainment drives useful data generation, later formalized in von Ahn's 2006 publication. Development of the game was supported by grants from the , specifically CCR-0122581 and CCR-0085982 through the Center, along with a Fellowship for von Ahn and a National Defense Science and Engineering Graduate Fellowship for Dabbish. The game's design was first detailed in the 2004 CHI conference paper "Labeling Images with a Computer Game" by von Ahn and Dabbish, which described its mechanics for image annotation and presented preliminary empirical results demonstrating its efficacy in producing high-quality labels through paired player agreement. The ESP Game launched publicly on , , via the website espgame.org, implemented as a to facilitate real-time online play. It achieved rapid adoption, with 13,630 unique players generating 1,271,451 labels for 293,760 images between launch and December 10, , and over 80% of participants returning for multiple sessions. By September 2006, the game had engaged more than 75,000 unique players, underscoring its appeal and scalability in image metadata. A key milestone occurred in 2006 when licensed the ESP Game's technology to develop , integrating player-generated labels to enhance the accuracy of its image search engine. The original espgame.org site was shut down in 2011.

Gameplay Mechanics

Core Rules

The ESP Game pairs players randomly with an anonymous partner online, preventing any direct communication to ensure independent inputs. The interface presents a sequence of images sourced from web crawls, displayed one at a time in a session, with players tasked with describing visible content to generate useful labels as a byproduct of gameplay. Players input single words or short phrases via text fields, aiming to exactly match their partner's description of the shared image. Successful matches are revealed instantly to both, advancing the labeling for that image; the system displays up to six taboo words—prior agreements on the image—to promote varied and specific labels. The round for an image concludes after multiple matches or mutual passes, with the game progressing through up to 15 images per session. Players can mutually pass on challenging images to skip them efficiently. Scoring incentivizes participation by awarding points to both players for each exact match, fostering quick consensus on descriptive terms. A substantial bonus is granted upon completing agreements for all 15 images in a session. Session totals are shown upon completion, highlighting cumulative achievements. Each game session runs for 2.5 minutes, allowing rapid cycling through images to maximize engagements within a short timeframe. Players may end early or skip via passes but are encouraged to continue for higher scores; multiple sessions can be played daily without formal restrictions. There is no defined win condition or endgame, but emphasis on accumulating high scores drives sustained play.

Player Interaction and Matching

In the ESP Game, players are anonymously paired by a centralized server that randomly matches online participants at any given time, with new pairings initiated every 30 seconds to facilitate continuous play. To prevent , the system ensures that players are not matched with the same partner more than once and verifies distinct IP addresses to avoid self-pairing or coordinated attempts. These one-session matches, typically consisting of 15 images, promote independent contributions without prior familiarity, fostering a dynamic environment where thousands of players can contribute simultaneously without direct coordination. Player interaction relies entirely on indirect communication, as participants see only their own typed inputs and any resulting matches, with no chat functionality, shared visuals of partner actions, or other cues available. This design compels players to anticipate common descriptive language for the same , such as both entering "" to label a canine, relying on shared cultural and linguistic conventions to achieve consensus. The absence of direct contact encourages players to "think like each other," building agreement through trial-and-error guesses limited to 13 characters each, which are submitted in real time. The matching algorithm performs real-time string comparisons on the server side, requiring exact word matches for agreement—partial matches or synonyms are not accepted to ensure data simplicity and quality. Successful matches are confirmed only when both players independently provide identical labels, advancing the shared set of images. Feedback is provided through visual cues, including a thermometer-style that fills as agreements accumulate toward completing the session's image tab, enhancing player engagement without revealing partner-specific details. This interaction model cultivates centered on creative yet , where players gravitate toward stereotypical or obvious labels to maximize matches, such as tagging ocean scenes with "beach" due to common associations. Studies analyzing game-generated metadata reveal that these dynamics often amplify linguistic biases, leading to predictable descriptors that reflect societal , though this convergence boosts agreement rates and overall label volume—1,271,451 labels across 293,760 images from 13,630 players in the initial four months.

Technical Implementation

Image Selection Process

The ESP Game draws its image pool primarily from Google's image database, accessed via tools like Random Bounce Me to query random web pages and collect diverse visuals in formats such as and . This method ensures a broad selection without manual curation, amassing an initial database of approximately 350,000 images to support ongoing gameplay. Automated filters are applied during selection to exclude unsuitable content, such as blank images, single-color graphics, those smaller than 20 pixels in any dimension, or visuals with extreme aspect ratios greater than 4.5 or less than 1/4.5. To avoid sensitive material like pornography, additional text-based filters and theme-based segregation are employed, particularly in versions adapted for broader audiences including children. The algorithm further prioritizes images based on and existing metadata scarcity, favoring those with low prior label agreement—such as , memes, or controversial visuals—that have been frequently passed over in previous sessions, as these yield the most valuable new data. Images are presented in a randomized order within each session to minimize and maintain engagement, with each visual rescaled to a display size suitable for the game's , typically without accompanying captions or contextual hints. Diversity is maintained through random web sourcing. This process dynamically refines the selection to emphasize under-labeled assets while discarding those presumed fully annotated or excessively difficult.

Labeling and Data Generation

In the ESP Game, labels are collected whenever a pair of players independently enters the same word for an image, generating a weighted label based on the of such agreements across multiple player pairs. This -based scoring reflects the consensus strength of each word, with more agreements indicating higher reliability. Labels achieving at least one agreement (threshold X=1) are considered valid, and previously agreed words become for future pairs on the same image to encourage diversity. The validation process ranks proposed labels by their agreement rate, promoting high-consensus terms—typically those with repeated matches from independent pairs—as official descriptors suitable for alt-text in . For instance, evaluator assessments showed that labels achieving strong agreement were descriptive in 85% of cases, while lower-consensus or ambiguous terms are discarded or the image requeued for additional playthroughs to build further consensus. incorporates filters to exclude non-descriptive spam words (e.g., "image" or "picture") and blacklists for inappropriate content like , alongside statistical detection via rater reviews to prioritize relevant nouns and adjectives. Random player pairing and word lists further mitigate repetitive or low-value inputs. The resulting data output comprised over 1.2 million labels for approximately 300,000 images within the first four months of the game's 2003 launch, expanding to more than 10 million labels by 2006 through widespread adoption. These datasets were used in for applications such as improved search and , with player metadata anonymized to preserve privacy. The system's supported peak engagement from 13,000 players, generating thousands of labels daily at launch and demonstrating capacity for 5,000 active players to Google's 425 million-image corpus in 31 days.

Challenges and Limitations

Cheating Detection

In the ESP Game, common cheating tactics included through external communication channels, such as coordinating via phone or chat to share labels in advance, which allowed players to achieve rapid agreements without genuine effort. Other tactics involved repetitive irrelevant entries, like typing a fixed word such as "a" for every image to farm points quickly, or self-pairing, where a player used multiple accounts from the same device to control both sides of the game. Detection techniques relied on anomaly monitoring, such as tracking unnatural match speeds where a sharp decrease in average agreement time indicated coordinated cheating. IP address tracking was employed to prevent multi-accounting by ensuring partners had distinct IPs, flagging sessions from the same location for review. Additionally, random player queuing and pairing from a large pool minimized the chance of colluders matching, while test images with known labels helped identify suspicious behavior through inconsistent responses. Response measures included inserting bot players with pre-recorded actions to disrupt global cheating strategies, rendering coordinated inputs ineffective. Labels from potentially cheated games were weighted lower or excluded by enforcing a "good label" threshold requiring agreement from multiple independent player pairs (e.g., at least two or more). Temporary session disruptions, such as introducing taboo words that blocked repeated strategies for the duration of a game, further deterred abuse without permanent exclusions. Statistical safeguards involved dynamically adjusting agreement thresholds to demand consensus from diverse player pairs, reducing the impact of any single collusive group. These methods, combined with the game's emphasizing over , ensured that did not significantly corrupt the , as even partial invalid entries were filtered out through redundancy.

Ethical and Privacy Concerns

The ESP Game's design involves collecting user data such as IP addresses to detect and prevent through random player pairing and session monitoring, alongside typed labels for image annotation, which could enable behavioral profiling and potential deanonymization of participants despite no explicit intent for such use. Additionally, the game's image corpus is drawn from public web sources, which may inadvertently include personal photographs of identifiable individuals, raising concerns about the of those depicted without their knowledge or for inclusion in a crowdsourced labeling system. The game also faced challenges in filtering inappropriate content from web images, potentially exposing players to unsuitable material. Players contribute to the game under limited transparency regarding data repurposing, often unaware that their labels train commercial AI applications, such as enhancements to Google's search functionality, with no mechanisms provided for opting out of downstream proprietary uses. This lack of undermines user autonomy, as the game's entertaining format masks its role in generating high-value training data for models. The labeling process amplifies biases inherent to the player base, which consists primarily of young, English-speaking users from Western demographics, resulting in datasets that overrepresent cultural norms and underrepresent diverse global traditions. These skewed annotations propagate into trained AI systems, perpetuating inequities in applications like image recognition. Additionally, labels may require periodic re-labeling as linguistic and cultural associations evolve over time. Critics have highlighted the game's reliance on unpaid "human computation" as a form of labor exploitation, where participants inadvertently provide economically valuable data for AI development—such as the millions of labels collected for Google's systems—without fair compensation or recognition of their contributions as work. To mitigate these issues, the ESP Game incorporates basic anonymization by not storing persistent user identifiers beyond session needs and includes terms of service disclosures about data collection for research and improvement purposes. Creator has since advocated for ethical frameworks in games with a purpose (GWAP), emphasizing transparency, as intrinsic , and societal benefit in his broader human computation to balance utility with participant rights.

Impact and Legacy

Applications in Computer Vision

The ESP Game's labels were integrated into Google Image Search through its licensing and reimplementation as the Google Image Labeler in 2006, enabling the addition of user-generated keywords to improve search relevance and accuracy. This collaboration allowed for the annotation of millions of web images, with labels directly enhancing query results by associating descriptive terms like "car" or "dog" to visuals, achieving near-perfect precision in targeted searches (e.g., 100% for common objects in tested sets). By providing meaningful metadata, these labels addressed key limitations in early image retrieval systems, where automated methods struggled with semantic understanding. The game's output contributed to foundational AI datasets in computer vision, supplying labeled images for training machine learning models in object detection and recognition tasks. For instance, the ESP dataset, derived from player annotations, has been used to benchmark algorithms for multi-label image classification, offering a challenging resource with diverse, real-world web images that require predicting multiple keywords per visual. These labels influenced early developments in computer vision tools by providing scalable, human-verified ground truth data, reducing reliance on computationally expensive automated labeling. Research leveraging ESP Game data advanced semantic image understanding, with the foundational paper cited over 3,197 times across and human-computer interaction studies. It enabled explorations into for the visually impaired, such as generating alt-text equivalents for screen readers to describe content audibly, thereby improving web for users with disabilities. By 2008, the initiative had generated over 50 million labels across diverse categories, significantly lowering costs compared to manual methods, which can exceed prohibitive expenses for large-scale datasets. One prominent successor to the ESP Game is , developed by and colleagues in 2007 as a human computation system that leverages users solving distorted text challenges to both prevent automated spam and contribute to the of books and historical documents. By integrating into websites worldwide, reCAPTCHA harnessed collective human effort to solve over 100 million CAPTCHAs daily as of 2008, scaling to hundreds of millions daily in later years and aiding projects like the Archive's . Within the broader Games With a Purpose (GWAP) framework pioneered by the ESP Game, several CMU-developed games extended its model for specialized image annotation tasks. Peekaboom, introduced in 2006, paired players to reveal portions of an image based on descriptive words, generating bounding boxes for object localization to improve accuracy. Similarly, Phetch, launched in 2008, involved multiple players collaboratively crafting full descriptive sentences for images, producing accessible captions particularly useful for visually impaired users navigating web content. Commercial adaptations drew on the ESP Game's gamified labeling approach for large-scale data collection. Microsoft's PhotoCity, a 2010 GWAP, encouraged outdoor play where participants photographed urban structures to "capture" virtual flags, amassing thousands of images per location to fuel 3D reconstructions in tools like . Academic extensions include open-source GWAP variants adapted for emerging technologies, such as mobile-based image labeling systems that enable on-the-go tagging for applications like . The ESP Game's legacy profoundly influenced the rise of platforms, including Figure Eight (acquired by Appen in 2019), which adopted human-AI symbiotic models for scalable data annotation, emphasizing voluntary or incentivized contributions to train systems. This framework, as outlined in foundational GWAP research, shifted paradigms toward integrating with computation for tasks beyond traditional programming.
Add your contribution
Related Hubs
User Avatar
No comments yet.