Hubbry Logo
Internet botInternet botMain
Open search
Internet bot
Community hub
Internet bot
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Internet bot
Internet bot
from Wikipedia

An Internet bot (also called a web robot or robot), or simply bot,[1] is a software application that runs automated tasks (scripts) on the Internet, usually with the intent to imitate human activity, such as messaging, on a large scale.[2] An Internet bot plays the client role in a client–server model whereas the server role is usually played by web servers. Internet bots are able to perform simple and repetitive tasks much faster than a person could ever do. The most extensive use of bots is for web crawling, in which an automated script fetches, analyzes and files information from web servers. More than half of all web traffic is generated by bots.[3]

Efforts by web servers to restrict bots vary. Some servers have a robots.txt file that contains the rules governing bot behavior on that server. Any bot that does not follow the rules could, in theory, be denied access to or be removed from the affected website. If the posted text file has no associated program/software/app, then adhering to the rules is entirely voluntary. There would be no way to enforce the rules or to ensure that a bot's creator or implementer reads or acknowledges the robots.txt file. Some bots are "good", e.g. search engine spiders, while others are used to launch malicious attacks on political campaigns, for example.[3]

IM and IRC

[edit]

Some bots communicate with users of Internet-based services, via instant messaging (IM), Internet Relay Chat (IRC), or other web interfaces such as Facebook bots and Twitter bots. These chatbots may allow people to ask questions in plain English and then formulate a response. Such bots can often handle reporting weather, postal code information, sports scores, currency or other unit conversions, etc.[4] Others are used for entertainment, such as SmarterChild on AOL Instant Messenger and MSN Messenger.[citation needed]

Additional roles of an IRC bot may be to listen on a conversation channel, and to comment on certain phrases uttered by the participants (based on pattern matching). This is sometimes used as a help service for new users or to censor profanity.[citation needed]

Social bots

[edit]

Social bots are sets of algorithms that take on the duties of repetitive sets of instructions in order to establish a service or connection among social networking users. Among the various designs of networking bots, the most common are chat bots, algorithms designed to converse with a human user, and social bots, algorithms designed to mimic human behaviors to converse with patterns similar to those of a human user. The history of social botting can be traced back to Alan Turing in the 1950s and his vision of designing sets of instructional code approved by the Turing test. In the 1960s Joseph Weizenbaum created ELIZA, a natural language processing computer program considered an early indicator of artificial intelligence algorithms. ELIZA inspired computer programmers to design tasked programs that can match behavior patterns to their sets of instruction. As a result, natural language processing has become an influencing factor to the development of artificial intelligence and social bots. And as information and thought see a progressive mass spreading on social media websites, innovative technological advancements are made following the same pattern.[citation needed]

Twitter bots posting similar messages during the 2016 United States elections

Reports of political interferences in recent elections, including the 2016 US and 2017 UK general elections,[5] have set the notion of bots being more prevalent because of the ethics that is challenged between the bot's design and the bot's designer. Emilio Ferrara, a computer scientist from the University of Southern California reporting on Communications of the ACM,[6] said the lack of resources available to implement fact-checking and information verification results in the large volumes of false reports and claims made about these bots on social media platforms. In the case of Twitter, most of these bots are programmed with search filter capabilities that target keywords and phrases favoring political agendas and then retweet them. While the attention of bots is programmed to spread unverified information throughout the social media platforms,[7] it is a challenge that programmers face in the wake of a hostile political climate. The Bot Effect is what Ferrera reported as the socialization of bots and human users creating a vulnerability to the leaking of personal information and polarizing influences outside the ethics of the bot's code, and was confirmed by Guillory Kramer in his study where he observed the behavior of emotionally volatile users and the impact the bots have on them, altering their perception of reality.[citation needed]

Commercial bots

[edit]

There has been a great deal of controversy about the use of bots in an automated trading function. Auction website eBay took legal action in an attempt to suppress a third-party company from using bots to look for bargains on its site; this approach backfired on eBay and attracted the attention of further bots. The United Kingdom-based bet exchange, Betfair, saw such a large amount of traffic coming from bots that it launched a WebService API aimed at bot programmers, through which it can actively manage bot interactions.[citation needed]

Bot farms are known to be used in online app stores, like the Apple App Store and Google Play, to manipulate positions[8] or increase positive ratings/reviews.[9]

A rapidly growing form of internet bot is the chatbot. From 2016, when Facebook Messenger allowed developers to place chatbots on their platform, there has been an exponential growth of their use on that app alone. 30,000 bots were created for Messenger in the first six months, rising to 100,000 by September 2017.[10] Avi Ben Ezra, CTO of SnatchBot, told Forbes that evidence from the use of their chatbot building platform pointed to a near future saving of millions of hours of human labor as 'live chat' on websites was replaced with bots.[11]

Companies use internet bots to increase online engagement and streamline communication. Companies often use bots to cut down on cost; instead of employing people to communicate with consumers, companies have developed new ways to be efficient. These chatbots are used to answer customers' questions; for example, Domino's developed a chatbot that can take orders via Facebook Messenger. Chatbots allow companies to allocate their employees' time to other tasks.[12]

Malicious bots

[edit]

One example of the malicious use of bots is the coordination and operation of an automated attack on networked computers, such as a denial-of-service attack by a botnet. Internet bots or web bots can also be used to commit click fraud and more recently have appeared around MMORPG games as computer game bots. Another category is represented by spambots, internet bots that attempt to spam large amounts of content on the Internet, usually adding advertising links. More than 94.2% of websites have experienced a bot attack.[3]

There are malicious bots (and botnets) of the following types:

  1. Spambots that harvest email addresses from contact or guestbook pages
  2. Downloaded programs that suck bandwidth by downloading entire websites
  3. Website scrapers that grab the content of websites and re-use it without permission on automatically generated doorway pages
  4. Registration bots that sign up a specific email address to numerous services in order to have the confirmation messages flood the email inbox and distract from important messages indicating a security breach.[13]
  5. Viruses and worms
  6. DDoS attacks
  7. Botnets, zombie computers, etc.
  8. Spambots that try to redirect people onto a malicious website, sometimes found in comment sections or forums of various websites
  9. Viewbots create fake views[14][15]
  10. Bots that buy up higher-demand seats for concerts, particularly by ticket brokers who resell the tickets.[16] These bots run through the purchase process of entertainment event-ticketing sites and obtain better seats by pulling as many seats back as it can.
  11. Bots that are used in massively multiplayer online role-playing games to farm for resources that would otherwise take significant time or effort to obtain, which can be a concern for online in-game economies.[17]
  12. Bots that increase traffic counts on analytics reporting to extract money from advertisers. A study by Comscore found that over half of ads shown across thousands of campaigns between May 2012 and February 2013 were not served to human users.[18]
  13. Bots used on internet forums to automatically post inflammatory or nonsensical posts to disrupt the forum and anger users.

in 2012, journalist Percy von Lipinski reported that he discovered millions of bots or botted or pinged views at CNN iReport. CNN iReport quietly removed millions of views from the account of iReporter Chris Morrow.[19] It is not known if the ad revenue received by CNN from the fake views was ever returned to the advertisers.[citation needed]

The most widely used anti-bot technique is CAPTCHA. Examples of providers include Recaptcha, Minteye, Solve Media and NuCaptcha. However, captchas are not foolproof in preventing bots, as they can often be circumvented by computer character recognition, security holes, and outsourcing captcha solving to cheap laborers.[citation needed]

Protection against bots

[edit]

In the case of academic surveys, protection against auto test taking bots is essential for maintaining accuracy and consistency in the results of the survey. Without proper precautions against these bots, the results of a survey can become skewed or inaccurate. Researchers indicate that the best way to keep bots out of surveys is to not allow them to enter to begin with. The survey should have participants from a reliable source, such as an existing department or group at work. This way, malicious bots don't have the opportunity to infiltrate the study.

Another form of protection against bots is a CAPTCHA test as mentioned in a previous section, which stands for "Completely Automated Public Turing Test".  This test is often used to quickly distinguish a real user from a bot by posing a challenge that a human could easily do but a bot would not.  This could be something like recognizing distorted letters or numbers, or picking out specific parts of an image, such as traffic lights on a busy street. CAPTCHAs are a great form of protection due to their ability to be completed quickly, low effort, and easy implementation.

There are also dedicated companies that specialize in protection against bots, including ones like DataDome, Akamai and Imperva.  These companies offer defense systems to their clients to protect them against DDoS attacks, infrastructure attacks, and overall cybersecurity.  While the pricing rates of these companies can often be expensive, the services offered can be crucial both for large corporations and small businesses.

Human interaction with social bots

[edit]

There are two main concerns with bots: clarity and face-to-face support. The cultural background of human beings affects the way they communicate with social bots.[citation needed] Others recognize that online bots have the ability to "masquerade" as humans online and have become highly aware of their presence. Due to this, some users are becoming unsure when interacting with a social bot.

Many people believe that bots are vastly less intelligent than humans, so they are not worthy of our respect.[2]

Min-Sun Kim proposed five concerns or issues that may arise when communicating with a social robot, and they are avoiding the damage of peoples' feelings, minimizing impositions, disapproval from others, clarity issues, and how effective their messages may come across.[2]

People who oppose social robots argue that they also take away from the genuine creations of human relationships.[2] Opposition to social bots also note that the use of social bots add a new, unnecessary layer to privacy protection. Many users call for stricter legislation in relation to social bots to ensure private information remains preserved. The discussion of what to do with social bots and how far they should go remains ongoing.

Social bots and political discussions

[edit]

In recent years, political discussion platforms and politics on social media have become highly unstable and volatile. With the introduction of social bots on the political discussion scene, many users worry about their effect on the discussion and election outcomes. The biggest offender on the social media side is X (previously Twitter), where heated political discussions are raised both by bots and real users. The result is a misuse of political discussion on these platforms and a general mistrust among users for what they see.[citation needed]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
An Internet bot, commonly shortened to bot, is a software program that automates repetitive tasks across the Internet, often simulating human-like interactions to perform functions ranging from data collection to content generation. These programs operate independently or as part of networks, executing scripts at scales unattainable by manual effort, and encompass both beneficial tools for efficiency and harmful agents for exploitation. Early Internet bots emerged in the late 1980s with automated responders on platforms like Internet Relay Chat, evolving into web crawlers such as WebCrawler in 1994, which indexed pages to enable search functionality. Legitimate bots, including search engine crawlers like Googlebot, facilitate essential services by scanning sites for indexing, monitoring uptime, or aggregating price data for comparison tools, comprising a significant portion of authorized traffic. In contrast, malicious bots engage in activities such as distributed denial-of-service attacks, credential stuffing for account hijacking, content scraping to evade paywalls or steal proprietary data, and coordinated spam campaigns that inflate engagement metrics or spread disinformation. By 2023, bots accounted for 49.6% of global Internet traffic, with roughly half classified as malicious, marking a steady rise driven by advancements in automation and AI integration that enhance evasion of detection. This prevalence underscores bots' dual role in digital ecosystems: enabling scalable operations like fraud detection or market analysis while posing risks such as economic losses from ad fraud—estimated in billions annually—and distortion of online discourse, where small bot clusters can amplify niche narratives to influence public perception or regulatory views. Mitigation relies on behavioral analysis, rate limiting, and AI-driven defenses, yet the arms race between bot creators and blockers continues, with sophisticated variants now leveraging machine learning to mimic human variability in timing and patterns.

Definition and Fundamentals

Core Definition and Characteristics

An internet bot, also known as a web bot or simply a bot, is a software application designed to execute automated tasks over the internet, typically performing repetitive actions at speeds unattainable by human operators. These programs operate as autonomous agents, following pre-defined scripts or algorithms to interact with websites, networks, or services without direct human intervention. Unlike manual processes, bots process data in bulk, enabling efficiencies in tasks such as data retrieval or content indexing, though they may also simulate user behaviors to evade detection. Core characteristics of internet bots include their scalability and persistence, allowing them to run continuously on remote servers or connected devices, generating a significant portion of web traffic—estimated at over 50% in recent analyses. They rely on programmatic logic, often leveraging HTTP requests, APIs, or scripting languages to navigate digital environments, and can adapt to patterns like mouse movements or keystrokes in advanced implementations to mimic organic activity. Bots are inherently rule-based or, in modern variants, incorporate machine learning for decision-making, but their outputs remain deterministic absent real-time human input, distinguishing them from interactive software. While bots enable legitimate automation, their defining traits—automation, repetition, and impersonation potential—also facilitate misuse, as they operate independently of ethical oversight inherent to human actions. Empirical detection studies highlight linguistic and behavioral markers, such as uniform posting cadences or automated phrasing, that differentiate bots from human-generated content on platforms like social media. This duality underscores bots' foundational role in internet ecosystems, where their efficiency drives both utility and risks, contingent on deployment intent.

Technical Architecture

Internet bots are automated software programs designed to interact with network services, typically comprising modular components that enable autonomous operation. At their core, bots consist of executable code implementing application logic, coupled with mechanisms for data input, processing, output generation, and persistence. This logic is often rule-based, relying on predefined scripts and conditional statements to execute tasks such as data retrieval or content posting, though modern variants integrate machine learning for dynamic decision-making. The communication layer forms a foundational element, utilizing protocols like HTTP/HTTPS for web interactions or APIs for platform-specific access, such as OAuth-authenticated endpoints on social media services. Bots employ libraries like Python's requests or JavaScript's axios to handle requests, mimicking browser behavior through headers, cookies, and user agents to evade detection where necessary. For real-time operations, WebSockets or polling mechanisms maintain persistent connections, enabling responsive actions like automated replies. Data processing involves parsers—e.g., BeautifulSoup for HTML or JSON decoders—to extract structured information from responses, often feeding into storage backends like relational databases (e.g., PostgreSQL) or NoSQL systems (e.g., MongoDB) for logging or analysis. Task management relies on queuing systems to orchestrate workflows, particularly in distributed architectures where multiple instances scale horizontally. A URL frontier or task queue, implemented as FIFO structures using tools like Redis or Apache Kafka, prioritizes and deduplicates operations to prevent redundancy and manage load. In web-traversing bots, such as crawlers, seed inputs initiate the process, with extracted links enqueued for subsequent fetches, ensuring systematic coverage while respecting rate limits via delays or token buckets. Advanced bots incorporate feedback loops, where processed data informs iterative refinements, as seen in AI-enhanced variants using natural language processing pipelines for intent recognition and response generation. Deployment typically occurs on server environments, including virtual private servers, cloud platforms like AWS or Google Cloud, or containerized setups via Docker for portability and orchestration with Kubernetes. Bots run as daemon processes or scheduled via cron jobs for periodic execution, with event-driven models using webhooks or message brokers for triggered responses. Security considerations, such as proxy rotation and CAPTCHA solvers, are embedded in resilient designs to sustain operations against blocking measures, though these raise ethical and legal concerns in non-benign contexts. Scalability is achieved through microservices, distributing components across nodes to handle high volumes, as evidenced in large-scale crawlers processing billions of pages daily.

Historical Development

Early Origins (1980s-1990s)

The earliest internet bots emerged in the late 1980s with the introduction of Internet Relay Chat (IRC), a protocol developed by Jarkko Oikarinen in August 1988 at the University of Oulu in Finland to enable real-time group communication across networked servers. IRC bots were automated programs that operated within these channels, performing repetitive tasks such as logging conversations, moderating user access, and responding to commands, thereby reducing manual oversight in growing online communities. The first recognized IRC bots included Jyrki Alakuijala's "Puppe," Greg Lindahl's "Game Manager" for handling multiplayer games, and Bill Wisner's "Bartender," which managed channel services like user queries and notifications. These bots exemplified early automation on the internet, leveraging simple scripting to simulate user-like behavior without advanced artificial intelligence, primarily serving utility functions in text-based environments. In the early 1990s, as the internet expanded beyond chat systems to include file-sharing protocols like FTP, bots evolved into indexing tools to catalog distributed resources. Archie, released on September 10, 1990, by Alan Emtage, Bill Heelan, and Peter Deutsch at McGill University, functioned as the first internet search engine by periodically crawling and indexing filenames across anonymous FTP archives worldwide, enabling users to query over 1 million files by 1992. Unlike manual directory maintenance, Archie's automated prowl—running every few weeks—gathered metadata without downloading full files, addressing the challenge of locating resources in a decentralized network lacking centralized oversight. This marked a shift toward bots as data discovery agents, though limited to non-web protocols and reliant on basic pattern matching rather than semantic understanding. The advent of the World Wide Web in 1991 spurred the development of web-specific bots, with the World Wide Web Wanderer (WWWW) debuting in June 1993 as the first automated web crawler, created by Matthew Gray at the Massachusetts Institute of Technology. Designed to measure web growth, the Perl-based Wanderer systematically followed hyperlinks from seed URLs, indexing servers rather than pages to avoid overload, and reported metrics like active web servers—rising from about 130 in mid-1993 to over 1,500 by early 1994. Early runs revealed rapid expansion but also unintended issues, such as temporary server slowdowns from uncoordinated crawling, prompting Gray to refine it for lighter footprint by focusing on server counts via HTTP HEAD requests. These precursors laid foundational techniques for scalable web indexing, influencing subsequent crawlers like WebCrawler in 1994, while highlighting early tensions between automation efficiency and network resource demands.

Expansion in the Web Era (2000s)

The proliferation of internet bots in the 2000s was driven by the rapid expansion of the World Wide Web, which necessitated advanced automated indexing and retrieval mechanisms to handle the surge in online content. Web crawlers, evolving from 1990s prototypes like WebCrawler, became essential for search engines such as Google, whose Googlebot systematically indexed billions of pages to support improved query relevance and scale. Incremental crawling techniques, as detailed in research from 2000, enabled bots to efficiently update indexes by prioritizing recently modified pages, addressing the web's dynamic growth from approximately 1 billion pages in 2000 to over 3 billion by 2005. These utility bots facilitated the foundational infrastructure of Web 1.0, automating content discovery without which modern search functionality would have been infeasible. Parallel to indexing advancements, chatbots emerged as consumer-facing automated agents amid the boom in instant messaging platforms. In 2001, SmarterChild, developed by ActiveBuddy, debuted on AOL Instant Messenger and Microsoft Messenger, simulating human-like conversations through scripted responses and basic natural language processing, attracting millions of users for entertainment and simple queries. This period saw the maturation of underlying technologies like the Artificial Intelligence Markup Language (AIML), finalized around 2000, which used pattern matching to enable more responsive bot interactions on emerging web services. Commercial adoption grew with the internet's commercialization, as bots automated customer support on e-commerce sites, reducing human intervention for routine tasks like order tracking. Malicious bots also expanded, exploiting the web's vulnerabilities for disruption and exploitation. In February 2000, a 15-year-old hacker known as Mafiaboy orchestrated volumetric denial-of-service attacks using rudimentary bot-like amplification techniques, crippling sites including CNN, Yahoo, and eBay, highlighting early scalable bot-enabled threats. By 2003, spam botnets like Sobig transitioned to proxy-based architectures, enabling mass distribution of malware and unsolicited emails, with Sobig infecting millions of machines and marking a shift toward coordinated zombie networks for phishing and propagation. These adversarial developments underscored bots' dual potential, as their automation capabilities were increasingly weaponized against the growing online ecosystem, prompting initial countermeasures like rate limiting and CAPTCHA systems.

Modern AI-Integrated Era (2010s-Present)

The 2010s witnessed the profound integration of machine learning and deep learning into internet bots, shifting them from deterministic scripts to adaptive systems capable of learning from vast datasets. Breakthroughs in neural networks, fueled by increased computational power from GPUs, enabled bots to excel in natural language understanding, image analysis, and behavioral mimicry, fundamentally enhancing their autonomy and effectiveness across applications. This era's advancements laid the groundwork for bots to handle unstructured data dynamically, marking a departure from earlier rule-based limitations. Consumer-facing AI bots proliferated with the launch of sophisticated virtual assistants. Apple's Siri, introduced on October 4, 2011, pioneered voice-activated interactions using natural language processing for iOS devices. Amazon's Alexa followed on November 6, 2014, embedding bots into smart home ecosystems for task automation and information retrieval. Google's Assistant debuted on May 18, 2016, further advancing contextual awareness and multi-modal inputs. These developments democratized AI bot interactions, with machine learning enabling personalized responses and continuous improvement via user data. Malicious bots leveraged these technologies for sophisticated operations, particularly in social media influence campaigns. During the 2016 U.S. presidential election, automated Twitter accounts disproportionately disseminated articles from low-credibility sources, amplifying polarizing content and distorting online discourse. Machine learning facilitated bot evasion of detection through human-like posting patterns and content generation, escalating an arms race with platform algorithms. In parallel, adversarial machine learning empowered bots to circumvent security measures like CAPTCHAs. Convolutional neural networks and generative adversarial networks have achieved high success rates in solving visual puzzles, rendering traditional defenses less effective against AI-augmented scrapers and intruders. The 2020s amplified these trends with transformer-based large language models, enabling bots to produce human-like text, code, and media. OpenAI's ChatGPT, released on November 30, 2022, exemplified this shift, powering autonomous agents for customer service, content creation, and research automation. AI web crawlers surged to support model training, with bots comprising about 30% of global web traffic by 2025, outpacing human activity in volume. Meta's crawlers alone accounted for 52% of AI-specific bot traffic, straining server resources and prompting new blocking protocols. This proliferation has heightened concerns over data privacy, intellectual property, and the authenticity of online interactions, as AI bots blur distinctions between automated and genuine engagement.

Classification of Bots

Benign and Utility Bots

Benign bots, also referred to as good bots in cybersecurity classifications, are automated software agents programmed to execute beneficial tasks over the internet while adhering to platform terms of service and avoiding harm to users or systems. Unlike adversarial bots, they prioritize utility and efficiency, such as facilitating data aggregation or monitoring without deceptive intent. This distinction arises from their operational behaviors, where benign bots typically announce their presence via user-agent strings and respect rate-limiting protocols to minimize resource strain. A primary example includes search engine crawlers, like Googlebot, which systematically scan web pages to build indexes that enable user queries, processing billions of pages daily to maintain up-to-date search results as of 2023 data from major providers. These bots enhance accessibility by prioritizing content discovery without altering or extracting data illicitly. Similarly, site monitoring bots, deployed by services like Pingdom, periodically check website availability and performance metrics, alerting administrators to downtime— for instance, scanning endpoints every 1-5 minutes to ensure 99.9% uptime compliance in enterprise environments. Utility bots extend this functionality into interactive and assistive roles, often integrating for user-facing . Chatbots, such as those powering on e-commerce platforms, routine inquiries like order tracking, resolving up to 80% of queries without intervention according to 2022 industry benchmarks from providers like Zendesk. In social media contexts, benign utility bots automate by flagging violations or posting alerts, exemplified by earthquake notification bots on Twitter that disseminate real-time USGS to subscribers within seconds of seismic events. These implementations demonstrate causal in reducing manual labor while preserving platform , though their depends on transparent to avoid misclassification as threats.

Commercial and Service Bots

Commercial and service bots encompass automated software agents deployed by businesses to facilitate customer interactions, streamline operations, and deliver value-added functionalities on the internet. These bots typically operate via web interfaces, APIs, or messaging platforms, leveraging natural language processing and rule-based logic to handle routine tasks without human intervention. Unlike adversarial bots, they are designed for efficiency and user satisfaction, often integrating with enterprise systems to provide scalable services. A primary application lies in customer service, where chatbots respond to inquiries, resolve issues, and guide users through processes. For instance, over 67% of consumers worldwide have interacted with a chatbot for support in the past year, with 85% of customer interactions expected to involve such automation. Businesses report that chatbots handle up to 80% of simple queries instantly, reducing response times threefold compared to human agents. The global AI chatbot market, heavily driven by service applications, was valued at $15.57 billion in 2024 and is projected to reach $46.64 billion by 2029, reflecting widespread adoption in sectors like retail and finance. In e-commerce, service bots enhance shopping experiences by offering personalized recommendations, processing orders, and managing post-purchase support. These bots simulate conversational interfaces to assist with product discovery, such as suggesting items based on user queries or browsing history, thereby increasing conversion rates by up to 67% in some implementations. Examples include bots integrated into platforms like Shopify or Amazon, which automate cart abandonment recovery and inventory checks. By enabling direct ordering through chat—eliminating traditional website navigation—e-commerce bots simplify transactions and boost engagement, with 37% of businesses deploying them specifically for support and sales automation. Other service bots support targeted commercial functions, such as virtual assistants for scheduling or data retrieval in professional services. Citibot, for example, utilizes cloud infrastructure to power municipal and enterprise chatbots that handle citizen or customer complaints efficiently. While these bots prioritize utility, their effectiveness depends on accurate training data and integration, with 58% of customer experience leaders anticipating advancements in chatbot sophistication by 2025. Deployment requires balancing automation with human escalation to maintain trust, as 34% of consumers still prefer human agents for complex issues.

Adversarial and Malicious Bots

Adversarial and malicious bots encompass automated software agents programmed to engage in deceptive, disruptive, or exploitative activities across online platforms, often evading detection mechanisms to achieve unauthorized goals such as fraud, data theft, or influence operations. These bots differ from benign counterparts by prioritizing harm over utility, frequently mimicking human behavior through advanced techniques like IP rotation, user-agent spoofing, and machine learning-driven pattern adaptation to bypass security measures. Cybersecurity analyses classify them as "bad bots," which constituted 37% of global internet traffic in 2024, marking an increase from 32% in 2023 and reflecting their growing sophistication. Key subtypes include fraud-oriented bots, which automate credential stuffing attacks by testing stolen username-password pairs against login portals; in 2024, such bots accounted for a significant portion of advanced threats, exploiting business logic flaws rather than technical vulnerabilities to perpetrate account takeovers and financial . Scraping bots, deployed for competitive intelligence gathering or content , systematically harvest data from websites, often overwhelming servers and violating terms of service; reports indicate these activities surged in sectors like e-commerce and travel, where bots inflated search queries to manipulate pricing algorithms in "look-to-book" fraud schemes. Denial-of-service bots, forming botnets to flood targets with traffic, enable distributed attacks that disrupt services; for instance, IoT-compromised bots have powered large-scale DDoS incidents, with advanced variants comprising 55% of bot attacks in 2024 by emulating legitimate user sessions. Social media manipulation bots represent another adversarial category, creating fake accounts to amplify narratives, spread misinformation, or astroturf opinions through coordinated posting; these evolved from early Twitter automation in the 2010s to AI-enhanced variants that generate contextually relevant content, complicating detection. In 2024, 49% of detected bots exhibited advanced human-mimicking traits, many tied to influence campaigns on platforms like X (formerly Twitter). Such bots have been implicated in electoral interference, with empirical studies documenting their role in inflating engagement metrics; however, detection challenges persist due to adversarial adaptations that counter behavioral analytics. Overall, these bots exploit internet-scale vulnerabilities, with mitigation relying on behavioral analysis and rate limiting, though their prevalence underscores ongoing arms races between developers and defenders.

Legitimate Applications

Information Retrieval and Indexing

Internet bots facilitate information retrieval and indexing primarily through web crawlers, automated programs that systematically traverse the World Wide Web to discover, fetch, and catalog content for search engines and databases. These bots begin with a set of seed URLs, follow hyperlinks recursively to identify new pages, and extract textual data, metadata, images, and structural elements while adhering to protocols such as robots.txt files to respect site owner directives on crawling permissions. The fetched content is then processed, tokenized, and stored in inverted indexes—data structures that map terms to their locations across documents—enabling efficient querying and relevance ranking during user searches. Search engines rely on these bots to maintain comprehensive indexes; for instance, Googlebot, the primary crawler for Google Search, operates continuously to explore billions of pages, updating its index with fresh content multiple times per day for high-authority sites and less frequently for others, ensuring search results reflect current web state. Similarly, Bingbot performs analogous functions for Microsoft's Bing engine, indexing pages to support its query processing, which collectively handles a significant portion of non-Google searches. Other legitimate crawlers, such as YandexBot and Applebot, contribute to regional or specialized indexing, with Yandex focusing on Russian-language content and Applebot aiding Spotlight search integration. Empirical data underscores the scale: as of 2025, Google commands over 90% of the global search market, processing more than 60% of queries on desktop and mobile, a dominance enabled by relentless crawling that has indexed trillions of URLs despite the web's exponential growth. Crawler traffic overall rose 18% from May 2024 to May 2025, with traditional search bots like Googlebot accounting for the bulk, though increases also reflect emerging AI training crawlers adapting similar techniques for data aggregation. This infrastructure underpins causal chains in information ecosystems, where bot-driven indexing directly enhances retrieval accuracy by prioritizing fresh, linked, and semantically rich content over isolated or outdated sources. Challenges in this domain include managing crawl budgets to avoid overwhelming servers and handling dynamic content via JavaScript rendering, which modern bots like Googlebot address through headless browser emulation. Open initiatives, such as Common Crawl's petabyte-scale archives of web snapshots dating back to 2008, further democratize indexed data for research, providing verifiable datasets for training retrieval models without proprietary dependencies.

Customer Interaction and Automation

Internet bots enable automated customer interactions by processing queries, providing responses, and handling routine tasks on digital platforms such as websites, messaging apps, and social . These systems, often implemented as chatbots or virtual , operate 24/7 to address common inquiries like order status , , or product recommendations, reducing the need for intervention in high-volume scenarios. Early forms of automated customer service emerged with (IVR) systems in banking during the , evolving into web-based bots in the and with the rise of sites integrating scripted response engines. By 2025, adoption has accelerated, with 37% of businesses deploying chatbots specifically for customer support interactions, responding to inquiries three times faster than human agents. Conversational AI variants, powered by natural language processing, now manage up to 70% of routine customer requests in sectors like retail and finance, yielding projected global savings of $80 billion in agent labor costs by 2026 through reduced handling times and scaled operations. For instance, AI bots excel in product guidance, where 89% of U.S. customer experience leaders report high value from automated assistance in navigating services or resolving simple issues. Effectiveness stems from bots' ability to integrate with backend data for personalized automation, such as real-time inventory updates or ticket routing, while maintaining consistent service levels without fatigue. Gartner notes three primary benefits: enhanced insights from interaction data, improved user experiences via rapid resolutions, and streamlined processes that free human agents for complex cases. In banking and healthcare, chatbots are forecasted to handle 75% to 90% of inquiries by 2025, driven by cost efficiencies estimated at 30% per support operation. Despite reliance on predefined scripts or machine learning models trained on historical data, these bots demonstrate reliability for structured tasks, with 62% of consumers preferring them over wait times for agents in non-escalated matters.

Market and Data Analysis

Internet bots facilitate market and by automating the collection, , and interpretation of vast datasets from online sources, enabling real-time insights into economic trends and . Web scraping bots, for instance, systematically extract publicly available financial such as prices, reports, and market indicators from websites like Yahoo Finance or regulatory filings, allowing analysts to aggregate that would otherwise require manual effort. These tools are essential for tracking competitor financials, including balance sheets and streams, to strategic decisions. In algorithmic trading, bots analyze historical and live market data to execute trades based on predefined criteria, such as price thresholds or statistical models, operating at speeds unattainable by humans. Approximately 70% of U.S. stock market trading volume in 2021 was driven by such algorithmic systems, which process feeds from exchanges and news sources to identify arbitrage opportunities or momentum patterns. The global algorithmic trading market, encompassing these bot-driven platforms, was valued at USD 17.2 billion in 2024 and is projected to reach USD 42.5 billion by 2033, reflecting their integration into high-frequency and quantitative strategies. Bots also perform sentiment analysis by mining social media, forums, and news for public opinion on assets or sectors, quantifying bullish or bearish signals through natural language processing to forecast price movements. Data analysis bots support broader market research by conducting automated web crawling for supply chain data or consumer pricing, contributing to the web scraping industry's growth beyond USD 9 billion by the end of 2025. These applications rely on compliant bots that respect robots.txt protocols and rate limits to ensure ethical data harvesting.

Adverse Effects and Misuses

Spam, Fraud, and Cyberattacks

Internet bots facilitate spam by automating the distribution of unsolicited messages across email, social media, forums, and comment sections, often disseminating advertisements, phishing links, or malware. For instance, spam bots generate and propagate content such as fake reviews or misleading links at scale, evading human moderation through rapid posting and variation in messaging patterns. In 2023, bad bots, which include those used for spamming, accounted for nearly one-third of global internet traffic, contributing to the proliferation of such automated abuse. In online fraud, bots enable credential stuffing, account takeovers, and click fraud by mimicking legitimate user behavior to exploit stolen credentials or generate artificial traffic. Credential stuffing bots, for example, test compromised username-password pairs across multiple sites, leading to unauthorized access and financial losses estimated in billions annually from such automated attacks. Click fraud bots simulate ad clicks to drain advertiser budgets or inflate metrics, with these malicious agents responsible for a significant portion of fraudulent digital advertising interactions. Additionally, bots create fake accounts to perpetrate advance-fee scams or distribute scam links, as observed in social platforms where automated profiles spam comments tying back to fraudulent schemes. Bots underpin cyberattacks, particularly through botnets—networks of compromised devices controlled remotely to launch distributed denial-of-service (DDoS) assaults that overwhelm targets with traffic. In the first half of 2025, DDoS-capable botnet nodes numbered over 1.2 million, fueling attacks that disrupted services globally. DDoS incidents surged 358% year-over-year in Q1 2025, with 20.5 million attacks blocked, many originating from known botnets responsible for 71% of HTTP-based DDoS efforts in Q2 2025. These botnet-driven operations exploit vulnerabilities in IoT devices and endpoints, amplifying attack volumes to terabits per second and causing economic damages exceeding hundreds of millions per major incident.

Manipulation of Social Platforms

![Twitter bots activity on November 13, 2016][float-right] Internet bots manipulate social platforms by automating behaviors that mimic human users, thereby influencing trends, opinions, and information flow through artificial amplification and targeted dissemination. Coordinated bot networks exploit algorithmic recommendations favoring high-engagement content, creating illusory consensus or virality for specific narratives. Empirical data from cross-platform analyses reveal that bots generate about 20% of chatter on global events, systematically differing from human patterns in timing, volume, and content focus. In electoral contexts, bots distort public discourse by inflating partisan signals. During the 2016 U.S. presidential election, automated accounts produced up to 20% of tweets on candidate-related hashtags, with studies showing they negatively affected discussion quality by prioritizing sensationalism over substantive exchange. Analysis of shared links indicated bots disproportionately disseminated articles from low-credibility sources, amplifying their reach beyond organic human sharing. Such tactics, including rapid retweeting and hashtag hijacking, simulate grassroots momentum, as evidenced by elevated bot activity spikes correlating with peak human engagement periods. Beyond elections, bots reinforce perceptual biases and agenda-setting. Exposure to bot-generated content leads users to overestimate bot prevalence and influence, exacerbating polarization through selective amplification of aligned viewpoints. In policy arenas, like discussions of China's dual-carbon goals, bots shape issue networks by bridging or dominating legacy media signals, steering public attention toward operator-preferred frames. These operations often involve botnets—clusters of scripted accounts—that evade detection via behavioral mimicry, sustaining long-term narrative control despite platform countermeasures. Detection challenges persist due to evolving , with recent models highlighting bots' in disinformation cascades during crises like , where they escalated spread at rates exceeding contributors. Manipulation extends to commercial , such as bombing or trend fabrication, but political applications predominate in documented cases, underscoring bots' in causal influence over perceptions without coordination.

Resource Consumption and Denial of Service

Internet bots contribute to resource consumption by generating excessive traffic that depletes server bandwidth, CPU cycles, and memory, often rendering services unavailable to legitimate users. In denial-of-service (DoS) scenarios, coordinated botnets amplify this effect through distributed requests, overwhelming targets without necessarily exploiting vulnerabilities. This mechanism exploits the finite nature of computing resources, where even legitimate-looking HTTP requests can exhaust connection pools or processing queues, leading to degraded performance or complete outages. Botnets, networks of compromised devices controlled remotely, exemplify this threat by scaling attacks to terabit-per-second volumes. The Mirai botnet, active since , infected unsecured (IoT) devices such as cameras and routers to launch DDoS floods; for instance, it generated 623 Gbps against security researcher ' website on , , saturating upstream providers and causing prolonged . Similarly, the October assault on DNS provider Dyn using Mirai peaked at over 1 Tbps, disrupting access to major sites including and by exhausting capacity. These incidents highlight how bots hijack everyday devices—estimated at millions in large botnets—to proxy traffic, evading single-source mitigation while consuming victim infrastructure. Recent trends underscore escalating scale and frequency, with 71% of HTTP DDoS attacks in Q2 2025 originating from identifiable botnets, enabling rapid but resource-intensive floods. Cloudflare reported blocking 20.5 million DDoS events in Q1 2025 alone, a 358% increase year-over-year, many driven by botnet-orchestrated volumetric assaults that spike bandwidth usage to 5.6 Tbps in record cases. Beyond raw volume, application-layer bots induce exhaustion via slowloris-style techniques, holding connections open to monopolize server sockets without high traffic, as seen in persistent bad bot campaigns consuming up to 32% of site resources in 2023 analyses. Such tactics not only deny service but inflate operational costs, with affected entities facing elevated hosting fees from sustained overload.

Societal Implications

Interactions Between Humans and Bots

Humans often engage with internet bots through conversational interfaces, such as chatbots, where bots routine queries to provide rapid responses. In , 37% of businesses utilized chatbots for , responses faster than agents in many cases. Approximately 80% of users reported positive experiences with chatbot interactions, though 62% preferred bots over waiting for agents in non-urgent scenarios. However, preferences shift based on ; for instance, users favor agents when expressing or , while opting for bots in discussions of embarrassing topics to avoid . In social media environments, interactions frequently involve bots mimicking human users to engage in discussions, retweet content, or form networks, complicating human discernment. Studies indicate that humans struggle to differentiate bots from genuine accounts, with behavioral analyses revealing consistent differences in posting patterns—such as bots producing 20% of global event chatter—yet failing to enable reliable manual detection. Exposure to such bots can amplify perceptual biases, widening gaps in self-perceived immunity to misinformation and elevating threat perceptions among humans. Moreover, interactions with non-cooperative bots spill over into reduced cooperation in subsequent human-human exchanges, as observed in controlled experiments. Bots influence human decision-making by simulating social cues, leading to persistent irrational herding behaviors even when users know they interact with automated agents. Extensive reliance on AI chatbots has been linked to deepened feelings of loneliness, particularly when user behaviors prioritize AI over human socialization. In online communities, human perceptions of bots—ranging from tools to deceptive entities—shape interaction dynamics, with reciprocity levels dropping compared to human counterparts due to perceived lack of genuine intent. These effects underscore causal pathways where bot-driven amplification of emotional or networked content disrupts typical human social processing.

Impacts on Information Ecosystems

Internet bots profoundly shape information ecosystems by automating content generation, dissemination, and interaction at volumes that dwarf human capabilities, thereby altering the perceived balance and authenticity of online discourse. Over half of global internet traffic originates from bots as of recent analyses, enabling them to inflate engagement metrics, manipulate trending topics, and skew algorithmic feeds toward certain narratives. This scale facilitates the creation of artificial consensus, where bot-driven amplification simulates grassroots support or outrage, distorting users' exposure to diverse viewpoints and fostering echo chambers. Social bots, in particular, accelerate misinformation propagation by targeting human influencers and injecting low-credibility content into networks, as evidenced in studies of platforms like Twitter during crises such as the COVID-19 pandemic. Research from 2018 demonstrated that bots preferentially boost negative and inflammatory material, increasing users' encounters with polarizing content by up to 27% in experimental settings, which heightens emotional chaos and network instability during public opinion surges. Such dynamics erode trust in digital information, as bots exploit semantic similarities to human posts while evading detection, leading to broader societal skepticism toward online sources. Bots also introduce entropy into discourse predictability, with information-theoretic models showing reduced stability in conversations infiltrated by automated accounts, complicating organic opinion formation. In polarized environments, even modest bot deployments—comprising less than 1% of participants—can elevate specific stories to millions of views, prioritizing sensationalism over factual accuracy and amplifying ideological silos. While some bots serve constructive roles, such as disseminating verified news alerts or countering falsehoods, empirical evidence indicates these are outnumbered by manipulative instances that degrade ecosystem integrity. The cumulative effect manifests in heightened to coordinated campaigns, where bots flood feeds to sway perceptions on issues like elections or , as observed in global analyses revealing their in hijacking and . Scholarly consensus underscores that without robust detection, these influences perpetuate a feedback loop: distorted inputs yield biased algorithms, which in turn reinforce skewed behaviors, entrenching divisions in the .

Political and Ideological Influences

Internet bots exert political influence by amplifying selected narratives, simulating grassroots support, and distorting online discourse to favor specific ideologies or agendas. Empirical analyses indicate that bots can significantly shape public opinion dynamics, often without direct human-bot interactions, through algorithmic amplification on platforms like Twitter. For instance, during the 2016 U.S. presidential election, automated accounts disseminated a disproportionate volume of content from ideologically aligned sources, including fake news, comprising up to 25% of traffic for certain low-credibility domains. Studies confirm that such bot activity negatively impacted democratic discussion by prioritizing sensationalism over factual exchange. State actors have systematically deployed bots for propaganda, as evidenced by Russian operations. In July 2024, the U.S. Department of Justice disrupted a Kremlin-backed bot farm employing AI to generate over 900 accounts impersonating Americans, promoting pro-Russia narratives on Ukraine and domestic U.S. issues. This network produced nearly 2 million posts since 2022, illustrating causal mechanisms where bots flood platforms to normalize state-favored views. Similar tactics appear in other regimes, where computational propaganda bots create artificial consensus or suppress dissent. Non-state actors also leverage bots ideologically, often mirroring partisan divides. A October 2024 investigation revealed an AI-driven network of Republican-aligned accounts on X (formerly ) posing as authentic users to advocate for Trump and conservative causes, generating thousands of posts to sway sentiment. Perceptions of bot influence exhibit ideological bias: individuals are more prone to label counter-ideological content as bot-generated, exacerbating polarization independent of actual automation levels. Neutral bot experiments further reveal platform algorithms favoring certain ideological clusters, indirectly boosting bots aligned with prevailing network effects. In global contexts, bots intensify ideological tensions, such as anti-vaccine campaigns where automated sways users toward fringe views. Authoritarian governments employ bots for control, while democracies face domestic to fabricate for candidates or policies. These influences persist due to detection challenges, with bots evolving via AI to mimic , underscoring the need for causal over anecdotal claims in assessing true impact.

Countermeasures and Challenges

Detection Technologies

Detection of internet bots relies on a combination of heuristic, behavioral, and machine learning-based approaches to differentiate automated traffic from human activity. Heuristic methods analyze static features such as IP addresses, user-agent strings, and request frequencies, flagging anomalies like high-volume requests from single sources or mismatched browser fingerprints. These techniques provide initial filtering but are increasingly evaded by bots that rotate proxies or emulate legitimate headers. Behavioral analysis examines dynamic user interactions, including mouse movements, keystroke dynamics, session durations, and navigation patterns, which automated scripts typically reproduce imperfectly due to deterministic programming. Client-side JavaScript challenges, such as canvas fingerprinting or timing-based proofs of human effort, further probe for inconsistencies in rendering or event handling. Honeypots—hidden form fields or links invisible to legitimate users—trap bots that interact with all page elements indiscriminately. Machine learning has emerged as a dominant paradigm, with supervised models trained on labeled datasets of bot and human traffic to classify based on aggregated features like entropy in request sequences or deviation from normal distributions. Semi-supervised and unsupervised variants adapt to unlabeled data, detecting outliers in real time without exhaustive retraining. For example, Cloudflare deployed a machine learning model in June 2024 specifically targeting bots leveraging residential IP proxies, achieving improved accuracy by incorporating proxy-specific behavioral signals. Recent integrations of deep learning enable per-customer anomaly detection, tailoring models to site-specific baselines for enhanced precision amid rising AI-driven bot evasion. Despite these advances, detection faces an ongoing , as bots employ to optimize evasion strategies, mimicking more effectively. Malicious bots accounted for more than one-third of global in 2025, underscoring the scale of the challenge and the need for hybrid systems combining multiple layers to minimize false positives while maintaining . Privacy considerations limit invasive monitoring, prompting shifts toward and aggregated in regulatory-compliant frameworks.

Mitigation Strategies

Bot mitigation strategies primarily involve layered technical defenses that differentiate automated scripts from human users, often combining rule-based, behavioral, and AI-driven techniques to minimize false positives while maximizing efficacy against evolving threats. These approaches are deployed at the application layer, network perimeter, or via specialized services from providers like Cloudflare and Imperva. Effective implementation requires continuous adaptation, as bots increasingly employ headless browsers, residential proxies, and AI to mimic human behavior, with bad bot traffic comprising up to 32% of internet activity in recent analyses. Challenge-response mechanisms, such as CAPTCHAs and execution tests, compel clients to solve puzzles or render dynamic that simple bots fail, thereby filtering out rudimentary scrapers and stuffers. Advanced , including invisible reCAPTCHAs or proof-of-work challenges, reduce user while targeting resource-intensive computations beyond most bot capabilities. However, sophisticated bots using to solve CAPTCHAs have prompted hybrid systems integrating multiple signals. Behavioral and scrutinize session patterns, including trajectories, , , and request sequencing, to flag anomalies indicative of . Tools from Akamai and employ heuristics and statistical models to score ; for instance, request intervals or absence of pauses often signal bots. Device and browser fingerprinting complements this by aggregating passive signals like canvas rendering, support, and plugin inventories to generate persistent , enabling tracking across sessions without . Machine learning classifiers, trained on vast datasets of labeled traffic, predict bot likelihood by processing features from headers, payloads, and temporal metadata, achieving detection rates exceeding 99% for known patterns in enterprise deployments. Rate limiting and IP reputation systems throttle or block sources exhibiting excessive volume, such as repeated logins from data centers, while whitelisting verified good bots like search crawlers via robots.txt directives—though the latter offers no enforcement against non-compliant actors. Web application firewalls (WAFs) embed these into rule sets, dynamically challenging suspicious traffic from known malicious providers or outdated user agents.
  • Proactive blocking: Deny access from proxy services, Tor exits, and hosting IPs associated with abuse, reducing attack surfaces by up to 50% in observed cases.
  • API protections: Enforce token-based authentication, payload validation, and anomaly detection for endpoints vulnerable to scraping or DDoS amplification.
  • Monitoring integration: Analytics dashboards track bot ratios post-mitigation, enabling iterative refinement; for example, Cloudflare's Bot Management reports evasion attempts to inform rule updates.
As AI enhances bot sophistication—evident in a 2025 surge of generative models automating evasion—mitigation evolves toward ensemble methods fusing human oversight with autonomous responses, though over-reliance on any single technique risks obsolescence.

Regulatory Approaches

In the United States, regulatory efforts targeting internet bots emphasize disclosure to curb deception in online interactions, particularly on social media and consumer-facing platforms. California's Bolstering Online Transparency (BOT) Act (SB 1001), signed into law on September 29, 2018, and operative from July 1, 2019, prohibits any person from using a bot to communicate or interact with another person in the state online if the intent is to mislead about the bot's artificial nature, requiring deployers to clearly disclose the automated identity in such cases. Exceptions apply to bots used for public interest research or without commercial intent. Federally, the proposed Bot Disclosure and Accountability Act (S. 2125) of 2019 would have mandated social media providers to enforce policies requiring users to disclose automated software applications, but it failed to advance beyond introduction. The Federal Trade Commission enforces against bot-facilitated deception under Section 5 of the FTC Act, as reinforced by its August 2024 final rule banning fake or AI-generated reviews and endorsements, which explicitly covers social media bots fabricating consumer feedback. State-level expansions in 2025 have addressed advanced AI-driven bots resembling chat companions. California's SB 243, enacted in 2025, requires operators of companion chatbots to their AI nature to users, implement age-appropriate safeguards prohibiting exposure of minors to , and prompt minors to take breaks every three hours. New York followed in May 2025 with mandating measures and disclosures for AI companions to prevent harmful interactions. These measures build on earlier precedents but remain fragmented, lacking a comprehensive federal framework, which has prompted calls for uniform standards amid concerns over enforcement against evasive bot operators. In the European Union, the Digital Services Act (DSA, Regulation (EU) 2022/2065), adopted on October 19, 2022, and fully enforceable for very large online platforms since August 2023, obligates intermediaries to identify and mitigate systemic risks from bot usage, including inauthentic accounts and automated amplification of . Platforms must conduct annual risk assessments and deploy reasonable measures against manipulative bots, with fines up to 6% of global turnover for non-compliance; for instance, the DSA targets deceptive practices like bot-orchestrated fake engagement on services exceeding 45 million users. The complementary EU AI Act, effective from August 2024, classifies certain bot systems as high-risk if used for behavioral manipulation or biometric categorization, imposing transparency and conformity requirements. These rules prioritize platform accountability over direct bot bans, reflecting a risk-based approach, though critics note enforcement relies heavily on self-reporting and may struggle with cross-border bot networks. Globally, regulatory approaches vary, with limited harmonization; for example, while the DSA influences non-EU platforms via its market access provisions, jurisdictions like the UK draw from similar transparency principles post-Brexit but lack equivalent mandates. Challenges include bots' rapid evolution via AI integration, jurisdictional gaps, and balancing regulation against innovation, as undetected bots continue to evade disclosure through mimicry of human behavior. Empirical data from platform reports indicate that disclosure laws reduce overt deception but have limited impact on sophisticated, non-commercial bots used for influence operations.

Surge in Bot Traffic (2020s)

In the early 2020s, automated bot traffic on the internet began surpassing human-generated activity at an accelerating rate, driven by advancements in artificial intelligence and expanded use cases for web scraping. By 2023, bots accounted for 49.6% of global web traffic, marking a 2% increase from the prior year and the highest level recorded since systematic tracking began in 2013. This upward trend continued into 2024, when total bot traffic exceeded 50% for the first time, with automated activity comprising 51% of all internet traffic according to Imperva's analysis of over 500 billion daily web requests. Bad bots—those engaged in malicious activities such as credential stuffing, content scraping, and denial-of-service attacks—rose to 37% of total traffic in 2024, up from 32% the previous year, reflecting a sixth consecutive annual increase in harmful automation. The surge correlates with the proliferation of generative AI models requiring vast datasets for training, leading to a spike in sophisticated crawler bots. Large language model (LLM) scrapers, for instance, quadrupled in volume across monitored networks in 2025, rising from 2.6% to over 10% of verified bot traffic, as AI firms aggressively harvest public web content to fuel model development. Earlier in the decade, bot traffic hovered around 37-40% (e.g., 37.2% in 2019), but post-2020 growth accelerated due to cheaper computational resources and AI-driven automation tools, enabling easier deployment of bots for both legitimate indexing and illicit purposes like fraud. Akamai's observations align, noting bots at 42% of web traffic by mid-decade, with nearly two-thirds classified as malicious, often evading detection through human-like behaviors mimicking real user patterns. This escalation has strained web infrastructure, with sectors like gaming and e-commerce experiencing disproportionate impacts; for example, bad bot activity in gaming reached 57.2% of in 2023, facilitating exploits such as account takeovers. While good bots (e.g., crawlers) constitute a minority—around 14% in 2024—the dominance of automated underscores vulnerabilities in and content , prompting heightened in bot solutions.

Integration with Advanced AI

The integration of advanced , particularly large models (LLMs) and generative AI techniques, into bots has markedly their capabilities since the early , shifting them from rigid, script-driven to adaptive systems capable of , contextual reasoning, and behavioral . This leverages models trained on datasets to enable bots to generate coherent, human-like text, images, or actions in real-time, facilitating applications in , , , and adversarial operations. For instance, LLMs allow bots to dynamically respond to user queries or environmental changes, reducing reliance on predefined rules and improving in tasks like data extraction or simulated interactions. Empirical data from cybersecurity analyses indicate a surge in AI-enhanced bot prevalence; the 2025 Imperva Bad Bot Report documents that AI-driven bots accounted for 51% of global internet traffic in 2024, exceeding human traffic and comprising 55% of attacks classified as moderate or advanced in sophistication. These bots employ AI to evade detection through techniques such as variable browsing patterns, natural language queries, and session persistence, complicating traditional mitigation efforts. In web ecosystems, LLM-powered scrapers—deployed by AI agents for data collection to fine-tune models—have distorted traffic analytics, with platforms reporting up to 30-50% inflation from such automated crawls in high-value domains. On social media platforms, AI-integrated bots amplify engagement metrics by automating replies, shares, and trend amplification, as evidenced by a 2025 study from the University of Notre Dame's Mendoza College of Business, which found bots increased post interactions by 20-40% but suppressed meaningful human-to-human discourse by prioritizing superficial volume over depth. Similarly, research published by INFORMS in October 2025 revealed that AI bots boosted individual post visibility without elevating platform-wide activity, often through coordinated persona-based behaviors. Controlled simulations further highlight risks: a 2025 experiment by researchers at the University of Zurich populated a mock social network with 500 LLM-driven bots assigned diverse personas, resulting in rapid clique formation, echo chamber reinforcement, and emergent toxicity within hours, including polarized rhetoric and misinformation cascades. This integration extends to agentic AI frameworks, where bots operate as semi-autonomous agents capable of multi-step planning and tool usage, as observed in deployments from December 2022 to June 2025, with regional spikes in AI bot activity correlating to LLM accessibility via APIs from providers like OpenAI and Anthropic. While enabling scalable automation—such as in e-commerce recommendation engines or fraud detection—these advancements lower barriers for malicious actors, as basic AI tools democratize sophisticated attacks previously requiring expert coding. Peer-reviewed analyses underscore that without robust behavioral modeling, such bots can bypass platform safeguards, with a 2024 University of Notre Dame study showing AI bots evading content filters on eight major social networks through iterative prompt engineering.

Potential Evolutions

Internet bots are to constitute up to 90% of by the end of the , driven by advancements in that enable more autonomous and interactive behaviors, surpassing current levels where bots already exceed 50% of global as of 2024. This shift aligns with the "dead ," positing a future where bot-to-bot interactions predominate, transforming online ecosystems into automated markets or competitive races rather than human-centric spaces. AI integration will likely enhance bot capabilities in natural language processing and decision-making, allowing chatbots and social media agents to simulate human-like nuance and evade traditional detection methods through adaptive learning. For instance, large language model-powered scrapers and crawlers, such as those from OpenAI's GPTBot, have shown exponential growth—up 305% year-over-year from May 2024 to May 2025—enabling real-time content retrieval and training data aggregation that could evolve into predictive, personalized bot networks. These developments may foster swarm-like bot behaviors, where coordinated agents perform complex tasks like content generation or influence campaigns with minimal human oversight. In social media contexts, bots could transition from basic automation to AI-driven entities that dominate engagement and content curation, potentially amplifying misinformation or commercial influence through sophisticated interaction patterns. Retrieval-focused AI bots, activated on-demand for user queries, may further blur lines between search and generation, raising challenges for content origin verification as bots increasingly produce derivative material. While regulatory efforts might constrain malicious applications, technological momentum—evidenced by AI crawlers comprising nearly 80% of AI bot traffic in recent analyses—suggests rapid iteration toward more resilient, decentralized forms resistant to centralized controls. Positive evolutions could include collaborative bots for scientific data processing or personalized assistance, though empirical trends indicate a higher risk of adversarial uses outpacing beneficial ones due to lower barriers for deployment.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.