Recent from talks
Nothing was collected or created yet.
Web query
View on WikipediaA web query or web search query is a query that a user enters into a web search engine to satisfy their information needs. Web search queries are distinctive in that they are often plain text and boolean search directives are rarely used. They vary greatly from standard query languages, which are governed by strict syntax rules as command languages with keyword or positional parameters.
Types
[edit]There are three broad categories that cover most web search queries: informational, navigational, and transactional.[1] These are also called "do, know, go."[2] Although this model of searching was not theoretically derived, the classification has been empirically validated with actual search engine queries.[3]
- Informational queries – Queries that cover a broad topic (e.g., colorado or trucks) for which there may be thousands of relevant results.
- Navigational queries – Queries that seek a single website or web page of a single entity (e.g., youtube or delta air lines).
- Transactional queries – Queries that reflect the intent of the user to perform a particular action, like purchasing a car or downloading a screen saver.
Search engines often support a fourth type of query that is used far less frequently:
- Connectivity queries – Queries that report on the connectivity of the indexed web graph (e.g., Which links point to this URL?, and How many pages are indexed from this domain name?).[4]
Characteristics
[edit]
Most commercial web search engines do not disclose their search logs, so information about what users are searching for on the Web is difficult to come by.[5] Nevertheless, research studies started to appear in 1998.[6][7] A 2001 study,[8] which analyzed the queries from the Excite search engine, showed some interesting characteristics of web searches:
- The average length of a query was 2.4 terms.
- About half of the users entered a single query while a little less than a third of users entered three or more unique queries.
- Close to half of the users examined only the first one or two pages of results (10 results per page).
- Less than 5% of users used advanced search features (e.g., boolean operators like AND, OR, and NOT).
- The top four most frequently used terms were (empty search), and, of, and sex.
A study of the same Excite query logs revealed that 19% of the queries contained a geographic term (e.g., place names, zip codes, geographic features, etc.).[9]
Studies also show that, in addition to short queries (queries with few terms), there are predictable patterns of how users change their queries.[10]
A 2005 study of Yahoo's query logs revealed that 33% of the queries from the same users were repeat queries and that in 87% of cases the user would click on the same result.[11] This suggests that many users use repeat queries to revisit or re-find information. This analysis is confirmed by a Bing search engine blog post which stated that about 30% of queries are navigational queries.[12]
In addition, research has shown that query term frequency distributions conform to the power law, or long tail distribution curves. That is, a small portion of the terms observed in a large query log (e.g. > 100 million queries) are used most often, while the remaining terms are used less often individually.[13] This example of the Pareto principle (or 80–20 rule) allows search engines to employ optimization techniques such as index or database partitioning, caching and pre-fetching. In addition, studies have been conducted into linguistically-oriented attributes that can recognize if a web query is navigational, informational or transactional.[14]
A 2011 study found that the average length of queries had grown steadily over time and the average length of non-English language queries had increased more than English ones.[15] Google implemented the hummingbird update in August 2013 to handle longer search queries since more searches are conversational (e.g. "where is the nearest coffee shop?").[16]
Structured queries
[edit]With search engines that support Boolean operators and parentheses, a technique traditionally used by librarians can be applied. A user who is looking for documents that cover several topics or facets may want to describe each of them by a disjunction of characteristic words, such as vehicles OR cars OR automobiles. A faceted query is a conjunction of such facets; e.g. a query such as (electronic OR computerized OR DRE) AND (voting OR elections OR election OR balloting OR electoral) is likely to find documents about electronic voting even if they omit one of the words "electronic" or "voting", or even both.[17]
See also
[edit]- Information retrieval – Finding information for an information need
- Taxonomy for search engines
- User intent – A user's goal in making a search query
- Web query classification
- Web search engine
References
[edit]- ^ Broder, A. (2002). A taxonomy of Web search. SIGIR Forum, 36(2), 3–10.
- ^ Gibbons, Kevin (2013-01-11). "Do, Know, Go: How to Create Content at Each Stage of the Buying Cycle". Search Engine Watch. Retrieved 24 May 2014.
- ^ Jansen, B. J., Booth, D., and Spink, A. (2008) Determining the informational, navigational, and transactional intent of Web queries, Information Processing & Management. 44(3), 1251-1266.
- ^ Moore, Ross. "Connectivity servers". Cambridge University Press. Retrieved 24 May 2014.
- ^ Dawn Kawamoto and Elinor Mills (2006), AOL apologizes for release of user search data
- ^ Jansen, B. J., Spink, A., Bateman, J., and Saracevic, T. 1998. Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1), 5 -17.
- ^ Silverstein, C., Henzinger, M., Marais, H., & Moricz, M. (1999). Analysis of a very large Web search engine query log. SIGIR Forum, 33(1), 6–12.
- ^ Amanda Spink; Dietmar Wolfram; Major B. J. Jansen; Tefko Saracevic (2001). "Searching the web: The public and their queries" (PDF). Journal of the American Society for Information Science and Technology. 52 (3): 226–234. CiteSeerX 10.1.1.23.9800. doi:10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.3.CO;2-I.
- ^ Mark Sanderson & Janet Kohler (2004). "Analyzing geographic queries". Proceedings of the Workshop on Geographic Information (SIGIR '04).
- ^ Jansen, B. J., Booth, D. L., & Spink, A. (2009). Patterns of query modification during Web searching. Journal of the American Society for Information Science and Technology. 60(3), 557-570. 60(7), 1358-1371.
- ^ Jaime Teevan; Eytan Adar; Rosie Jones; Michael Potts (2005). "History repeats itself: Repeat Queries in Yahoo's query logs" (PDF). Proceedings of the 29th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR '06). pp. 703–704. doi:10.1145/1148170.1148326.[permanent dead link]
- ^ "Bing Making search yours — Search Blog — Site Blogs — Bing Community". Archived from the original on 2011-03-14. Retrieved 2011-03-01.
- ^ Ricardo Baeza-Yates (2005). "Applications of Web Query Mining". Advances in Information Retrieval. Lecture Notes in Computer Science. Vol. 3408. Springer Berlin / Heidelberg. pp. 7–22. doi:10.1007/978-3-540-31865-1_2. ISBN 978-3-540-25295-5.
- ^ Alejandro Figueroa (2015). "Exploring effective features for recognizing the user intent behind web queries". Computers in Industry. 68. Elsevier: 162–169. doi:10.1016/j.compind.2015.01.005.
- ^ Mona Taghavi; Ahmed Patel; Nikita Schmidt; Christopher Wills; Yiqi Tew (2011). "An analysis of web proxy logs with query distribution pattern approach for search engines". Journal of Computer Standards & Interfaces. 34 (1): 162–170. doi:10.1016/j.csi.2011.07.001.
- ^ Sullivan, Danny (2013-09-26). "FAQ: All About The New Google "Hummingbird" Algorithm". Search Engine Land. Retrieved 24 May 2014.
- ^ Vojkan Mihajlović; Djoerd Hiemstra; Henk Ernst Blok; Peter M.G. Apers (October 2006). "Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness" (PDF).
{{cite journal}}: Cite journal requires|journal=(help)
Web query
View on GrokipediaFundamentals
Definition and Scope
A web query is defined as a user-submitted string of terms, keywords, or phrases entered into a search engine or web interface to articulate an information need and retrieve relevant content from the internet.[4] In the field of information retrieval, it serves as the primary input mechanism for ad hoc searches across vast, unstructured collections of web documents, distinguishing it from the information need itself.[5] Unlike structured database queries, such as those in SQL that target predefined schemas for exact matches on relational data, web queries focus on semi-structured or unstructured text like webpages, prioritizing relevance ranking over precise logical operations.[4] The scope of web queries encompasses user-facing interactions in search engines, where they drive the retrieval of ranked results from billions of indexed pages, as well as integrations in web APIs for programmatic access to search functionalities and browser address bars that blend navigation with querying.[3][6] This includes both traditional keyword-based inputs, such as entering "python programming tutorial," and modern natural language formulations, like "How do I install Python on Windows?" which leverage advanced processing to interpret intent across diverse media types including text, images, and videos.[3] Web queries are thus bounded to internet-scale, publicly accessible data, excluding internal system calls or proprietary database operations. Early web queries emerged with tools like Archie in 1990, the first internet search engine, which enabled users to query indexes of FTP file archives for resource location.[7] By the mid-1990s, the scope expanded through milestones such as AltaVista's 1995 implementation of Boolean operators (AND, OR, NOT), allowing users to construct complex expressions for refined retrieval from the growing World Wide Web.[8] These developments established web queries as dynamic, user-centric tools for navigating hyperlinked, ever-evolving online content, setting the foundation for contemporary systems.Historical Development
The evolution of web queries originated in the early 1990s with foundational tools that predated the World Wide Web's widespread adoption. In 1991, the Wide Area Information Server (WAIS) was developed as one of the first systems for full-text searching across distributed Internet databases, allowing users to query unstructured text resources remotely. Similarly, Gopher, launched the same year by the University of Minnesota, introduced a menu-driven protocol for navigating and searching file archives and documents over the Internet, facilitating early forms of information retrieval. These tools marked the initial shift toward automated querying of networked content, though they relied on limited indexing and lacked hyperlink support.[9][10] The mid-1990s saw the emergence of dedicated web search engines as the Web proliferated. In 1994, WebCrawler debuted as the first full-text search engine for the Web, crawling and indexing page content to enable keyword-based queries across thousands of sites. Concurrently, Yahoo! launched in 1994 as a human-curated directory, where users queried hierarchical categories rather than raw text, reflecting an initial reliance on manual organization amid the Web's rapid growth from fewer than 3,000 sites at the start of the year. This period's innovations were driven by the need to manage an exploding volume of online content, transitioning queries from protocol-specific tools to Web-oriented systems.[11][12] A pivotal advancement occurred in 1998 with Google's introduction of the PageRank algorithm, which revolutionized query relevance by ranking results based on hyperlink structures rather than mere keyword matches, significantly improving the accuracy of web searches. This algorithmic approach supplanted directory-based methods, propelling search engines toward scalable, automated relevance determination and establishing PageRank as a cornerstone for subsequent query processing. The growth of web queries paralleled the Internet's expansion, with user numbers rising from approximately 16 million in 1995 to over 5 billion by 2025, escalating query complexity and volume as access democratized globally.[13][14][15] The 2010s brought the integration of natural language processing (NLP) into web queries, enabling better handling of conversational and contextual inputs. A landmark was Google's 2019 rollout of BERT (Bidirectional Encoder Representations from Transformers), a deep learning model that improved search understanding of query intent through bidirectional context analysis, enabling Search to better understand approximately 10% of searches in English in the U.S. This era's NLP advancements, fueled by deep learning breakthroughs, shifted queries from rigid keywords to more intuitive, human-like expressions.[16] Post-2020 developments incorporated artificial intelligence for multimodal queries, combining text with images and other inputs. In 2023, Microsoft Bing introduced multimodal visual search in its AI-powered chat, allowing users to query using uploaded images alongside text for enhanced contextual responses, leveraging models like GPT-4 to process diverse data types. In 2024, Google expanded its AI Overviews feature, providing generative AI summaries directly in search results for a broader range of queries. By 2025, Google introduced AI Mode, enhancing search with advanced reasoning, multimodality, and agentic capabilities for deeper interactions and follow-up questions. These innovations, driven by AI progress, expanded web queries beyond text-only paradigms, addressing societal demands for richer, integrated information retrieval amid continued Internet growth.[17][18]Classification
Unstructured Queries
Unstructured queries consist of free-form text inputs entered by users into search engines without adhering to a predefined schema or formal structure, depending instead on keyword extraction, matching, and relevance-ranking algorithms to identify and retrieve pertinent web content. These queries typically take the form of natural language phrases or sequences of keywords, allowing users to express information needs in everyday language rather than rigid syntax.[19] Common examples include short, navigational phrases such as "best pizza near me," which seek location-based recommendations, or longer, informational long-tail queries like "how to fix iPhone screen 2025 model," targeting specific troubleshooting guidance.[20] Such queries dominate consumer web interactions because they mirror spontaneous human expression, facilitating broad accessibility in tools like Google Search.[21] Search engines process unstructured queries primarily through inverted indexes, data structures that map individual terms (or tokens) extracted from web documents to lists of documents containing them, enabling rapid lookup and scoring based on factors like term frequency and document relevance.[22] Key challenges in this handling include synonym resolution, where terms like "car" and "automobile" must be linked to expand query scope without predefined mappings, often leading to incomplete results if not addressed via techniques like latent semantic analysis or distributional semantics.[23] Unstructured queries account for the vast majority of web searches. This prevalence underscores their role as the default mechanism for exploratory and informational retrieval, in contrast to more rigid formats used in specialized databases.Structured Queries
Structured queries in web search incorporate predefined operators, filters, or schemas to specify retrieval criteria with greater precision than free-form text input. These queries leverage syntactic elements to constrain or refine results, such as domain restrictions or attribute-based filtering, enabling users to target specific subsets of data efficiently. For instance, Google'ssite: operator limits searches to a designated domain or subdomain, as in site:example.com keyword, which retrieves pages only from that site while incorporating the keyword.[24] Other common operators include filetype:pdf to restrict results to PDF documents or inurl:blog to find pages with "blog" in the URL.[25] In e-commerce, faceted search provides interactive filters based on product attributes like price range, brand, or color, allowing dynamic refinement of result sets without altering the core query.[26]
Common examples include Boolean queries, which use logical operators to combine terms systematically. The query "Paris AND France NOT Texas" returns documents containing both "Paris" and "France" while excluding those with "Texas", thereby narrowing scope and reducing irrelevant results.[27]
Implementations of structured queries appear prominently in enterprise search systems and modern web tools. Elasticsearch's Query DSL, introduced alongside Elasticsearch in 2010, offers a JSON-formatted language for building complex queries with filters on structured fields like dates or exact values, supporting binary logic for efficient, score-free matching that enhances performance in large-scale indexing. In browser-integrated search, Google's Programmable Search Engine utilizes refinement labels for faceted navigation.[28]
The primary advantages of structured queries lie in their ability to boost recall—by comprehensively capturing relevant items—and precision—by minimizing noise—compared to unstructured approaches. Evaluations of academic search systems demonstrate that those supporting Boolean operators and field codes, such as PubMed, achieve superior retrieval quality for systematic reviews, with structured strategies yielding higher precision and recall through reproducible, targeted Boolean functionality.[29] This targeted control proves particularly valuable in domains like scholarly literature, where precise filtering reduces screening burdens while maintaining comprehensive coverage.
