Direct manipulation interface
View on WikipediaThis article includes a list of general references, but it lacks sufficient corresponding inline citations. (December 2011) |
In computer science, human–computer interaction, and interaction design, direct manipulation is an approach to interfaces which involves continuous representation of objects of interest together with rapid, reversible, and incremental actions and feedback.[1] As opposed to other interaction styles, for example, the command language, the intention of direct manipulation is to allow a user to manipulate objects presented to them, using actions that correspond at least loosely to manipulation of physical objects. An example of direct manipulation is resizing a graphical shape, such as a rectangle, by dragging its corners or edges with a mouse.
Having real-world metaphors for objects and actions can make it easier for a user to learn and use an interface (some might say that the interface is more natural or intuitive), and rapid, incremental feedback allows a user to make fewer errors and complete tasks in less time, because they can see the results of an action before completing the action, thus evaluating the output and compensating for mistakes.
The term was introduced by Ben Shneiderman in 1982 within the context of office applications and the desktop metaphor.[2][3] Individuals in academia and computer scientists doing research on future user interfaces often put as much or even more stress on tactile control and feedback, or sonic control and feedback than on the visual feedback given by most GUIs. As a result, the term has been more widespread in these environments.[citation needed]
In the contrast to WIMP/GUI interfaces
[edit]Direct manipulation is closely associated with interfaces that use windows, icons, menus, and a pointing device (WIMP GUI) as these almost always incorporate direct manipulation to at least some degree. However, direct manipulation should not be confused with these other terms, as it does not imply the use of windows or even graphical output. For example, direct manipulation concepts can be applied to interfaces for blind or vision-impaired users, using a combination of tactile and sonic devices and software.
Compromises to the degree to which an interface implements direct manipulation are frequently seen. For some examples, most versions of windowing interfaces allow users to reposition a window by dragging it with the mouse. In early systems, redrawing the window while dragging was not feasible due to computational limitations. Instead, a rectangular outline of the window was drawn while dragging. The complete window contents were redrawn once the user released the mouse button.
In computer graphics
[edit]Because of the difficulty of visualizing and manipulating various aspects of computer graphics, including geometry creation and editing, animation, the layout of objects and cameras, light placement, and other effects, direct manipulation is a significant part of 3D computer graphics. There is standard direct manipulation widgets as well as many unique widgets that are developed either as a better solution to an old problem or as a solution for a new and/or unique problem. The widgets attempt to allow the user to modify an object in any possible direction while also providing easy guides or constraints to allow the user to easily modify an object in the most common directions, while also attempting to be as intuitive as to the function of the widget as possible. The three most ubiquitous transformation widgets are mostly standardized and are:
- the translation widget, which usually consists of three arrows aligned with the orthogonal axes centered on the object to be translated. Dragging the center of the widget translates the object directly underneath the mouse pointer in the plane parallel to the camera plane, while dragging any of the three arrows translates the object along the appropriate axis. The axes may be aligned with the world-space axes, the object-space axes, or some other space.
- the rotation widget, which usually consists of three circles aligned with the three orthogonal axes, and one circle aligned with the camera plane. Dragging any of the circles rotates the object around the appropriate axis while dragging elsewhere will freely rotate the object (virtual trackball rotation).
- the scale widget, which usually consists of three short lines aligned with the orthogonal axes terminating in boxes, and one box in the center of the widget. Dragging any of the three axis-aligned boxes effects a non-uniform scale along solely that axis, while dragging the center box effects a uniform scale on all three axes at once.
Depending on the specific standard uses of an object, different kinds of widgets may be used. For example, a light in computer graphics is, like any other object, also defined by a transformation (translation and rotation), but it is sometimes positioned and directed simply with its endpoint positions. This is because it may be more intuitive to define the location of the light source and then define the light's target, rather than rotating it around the coordinate axes to point it at a known position.
Other widgets may be unique for a particular tool, such as edge controls to change the cone of a spotlight, points and handles to define the position and tangent vector for a spline control point, circles of variable size to define a blur filter width or paintbrush size, IK targets for hands and feet, or color wheels and swatches for quickly choosing colors. Complex widgets may even incorporate some from scientific visualization to efficiently present relevant data (such as vector fields for particle effects or false color images to display vertex maps).
Direct manipulation, as well as user interface design in general, for 3D computer graphics tasks, is still an active area of invention and innovation. The process of generating CG images is not considered to be intuitive or easy in comparison to the difficulty of what the user wants to do, especially for complex and less common tasks. The user interface for word processing, for example, is commonly used. It is easy to learn for new users and is sufficient for most word processing purposes, so it is a mostly solved and standardized UI. However, the user interfaces for 3D computer graphics are usually either challenging to learn and use or not sufficiently powerful for complex tasks, so direct manipulation and user interfaces will vary wildly from application to application.
See also
[edit]References
[edit]- ^ Kwon, Bum chul; Wagas Javed; Niklas Elmgvist; Ji Soo Yi (May 2011). "Direct manipulation through surrogate objects". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (PDF). pp. 627–636. CiteSeerX 10.1.1.400.340. doi:10.1145/1978942.1979033. ISBN 9781450302289. Archived from the original (PDF) on 2014-02-01. Retrieved 2013-06-09.
- ^ Shneiderman, Ben (1982). "The future of interactive systems and the emergence of direct manipulation". Behaviour & Information Technology. 1 (3): 237–256. doi:10.1080/01449298208914450.
- ^ Shneiderman, Ben (August 1983). "Direct Manipulation. A Step Beyond Programming Languages". IEEE Computer. 1 (8): 57–69. doi:10.1109/MC.1983.1654471. Archived from the original on 8 February 2012. Retrieved 2010-12-28.
- Frohlich, David M (1993). "The history and future of direct manipulation". Behaviour & Information Technology. 12 (6): 315–329. doi:10.1080/01449299308924396.
- Shneiderman, Ben. Designing the user interface: strategies for effective human-computer-interaction.(1987)
- Hutchins, Edwin L.. James D. Hollan, and Donald Norman.Direct manipulation interfaces. (1985)
- Shneiderman, Ben. "Direct manipulation: a step beyond programming languages," IEEE Computer 16(8) (August 1983), 57-69.
Direct manipulation interface
View on GrokipediaDefinition and History
Definition
A direct manipulation interface is a paradigm in human-computer interaction where users perform operations by directly acting on visible representations of objects, such as dragging icons, resizing windows, or selecting elements, thereby mimicking real-world manipulations in an intuitive manner. This approach emphasizes continuous visibility of the objects of interest, rapid and reversible actions that provide immediate visual feedback, and the elimination of complex syntactic commands in favor of straightforward physical or gestural interactions.[1] The term "direct manipulation" was coined by computer scientist Ben Shneiderman to describe this interaction style, distinguishing it from earlier paradigms reliant on abstract commands, menus, or programming-like syntax that require users to formulate indirect instructions.[2] In direct manipulation, the interface acts as a transparent medium, allowing users to focus on tasks rather than on learning interface-specific rules, with core elements including the persistent display of manipulable objects, instantaneous response to user inputs, and easy undo mechanisms to support error recovery.[1]Historical Development
The origins of direct manipulation interfaces trace back to early innovations in computer graphics. In 1963, Ivan Sutherland's Sketchpad system at MIT's Lincoln Laboratory introduced precursor elements through light pen interactions, allowing users to directly select, move, copy, and rotate geometric objects on a vector display, providing immediate visual feedback on a cathode-ray tube.[3] In 1968, Douglas Engelbart's "Mother of All Demos" advanced these concepts by introducing the computer mouse and demonstrating direct manipulation of on-screen elements, such as selecting and editing text within overlapping windows, laying groundwork for modern graphical interfaces.[4] The term "direct manipulation" was formally coined by Ben Shneiderman in 1983, building on emerging graphical paradigms to describe interfaces where users perform actions on visible objects with continuous representation, rapid reversibility, and immediate feedback, as outlined in his influential paper.[1] This conceptualization positioned direct manipulation as an advancement over command-line and programming-based interactions, emphasizing psychological benefits like reduced cognitive load.[3] Key milestones in adoption began with the Xerox PARC Alto computer in 1973, which implemented direct manipulation via a mouse-driven graphical interface featuring icons, windows, and bit-mapped displays for interacting with on-screen elements like documents and folders.[3] The Apple Macintosh in 1984 popularized these concepts commercially through its intuitive desktop metaphor, enabling users to drag icons and manipulate files directly with a mouse, drawing from Xerox innovations.[3] Microsoft Windows, starting with version 1.0 in 1985, further disseminated direct manipulation to personal computing mainstream, incorporating tiled windows, icons, and pointer-based operations that became standard.[3] The evolution continued into the mid-to-late 2000s with the integration of direct manipulation into web browsers, exemplified by drag-and-drop features for rearranging page elements and file uploads, facilitated by advancing JavaScript libraries and the HTML5 Drag and Drop API.[5] A significant shift occurred in 2007 with the iPhone's introduction of multi-touch direct manipulation, using gestures like pinching and swiping on capacitive screens to interact with virtual objects, extending the paradigm to mobile and touch-based systems.[6]Principles and Characteristics
Core Principles
Direct manipulation interfaces are guided by foundational principles that emphasize intuitive and visible interactions, first articulated by Ben Shneiderman in his seminal 1983 paper.[1] These principles aim to make computer interactions feel natural and accessible by leveraging visual and physical affordances over abstract commands. The first core principle is the continuous representation of the objects and actions of interest, where the system maintains a visible, persistent display of relevant elements, allowing users to interact directly within the problem domain without needing to recall hidden states.[1] This visibility reduces the cognitive burden of tracking mental models, as users can perceive and manipulate objects in real-time.[1] The second principle involves physical actions instead of complex syntax, substituting intuitive gestures—such as dragging with a mouse, joystick movements, or touchscreen selections—for verbose command languages that require memorization and precise typing.[1] By employing simple, labeled controls or mimetic inputs, this approach aligns interface operations with everyday physical manipulations, enhancing ease of use.[1] The third principle requires rapid, incremental, reversible operations with immediate feedback, ensuring that actions produce quick, visible effects on the target objects, which users can undo effortlessly to experiment without fear of irreversible errors.[1] This immediacy fosters a sense of control and predictability, as outcomes are perceptually verifiable rather than inferred from textual confirmations.[1] Building on these, additional principles include intuitive mapping between actions and outcomes, where interface actions closely mimic their real-world counterparts to achieve articulatory directness, allowing users to specify intentions through natural gestures.[2] Consistency in interaction metaphors further supports this by employing a unified model-world representation, ensuring that manipulations behave reliably like physical objects across the interface.[2] Finally, support for user control empowers individuals to directly engage with and govern the system state, creating a qualitative sense of agency over digital elements.[2] Theoretically, these principles are rooted in cognitive psychology, drawing from concepts like Jean Piaget's concrete operational stage, where visual and manipulative experiences aid comprehension, and George Pólya's problem-solving heuristics that emphasize tangible exploration.[1] By mimicking real-world interactions through continuous visibility and perceptual feedback, direct manipulation minimizes working memory load, as users rely on external representations rather than internal simulations, thereby bridging the gulf of execution and evaluation in human-computer dialogue.[2]Key Characteristics
Direct manipulation interfaces are distinguished by several core features that emphasize intuitiveness and user control in interaction design. These characteristics, first articulated by Ben Shneiderman, include visibility of objects and actions, directness in manipulation, immediate feedback, and reversibility of operations.[1] Visibility ensures that the objects of interest and possible actions are continuously represented on the screen, making the interface state transparent and avoiding hidden or obscured elements. This allows users to maintain a clear mental model of the system's current configuration without needing to recall abstract commands or menus. For instance, in graphical file managers, files appear as draggable icons whose positions and properties are always apparent, reducing cognitive load by aligning the interface with real-world object permanence.[1] Directness involves users interacting with visual representations of objects through straightforward physical actions, such as pointing, dragging, or resizing, rather than typing indirect commands or syntax. This approach replaces complex language-based inputs with intuitive gestures that mimic real-world handling, fostering a sense of agency. An example is selecting and moving a digital photo in an image editor by directly clicking and dragging it across the canvas, which bypasses the need for sequential instructions.[1] Feedback provides real-time, perceptible responses to user actions, often through visual animations, sounds, or haptic cues that confirm the operation's effect immediately. This instantaneous confirmation helps users gauge the success of their input and adjust on the fly, enhancing predictability. For example, when resizing a window, the borders snap and the content reflows visibly in response to the drag, signaling the change without delay.[1] Reversibility incorporates mechanisms like undo and redo functions that allow users to easily reverse actions, promoting experimentation and reducing the fear of irreversible errors. These features enable incremental trial-and-error without permanent consequences, supporting a forgiving interaction flow. In practice, this might manifest as a multi-level undo stack in drawing software, where a misplaced stroke can be retracted instantly to restore the previous state.[1]Comparison to Other Interfaces
Versus Command-Line Interfaces
Direct manipulation interfaces fundamentally differ from command-line interfaces (CLIs) in their interaction model. In direct manipulation, users perform visual-spatial actions, such as dragging icons or resizing objects on screen, to directly interact with representations of data, providing continuous visibility and immediate feedback without relying on abstract syntax.[1] In contrast, CLIs require sequential text input of commands, which the system parses to execute operations, often demanding precise syntax and lacking real-time visual cues during input.[7] This visual-spatial approach in direct manipulation mimics physical object handling, reducing the cognitive distance between user intent and system response, whereas CLIs impose a layer of translation through textual commands.[8] Regarding user requirements, direct manipulation significantly lowers the learning curve for novices by eliminating the need to memorize commands and syntax, allowing intuitive exploration through demonstration and immediate reversibility of actions.[1] A 1987 study comparing file manipulation on the Apple Macintosh (direct manipulation) and IBM PC with MS-DOS (CLI) found that novices completed tasks faster (4.80 minutes vs. 5.77 minutes on average) and made fewer errors (0.80 vs. 2.03) with direct manipulation, attributing this to its reduced memory demands and visual feedback.[9] For experts, however, CLIs offer superior precision through granular command options and switches, enabling exact control over operations that might be cumbersome in visual interfaces.[10] Additionally, CLIs excel in scripting and automation, allowing complex, repetitive tasks to be encoded in reusable text files, which direct manipulation interfaces typically do not support as efficiently.[10] Historically, CLIs dominated in the 1970s with systems like UNIX, where text-based terminals and tools such asls and grep formed the core interaction paradigm, reflecting the era's hardware limitations and focus on efficient, programmatic control.[11] The rise of direct manipulation in the 1980s marked a shift toward graphical user interfaces (GUIs), exemplified by innovations like the Xerox Star and Apple Macintosh, which prioritized visual metaphors to broaden accessibility beyond expert users.[1] This transition addressed CLI's barriers for non-technical users while preserving CLI's role in backend and expert workflows.[11]
In terms of suitability, direct manipulation is ideal for exploratory tasks where users need to iteratively manipulate and visualize objects, such as browsing files or editing documents, due to its supportive feedback and low error risk.[7] Conversely, CLIs are better suited for batch operations and automation, where scripting enables rapid execution of precise, high-volume actions without the overhead of visual rendering.[7]