Hubbry Logo
VideotelephonyVideotelephonyMain
Open search
Videotelephony
Community hub
Videotelephony
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Videotelephony
Videotelephony
from Wikipedia

A telepresence system in 2007

Videotelephony (also known as videoconferencing or video calling or telepresense) is the use of audio and video for simultaneous two-way communication.[1] Today, videotelephony is widespread. There are many terms to refer to videotelephony. Videophones are standalone devices for video calling (compare Telephone). In the present day, devices like smartphones and computers are capable of video calling, reducing the demand for separate videophones. Videoconferencing implies group communication.[2] Videoconferencing is used in telepresence, whose goal is to create the illusion that remote participants are in the same room.

The concept of videotelephony was conceived in the late 19th century, and versions were demonstrated to the public starting in the 1930s. In April, 1930, reporters gathered at AT&T corporate headquarters on Broadway in New York City for the first public demonstration of two-way video telephony. The event linked the headquarters building with a Bell laboratories building on West Street.[3]Early demonstrations were installed at booths in post offices and shown at various world expositions. AT&T demonstrated Picturephone at the 1964 World’s Fair in New York City. In 1970, AT&T launched Picturephone as the first commercial personal videotelephone system. In addition to videophones, there existed image phones which exchanged still images between units every few seconds over conventional telephone lines. The development of advanced video codecs, more powerful CPUs, and high-bandwidth Internet service in the late 1990s allowed digital videophones to provide high-quality low-cost color service between users almost any place in the world.

Applications of videotelephony include sign language transmission for deaf and speech-impaired people, distance education, telemedicine, and overcoming mobility issues. News media organizations have used videotelephony for broadcasting.[citation needed]

History

[edit]
Videotelephony predicted to be in use by 2000, as envisioned in 1910 (artist's conception)
Videotelephone booth, 1922

Origin

[edit]

The concept of videotelephony was first conceived in the late 1870s, both in the United States and in Europe, although the basic sciences to permit its very earliest trials would take nearly a half century to be discovered.[citation needed] The prerequisite knowledge arose from intensive research and experimentation in several telecommunication fields, notably electrical telegraphy, telephony, radio, and television.

Early systems

[edit]

Simple analog videophone communication could be established as early as the invention of the television. Such an antecedent usually consisted of two closed-circuit television systems connected via coax cable or radio. An example of that was the German Reich Postzentralamt (post office) videotelephone network serving Berlin and several German cities via coaxial cables between 1936 and 1940.[4][5]

Gregorio Y. Zara was a Filipino engineer and physicist best remembered for inventing the first two-way video telephone.

Gregorio Y. Zara, a Filipino scientist, invented the first videophone in 1954, which was patented in 1955 as a "photo phone signal separator network." He is recognized as the Father of Videoconferencing for his pioneering contribution to the development of videotelephony technology.[6]

The development of videotelephony as a subscription service started in the latter half of the 1920s in the United Kingdom and the United States, spurred notably by John Logie Baird and AT&T's Bell Labs. This occurred in part, at least with AT&T, to serve as an adjunct supplementing the use of the telephone. A number of organizations believed that videotelephony would be superior to plain voice communications. Attempts at using normal telephony networks to transmit slow-scan video, such as the first systems developed by AT&T Corporation, first researched in the 1950s, failed mostly due to the poor picture quality and the lack of efficient video compression techniques.

During the first crewed space flights, NASA used two radio-frequency (UHF or VHF) video links, one in each direction. TV channels routinely use this type of videotelephony when reporting from distant locations. The news media were to become regular users of mobile links to satellites using specially equipped trucks, and much later via special satellite videophones in a briefcase. This technique was very expensive, though, and was not adopted for applications such as telemedicine, distance education, and business meetings.

Decades of research and development culminated in the 1970 commercial launch of AT&T's Picturephone service, available in select cities. However, the system was a commercial failure, chiefly due to consumer apathy, high subscription costs, and lack of network effect—with only a few hundred Picturephones in the world, users had extremely few contacts they could actually call, and interoperability with other videophone systems would not exist for decades.

Multiple user videoconferencing first being demonstrated with Stanford Research Institute's NLS computer technology (1968)
An AT&T Picturephone Model 2 from 1969

Digital

[edit]

In the 1980s, digital telephony transmission networks became possible, such as with ISDN networks. During this time, there was also research into other forms of digital video and audio communication. Many of these technologies, such as the Media space, are not as widely used today as videoconferencing but were still an important area of research.[7][8] The first dedicated systems started to appear as ISDN networks were expanding throughout the world. One of the first commercial videoconferencing systems sold to companies came from PictureTel Corp., which had an initial public offering in November, 1984.

In 1984, Concept Communication in the United States created a circuit board for standard personal computers that doubled the video frame rate of typical digital videotelephone systems from 15 to 30 frames per second, and reduced the cost from $100,000 to $12,000.[9] The company also secured a patent for a codec for full-motion videoconferencing, first demonstrated at AT&T Bell Labs in 1986.[9][10]

Global Schoolhouse students communicating via CU-SeeMe, shown here with a video frame rate between 0.9 and 3 frames per second (1993)

Very expensive videoconferencing systems continued to rapidly evolve throughout the 1980s and 1990s. Proprietary equipment, software, and network requirements gave way to standards-based technologies that were available for anyone to purchase at a reasonable cost.

While videoconferencing technology was initially used primarily within internal corporate communication networks, one of the first community service uses of the technology started in 1992 through a unique partnership with PictureTel and IBM, which at the time were promoting a jointly developed desktop based videoconferencing product known as the PCS/1. Over the next 15 years, Project DIANE (Diversified Information and Assistance Network) grew to use a variety of videoconferencing platforms to create a multi-state cooperative public service and distance education network consisting of several hundred schools, libraries, science museums, zoos and parks, and many other community-oriented organizations.[citation needed]

Transition to internet and mobile devices

[edit]

Advances in video compression allowed digital video streams to be transmitted over the Internet, which was previously difficult due to the impractically high bandwidth requirements of uncompressed video. The DCT algorithm was the basis for the first practical video coding standard that was useful for online videoconferencing, H.261, standardised by the ITU-T in 1988, and subsequent H.26x video coding standards.[11]

In 1992 CU-SeeMe was developed at Cornell by Tim Dorcey et al. In 1995 the first public videoconference between North America and Africa took place, linking a technofair in San Francisco with a techno-rave and cyberdeli in Cape Town. At the 1998 Winter Olympics opening ceremony in Nagano, Japan, Seiji Ozawa conducted the Ode to Joy from Beethoven's Ninth Symphony simultaneously across five continents in near-real-time.

The Kyocera VP-210 Visual Phone was the first commercial mobile videophone (1999).

Kyocera conducted a two-year development campaign from 1997 to 1999 that resulted in the release of the VP-210 Visual Phone, the first mobile colour videophone that also doubled as a camera phone for still photos.[12][13] The camera phone was the same size as similar contemporary mobile phones, but sported a large camera lens and a 5 cm (2 inch) colour TFT display capable of displaying 65,000 colors, and was able to process two video frames per second.[13][14]

Videotelephony was popularized in the 2000s via free Internet services such as Skype and iChat, web plugins supporting H.26x video standards, and online telecommunication programs that promoted low cost, albeit lower quality, videoconferencing to virtually every location with an Internet connection.

Videotelephony became even more widespread through the deployment of video-enabled mobile phones such as 2010s iPhone 4, plus videoconferencing and computer webcams which use Internet telephony. In the upper echelons of government, business, and commerce, telepresence technology, an advanced form of videoconferencing, has helped reduce the need to travel.[citation needed]

Additional history

[edit]

In May 2005, the first high definition videoconferencing systems, produced by Lifesize, were displayed at the Interop trade show in Las Vegas, Nevada, able to provide video at 30 frames per second with a 1280 by 720 display resolution.[15][16] Polycom introduced its first high definition videoconferencing system to the market in 2006. As of the 2010s, high-definition resolution for videoconferencing became a popular feature, with most major suppliers in the videoconferencing market offering it.

Technological developments by videoconferencing developers in the 2010s have extended the capabilities of videoconferencing systems beyond the boardroom for use with hand-held mobile devices that combine the use of video, audio and on-screen drawing capabilities broadcasting in real time over secure networks, independent of location. Mobile collaboration systems now allow people in previously unreachable locations, such as workers on an offshore oil rig, the ability to view and discuss issues with colleagues thousands of miles away. Traditional videoconferencing system manufacturers have begun providing mobile applications as well, such as those that allow for live and still image streaming.[17]

The highest ever video call (other than those from aircraft and spacecraft) took place on May 19, 2013, when British adventurer Daniel Hughes used a smartphone with a BGAN satellite modem to make a videocall to the BBC from the summit of Mount Everest, at 8,848 metres (29,029 ft) above sea level.[18]

The COVID-19 pandemic resulted in a significant increase in the use of videoconferencing. Berstein Research found that Zoom added more subscribers during the first two months of 2020 alone than in the entire year 2019. GoToMeeting had a 20 percent increase in usage, according to LogMeIn.[19] UK based StarLeaf reported a 600 percent increase in national call volumes.[20] Videoconferencing became so widespread during the pandemic that the term Zoom fatigue came to prominence, referring to the taxing nature of spending long periods of time on videocalls.[21] This fatigue refers to the psychological and physiological effects participants involved in videoconferencing.[22][23][24] One experimental study from 2021 revealed a link between camera use in videoconferencing and a prediction of fatigue occurrence an individual.[25][26] Furthermore, a 2022 article in the journal "Computers in Human Behaviour" highlighted a study linking negative attitudes with the use of "self-view" when videoconferencing.[27][28]

On 21 September 2021, Facebook launched two new versions of its Portal video-calling devices, the Portal Go and Portal Plus. The new video calling devices include the first portable variety of the hardware and number of updates.[29]

Major categories

[edit]
A modern Avaya Nortel 1535 IP model broadband videophone (2008), using VoIP
USB webcam for PC

Videotelephony can be categorized by its functionality and intended purpose, and also by its method of transmission.

Videophones were the earliest form of videotelephony, dating back to initial tests in 1927 by AT&T. During the late 1930s, the post offices of several European governments established public videophone services for person-to-person communications using dual cable circuit telephone transmission technology. In the present day, standalone videophones and UMTS video-enabled mobile phones are usually used on a person-to-person basis.

Videoconferencing saw its earliest use with AT&T's Picturephone service in the early 1970s. Transmissions were analog over short distances, but converted to digital forms for longer calls, again using telephone transmission technology. Popular corporate video-conferencing systems in the present day have migrated almost exclusively to digital ISDN and IP transmission modes due to the need to convey the very large amounts of data generated by their cameras and microphones. These systems are often intended for use in conference mode, that is by many people in several different locations, all of whom can be viewed by every participant at each location.

Telepresence systems are a newer, more advanced subset of videoconferencing systems, meant to allow higher degrees of video and audio fidelity. Such high-end systems are typically deployed in corporate settings.

Mobile collaboration systems are another recent development, combining the use of video, audio, and on-screen drawing capabilities using newest generation hand-held electronic devices broadcasting over secure networks, enabling multi-party conferencing in real time, independent of location. Proximity chat is another alternative mode, focused on the flexibility of small group conversations.

A more recent technology encompassing these functions is TV cams. TV cams enable people to make video calls using video calling services, like Skype on their TV, without using a PC connection. TV cams are specially designed video cameras that feed images in real time to another TV camera or other compatible computing devices like smartphones, tablets and computers.

Webcams are popular, relatively low-cost devices that can provide live video and audio streams via personal computers, and can be used with many software clients for both video calls and videoconferencing.[30]

Each of the systems has its own advantages and disadvantages, including video quality, capital cost, degrees of sophistication, transmission capacity requirements, and cost of use.

By cost and quality of service

[edit]

From the least to the most expensive systems:

  • Web camera videophone and videoconferencing systems, either stand-alone or built-in, that serve as complements to personal computers, connected to other participants by computer and VoIP networks—lowest direct cost, assuming the users already possess computers at their respective locations. Quality of service can range from low to very high, including high definition video available on the latest model webcams. A related and similar device is a TV camera which is usually small, sits on top of a TV, and can connect to it via its HDMI port, similar to how a webcam attaches to a computer via a USB port.
  • Videophones—low to midrange cost. The earliest standalone models operated over either plain old telephone service (POTS) lines on the PSTN telephone networks or more expensive ISDN lines, while newer models have largely migrated to Internet Protocol line service for higher image resolutions and sound quality. Quality of service for standalone videophones can vary from low to high;
  • Huddle room or all-in-one systems —low to midrange cost, newer endpoint category based on standard videoconferencing systems, but defined by the camera, microphone(s), speakers, and codec contained in a single piece of hardware. Typically used in small to medium spaces where beamforming microphone arrays located in the system are sufficient, in lieu of table or ceiling microphones in closer proximity to the in-room participants. Quality of service is comparable to standard videoconferencing systems, varying from moderate to high. Some manufacturers' huddle room systems do not include the codec within the soundbar-shaped unit, rather only camera, microphone, and speakers. These systems are usually still classified as huddle room systems, but, like webcams, rely on a USB connection to an external device, usually a PC, to process the video codec responsibilities. Despite its name, video conferencing systems for Huddle Rooms prevent participants from huddling close together to be seen in the camera. All-in-one systems for these types of rooms range from wide angles such as 110° Horizontal field of view (FOV) to as much as 360° FOV that allow a full view of the room.
  • Videoconferencing systems—midrange cost, usually using multipoint control units or other bridging services to allow multiple parties on videoconference calls. Quality of service can vary from moderate to high.
  • Telepresence systems—highest capabilities and highest cost. Full high-end systems can involve specially built teleconference rooms to allow expansive views with very high levels of audio and video fidelity, to permit an 'immersive' videoconference. When the proper type and capacity transmission lines are provided between facilities, the quality of service reaches state-of-the-art levels.

Security concerns

[edit]

Computer security experts have shown that poorly configured or inadequately supervised videoconferencing systems can permit an easy virtual entry by computer hackers and criminals into company premises and corporate boardrooms.[31]

Adoption

[edit]

For over a century, futurists have envisioned a future where telephone conversations will take place as actual face-to-face encounters with video as well as audio. Sometimes it is simply not possible or practical to have face-to-face meetings with two or more people. Sometimes a telephone conversation or conference call is adequate. Other times, e-mail exchanges are adequate. However, videoconferencing adds another option and can be considered when:

  • A live conversation is needed
  • Non-verbal (visual) information is an important component of the conversation
  • The parties of the conversation cannot physically come to the same location
  • The expense or time of travel is a consideration

Bill Gates said in 2001 that he used videoconferencing "three or four times a year", because digital scheduling was difficult and "if the overhead is super high, then you might as well just have a face-to-face meeting".[32] Some observers argue that three outstanding issues have prevented videoconferencing from becoming a widely adopted form of communication, despite the ubiquity of videoconferencing-capable systems.[33]

  • Eye contact: Eye contact plays a large role in conversational turn-taking, perceived attention and intent, and other aspects of group communication.[34] While traditional telephone conversations give no eye contact cues, many videoconferencing systems are arguably worse in that they provide an incorrect impression that the remote interlocutor is avoiding eye contact. Some telepresence systems have cameras located in the screens that reduce the amount of parallax observed by the users. This issue is also being addressed through research that generates a synthetic image with eye contact using stereo reconstruction.[35]
    Telcordia Technologies, formerly Bell Communications Research, owns a patent for eye-to-eye videoconferencing using rear projection screens with the video camera behind it, evolved from a 1960s U.S. military system that provided videoconferencing services between the White House and various other government and military facilities. This technique eliminates the need for special cameras or image processing.[36]
  • Appearance consciousness: A second psychological problem with videoconferencing is being on camera, with the video stream possibly even being recorded. The burden of presenting an acceptable on-screen appearance is not present in audio-only communication. Early studies by Alphonse Chapanis found that the addition of video actually impaired communication, possibly because of the consciousness of being on camera.[37]
  • Signal latency: The information transport of digital signals in many steps need time. In a telecommunicated conversation, an increased latency (time lag) larger than about 150–300 ms becomes noticeable and is soon observed as unnatural and distracting. Therefore, next to a stable large bandwidth, a small total round-trip time is another major technical requirement for the communication channel for interactive videoconferencing.[38]
  • Bandwidth and quality of service: In some countries, it is difficult or expensive to get a high-quality connection that is fast enough for good-quality videoconferencing. Technologies such as ADSL are usually provided as two separate lines (for uplink/downlink) because each has limited upload speeds and cannot upload and download simultaneously at full speed. As Internet speeds increase, higher quality and high-definition videoconferencing will become more readily available.
  • Complexity of systems: Most users are not technically experienced and want a simple interface. In hardware systems, an unplugged cord or an unresponsive remote control is seen as a failure, contributing to a perceived unreliability. Successful systems are backed by support teams who can provide fast assistance when required.
  • Perceived lack of interoperability: Not all systems can readily interconnect; for example, ISDN and IP systems require a gateway. Popular software solutions cannot easily connect to hardware systems. Some systems use different standards, features, and qualities which can require additional configuration when connecting to dissimilar systems. Free software systems circumvent this limitation by making it relatively easy for a single user to communicate over multiple incompatible platforms.
  • Expense of commercial systems: Well-designed telepresence systems require specially designed rooms which can cost hundreds of thousands of dollars to fit out their rooms with codecs, integration equipment (such as Multipoint Control Units), high fidelity sound systems, and furniture. Monthly charges may also be required for bridging services and high-capacity broadband service.

These are some of the reasons many organizations only use the systems internally, where there is less risk of loss of customers. An alternative for those lacking dedicated facilities is the rental of videoconferencing-equipped meeting rooms in cities around the world. Clients can book rooms and turn up for the meeting, with all technical aspects being prearranged and support being readily available if needed. The issue of eye contact may be solved with advancing technology, including smartphones which have the screen and camera in essentially the same place. In developed countries, the near-ubiquity of smartphones, tablet computers, and computers with built-in audio and webcams removes the need for expensive dedicated hardware.

Technology

[edit]

Components and types

[edit]
Dual display: A mid-2000s Polycom VSX 7000 system and camera used for videoconferencing, with two displays for simultaneous broadcast from separate locations
A videoconference meeting facilitated by Google Hangouts

The core technology used in a videotelephony system is digital compression of audio and video streams in real time. The hardware or software that performs compression is called a codec (coder/decoder). Compression rates of up to 1:500 can be achieved. The resulting digital stream of 1s and 0s is subdivided into labeled packets, which are then transmitted through a digital network of some kind (usually ISDN or IP).

The other components required for a videoconferencing system include:

  • Video input: (PTZ / 360° / Fisheye) video camera, or webcam
  • Video output: computer monitor, television, or projector
  • Audio input: microphones, CD/DVD player, cassette player, or any other source of PreAmp audio outlet.
  • Audio output: usually loudspeakers associated with the display device or telephone
  • Data transfer: analog or digital telephone network, LAN, or Internet
  • Computer: a data processing unit that ties together the other components, does the compressing and decompressing, and initiates and maintains the data linkage via the network.

There are basically three kinds of videoconferencing and videophone systems:

  1. Dedicated systems have all required components packaged into a single piece of equipment, usually a console with a high quality remote controlled video camera. These cameras can be controlled at a distance to pan left and right, tilt up and down, and zoom. They became known as PTZ cameras. The console contains all electrical interfaces, the control computer, and the software or hardware-based codec. Omnidirectional microphones are connected to the console, as well as a TV monitor with loudspeakers and/or a video projector. There are several types of dedicated videoconferencing devices:
    1. Large group videoconferencing are built-in, large, expensive devices used for large rooms such as conference rooms and auditoriums.
    2. Small group videoconferencing are either non-portable or portable, smaller, less expensive devices used for small meeting rooms.
    3. Individual videoconferencing are usually portable devices, meant for single users, and have fixed cameras, microphones, and loudspeakers integrated into the console.
  2. Desktop systems are add-ons (hardware boards or software codec) to normal PCs and laptops, transforming them into videoconferencing devices. A range of different cameras and microphones can be used with the codec, which contains the necessary codec and transmission interfaces.
  3. WebRTC platforms use a web browser instead of dedicated native application software. Solutions such as Adobe Connect and Cisco WebEx can be accessed using a URL sent by the meeting organizer, and various degrees of security can be attached to the virtual room. Often the user must download and install a browser extension to enable access to the local camera and microphone and establish a connection to the meeting. But WebRTC does not require any special software, instead a WebRTC-compliant web browser itself provides the facilities for 1-to-1 and 1-to-many videoconferencing calls. Several enhancements to WebRTC are provided by independent vendors.

Videoconferencing modes

[edit]

Videoconferencing systems use several methods to determine which video feed or feeds to display.[39]: 11–16 

Continuous Presence simply displays all participants at the same time, usually with the exception that the viewer either does not see their own feed, or sees their own feed in miniature.

Voice-Activated Switch selectively chooses a feed to display at each endpoint, with the goal of showing the person who is currently speaking. This is done by choosing the feed (other than the viewer) which has the loudest audio input (perhaps with some filtering to avoid switching for very short-lived volume spikes). Often, if no remote parties are currently speaking, the feed with the last speaker remains on the screen.

Echo cancellation

[edit]

Acoustic echo cancellation (AEC) is a processing algorithm that uses the knowledge of audio output to monitor audio input and filter from it noises that echo back after some time delay. If unattended, these echoes can be re-amplified several times, leading to problems including:

  • The remote party hearing their own voice coming back at them (usually significantly delayed)
  • Strong reverberation, which makes the voice channel useless
  • Howling created by feedback

Echo cancellation is a processor-intensive task that usually works over a narrow range of sound delays.

Bandwidth requirements

[edit]
Deutsche Telekom T-View 100 ISDN-type videophone meant for home offices and small businesses, with a lens cover which can be rotated upward for privacy

Videophones have historically employed a variety of transmission and reception bandwidths, which can be understood as data transmission speeds. The lower the transmission/reception bandwidth, the lower the data transfer rate, resulting in a progressively limited and poorer image quality (i.e. lower resolution and/or frame rate). Data transfer rates and live video image quality are related but are also subject to other factors such as data compression techniques. Some early videophones employed very low data transmission rates with a resulting poor video quality.

Broadband bandwidth is often called high-speed, because it usually has a high rate of data transmission. In general, any connection of 256 kbit/s (0.256 Mbit/s) or greater is more concisely considered broadband Internet. The International Telecommunication Union Telecommunication Standardization Sector (ITU-T) recommendation I.113 has defined broadband as a transmission capacity at 1.5 to 2 Mbit/s. The Federal Communications Commission (United States) definition of broadband is 25 Mbit/s.[40]

Currently, adequate video for some purposes becomes possible at data rates lower than the ITU-T broadband definition, with rates of 768 kbit/s and 384 kbit/s used for some videoconferencing applications, and rates as low as 100 kbit/s used for videophones using H.264/MPEG-4 AVC compression protocols. The newer MPEG-4 video and audio compression format can deliver high-quality video at 2 Mbit/s, which is at the low end of cable modem and ADSL broadband performance.[citation needed]

Standards

[edit]
The Tandberg E20 is an example of a SIP-only device. Such devices need to route calls through a Video Communication Server to be able to reach H.323 systems, a process known as "interworking" (2009).

The International Telecommunication Union (ITU) has three umbrellas of standards for videoconferencing:

  • ITU H.320 is known as the standard for public switched telephone networks (PSTN) or videoconferencing over integrated services digital networks. While still prevalent in Europe, ISDN was never widely adopted in the United States and Canada.[citation needed]
  • ITU H.264 Scalable Video Coding (SVC) is a compression standard that enables videoconferencing systems to achieve highly error resilient Internet Protocol (IP) video transmissions over the public Internet without quality-of-service enhanced lines.[41] This standard has enabled wide scale deployment of high definition desktop videoconferencing and made possible new architectures,[42] which reduces latency between the transmitting sources and receivers, resulting in more fluid communication without pauses. In addition, an attractive factor for IP videoconferencing is that it is easier to set up for use along with web conferencing and data collaboration. These combined technologies enable users to have a richer multimedia environment for live meetings, collaboration and presentations.
  • ITU-T V.80: videoconferencing is generally compatibilized with H.324 standard point-to-point videotelephony over regular (POTS) phone lines.

The Unified Communications Interoperability Forum (UCIF), a non-profit alliance between communications vendors, launched in May 2010. The organization's vision is to maximize the interoperability of UC based on existing standards. Founding members of UCIF include HP, Microsoft, Polycom, Logitech/Lifesize, and Juniper Networks.[43][44]

Call setup

[edit]

Videoconferencing in the late 20th century was limited to the H.323 protocol (notably Cisco's SCCP implementation was an exception), but newer videophones often use SIP, which is often easier to set up in home networking environments.[45] It is a text-based protocol, incorporating many elements of the Hypertext Transfer Protocol (HTTP) and the Simple Mail Transfer Protocol (SMTP).[46] H.323 is still used, but more commonly for business videoconferencing, while SIP is more commonly used in personal consumer videophones. A number of call-setup methods based on instant messaging protocols such as Skype also now provide video.

Another protocol used by videophones is H.324, which mixes call setup and video compression. Videophones that work on regular phone lines typically use H.324, but the bandwidth is limited by the modem to around 33 kbit/s, limiting the video quality and frame rate. A slightly modified version of H.324 called 3G-324M defined by 3GPP is also used by some cellphones that allow video calls, typically for use only in UMTS networks.[47][48]

There is also H.320 standard, which specified technical requirements for narrow-band visual telephone systems and terminal equipment, typically for videoconferencing and videophone services. It applied mostly to dedicated circuit-based switched network (point-to-point) connections of moderate or high bandwidth, such as through the medium-bandwidth ISDN digital phone protocol or a fractionated high bandwidth T1 lines. Modern products based on H.320 standard usually support also H.323 standard.[49]

The IAX2 protocol also supports videophone calls natively, using the protocol's own capabilities to transport alternate media streams. A few hobbyists obtained the Nortel 1535 Color SIP Videophone cheaply in 2010 as surplus after Nortel's bankruptcy and deployed the sets on the Asterisk (PBX) platform. While additional software is required to patch together multiple video feeds for conference calls or convert between dissimilar video standards, SIP calls between two identical handsets within the same PBX were relatively straightforward.[50]

Conferencing layers

[edit]

The components within a videoconferencing system can be divided up into several different layers: User Interface, Conference Control, Control or Signaling Plane, and Media Plane.

Videoconferencing User Interfaces (VUI) can be either graphical or voice-responsive. Many in the industry have encountered both types of interface, and normally a graphical interface is encountered on a computer. User interfaces for conferencing have a number of different uses; they can be used for scheduling, setup, and making a video call. Through the user interface, the administrator is able to control the other three layers of the system.

Conference Control performs resource allocation, management, and routing. This layer along with the User Interface creates meetings (scheduled or unscheduled) or adds and removes participants from a conference.

Control (Signaling) Plane contains the stacks that signal different endpoints to create a call and/or a conference. Signals can be, but are not limited to, H.323 and Session Initiation Protocol (SIP) Protocols. These signals control incoming and outgoing connections as well as session parameters.

The Media Plane controls the audio and video mixing and streaming. This layer manages Real-Time Transport Protocols, User Datagram Packets (UDP) and Real-Time Transport Control Protocol (RTCP). The RTP and UDP normally carry information such the payload type which is the type of codec, frame rate, video size, and many others. RTCP on the other hand acts as a quality control Protocol for detecting errors during streaming.[39]

Multipoint control

[edit]

Simultaneous videoconferencing among three or more remote points is possible in a hardware-based system by means of a Multipoint Control Unit (MCU). This is a bridge that interconnects calls from several sources (in a similar way to the audio conference call). All parties call the MCU, or the MCU can also call the parties which are going to participate, in sequence. There are MCU bridges for IP and ISDN-based videoconferencing. There are MCUs which are pure software and others that are a combination of hardware and software. An MCU is characterized according to the number of simultaneous calls it can handle, its ability to conduct transposing of data rates and protocols, and features such as Continuous Presence, in which multiple parties can be seen on-screen at once. MCUs can be stand-alone hardware devices, or they can be embedded into dedicated videoconferencing units.

The MCU consists of two logical components:

  1. A single multipoint controller (MC), and
  2. Multipoint Processors (MP), sometimes referred to as the mixer.

The MC controls the conferencing while it is active on the signaling plane, which is simply where the system manages conferencing creation, endpoint signaling and in-conferencing controls. This component negotiates parameters with every endpoint in the network and controls conferencing resources. While the MC controls resources and signaling negotiations, the MP operates on the media plane and receives media from each endpoint. The MP generates output streams from each endpoint and redirects the information to other endpoints in the conference.

Some systems are capable of multipoint conferencing with no MCU, stand-alone, embedded or otherwise. These use a standards-based H.323 technique known as decentralized multipoint, where each station in a multipoint call exchanges video and audio directly with the other stations with no central manager or other bottleneck. The advantages of this technique are that the video and audio will generally be of higher quality because they do not have to be relayed through a central point. Also, users can make ad hoc multipoint calls without any concern for the availability or control of an MCU. This added convenience and quality comes at the expense of some increased network bandwidth, because every station must transmit to every other station directly.[39]

Cloud storage

[edit]

Cloud-based videoconferencing can be used without the hardware generally required by other videoconferencing systems, and can be designed for use by SMEs,[51] or larger international or multinational corporations like Facebook.[52][53] Cloud-based systems can handle either 2D or 3D video broadcasting.[54] Cloud-based systems can also implement mobile calls, VOIP, and other forms of video calling. They can also come with a video recording function to archive past meetings.[55]

Impact

[edit]
A mobile video call between Sweden and Singapore made on a Sony Ericsson K800 (2007)

High speed Internet connectivity has become more widely available and affordable, as has good-quality video capture and display hardware. Consequently, personal videoconferencing systems based on webcams, personal computer systems, software compression, and the Internet have become progressively more affordable by the general public. The availability of freeware (often as part of chat programs) has made software based videoconferencing accessible to many.

The widest deployment of videotelephony now occurs in mobile phones. Nearly all mobile phones supporting UMTS networks can work as videophones using their internal cameras and are able to make video calls wirelessly to other UMTS users anywhere.[citation needed] As of the second quarter of 2007, there are over 131 million UMTS users (and hence potential videophone users), on 134 networks in 59 countries.[citation needed] Mobile phones can also use broadband wireless Internet, whether through the cell phone network or over a local Wi-Fi connection, along with software-based videophone apps to make calls to any video-capable Internet user, whether mobile or fixed.

Deaf, hard-of-hearing, and mute individuals have a particular role in the development of affordable high-quality videotelephony as a means of communicating with each other in sign language. Unlike Video Relay Service, which is intended to support communication between a caller using sign language and another party using spoken language, videoconferencing can be used directly between two deaf signers.

Videophones are increasingly used in the provision of telemedicine to the elderly, disabled, and to those in remote locations, where the ease and convenience of quickly obtaining diagnostic and consultative medical services are readily apparent.[56] In one single instance quoted in 2006: "A nurse-led clinic at Letham has received positive feedback on a trial of a video-link which allowed 60 pensioners to be assessed by medics without traveling to a doctor's office or medical clinic."[56] A further improvement in telemedical services has been the development of new technology incorporated into special videophones to permit remote diagnostic services, such as blood sugar level, blood pressure, and vital signs monitoring. Such units are capable of relaying both regular audio-video plus medical data over either standard (POTS) telephone or newer broadband lines.[57]

A Tandberg T3 high-resolution telepresence room in use (2008)

Videotelephony has also been deployed in corporate teleconferencing, also available through the use of public access videoconferencing rooms. A higher level of videoconferencing that employs advanced telecommunication technologies and high-resolution displays is called telepresence.

Today the principles, if not the precise mechanisms, of a videophone are employed by many users worldwide in the form of webcam videocalls using personal computers, with inexpensive webcams, microphones, and free video calling Web client programs. Thus an activity that was disappointing as a separate service has found a niche as a minor feature in software products intended for other purposes.

A study conducted by Pew Research in 2010, revealed that 7% of Americans have made a mobile video call.[58]

Government and law

[edit]

In the United States, videoconferencing has allowed testimony to be used for an individual who is unable or prefers not to attend the physical legal settings or would be subjected to severe psychological stress in doing so, however, there is a controversy on the use of testimony by foreign or unavailable witnesses via video transmission, regarding the violation of the Confrontation Clause of the Sixth Amendment of the U.S. Constitution.[59] Videoconferencing may also be associated with a number of technical risks.[60]

In a military investigation in North Carolina, Afghan witnesses have testified via videoconferencing.

In Hall County, Georgia, videoconferencing systems are used for initial court appearances. The systems link jails with courtrooms, reducing the expenses and security risks of transporting prisoners to the courtroom.[61]

The U.S. Social Security Administration (SSA), which oversees the world's largest administrative judicial system under its Office of Disability Adjudication and Review (ODAR),[62] has made extensive use of videoconferencing to conduct hearings at remote locations.[63] In Fiscal Year (FY) 2009, the U.S. Social Security Administration (SSA) conducted 86,320 videoconferenced hearings, a 55% increase over FY 2008.[64] In August 2010, the SSA opened its fifth and largest videoconferencing-only National Hearing Center (NHC), in St. Louis, Missouri. This continues the SSA's effort to use video hearings as a means to clear its substantial hearing backlog. Since 2007, the SSA has also established NHCs in Albuquerque, New Mexico, Baltimore, Maryland, Falls Church, Virginia, and Chicago.[62]

Education

[edit]
Indonesian and U.S. students participate in an educational videoconference (2010)

Videoconferencing has gained widespread popularity within education in recent years, particularly so following the COVID-19 Pandemic of early 2020, when much education provision moved online. It provides students with the chance to learn by participating in two-way communication forums. Because it is live, videotelephony allows teachers to access remote or otherwise isolated learners. Students from diverse communities and backgrounds can come together to learn about one another through practices known as telecollaboration[65][66] (in foreign language education) and virtual exchange, although language barriers will continue to be present. Such students are able to explore, communicate, analyze, and share information and ideas with one another.

Educational institutions have promoted videoconferencing as a way to reduce costs and increase student numbers, with lectures and seminars now often being provided online through videoconferencing technology. Videoconferencing offers educational institutes the possibility to provide courses and education to greater numbers of students, dispersed over large geographical areas than can be provided from a single bricks-and-mortar location[67]

Through videoconferencing, students can visit other parts of the world, including museums and other cultural and educational sites. Such virtual field trips can provide enriched learning opportunities to students, especially those who are geographically isolated or economically disadvantaged. Small schools can use these technologies to pool resources and provide courses, such as in foreign languages, which could not otherwise be offered.

Other benefits that videoconferencing can provide to education include:

  • faculty members keeping in touch with classes while attending conferences;
  • faculty members attending conferences 'virtually'[68][69]
  • guest lecturers brought in classes from other institutions;[70]
  • researchers collaborating with colleagues at other institutions on a regular basis without loss of time due to travel;
  • schools with multiple campuses collaborating and sharing professors;[71]
  • schools from two separate nations engaging in cross-cultural exchanges;[72]
  • faculty members participating in thesis defenses at other institutions;
  • administrators on tight schedules collaborating on budget preparation from different parts of campus;
  • faculty committee auditioning scholarship candidates;
  • researchers answering questions about grant proposals from agencies or review committees;
  • alternative enrollment structures to purely in-person attendance;
  • student interviews with employers in other cities, and
  • teleseminars.

Medicine and health

[edit]

Videoconferencing is a highly useful technology for real time telemedicine and telenursing applications, such as diagnosis, consulting, prevention, treatment, and transmission of medical images.[73] With videoconferencing, patients may contact nurses and physicians in emergency or routine situations; physicians and other paramedical professionals can discuss cases across large distances. Rural areas can use this technology for diagnostic purposes, thus saving lives and making more efficient use of health care money. For example, a rural medical center in Ohio used videoconferencing to successfully cut the number of transfers of sick infants to a hospital 70 miles (110 km) away. This had previously cost nearly $10,000 per transfer.[74]

Special peripherals such as microscopes fitted with digital cameras, videoendoscopes, medical ultrasound imaging devices, otoscopes, etc., can be used in conjunction with videoconferencing equipment to transmit data about a patient. Recent developments in mobile collaboration on hand-held mobile devices have also extended video-conferencing capabilities to locations previously unreachable, such as a remote community, long-term care facility, or a patient's home.[75]

Mayo Clinic uses videoconferencing to enable collaboration among multidisciplinary teams of specialists developing treatment plans for complex cases. The technology links Mayo locations with doctors at hospitals that require Mayo’s expertise and input.[76]

Business

[edit]

Videoconferencing can enable individuals in distant locations to participate in meetings on short notice, with time and money savings. Technology such as VoIP can be used in conjunction with desktop videoconferencing to enable low-cost face-to-face business meetings without leaving the desk, especially for businesses with widespread offices. The technology is also used for remote work. One research report based on a sampling of 1,800 corporate employees showed that, as of June 2010, 54% of the respondents with access to videoconferencing used it "all of the time" or "frequently".[77][78]

Aside from traditional meetings, videoconferencing enables collaborative group sessions in which people collaborate to produce products and services. Industrial Light & Magic uses videoconferencing as part of a 24-hour global video effects production environment for the film industry.[79]

Intel Corporation have used videoconferencing to reduce both costs and environmental impacts of its business operations.[80]

Videoconferencing is also currently being introduced on online networking websites, in order to help businesses form profitable relationships quickly and efficiently without leaving their place of work. This has been leveraged by banks to connect busy banking professionals with customers in various locations using video banking technology.

Videoconferencing on hand-held mobile devices (mobile collaboration technology) is being used in industries such as manufacturing, energy, healthcare, insurance, government, and public safety. Live, visual interaction removes traditional restrictions of distance and time, often in locations previously unreachable, such as a manufacturing plant floor thousands of miles away.[81]

In the increasingly globalized film industry, videoconferencing has become useful as a method by which creative talent in many different locations can collaborate closely on the complex details of film production. For example, for the 2013 award-winning animated film Frozen, Burbank-based Walt Disney Animation Studios hired the New York City-based husband-and-wife songwriting team of Robert Lopez and Kristen Anderson-Lopez to write the songs, which required two-hour-long transcontinental videoconferences nearly every weekday for about 14 months.[82][83][84][85]

With the development of lower-cost endpoints, the integration of video cameras into personal computers and mobile devices, and software applications such as FaceTime, Skype, Teams, BlueJeans and Zoom, videoconferencing has changed from just a business-to-business offering to include business-to-consumer (and consumer-to-consumer) use.

Although videoconferencing has frequently proven its value, research has shown that some non-managerial employees prefer not to use it due to several factors, including anxiety.[86] Some such anxieties can often be avoided if managers use the technology as part of the normal course of business. Remote workers can also adopt certain behaviors and best practices to stay connected with their co-workers and company.[87][better source needed]

Researchers also find that attendees of business and medical videoconferences must work harder to interpret information delivered during a conference than they would if they attended face-to-face.[88] They recommend that those coordinating videoconferences make adjustments to their conferencing procedures and equipment.

Press

[edit]

The concept of press videoconferencing was developed in October 2007 by the PanAfrican Press Association (APPA), a Paris France-based non-governmental organization, to allow African journalists to participate in international press conferences on developmental and good governance issues.

Press videoconferencing permits international press conferences via videoconferencing over the Internet. Journalists can participate on an international press conference from any location, without leaving their offices or countries. They need only be seated by a computer connected to the Internet in order to ask their questions.

In 2004, the International Monetary Fund introduced the Online Media Briefing Center, a password-protected site available only to professional journalists. The site enables the IMF to present press briefings globally and facilitates direct questions to briefers from the press. The site has been copied by other international organizations since its inception. More than 4,000 journalists worldwide are currently registered with the IMF.

Sign language

[edit]
Video Interpreter sign used at VRS/VRI service locations

One of the first demonstrations of the ability for telecommunications to help sign language users communicate with each other occurred when AT&T's videophone (trademarked as the Picturephone) was introduced to the public at the 1964 New York World's Fair—two deaf users were able to communicate freely with each other between the fair and another city.[89] Various universities and other organizations, including British Telecom's Martlesham facility, have also conducted extensive research on signing via video telephony.[90][91][92]

The use of sign language via videotelephony was hampered for many years due to the difficulty of its use over slow analog copper phone lines,[91] coupled with the high cost of better quality ISDN (data) phone lines.[90] Those factors largely disappeared with the introduction of more efficient and powerful video codecs and the advent of lower-cost high-speed ISDN data and IP (Internet) services in the 1990s.

21st-century improvements

[edit]

Significant improvements in video call quality of service for the deaf occurred in the United States in 2003 when Sorenson Media Inc. (formerly Sorenson Vision Inc.), a video compression software coding company, developed its VP-100 model stand-alone videophone specifically for the deaf community. It was designed to output its video to the user's television in order to lower the cost of acquisition and to offer remote control and a powerful video compression codec for unequaled video quality and ease of use with video relay services. Favorable reviews quickly led to its popular usage at educational facilities for the deaf, and from there to the greater deaf community.[93]

Coupled with similar high-quality videophones introduced by other electronics manufacturers, the availability of high-speed Internet, and sponsored video relay services authorized by the U.S. Federal Communications Commission in 2002, VRS services for the deaf underwent rapid growth in that country.[93]

A deaf or hard-of-hearing person uses a Video Relay Service at his workplace to communicate with a hearing person in London (2007).

Using such video equipment in the present day, the deaf, hard-of-hearing, and speech-impaired can communicate between themselves and with hearing individuals using sign language. The United States and several other countries compensate companies to provide video relay services (VRS). Telecommunication equipment can be used to talk to others via a sign language interpreter, who uses a conventional telephone at the same time to communicate with the deaf person's party. Video equipment is also used to do on-site sign language translation via Video Remote Interpreting (VRI). The relatively low cost and widespread availability of 3G mobile phone technology with video calling capabilities have given deaf and speech-impaired users a greater ability to communicate with the same ease as others. Some wireless operators have even started free sign language gateways.

Sign language interpretation services via VRS or by VRI are useful in the present day where one of the parties is deaf, hard-of-hearing, or speech-impaired (mute). In such cases the interpretation flow is normally within the same principal language, such as French Sign Language (LSF) to spoken French, Spanish Sign Language (LSE) to spoken Spanish, British Sign Language (BSL) to spoken English, and American Sign Language (ASL) also to spoken English (since BSL and ASL are completely distinct from each other), German Sign Language (DGS) to spoken German, and so on.

Multilingual sign language interpreters, who can also translate as well across principal languages (such as a multilingual interpreter interpreting a call from a deaf person using ASL to reserve a hotel room at a hotel in the Dominican Republic whose staff speaks Spanish only, therefore the interpreter has to use ASL, spoken Spanish, and spoken English to facilitate the call for the deaf person), are also available, albeit less frequently. Such activities involve considerable mental processing efforts on the part of the translator, since sign languages are distinct natural languages with their own construction, semantics and syntax, different from the aural version of the same principal language.

With video interpreting, sign language interpreters work remotely with live video and audio feeds, so that the interpreter can see the deaf or mute party, and converse with the hearing party, and vice versa. Much like telephone interpreting, video interpreting can be used for situations in which no on-site interpreters are available. However, video interpreting cannot be used for situations in which all parties are speaking via telephone alone. VRS and VRI interpretation requires all parties to have the necessary equipment. Some advanced equipment enables interpreters to control the video camera remotely, in order to zoom in and out or to point the camera toward the party that is signing.

Comparison of Sign Language communication tools

[edit]
Tool Owner Free? Pure web based?[a] Works on desktops? Mobile support? Uses email? Required hardware Installation Limitations Specialities Technologies Deaf made? Licensing
Facebook Messenger Facebook Yes No Yes Yes No Any mobile App must be installed, does not require a Facebook account ? No 100% proprietary
FaceTime Apple Inc. Yes No Yes Yes No Apple hardware only (Desktop or mobile) App must be installed, requires an Apple ID account ? No 100% proprietary
Glide (software) Glide Yes No No Yes No Any mobiles App must be installed ? No 100% proprietary
Google Hangouts Google Yes No Yes Yes No Any Desktop or mobiles App must be installed, requires a Google account ? No 100% proprietary
Skype Microsoft Yes No Yes Yes No Any Desktop or mobile App must be installed, requires a Microsoft account ? No 100% proprietary
Tikatoy Archived 2019-01-08 at the Wayback Machine Tikatoy Archived 2019-01-08 at the Wayback Machine Yes No Yes Android only Yes Desktop or Android Requires a web browser with Adobe Flash Apple blocks Adobe Flash C++, JavaScript, Python Yes 100% proprietary
videomail.io Binary Kitchen Yes Yes Yes Android only Yes Desktop or Android, iPhone and Safari only for viewing Web browser Recording max 3 minutes, does not work on old browsers Reusable: can be plugged directly into other websites or as a WordPress plugin ninja-forms-videomail Archived 2018-01-19 at the Wayback Machine JavaScript Yes Mixed. Proprietary server and client is open source[94]
  1. ^ Pure web based means, it is using standardized web technologies only such as HTML, JavaScript and CSS.

Descriptive names and terminology

[edit]

The name videophone never became as standardized as its earlier counterpart telephone, resulting in a variety of names and terms being used worldwide, and even within the same region or country. Videophones are also known as video phones, videotelephones (or video telephones) and often by an early trademarked name Picturephone, which was the world's first commercial videophone produced in volume. The compound name videophone slowly entered into general use after 1950,[95] although video telephone likely entered the lexicon earlier after video was coined in 1935.[96]

Videophone calls (also: videocalls, video chat)[97] as well as Skype and Skyping in verb form[98] differ from videoconferencing in that they expect to serve individuals, not groups.[2] However that distinction has become increasingly blurred with technology improvements such as increased bandwidth and sophisticated software clients that can allow for multiple parties on a call. In general everyday usage the term videoconferencing is now frequently used instead of videocall for point-to-point calls between two units. Both videophone calls and videoconferencing are also now commonly referred to as a video link.

Webcams are popular, relatively low-cost devices that can provide live video and audio streams via personal computers, and can be used with many software clients for both video calls and videoconferencing.[30]

A videoconference system is generally higher cost than a videophone and deploys greater capabilities. A videoconference (also known as a videoteleconference) allows two or more locations to communicate via live, simultaneous two-way video and audio transmissions. This is often accomplished by the use of a multipoint control unit (a centralized distribution and call management system) or by a similar non-centralized multipoint capability embedded in each videoconferencing unit. Again, technology improvements have circumvented traditional definitions by allowing multiple-party videoconferencing via web-based applications.[99][100]

A telepresence system is a high-end videoconferencing system and service usually employed by enterprise-level corporate offices. Telepresence conference rooms use state-of-the-art room designs, video cameras, displays, sound systems and processors, coupled with high-to-very-high capacity bandwidth transmissions.

Typical uses of the various technologies described above include calling one-to-one or conferencing one-to-many or many-to-many for personal, business, educational, deaf Video Relay Service and tele-medical, diagnostic and rehabilitative purposes.[101] personal videocalls to inmates incarcerated in penitentiaries, and videoconferencing to resolve airline engineering issues at maintenance facilities, are being created or evolving on an ongoing basis.

Other names for videophone that have been used in English are: Viewphone (the British Telecom equivalent to AT&T's Picturephone),[102] and visiophone, a common French translation that has also crept into limited English usage, as well as over twenty less common names and expressions. Latin-based translations of videophone in other languages include vidéophone (French), Bildtelefon (German), videotelefono (Italian), both videófono and videoteléfono (Spanish), both beeldtelefoon and videofoon (Dutch), and videofonía (Catalan).

A telepresence robot (also telerobotics) is a robotically controlled and motorized videoconferencing display to help give a better sense of remote physical presence for communication and collaboration in an office, home, school, etc. when one cannot be there in person. The robotic avatar device can move about and look around at the command of the remote person it represents.[103]

[edit]
Dr. Heywood Floyd in the 1968 film 2001: A Space Odyssey calls his daughter on Earth.

In science fiction literature, names commonly associated with videophones include telephonoscope, telephote, viewphone, vidphone, vidfone, and visiphone. The first example was probably the cartoon "Edison's Telephonoscope" by George du Maurier in Punch 1878.[104] In «In the year 2889», published 1889, the French author Jules Verne predicts that «The transmission of speech is an old story; the transmission of images by means of sensitive mirrors connected by wires is a thing but of yesterday.»[105] Early examples in Anglophone literature, using the word videotelephone, includes The World of Null-A Harl Vincent from late 1920s.[106] In many science fiction movies and TV programs that are set in the future, videophones were used as a primary method of communication. One of the first movies where a videophone was used was Fritz Lang's Metropolis (1927).[107]

Other notable examples of videophones in popular culture include an iconic scene from the 1968 film 2001: A Space Odyssey set on Space Station V. The movie was released shortly before AT&T began its efforts to commercialize its Picturephone Mod II service in several cities and depicts a video call to Earth using an advanced AT&T videophone—which it predicts will cost $1.70 for a two-minute call in 2001 (a fraction of the company's real rates on Earth in 1968). Film director Stanley Kubrick strove for scientific accuracy, relying on interviews with scientists and engineers at Bell Labs in the United States. Dr. Larry Rabiner of Bell Labs, discussing videophone research in the documentary 2001: The Making of a Myth, stated that in the mid-to late-1960s videophones "... captured the imagination of the public and ... of Mr. Kubrick and the people who reported to him". In one 2001 movie scene a central character, Dr. Heywood Floyd, calls home to contact his family, a social feature noted in the Making of a Myth. Floyd talks with and views his daughter from a space station in orbit above the Earth, discussing what type of present he should bring home for her.[108][unreliable source][109][110]

Other earlier examples of videophones in popular culture included a videophone that was featured in the Warner Bros. cartoon, Plane Daffy, in which the female spy Hatta Mari used a videophone to communicate with Adolf Hitler (1944), as well as a device with the same functionality has been used by the comic strip character Dick Tracy, who often used his "2-way wrist TV" to communicate with police headquarters.[111] (1964–1977).

By the early 2010s videotelephony and videophones had become commonplace and unremarkable in various forms of media, in part due to their real and ubiquitous presence in common electronic devices and laptop computers. Additionally, TV programming increasingly used videophones to interview subjects of interest and to present live coverage by news correspondents, via the Internet or by satellite links. In the mass market media, the popular U.S. TV talk show hostess Oprah Winfrey incorporated videotelephony into her TV program on a regular basis from May 21, 2009, with an initial episode called Where the Skype Are You?, as part of a marketing agreement with the Internet telecommunication company Skype.[112][113]

See also

[edit]

Notes

[edit]

Bibliography

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Videotelephony is a telecommunications technology that enables real-time, two-way communication of synchronized audio and video signals between two or more participants, typically using devices such as videophones, computers, or smartphones connected over telephone lines, integrated services digital network (ISDN), or internet protocol (IP) networks. This system combines the functionalities of traditional telephony with visual elements, allowing users to see and hear each other simultaneously, which distinguishes it from one-way video broadcasting or audio-only calls. The concept of videotelephony emerged in the late alongside the , with early experiments in the and demonstrating feasibility, including Germany's first public service in 1936 using more than 620 miles of infrastructure, though discontinued by 1939 due to . Later efforts, such as AT&T's Picturephone Mod II launched in 1970, offered 30 frames per second video but failed commercially due to high costs—$16 for a three-minute call, plus a $160 monthly service fee—and limited infrastructure, leading to discontinuation in the mid-1970s. In modern contexts, videotelephony has evolved from specialized hardware to ubiquitous software applications, driven by advancements in broadband internet, , and video compression standards like H.264 (2003). The from 2020 dramatically accelerated adoption, with platforms such as (2003), (2010), Zoom (2011), and enabling billions of daily video interactions for personal, business, education, healthcare, and accessibility needs. As of 2025, videotelephony is a mainstream communication tool, though challenges like network latency, privacy concerns, and device interoperability persist.

History

Origins and Early Experiments

The concept of videotelephony traces its earliest precursors to inventions aimed at transmitting visual information over distance, predating moving images. In 1888, American inventor patented the , a device that electrically reproduced handwriting at a remote location using synchronized mechanical arms connected via telegraph wires. This system served as an early form of by allowing users to send hand-drawn messages in real-time, distinguishing individual styles and laying groundwork for later image transmission technologies, though it was limited to static graphics rather than live video. Pioneering efforts in moving-image transmission emerged in the 1920s through experiments, particularly by Scottish inventor . Baird achieved the first public demonstration of television in using a Nipkow disc to scan and transmit simple moving silhouettes, and by , he extended these to transatlantic broadcasts via . His mechanical systems, which mechanically scanned images line by line, were adapted for rudimentary trials, foreshadowing interactive video links by combining transmission with elements. In , advanced toward practical videotelephony in with the launch of the Fernsehsprechdienst (visual telephone service) by the on March 1, 1936, connecting to via dedicated lines. This two-way system used mechanical scanning at 25 frames per second to capture and display low-resolution images on 8-inch screens, enabling public calls at post offices for a fee equivalent to several hours of regular . Timed with the 1936 Berlin Olympics, the service demonstrated live video links, including potential uses for event reporting, though it remained limited to urban hubs and was discontinued in 1939 due to . Across the Atlantic, pursued similar innovations, beginning with a landmark demonstration on April 7, 1927, when transmitted a 50-line video image alongside voice between U.S. Secretary of Commerce in Washington, D.C., and AT&T president Walter Gifford in New York. This one-way adjunct to highlighted the potential for visual calls over existing phone lines. By 1964, AT&T unveiled the Picturephone at the New York World's Fair, featuring a compact unit with a Plumbicon camera, 5-inch cathode-ray tube screen, and 250-line resolution for two-way conversations. Public trials followed, but high costs—$160 per month plus usage fees—limited adoption to niche business use before broader commercialization in 1970.

Analog and Early Digital Systems

One of the earliest operational analog videotelephony systems was AT&T's Picturephone Mod II, commercially launched on June 30, 1970, in Pittsburgh, Pennsylvania, with expansion to later that year. This system provided full-motion black-and-white video on a 5 by 5 inch screen with 250 lines of resolution at 30 interlaced frames per second, using a camera mounted above the display for head-and-shoulders shots. The video signal required a 1 MHz bandwidth, necessitating two standard lines for video transmission alongside one for audio, resulting in a total of approximately 6 Mbit/s. Despite initial enthusiasm, the service faced significant technical limitations, including bulky equipment and poor low-light performance, and was rolled out only to a handful of locations with fewer than 500 subscribers at its peak. High costs further hampered adoption, with installation fees around $150 and monthly charges of $160 for just 30 minutes of video calling time, equivalent to over $1,200 in today's dollars. By 1971, the system had limited availability in select settings across a few U.S. cities, but usage declined rapidly due to these expenses and lack of consumer demand, leading AT&T to discontinue in 1973. The Picturephone Mod II highlighted the challenges of analog videotelephony, where demanded substantial infrastructure, paving the way for compression innovations. In , (NTT) introduced an early analog videophone service in the mid-1970s, focusing on point-to-point connections for business use with basic compression to reduce bandwidth needs over dedicated lines. Launched around 1976, the service connected major cities like and using analog transmission techniques, offering low-resolution video at frame rates suitable for static headshots, though specific technical details such as exact bandwidth or fps remain sparsely documented in public records. This deployment represented one of the first national-scale analog videophone networks outside the U.S., emphasizing reliability for corporate communications despite high setup costs and limited . The transition to early digital systems began in the with advancements in video compression, notably from Compression Labs International (CLI), founded in 1978. CLI's systems, such as the 1982 CLI T1, enabled group videotelephony over links by achieving significant data reduction, supporting broadcast-quality video at bit rates as low as 1.5 Mbit/s using proprietary . These systems were deployed for remote communications, including and applications, where traditional analog methods were impractical due to bandwidth constraints on transponders; for example, CLI technology facilitated live video feeds from remote sites with reduced latency compared to uncompressed signals. CLI's innovations laid groundwork for standardized digital codecs, influencing deployments in over 100 countries by the late . A key milestone in early digital videotelephony was the H.261 standard, ratified in 1990, which defined video coding for audiovisual services at bit rates of p × 64 kbit/s (where p ranges from 1 to 30, typically up to 1920 kbit/s). Designed for Integrated Services Digital Network (ISDN) lines, H.261 employed (DCT) compression with to achieve CIF (352 × 288 pixels) or QCIF (176 × 144 pixels) resolutions at 30 fps, enabling real-time video over standard phone infrastructure without the excessive bandwidth of analog systems. This standard facilitated the first widespread digital videophone deployments in the early 1990s, such as business terminals from manufacturers like and , though adoption was constrained by ISDN's limited availability and costs. ISDN-based videotelephony proliferated in the , offering digital channels at 64 or 128 kbit/s for reliable two-way video. A notable software example was , developed at and first released in 1992 for Macintosh computers, which supported packet-switched video conferencing over IP networks, often accessed via ISDN modems for sufficient bandwidth. Initially video-only, it used simple compression to transmit low-resolution streams (up to 160 × 120 pixels) in real-time multipoint calls without dedicated hardware, democratizing access for academic and early internet users; by 1994, Windows versions and audio integration expanded its reach, though quality was limited to jerky motion at under 15 fps on typical connections. These ISDN-era systems underscored the shift from analog's high-bandwidth demands to digital efficiency, but persistent limitations in speed and affordability delayed mass adoption until broadband advancements.

Transition to Broadband and Internet

The transition from dedicated analog and early digital videotelephony systems to and -based platforms in the late and early fundamentally democratized access, leveraging packet-switched IP networks to reduce costs and expand usability beyond specialized hardware and lines. This shift was underpinned by the development of key standards that enabled communication over non-guaranteed quality-of-service networks like the . In 1996, the Telecommunication Standardization Sector () approved , an umbrella recommendation defining protocols for call signaling, transport, and bandwidth management in IP-based videoconferencing, supporting both point-to-point and multipoint sessions. Three years later, in March 1999, the (IETF) published RFC 2543, specifying the (SIP) as a lightweight, application-layer signaling mechanism for initiating, modifying, and terminating sessions, including video calls, across IP networks. These standards built on prior digital compression techniques, such as , to adapt videotelephony for variable-bandwidth environments. The proliferation of consumer in the early 2000s provided the infrastructure necessary for practical home-based videotelephony, offering download speeds of 800–1,200 kbit/s via cable modems and comparable or higher rates through (DSL) services, a marked improvement over ISDN's of 128 kbit/s that had previously confined video to professional or expensive setups. This bandwidth increase, combined with lower per-line costs and reduced setup latency compared to circuit-switched ISDN connections, made real-time video feasible for everyday users without dedicated lines, as packet-based transmission allowed for more efficient data handling despite occasional . By , these enablers facilitated the rise of webcam-integrated software, such as Apple's AV, released in June of that year as part of Mac OS X 10.2, which supported seamless audio and video chats using any FireWire-connected camera and , requiring only a connection for plug-and-play operation among compatible users. Concurrently, Skype's public beta launch in August 2003 introduced a groundbreaking architecture for and video, allowing direct user-to-user connections without centralized servers for media streams, which minimized infrastructure costs and bypassed many NAT/firewall barriers. Early Skype video calls demanded approximately 384 kbit/s of bandwidth for acceptable quality, aligning with emerging capabilities and enabling free, global video communication on standard PCs with webcams. This model, which dynamically selected supernodes among users for signaling, rapidly popularized videotelephony by integrating it with and voice, achieving millions of users within its first year. On the mobile front, the introduction of third-generation () networks heralded portable videotelephony, with launching Japan's FOMA service on October 1, 2001, as the world's first commercial rollout using wideband (W-CDMA) technology, complete with handsets supporting 64 kbit/s video calls over cellular connections up to 384 kbit/s downlink. This enabled on-the-go video between compatible devices within coverage areas, though initial adoption was limited by handset costs and network availability. Complementing cellular advances, applications like Fring, which debuted in 2007 for platforms including and , extended IP-based calling to mobiles via , allowing free voice and early video sessions over connections without relying solely on cellular data plans.

Modern Developments and Widespread Adoption

The from 2020 to 2022 dramatically accelerated the adoption of videotelephony, transforming it from a niche tool into an essential communication medium for , education, and social interaction worldwide. Platforms like Zoom experienced explosive growth, with daily meeting participants surging from 10 million in December 2019 to 300 million by April 2020, reflecting a 30-fold increase driven by global lockdowns and stay-at-home orders. In response to heightened security concerns amid this rapid scaling, major providers implemented ; for instance, Zoom rolled out its E2EE feature in October 2020, enabling optional encryption for meetings to protect against unauthorized access while maintaining compatibility for large-scale use. Advancements in have further enhanced videotelephony's usability and inclusivity since the early 2020s. AI-powered features, such as virtual backgrounds, gained widespread adoption during the to improve and professionalism by allowing users to replace real environments with custom images or effects, with platforms like Zoom introducing this capability in April 2020 to address home office distractions. More recently, real-time speech translation has emerged as a key innovation; launched its AI-driven feature in May 2025, enabling near-instantaneous voice dubbing in languages like English to Spanish using authentic-sounding synthetic voices, thereby breaking barriers in global meetings. The rollout of networks since 2019, combined with , has significantly improved videotelephony's performance by supporting higher resolutions and lower latency. 's high bandwidth and sub-20 ms air-interface latency enable seamless 4K video streaming with end-to-end delays under 100 ms, even in mobile scenarios, as demonstrated in operational network studies where outperformed in throughput for panoramic video calls. complements this by processing video data closer to users—such as at network edges rather than distant clouds—reducing round-trip times and buffering, which is critical for interactive applications like live conferencing. As of 2025, videotelephony continues to evolve toward immersive and interoperable experiences. Hybrid AR/VR systems, exemplified by Meta's Horizon Workrooms, integrate collaboration with traditional video calls, allowing headset users to interact in shared 3D spaces while non-VR participants join via 2D feeds, fostering more engaging remote teamwork. Regulatory efforts are also advancing accessibility; the FCC's September 2024 rules, effective in 2025, mandate accessibility features for people with disabilities in video conferencing services, while industry initiatives at events like IBC 2025 emphasize standards for cross-platform compatibility to enhance global scalability.

Technology

Core Components and Hardware

Videotelephony systems rely on several fundamental hardware components to capture, , and transmit audio and video signals effectively. These include cameras for visual input, for audio capture, displays for output, endpoints that integrate these elements, network interfaces for connectivity, and codecs for data compression. Each component has evolved to support higher quality and efficiency in real-time communication, enabling seamless interactions across devices. Cameras form the primary visual capture mechanism in videotelephony, typically employing complementary metal-oxide-semiconductor () sensors to convert light into digital signals. Common types include fixed-focus s for personal use and pan-tilt-zoom (PTZ) cameras for group settings, with resolutions ranging from for basic setups to 4K ultra-high definition for professional applications. For instance, the MX Brio utilizes a 4K sensor to deliver sharp imagery at 30 frames per second, while the Facecam Pro incorporates a STARVIS sensor supporting 4K at 60 fps alongside and options. The quality of the camera significantly affects video call clarity by determining resolution, sharpness, detail capture, and smoothness (via frame rate). Higher-quality cameras produce clearer, more detailed images with smoother motion, whereas lower-quality cameras result in blurry, pixelated, or low-resolution video. External cameras generally outperform built-in ones due to superior sensors, lenses, and features such as better low-light performance and higher frame rates. Microphones complement cameras by capturing audio, often integrated into the same device or used separately; condenser microphones are prevalent in videotelephony due to their sensitivity for clear voice pickup in conference environments, as seen in external units like the HP Poly Studio A2 Table . Microphones affect audio clarity by capturing undistorted sound with effective noise reduction; superior microphones deliver natural voice reproduction and minimize issues like muffled, robotic, or interrupted audio. External microphones typically outperform built-in ones by providing clearer sound, better noise handling, and reduced distortion. Device quality is foundational to videotelephony performance, with external cameras and microphones generally outperforming built-in ones due to better components and performance. Poor input quality from hardware cannot be fully compensated by network bandwidth, software processing, or optimization techniques, as these downstream improvements rely on high-quality source signals for optimal results. Displays and endpoints serve as the user-facing interfaces, rendering video feeds while housing integrated hardware. Desktop endpoints, such as Poly (formerly Polycom) devices like the Poly Studio X52, combine high-definition cameras, microphones, and speakers into compact all-in-one units suitable for small to medium rooms, supporting plug-and-play connectivity via USB or Ethernet. Integrated smart displays, exemplified by the series, embed 8-inch or larger touchscreens with built-in cameras (e.g., 13MP in recent 8-inch and larger models as of 2025) and dual speakers, facilitating video calls through voice-activated interfaces without additional peripherals. These endpoints have progressed from the bulky consoles of the , like AT&T's Picturephone Mod I, to modern slim designs that prioritize portability and ease of integration. Network interfaces ensure reliable data transmission in videotelephony by connecting devices to IP-based networks, with routers playing a critical role in directing audio-video packets between local and wide-area connections to minimize latency. Hardware codecs, often embedded in processors like the Qualcomm Snapdragon series, handle compression and decompression of video streams; for example, Snapdragon chips incorporate AI-accelerated neural codecs, enabling improved compression and real-time processing on mobile endpoints. By 2025, all-in-one video bars such as the Poly Studio X72 feature AI-enhanced cameras with auto-framing and gesture control, representing a shift toward intelligent, compact hardware that supports hybrid work environments.

Software and Protocols

Videotelephony relies on a suite of standardized protocols for establishing, maintaining, and transporting multimedia sessions over networks. The evolution of these standards began with Recommendation H.320, which defined narrowband audiovisual services over integrated services digital network (ISDN) circuits, emphasizing circuit-switched connections for reliable, low-latency communication in early systems. As networks shifted to packet-based (IP) infrastructures, H.323 emerged in 1996 as an umbrella standard for multimedia communications over IP, incorporating components like H.225.0 for call signaling and H.245 for media control to enable between diverse endpoints, including gateways to legacy H.320 systems. This transition facilitated broader adoption by supporting non-guaranteed on IP networks while maintaining compatibility through annexes for features such as integration and enhanced signaling. Central to media transport in modern videotelephony are the (RTP) and its companion (RTCP), defined in RFC 3550. RTP handles the delivery of real-time audio and video s over (UDP), incorporating sequence numbers for reordering packets, timestamps for , and payload type identifiers to denote codecs, ensuring end-to-end transport without inherent quality-of-service guarantees. RTCP complements RTP by providing control, sending periodic reports on reception quality—such as and —along with sender statistics and participant descriptions, which are crucial for adaptive adjustments in video conferencing scenarios. Together, they operate on paired ports (RTP on even, RTCP on the next odd), with RTCP allocated about 5% of session bandwidth to balance feedback without overwhelming the media stream. Video compression in these protocols is dominated by ITU-T standards H.264 (Advanced Video Coding, AVC) and its successor H.265 (High Efficiency Video Coding, HEVC). H.264, standardized in 2003, achieves efficient compression for high-definition video through techniques like block-based and intra-frame prediction, making it a baseline for videotelephony due to its balance of quality and computational demands. H.265, introduced in 2013, builds on this with approximately 50% better compression efficiency at equivalent quality, enabling higher resolutions over constrained bandwidths, though at higher encoding complexity. Subsequent standards include (AOMedia Video 1, standardized in 2018), a offering approximately 30% better compression than H.264, widely integrated into and browser-based videotelephony by 2025. Additionally, H.266/VVC (2020) provides 30-50% efficiency gains over H.265 for ultra-high-definition applications, though with increased complexity. Both H.264 and H.265 are integrated into RTP payloads, with dynamic negotiation ensuring compatibility across sessions. Call setup in IP-based videotelephony typically employs the Session Initiation Protocol (SIP), outlined in RFC 3261, which initiates multimedia sessions via an INVITE request containing essential headers (e.g., To, From, Call-ID) and a body for media description. The process unfolds as a three-way handshake: the INVITE is routed through proxies, eliciting provisional responses like 180 Ringing from the user agent server (UAS), followed by a 200 OK upon acceptance, which the user agent client (UAC) acknowledges to establish the dialog. Embedded within SIP messages is the Session Description Protocol (SDP) from RFC 4566, which uses an offer-answer model to negotiate media capabilities, specifying streams (e.g., video via "m=video"), transport (e.g., RTP/AVP), and formats (e.g., H.264 payload types via "a=rtpmap"). This negotiation ensures endpoints agree on codecs and parameters before RTP streams commence, supporting secure variants like SIPS over TLS for encrypted signaling. WebRTC, standardized by the W3C in 2011 following Google's open-sourcing of key technologies, extends these protocols for browser-native videotelephony without plugins. It leverages RTP/RTCP for media, SDP for negotiation, and interactive connectivity establishment () for , enabling direct audio-video streams between browsers while providing APIs for local media capture and data channels. This framework has driven widespread adoption in web-based applications by simplifying integration and ensuring cross-browser interoperability. Software platforms implementing these protocols vary from open-source to models. , an Apache-licensed suite, offers fully open-source videotelephony via Jitsi Meet, supporting unlimited participants with and self-hosting options, built on for browser and mobile access. In contrast, provides a , cloud-centric platform integrated into the ecosystem, handling up to 1,000 video participants per meeting with features like live captions and extensions for custom integrations, utilizing SIP/ under the hood for hybrid work environments. These platforms enhance through protocol adherence, allowing federation—such as connecting to Teams via gateways—while differing in deployment flexibility and ecosystem lock-in.

Bandwidth, Quality, and Optimization Techniques

Videotelephony systems require sufficient bandwidth to transmit video and audio streams without degradation, with requirements varying by resolution and compression. For high-definition (HD) video at 720p or 1080p resolutions, typical bandwidth needs range from 1 to 4 Mbps per stream, enabling smooth playback at 30 frames per second (fps) under standard codecs. For 4K ultra-high-definition (UHD) video, bandwidth demands increase significantly to 25 Mbps or more, due to the higher pixel count and data volume, though efficient compression can mitigate this to around 15-25 Mbps. For uncompressed video, Bandwidth (Mbps) ≈ (width × height × fps × bit depth × 3 for RGB) / 1,000,000. Codec compression reduces this by ratios of 100:1 to 1000:1, yielding practical bitrates for streaming. This provides a foundational calculation before applying real-world codec optimizations, highlighting how higher resolutions exponentially increase data needs. Quality in videotelephony is evaluated using metrics that assess perceptual and technical performance, ensuring a natural . The (MOS), rated on a 1-5 scale where 4.0-4.5 indicates high quality, incorporates factors like audio-video , with ideal lip-sync delays under 100 ms to avoid noticeable desynchronization. Network impairments such as —variation in packet arrival times—should remain below 30 ms to prevent or artifacts in video playback. tolerance is similarly critical, with rates under 1% maintaining acceptable quality; losses above this threshold cause visible freezing or blockiness, degrading MOS scores. These metrics, standardized in recommendations, guide system design to prioritize low-latency, reliable transmission for interactive calls. Optimization techniques enhance efficiency by dynamically adjusting to network conditions and mitigating common issues. , such as MPEG-DASH (), monitors available bandwidth and switches between multiple encoded versions of the video (e.g., from HD to lower resolutions) to prevent buffering, ensuring consistent quality during fluctuations. For audio challenges, acoustic echo cancellation employs algorithms like the least mean squares (LMS) adaptive filter, which iteratively updates filter coefficients to subtract echoed signals from the microphone input, reducing feedback in real-time calls; the LMS method, based on minimizing , converges quickly with low computational overhead. These techniques, often integrated with codecs like H.265 for superior compression, allow videotelephony to operate effectively over variable connections without extensive hardware upgrades. However, while bandwidth, compression, and software optimizations—including AI-based enhancements such as noise suppression and image enhancement—can substantially improve transmission and processing, they cannot fully overcome limitations imposed by poor input device quality from cameras and microphones, as foundational clarity depends on hardware capture. High-quality cameras deliver better resolution, sharpness, detail, and smoothness via higher frame rates, while superior microphones provide clear, undistorted sound with effective noise reduction; poor inputs result in persistent issues like blurry or pixelated video and muffled or robotic audio that downstream optimizations cannot fully remedy. External devices often outperform built-in ones for optimal performance. In mobile environments, network generations present trade-offs for videotelephony performance. Fourth-generation () LTE networks support HD calls adequately but struggle with higher resolutions due to latencies around 30-50 ms and bandwidth limits of 10-20 Mbps, leading to quality drops in congested scenarios. Fifth-generation () networks address these by offering ultra-reliable low-latency communication (URLLC), enabling sub-10 ms end-to-end latency for advanced applications like holographic calls, where real-time 3D rendering requires precise . This shift facilitates immersive experiences, though 5G's benefits depend on to offload processing and maintain low across diverse mobile conditions.

Conferencing Systems and Multipoint Control

Videotelephony conferencing systems enable multi-party communication by extending point-to-point connections to support three or more participants, typically through centralized or distributed architectures that manage media distribution and coordination. In point-to-point mode, two endpoints exchange streams directly, limiting scalability to small groups due to bandwidth constraints on each device. Multipoint setups address this by introducing intermediary servers, with two primary models: the Multipoint Control Unit (MCU) and the Selective Forwarding Unit (SFU). An MCU operates in a centralized manner, receiving all incoming audio and video streams from participants, decoding them, mixing or into a single output stream, and re-encoding it for distribution back to all endpoints. This approach reduces client-side bandwidth usage since each participant receives one unified stream, but it imposes high computational demands on the server for processing, making it suitable for scenarios with limited client resources or uniform layouts, such as continuous presence views. In contrast, an SFU relays streams selectively without decoding or mixing; it forwards individual incoming streams to relevant participants based on policies like active speaker detection, allowing clients to composite multiple streams locally for flexible layouts. SFUs offer better for servers by offloading processing to endpoints and are commonly used in modern WebRTC-based systems for meetings with 5 to 100+ participants. These architectures operate across distinct layers to ensure reliable multipoint operation. The primarily relies on UDP for low-latency delivery of real-time media, often incorporating to efficiently distribute streams to multiple recipients without duplicating transmissions, as in over RTP. The control layer handles session management, such as capability negotiation and mode selection, using protocols like H.245 in frameworks to exchange endpoint parameters and determine the conference master. Synchronization occurs at the , aligning audio, video, and data streams across participants via RTP timestamps and sequence numbers to prevent drift in multipoint scenarios. Cloud-based conferencing systems often integrate storage and recording capabilities for session archiving, enabling post-meeting review while adhering to regulatory standards. For instance, AWS Chime supports recording of audio and screen shares for up to 12 hours per session, with outputs stored securely in buckets and retention policies configurable for compliance, such as GDPR data processing requirements under AWS's Data Processing Addendum. A prominent example is Zoom, which has utilized a hybrid MCU-SFU since 2011 to balance processing efficiency and flexibility, allowing scalability to meetings with over 1,000 participants through distributed server clusters and selective stream forwarding for smaller groups.

Security and Privacy

Common Vulnerabilities and Threats

Videotelephony systems are susceptible to eavesdropping threats, particularly on unencrypted (RTP) streams used for audio and video transmission, where data can be intercepted and accessed by unauthorized parties without inherent protections. Man-in-the-middle (MITM) attacks targeting (SIP) signaling further exacerbate risks by allowing attackers to intercept and potentially alter call setup information between endpoints, compromising the integrity of connections in (VoIP) environments that extend to video. A prominent example of disruption threats is "," where uninvited participants hijack video conferences to broadcast offensive content, with the FBI reporting multiple incidents in early 2020 involving pornographic images, , and threats during the surge in remote meetings. Device-level vulnerabilities, such as weak or default passwords in video-enabled hardware like Ring cameras, have enabled unauthorized access; in late 2019, hackers exploited reused credentials to infiltrate user accounts and view live feeds, affecting thousands of devices across multiple states. Metadata leaks in video calls pose additional risks by inadvertently revealing participant locations through embedded audio cues or network details, as demonstrated in 2025 research showing how conferencing apps can expose geographic information via unintended acoustic signals. A notable historical breach occurred in January 2019 with Apple's group chat feature, where a allowed callers to access audio—and potentially video—from recipients before the call was accepted, prompting Apple to temporarily disable the function. As of 2025, emerging threats include AI-generated injections into video feeds, enabling real-time manipulation during calls to impersonate participants and facilitate deception, with studies highlighting their potential for socioeconomic harm in communication platforms.

Mitigation Strategies and Best Practices

To mitigate security vulnerabilities in videotelephony, such as unauthorized intrusions exemplified by incidents, platforms implement protocols that protect media streams during transmission. WebRTC-based systems, widely used in modern videotelephony, employ (SRTP) for encrypting audio and video streams, combined with (DTLS) for and channel protection, ensuring without relying on intermediaries. This approach uses DTLS-SRTP as the default mechanism, providing lightweight, mandatory as per WebRTC specifications. Additionally, standards like AES-256 are applied for both in-transit and at-rest in platforms such as Zoom and Video API, offering robust symmetric key protection against interception. Access controls form a critical layer of defense by restricting unauthorized participation in videotelephony sessions. In , role-based access control (RBAC) enables administrators to define permissions for users, such as limiting meeting controls to organizers or presenters, integrated with for authentication. Zoom complements this with features like waiting rooms, where hosts manually approve entrants, and mandatory passcodes (typically 6-10 digits) to prevent uninvited access, alongside authentication requirements for participants. These mechanisms ensure only verified users join, reducing risks from shared or guessed meeting identifiers. Best practices further enhance videotelephony security through proactive maintenance and network safeguards. Organizations should enforce regular and software updates for conferencing devices and applications to patch known vulnerabilities, with tools like automatic updates recommended by cybersecurity agencies. Using virtual private networks (VPNs) on public protects against man-in-the-middle attacks by encrypting traffic end-to-end, a standard recommendation for remote sessions. Compliance with ISO 27001, an international standard for systems, is achieved by platforms like Zoom through audited controls covering risk assessment, access management, and incident response, applicable to video conferencing products. Videotelephony platforms must also adhere to privacy regulations to protect user data. In the , the General Data Protection Regulation (GDPR) requires explicit consent for processing in video calls, including video and audio recordings, with fines up to 4% of global annual turnover for non-compliance. In the United States, the (VPPA) prohibits disclosure of video viewing habits without consent, extending to videotelephony services, while the (FTC) enforces safeguards under Section 5 of the FTC Act against unfair or deceptive practices in data security. As of 2025, enterprise videotelephony increasingly adopts zero-trust models, which assume no implicit trust and verify every access request continuously. integrates zero-trust principles via policies and , extending to media flows with per-session verification. Zoom has implemented a comprehensive zero-trust , treating all users and devices as untrusted until authenticated, enhancing platform-wide . Biometric authentication, such as recognition or voice verification, is emerging in these systems for heightened identity assurance, with platforms like Neat and AONMeetings anticipating its adoption as a standard feature to prevent in enterprise environments as of mid-2025.

Applications and Societal Impact

Business and Professional Use

Videotelephony has become integral to business and professional environments, particularly in enabling since the shifted organizational models toward hybrid setups. Post-2020, a significant portion of companies adopted hybrid work arrangements, with a 2023 McKinsey survey indicating that 58% of U.S. workers can work from home at least part-time and 35% full-time, fundamentally altering dynamics. This transition has been supported by videotelephony platforms that facilitate real-time interaction, reducing the need for physical presence while maintaining team cohesion in distributed teams. Studies on highlight substantial time savings in through videotelephony tools. For instance, the Forrester Total Economic Impact study on Webex found that users saved an average of 8 minutes per meeting due to seamless integration and startup, translating to millions in annual gains for large organizations. Integration with tools enhances videotelephony's utility in professional workflows. Platforms like Slack incorporate video calling with features such as calendar syncing—automatically updating user status based on events from or —and screen sharing for collaborative editing during calls. These hybrids streamline scheduling and content sharing, minimizing context-switching and boosting efficiency in fast-paced corporate settings. The economic impact of videotelephony in business is evident in market expansion and cost efficiencies. The global video conferencing market is projected to reach USD 37.29 billion in , driven by widespread adoption across sectors seeking remote collaboration solutions. Small and medium-sized enterprises (SMEs) have fueled this growth by leveraging free or low-cost tiers of platforms like Zoom and Webex, which offer core features such as unlimited one-on-one calls and basic group meetings, enabling affordable entry into digital communication without significant upfront investment. During the , companies accelerated videotelephony adoption, leading to marked reductions in travel expenses. For example, U.S. spending dropped by about 60% in 2020 as firms pivoted to virtual meetings, with many reporting sustained savings through tools like Webex that replaced in-person summits and training sessions. In one composite case from a representing large enterprises, Webex deployment yielded USD 3.54 million in travel cost avoidance over three years by virtualizing events and inter-office interactions. Overall, U.S. employers saved an estimated USD 11,000 per half-time remote worker annually, partly attributable to eliminated travel and .

Education and Remote Learning

Videotelephony has transformed by enabling virtual classrooms that replicate traditional learning environments through real-time video interactions. During the 2020-2021 school year, approximately 79% of U.S. teachers reported using remote or hybrid models that relied heavily on video conferencing platforms, such as integrated with , to facilitate synchronous instruction. These tools support features like breakout rooms, allowing educators to divide students into smaller virtual groups for collaborative discussions, which enhances engagement in large classes. , adopted by over 80% of K-12 teachers weekly as a virtual learning platform, streamlines assignment distribution, feedback, and live sessions, making it a cornerstone for remote . Interactive elements in videotelephony platforms further enrich pedagogical approaches by incorporating tools for real-time participation and immersive experiences. For instance, platforms like Engage VR enable polling for instant feedback, shared for collaborative problem-solving, and field trips to historical sites or scientific simulations, with access to over 150 pre-built virtual locations as of 2023. These features promote , where students can manipulate 3D models or conduct virtual experiments in a shared video space, fostering deeper conceptual understanding without physical resources. Such integrations, supporting up to 70 simultaneous users on interactive boards, allow for scalable group activities that mimic in-person dynamics. While videotelephony expands access to , particularly for rural who gain exposure to specialized curricula and expert instructors via video links, it also highlights equity challenges related to device and availability. In rural areas, where geographic isolation limits in-person options, video platforms bridge gaps by delivering flexible, location-independent classes, improving attendance and resource sharing for underserved communities. However, disparities persist, as 19% of public schools in 2019-2020 reported no computer available for every , exacerbating the and hindering participation for low-income or rural learners without reliable . Addressing these issues requires institutional support, such as loaned hotspots, to ensure inclusive implementation. Post-pandemic, hybrid models combining videotelephony with in-person elements have shown measurable benefits in outcomes, including retention rates of 25-60% for eLearning and hybrid approaches compared to 5-10% for traditional lectures. This improvement stems from the flexibility of video tools, which accommodate diverse learning paces and reduce dropout risks in blended environments. Overall, these applications underscore videotelephony's role in adapting to broader while necessitating ongoing efforts to mitigate inequities.

Healthcare and Telemedicine

Videotelephony has become integral to telemedicine, enabling real-time visual and auditory interactions between healthcare providers and patients for diagnostics, consultations, and monitoring. Platforms like Doxy.me, launched in 2014, offer HIPAA-compliant video conferencing tailored for medical use, supporting secure, browser-based sessions without downloads. Similarly, , established in 2002, provides HIPAA-compliant videotelephony services that facilitate virtual and specialist interactions, ensuring (PHI) transmission through and business associate agreements (BAAs). These platforms emphasize ease of access, with features like to maintain compliance during video-enabled patient encounters. Key use cases include remote consultations, which allow patients to receive care without traveling, thereby reducing emergency room (ER) visits. A 2022 Cigna study found that virtual care via videotelephony led to 19% fewer ER and urgent care visits compared to traditional in-person care, highlighting its role in managing non-emergent conditions efficiently. Additionally, videotelephony supports specialist referrals by enabling secure video links for consultations, such as connecting providers with cardiologists or dermatologists for visual assessments of symptoms, improving access to expertise without physical transfers. Regulatory advancements have further integrated videotelephony with remote monitoring devices. In April 2024, the U.S. (FDA) approved Eko Health's AI-enabled digital , which detects low indicative of in 15 seconds during routine exams and integrates with telemedicine platforms for live-streaming sounds over video. This device enhances videotelephony by allowing remote cardiac evaluations, with AI analysis supporting clinical decisions in virtual settings. Globally, videotelephony has expanded telemedicine in underserved regions. India's eSanjeevani national telemedicine service, operational since 2019, scaled to over 108,000 access points by the end of 2023 and had delivered more than 160 million consultations by September 2023, with totals exceeding 372 million as of mid-2025, through video-enabled provider-to-patient and provider-to-provider models, particularly benefiting rural populations. These implementations underscore videotelephony's role in bridging healthcare gaps, often incorporating brief references to standards like HIPAA for protection during sessions.

Government, Accessibility, and Cultural Roles

Videotelephony has transformed government operations, particularly in judicial proceedings and international . Virtual courtrooms emerged prominently during the , enabling remote hearings to maintain access to while minimizing health risks. For instance, U.S. courts across federal and state levels adopted platforms like Zoom for trials, arraignments, and sentencing, allowing participants to appear via video from secure locations. In diplomacy, the shifted to virtual formats for its 75th in 2020, where leaders delivered pre-recorded speeches and engaged in live video side meetings, reducing travel and fostering broader participation amid global restrictions. Accessibility for deaf and hard-of-hearing individuals has been significantly enhanced by videotelephony through specialized services like the (VRS) in the United States. Authorized by the in 2000, VRS enables ASL users to make phone calls by connecting via video to a communications assistant who interprets between ASL and spoken English in real time, bridging communication gaps without cost to the user. By 2002, VRS was available nationwide, supporting everyday interactions such as medical appointments and business calls. In the 21st century, AI-driven innovations like SignAll have introduced to automate sign language translation during video calls, improving speed and availability for non-relay scenarios. Culturally, videotelephony facilitates media events and by enabling remote, interactive engagement. Virtual press conferences, popularized during the , allow journalists and officials to participate via video platforms, streamlining global coverage without physical gatherings. In , platforms like Twitch support virtual concerts where artists perform live via video streams, interacting with audiences through real-time chat and donations, thus expanding access to performances beyond traditional venues.
Tool TypeExamplesLatency CharacteristicsAccuracy for Sign Language Interpretation
VRS ProvidersSorenson VRS, ZVRSOptimized for real-time relay with minimal to support natural dialogueHigh via certified human interpreters, ensuring precise ASL-to-English conveyance
General AppsZoom, Typically 100-300 ms, variable based on network; suitable for calls but may lag in poor conditionsModerate; relies on auto-captions (ASR ~80-95% for speech) or VRS integration, lacking native sign recognition

Terminology and Categorization

Descriptive Names and Evolution of Terms

The term "video telephone" emerged in the early , with conceptual depictions appearing as early as 1910 in illustrations imagining future communication devices that combined visual and audio transmission, though formal usage of the phrase dates to the amid initial experiments in television-based . By the 1930s, early public demonstrations, such as AT&T's 1931 two-way video system, reinforced the term's association with point-to-point visual calls. The related term "videophone" gained traction after 1950, reflecting advancements in dedicated hardware for individual use. As videotelephony expanded into group settings during the mid-20th century, nomenclature shifted toward "videoconferencing" in the 1960s and 1970s, coinciding with commercial deployments like AT&T's Picturephone service, first demonstrated in 1964 at the New York and commercially launched in 1970, which emphasized business meetings over personal calls. This evolution highlighted a distinction from one-on-one "video telephone" interactions, with the term "videoconferencing" appearing in technical literature by to describe multi-party video links. In the digital era, particularly post-1990s with protocols, "video calling" became the predominant modern descriptor for consumer-oriented, app-based , simplifying the language for everyday mobile and web use. In the , terms like "video chat" and "video call" emerged for informal, -based interactions, as seen in early software like and MSN Messenger. Regional linguistic variations reflect local technological histories; in , "Bildtelefon" (literally "picture ") was coined for the world's first videotelephony service launched by the postal authority in 1936, using mechanical scanners for Berlin-to-Leipzig calls. Similarly, in , "visiophone" was coined in the 1970s from "visio-" (vision) combined with "phone," entering usage alongside systems like Matra's 1970 videophone, and persisting today for both intercoms and remote video devices. These terms underscore how early national infrastructures shaped descriptive nomenclature. Post-2000, as high-definition and immersive systems proliferated, the terminology evolved to "" for advanced setups aiming to simulate physical colocation, with the term—originally proposed by in 1980 for remote manipulation—repurposed in videotelephony by companies like , which introduced commercial telepresence suites in 2006 featuring life-size displays and spatial audio. Branding has further influenced term usage, as seen in Apple's , launched in June 2010 as a proprietary video calling feature integrated into devices, emphasizing seamless personal connectivity and retaining its branded identity distinct from generic descriptors. In contrast, Zoom Video Communications, founded in 2011, saw its name evolve into a generic stand-in for any video conference by 2020, with phrases like "let's Zoom" mirroring historical genericide of terms like "Kleenex" for tissues, driven by pandemic-era ubiquity.

Categories by Cost, Quality, and Service Models

Videotelephony systems are broadly classified by cost into free consumer tiers and paid enterprise subscriptions. Free consumer options, such as video calls and Zoom's Basic plan, enable basic peer-to-peer or small-group video communication without subscription fees, though they often impose limits like 40-minute meeting durations or participant caps at 100. These are designed for personal or casual use, relying on end-user devices like smartphones without additional infrastructure costs. In contrast, enterprise subscriptions range from approximately $13 to $25 per user per month (annual billing, as of November 2025), providing robust features including unlimited call times, advanced , and integrations with business tools. For instance, Zoom's Pro plan starts at $13.33 per user per month (annual), while the Business plan is $18.32, and Enterprise options feature custom for large-scale deployments. This tier supports professional environments, where costs scale with user count and feature depth, often bundled with audio conferencing and analytics. Quality levels in videotelephony span low-end mobile setups to high-end configurations, differentiated primarily by bandwidth and resolution. Low-end systems, common in consumer mobile applications, operate at sub-1 Mbps bandwidth for standard definition (SD) video, delivering acceptable clarity for one-on-one calls on limited networks like cellular data, with resolutions up to at 30 frames per second. High-end setups, such as 4K rooms, require 25 Mbps or more per endpoint to achieve immersive, lifelike experiences with ultra-high definition video and multi-screen layouts, enabling detailed visuals for executive meetings or collaborative design reviews. These distinctions ensure adaptability across network conditions, with codecs like H.265 optimizing compression to balance and . Service models for videotelephony divide into on-premise hardware deployments and cloud-based (SaaS) offerings. On-premise systems involve dedicated hardware installations, such as suites in conference rooms, granting organizations full control over data and customization but demanding significant upfront capital for servers and maintenance. Cloud SaaS models, exemplified by platforms like Zoom or hosted on AWS, eliminate hardware needs through subscription access via web browsers, facilitating rapid deployment and automatic updates. This model prioritizes flexibility, though it relies on stable connectivity.
CategoryProsConsScalability (Small vs. Large Groups)
Free Consumer (e.g., , Zoom Basic)No cost; easy access on personal devices; sufficient for casual use.Feature limitations (e.g., time caps); basic security; poor for professional needs.Excellent for small (1-10 participants); limited for large due to caps.
Enterprise Subscription (e.g., Zoom Pro/Business)Advanced features (e.g., integrations, ); reliable support.Recurring fees (approx. $13-25/user/month annual as of November 2025); potential overkill for individuals.Strong for small to medium (up to 300); Enterprise scales to 1000+ with add-ons.
Low-End Quality (sub-1 Mbps, SD/)Low bandwidth use; mobile-friendly; cost-effective on weak networks.Reduced clarity; unsuitable for detailed visuals or groups.Ideal for small mobile groups; struggles with large due to compression artifacts.
High-End Quality (25+ Mbps, 4K )Immersive realism; high fidelity for collaboration.High bandwidth demands; expensive hardware.Limited for small (overkill); excels in large boardroom settings with multi-endpoint support.
On-Premise Hardware (e.g., dedicated rooms)Complete control; customizable; no dependency for core ops.High upfront costs; ongoing maintenance; IT expertise required.Fixed for small rooms; challenging for large/distributed groups without expansion.
SaaS (e.g., AWS-hosted Zoom)Scalable pay-as-you-go; easy global access; automatic scaling. reliance; potential concerns with third-party hosting.Seamless for small to large (auto-adjusts participants); handles thousands via resources.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.