Hubbry Logo
RTP-MIDIRTP-MIDIMain
Open search
RTP-MIDI
Community hub
RTP-MIDI
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
RTP-MIDI
RTP-MIDI
from Wikipedia
RTP-MIDI
International standardIETF RFC 4696
Developed byUC Berkeley
Websitewww.midi.org/midi-articles/rtp-midi-or-midi-over-networks

RTP-MIDI (also known as AppleMIDI) is a protocol to transport MIDI messages within Real-time Transport Protocol (RTP) packets over Ethernet and WiFi networks. It is completely open and free (no license is needed), and is compatible both with LAN and WAN application fields. Compared to MIDI 1.0, RTP-MIDI includes new features like session management, device synchronization and detection of lost packets, with automatic regeneration of lost data. RTP-MIDI is compatible with real-time applications, and supports sample-accurate synchronization for each MIDI message.

History of RTP-MIDI

[edit]

In 2004, John Lazzaro and John Wawrzynek, from UC Berkeley, made a presentation in front of AES named "An RTP payload for MIDI".[1] In 2006, the document was submitted to IETF and received the number RFC 4695.[2] In parallel, another document was released by Lazzaro and Wawrzynek to give details about practical implementation of the RTP-MIDI protocol, especially the journaling mechanism.[3]

RFC 4695 has been obsoleted by RFC 6295 in 2011. The protocol has not changed between the two version of the RFC documents, the last one contains correction of errors found in RFC 4695)[4]

The MMA (MIDI Manufacturers Association) has created a page on its website in order to provide basic information related to RTP-MIDI protocol.[5]

AppleMIDI

[edit]

Apple Computer introduced RTP-MIDI as a part of their operating system, Mac OS X v10.4, in 2005. The RTP-MIDI driver is reached using the Network icon in the MIDI/Audio Configuration tool. Apple's implementation strictly follows the RFC 4695 for RTP payload and journalling system, but uses a dedicated session management protocol; they do not follow the RFC 4695 session management proposal. This protocol is displayed in Wireshark as "AppleMIDI" and was later documented by Apple.

Apple also created a dedicated class in their mDNS/Bonjour implementation. Devices which comply with this class appear automatically in Apple's RTP-MIDI configuration panel as the Participants directory, making the Apple MIDI system fully 'Plug & Play'. However, it is possible to manually enter IP addresses and ports in this directory to connect to devices which do not support Bonjour.

Apple also introduced RTP-MIDI support in iOS4, but such devices cannot be session initiators.

The RTP-MIDI driver from Apple creates virtual MIDI ports named "Sessions", which are available as MIDI ports in any software, such as sequencers or software instruments, using CoreMIDI, where they appear as a pair of MIDI IN / MIDI OUT ports like any other MIDI 1.0 port or USB MIDI port.

Implementations

[edit]

Embedded devices

[edit]

In 2006, the Dutch company Kiss-Box presented a first embedded implementation of RTP-MIDI, in different products like MIDI or LTC interfaces.[6] These devices comply with AppleMIDI implementation, using the same session management protocol, in order to be compatible with the other devices and operating system using this protocol.

A proprietary driver was initially developed by the company for Windows XP, but it was restricted to communication with their devices; it was not possible to connect a PC with a Mac computer using this driver. The support of this driver was dropped in 2012 in favor of the standard approach when rtpMIDI driver for Windows became available.

Kiss-Box announced released in 2012 a new generation of CPU boards, named "V3", which support the session initiator functionalities. These models are able to establish sessions with other RTP-MIDI devices without requiring a computer as a control point.

During NAMM 2013, the Canadian company iConnectivity presented a new interface named iConnectivityMIDI4+ which supports RTP-MIDI and allows direct bridging between USB and RTP-MIDI devices. They have since followed up with several other RTP-MIDI capable interfaces, including the mio4 and mio10, and the PlayAUDIO 12.

Windows

[edit]

Tobias Erichsen in 2010 released a Windows implementation of Apple's RTP-MIDI driver.[7] This driver works under XP, Vista, Windows 7, Windows 8, and Windows 10, 32 and 64 bit versions.[8] The driver uses a configuration panel very similar to the Apple's one, and is fully compliant with Apple's implementation. It can then be used to connect a Windows machine with a Macintosh computer, but also embedded systems. As with Apple's driver, the Windows driver creates virtual MIDI ports, which become visible from any MIDI application running on the PC. Access is done through mmsystem layer, like all other MIDI ports.

Linux

[edit]

RTP-MIDI support for Linux has been reactivated in February 2013 after an idle period. Availability of drivers have been announced on some forums, based on the original work of Nicolas Falquet and Dominique Fober.[9][10]

A specific /but incomplete) implementation for Raspberry PI computer is also available, called raveloxmidi.[11] Check rtpmidid later below for a full implementation.

A full implementation of RTP-MIDI (including the journalling system) is available within the Ubuntu distribution, in the Scenic software package.[12]

There is a new implementation, rtpmidid,[13] that integrates seamlessly with the ALSA sequencer, allowing use of tools like QjackCtl to control the connections. This implementation is also available for ARM64, which means it works on Raspberry PI computer.

iOS

[edit]

Apple added full CoreMIDI support in their iOS devices in 2010, allowing the development of MIDI applications for iPhone, iPad and iPods. MIDI then became available from the docking port in the form of a USB controller, allowing connection of USB MIDI devices using the "Apple Camera Kit". It was also available in form of an RTP-MIDI session listener over WiFi.

iOS devices do not support session initiation functionalities, which requires the use of an external session initiator on the network to open an RTP-MIDI session with the iPad. This session initiator can be a Mac computer or a Windows computer with the RTP-MIDI driver activated, or an embedded RTP-MIDI device. The RTP-MIDI session appears under the name "Network MIDI" to all CoreMIDI applications on iOS, and no specific development is required to add RTP-MIDI support in the iOS application. The MIDI port is virtualized by CoreMIDI, so the programmer just needs to open a MIDI connection, regardless of whether the port is connected to USB or RTP-MIDI.

Some complaints arose about the use of the MIDI over USB with iOS devices,[14] since the iPad/iPhone must provide power supply to the external device. Some USB MIDI adapters draw too much current for the iPad, which limits the current and blocks the startup of the device, which then does not appear as available to the application. This problem is avoided by the use of RTP-MIDI.

Javascript

[edit]

Since June 2013, a Javascript implementation of RTP-MIDI, created by J.Dachtera, is available as an open-source project.[15] The source code is based on Apple's session management protocol, and can act as a session initiator and session listener.

Java

[edit]

Cross-platform Java implementations of RTP-MIDI are possible, particularly 'nmj' library.[16]

WinRT

[edit]

The WinRTP-MIDI project [17] is an open-source implementation of RTP-MIDI protocol stack under Windows RT. The code was initially designed to be portable between the various versions of Windows, but the last version has been optimized for WinRT, in order to simplify the design of applications for Windows Store.

Arduino

[edit]

RTP-MIDI was available for the Arduino platform in November 2013, under the name "AppleMIDI library".[18] The software module can run either on Arduino modules with integrated Ethernet adapter, like the Intel Galileo, or run on the "Ethernet shield".

KissBox produces an RTP-MIDI OEM module, an external communication processor board, which connects over an SPI bus link.

MIDIbox

[edit]

In December 2013, two members of the MIDIbox DIY group started to work on an initial version of MIOS (MIDIbox Operating System) including RTP-MIDI support over a fast SPI link. In order to simplify integration, it was decided to use an external network processor board handling the whole protocol stack. A first beta version was released in the second week of January 2014.[19] The first official software was released during first week of March 2014.

The protocol used on the SPI link between the MIOS processor and the network processor is based on the same format as USB, using 32-bit words containing a complete MIDI message, and has been proposed as an open standard for communication between network processor modules and MIDI application boards.

Axoloti

[edit]

The Axoloti is an open-source hardware synthesizer based on a STM32F427 ARM processor. This synthesizer is fully programmable using a virtual patch concept, similar to Max/MSP, and includes a full MIDI support. A node.js extension has been developed to allow RTP-MIDI connection of an Axoloti with any RTP-MIDI devices.[20] The Axoloti hardware can also be equipped with a RTP-MIDI external coprocessor, connected via the SPI bus available on the expansion port of the Axoloti core. The approach is the same as the one described for Arduino and MIDIbox.

MIDIKit Cross-platform library

[edit]

MIDIKit is an open-source, cross-platform library which provides a unified MIDI API for the various MIDI API available on the market (Core MIDI, Windows MME, Linux ALSA, etc...). MIDIKit supports RTP-MIDI protocol, including the journalling system. RTP-MIDI ports are seen within MIDIKit as complementary ports (they do not rely on rtpMIDI driver), added to native system MIDI ports[21]

Driverless use

[edit]

Since RTP-MIDI is based on UDP/IP, any application can implement the protocol directly, without needing any driver. The drivers are needed only when users want to make the networked MIDI ports appear as a standard MIDI port. For example, some Max/MSP objects and VST plugins have been developed following this methodology.

RTP-MIDI over AVB

[edit]

AVB is a set of technical standards which define specifications for extremely low latency streaming services over Ethernet networks. AVB networks are able to provide latencies down to one audio sample across a complete network.
RTP-MIDI is natively compatible with AVB networks, like any other IP protocol, since AVB switches (also known as "IEEE802.1 switches") automatically manage the priority between real-time audio/video streams and IP traffic. RTP-MIDI protocol can also use the real-time capabilities of AVB if the device implements the RTCP payload described in IEEE-1733 document.[22] RTP-MIDI applications can then correlate the "presentation" timestamp, provided by IEEE-802.1 Master Clock, with the RTP timestamp, ensuring a sample-accurate time distribution of the MIDI events.

Protocol

[edit]

RFC 4695/RFC 6295 split the RTP-MIDI implementation in different parts. The only mandatory one, which defines compliance to RTP-MIDI specification, is the payload format. The journalling part is optional, but RTP-MIDI packets shall indicate that they have an empty journal, so the journal is always present in the RTP-MIDI packet, even if it is empty. The session initiation/management part is purely informational. It was not used by Apple, which created its own session management protocol.

Header format

[edit]
RTP-MIDI Header Format
Section Bit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
RTP 0 V P X CC M Payload type (PT) Sequence number
32 Timestamp
64 Synchronization source (SSRC) identifier
96 Contributing source (CSRC) identifiers (optional)
MIDI commands B J Z P LEN… MIDI messages list…
Journal (optional depending on J flag) S Y A H TOTCHAN Checkpoint Packet Seqnum System journal (optional)…
Channel journals…

Sessions

[edit]

RTP-MIDI sessions are in charge of creating a virtual path between two RTP-MIDI devices, and they appear as a MIDI IN / MIDI OUT pair from the application point of view. RFC 6295 proposes to use SIP (Session Initiation Protocol) and SDP (Session Description Protocol), but Apple decided to create its own session management protocol. Apple's protocol links the sessions with names used on Bonjour, and also offers clock synchronization service.

Describes how RTP-MIDI sessions merge and duplicate MIDI streams automatically between controllers sharing the same session.

A given session is always created between two, and only two participants, each session being used to detect potential message loss between the two participants. However, a given session controller can open multiple sessions in parallel, which enables capabilities such as splitting, merging, or a distributed patchbay. On the diagram given here, device 1 has two sessions being opened at the same time, one with device 2 and another one with device 3, but the two sessions in device 1 appear as the same virtual MIDI interface to the final user.

Sessions vs. endpoints

[edit]

A common mistake is the mismatch between RTP-MIDI endpoints and RTP-MIDI sessions, since they both represent a pair of MIDI IN / MIDI OUT ports.

An endpoint is used to exchange MIDI data between the element (software and/or hardware) in charge of decoding the RTP-MIDI transport protocol and the element using the MIDI messages. In other terms, only MIDI data are visible at endpoint level. For devices with MIDI 1.0 DIN connectors, there is one endpoint per connector pair, for example: 2 endpoints for KissBox MIDI2TR, 4 endpoints for iConnectivityMIDI4+, etc. Devices using other communication links like SPI or USB offer more endpoints, for example, a device using the 32 bits encoding of USB MIDI Class can represent up to 16 endpoints using the Cable Identifier field. An endpoint is represented on the RTP-MIDI side by a paired UDP port when AppleMIDI session protocol is used.

A session defines the connection between two endpoints. MIDI IN of one endpoint is connected to the MIDI OUT of the remote endpoint, and vice versa. A single endpoint can accept multiple sessions, depending on the software configuration. Each session for a given endpoint appears as a single one for the remote session handler. A remote session handler does not know if the endpoint it is connected to is being used by other sessions at the same time. If multiple sessions are active for a given endpoint, the different MIDI streams reaching the endpoint are merged before the MIDI data are sent to the application. In the other direction, MIDI data produced by an application is sent to all session handlers connected to the endpoint.

AppleMIDI session participants

[edit]

AppleMIDI implementation defines two kind of session controllers: session initiators and session listeners. Session initiators are in charge of inviting the session listeners, and are responsible of the clock synchronization sequence. Session initiators can generally be session listeners, but some devices, such as iOS devices, can be session listeners only.

MIDI merging

[edit]

RTP-MIDI devices are able to merge different MIDI streams without needing any specific component, in contrast to MIDI 1.0 devices that require "MIDI mergers". As it can be seen on the diagram, when a session controller is connected to two or more remote sessions, it automatically merges the MIDI streams coming from the remote devices, without requiring any specific configuration.

MIDI splitting ("MIDI THRU")

[edit]

RTP-MIDI devices are able to duplicate MIDI streams from one session to any number of remote sessions without requiring any "MIDI THRU" support device. When an RTP-MIDI session is connected to two or more remote sessions, all the remote sessions receive a copy of the MIDI data sent from the source.

Distributed patchbay concept

[edit]

RTP-MIDI sessions are also able to provide a "patchbay" feature, which is possible under MIDI 1.0 only by using a separate hardware device. A MIDI 1.0 patchbay is a hardware device which allows dynamic connections between a set of MIDI inputs and a set of MIDI outputs, most of the time in the form of a matrix. The concept of "dynamic" connection is made in contrast to the classical use of MIDI 1.0 lines where cables were connected "statically" between two devices. Rather than establishing the data path between devices in form of a cable, the patchbay becomes a central point where all MIDI devices are connected. The software in the MIDI patchbay is configured to define which MIDI input goes to which MIDI output, and the user can change this configuration at any moment, without needing to disconnect the MIDI DIN cables.

The "patchbay" hardware modules are not needed anymore with RTP-MIDI, thanks to the session concept. The sessions are, by definition, virtual paths established over the network between two MIDI ports. No specific software is needed to perform the patchbay functions since the configuration process precisely defines the destinations for each MIDI stream produced by a given MIDI device. It is then possible to change at any time these virtual paths just by changing the destination IP addresses used by each session initiator. The "patch" configuration formed in this way can stored in non-volatile memory, to allow the patch to reform automatically when the setup is powered, but they can also be changed directly, like with the RTP-MIDI Manager software tool or with the RTP-MIDI drivers control panels, at RAM level.

Apple's session protocol

[edit]

RFC6295 document proposes to use SDP (Session Description Protocol) and SIP (Session Initiation Protocol) protocols in order to establish and manage sessions between RTP-MIDI partner. These two protocols are however quite heavy to implement especially on small systems, especially since they do not constrain any of the parameters enumerated in the session descriptor, like sampling frequency, which defines in turn all fields related to timing data both in RTP headers and RTP-MIDI payload. Moreover, the RFC6295 document only suggests using these protocols, allowing any other protocol to be used, leading to potential incompatibilities between suppliers.

Apple decided to create their own protocol, imposing all parameters related to synchronization like the sampling frequency. This session protocol is called "AppleMIDI" in Wireshark software. Session management with AppleMIDI protocol requires two UDP ports, the first one is called "Control Port", the second one is called "Data Port". When used within a multithread implementation, only the Data port requires a "real-time" thread, the other port can be controlled by a normal priority thread. These two ports must be located at two consecutive locations (n / n+1); the first one can be any of the 65536 possible ports.

There is no constraint on the number of sessions that can be opened simultaneously on the set of UDP ports with AppleMIDI protocol. It is possible to either create one port group per session manager, or use only one group for multiple sessions, which limits the memory footprint in the system. In this last case, the IP stack provides resources to identify partners from their IP address and ports numbers. This functionality is called "socket reuse" and is available in most modern IP implementations.

All AppleMIDI protocol messages use a common structure of 4 words of 32 bits, with a header containing two bytes with value 255, followed by two bytes describing the meaning of the message:

Description Wireshark header definition Field value (hex) Field value (chars)
Invitation APPLEMIDI_COMMAND_INVITATION 0x494e IN
Invitation accepted APPLEMIDI_COMMAND_INVITATION_ACCEPTED 0x4f4b OK
Invitation refused APPLEMIDI_COMMAND_INVITATION_REJECTED 0x4e4f NO
Closing session APPLEMIDI_COMMAND_ENDSESSION 0x4259 BY
Clock synchronization APPLEMIDI_COMMAND_SYNCHRONIZATION 0x434b CK
Journalling synchronization APPLEMIDI_COMMAND_RECEIVER_FEEDBACK 0x5253 RS
Bitrate APPLEMIDI_COMMAND_BITRATE_RECEIVE_LIMIT 0x524c RL

These messages control a state machine related to each session. For example, this state machine forbids any MIDI data exchange until a session reaches the "opened" state.

Invitation sequence

[edit]

Opening a session starts with an invitation sequence. The first session partner (the "Session Initiator") sends an IN message to the control port of the second partner. They answer by sending an OK message if they agree to open the session, or by a NO message if they do not accept the invitation. If an invitation is accepted on the control port, the same sequence is repeated on the data port. Once invitations have been accepted on both ports, the state machine goes into the synchronization phase.

Synchronization sequence

[edit]

The synchronization sequence allows both session participants to share informations related to their local clocks. This phase makes it possible to compensate for the latency induced by the network, and also to support the "future timestamping" (see "Latency" section below).

The session initiator sends a first message (named CK0) to the remote partner, giving its local time in 64 bits (Note that this is not an absolute time, but a time related to a local reference, generally given in microseconds since the startup of operating system kernel). This time is expressed on a 10 kHz sampling clock basis (100 microseconds per increment). The remote partner must answer this message with a CK1 message, containing its own local time in 64 bits. Both partners then know the difference between their respective clocks and can determine the offset to apply to Timestamp and Deltatime fields in the RTP-MIDI protocol.

The session initiator finishes this sequence by sending a last message called CK2, containing the local time when it received the CK1 message. This technique makes it possible to compute the average latency of the network, and also to compensate for a potential delay introduced by a slow starting thread, which can occur with non-realtime operating systems like Linux, Windows or OS X.

Apple recommends repeating this sequence a few times just after opening the session, in order to get better synchronization accuracy, in case one of them has been delayed accidentally because of a temporary network overload or a latency peak in a thread activation.

This sequence must repeat cyclically, between 2 and 6 times per minute typically, and always by the session initiator, in order to maintain long term synchronization accuracy by compensation of local clock drift, and also to detect a loss of communication partner. A partner not answering multiple CK0 messages shall consider that the remote partner is disconnected. In most cases, session initiators switch their state machine into "Invitation" state in order to re-establish communication automatically as soon as the distant partner reconnects to the network. Some implementations, especially on personal computers, also display an alert message and offer to the user a choice between a new connection attempt or closing the session.

Journal update

[edit]

The journalling mechanism permits to detect MIDI messages loss and allows the receiver to generate missing data without needing any retransmission. The journal keeps in memory "MIDI images" for the different session partners at different moments. However, it is useless to keep in memory the journalling data corresponding to events received correctly by a session partner. Each partner then sends cyclically to the other partner the RS message, indicating the last sequence number received correctly, in other words, without any gap between two sequence numbers. The sender can then free the memory containing old journalling data if necessary.

Disconnection of session's partner

[edit]

A session partner can ask at any moment to leave a session, which will close the session in return. This is done using the BY message. When a session partner receives this message, it immediately closes the session with the remote partner that sent the message, and it frees all resources allocated to this session. This message can be sent by the session initiator or by the session listener ("invited" partner).[23]

Latency

[edit]

The most common concern about RTP-MIDI is related to latency issues, a general concern with Digital Audio Workstations, mainly because it uses the IP stack. It can however easily be shown that a correctly programmed RTP-MIDI application or driver does not exhibit more latency than other communication methods.

Moreover, RTP-MIDI as described in RFC 6295 contains a latency compensation mechanism. A similar mechanism is found in most plugins, which can inform the host of the latency they add to the processing path. The host can then send samples to the plugin in advance, so the samples are ready and sent synchronously with other audio streams. The compensation mechanism described in RF6295 uses a relative timestamp system, based on the MIDI deltatime, as described in.[24] Each MIDI event transported in the RTP payload has a leading deltatime value, related to the current payload time origin, defined by the Timestamp field in RTP header.

Each MIDI event in the RTP-MIDI payload can then be strictly synchronized with the global clock. The synchronization accuracy directly depends on the clock source defined when opening the RTP-MIDI session. RFC 6295 gives some examples based on an audio sampling clock, in order to get a sample accurate timestamping of MIDI events. Apple's RTP-MIDI implementation, as with all other related implementations like rtpMIDI driver for Windows or KissBox embedded systems, use a fixed clock rate of 10 kHz rather than a sampling audio rate. The timing accuracy of all MIDI events is then 100 microseconds for these implementations.

Sender and receiver clocks are synchronized when the session is initiated, and they are kept synchronized during the whole session period by the regular synchronization cycles, controlled by the session initiators. This mechanism has the capability to compensate for any latency, from a few hundreds of microseconds, as seen on LAN applications, to seconds. It can compensate for the latency introduced by the Internet for example, allowing real-time execution of music pieces.

This mechanism is however mainly designed for pre-recorded MIDI streams, like the one coming from a sequencer track. When RTP-MIDI is used for real-time applications (e.g. controlling devices from a RTP-MIDI compatible keyboard [25]), deltatime is mostly set to the specific value of 0, which means that the related MIDI event shall be interpreted as soon as it is received). With such usecase, the latency compensation mechanism described previously can not be used.

The latency which can be obtained is then directly related to the different networking components involved in the communication path between the RTP-MIDI devices:

  • MIDI application processing time
  • IP communication stack processing time
  • Network switches/routers packet forwarding time

Application processing time

[edit]

Application processing time is generally tightly controlled, since MIDI tasks are most often real-time tasks. In most cases, the latency comes directly from the thread latency which can be obtained on a given operating system, typically 1-2 ms max on Windows and Mac OS systems. Systems with real-time kernel can achieve much better results, down to 100 microseconds. This time can be considered as constant, whatever the communication channel (MIDI 1.0, USB, RTP-MIDI, etc...), since the processing threads are operating on a different level than the communication related threads/tasks.

IP stack processing time

[edit]

IP stack processing time is the most critical one, since the communication process goes under operating system control. This applies to any communication protocol, IP related or not, since most operating systems, including Windows, Mac OS or Linux, do not allow direct access to the Ethernet adapter. In particular, a common mistake is to conflate "raw sockets" with "direct access to network"; sockets being the entry point to send and receive data over network in most operating systems. A "raw socket" is a socket which allows an application to send any packet using any protocol. The application is then responsible to build the telegram following given protocol rules, while "direct access" would require system-level access which is restricted to the operating system kernel. A packet sent using a raw socket can then be delayed by the operating system if the network adapter is currently being used by another application. Thus, an IP packet can be sent to the network before a packet related to a raw socket. Technically speaking, access to a given network card is controlled by "semaphores".[26]

IP stacks need to correlate Ethernet addresses (MAC address) and IP addresses, using a specific protocol named ARP. When a RTP-MIDI application wants to send a packet to a remote device, it must locate it first on the network, since Ethernet does not understand IP-related concepts, in order to create the transmission path between the routers/switches. This is done automatically by the IP stack by sending first an ARP (Address Recognition Protocol) request. When the destination device recognizes its own IP address in the ARP packet, it sends back an ARP reply with its MAC address. The IP stack can then send the RTP-MIDI packet. The next RTP-MIDI packets do not need the ARP sequence anymore, unless the link becomes inactive for a few minutes, which clears the ARP entry in the sender's routing table.

This ARP sequence can take a few seconds, which can in turn introduce noticeable latency, at least for the first RTP-MIDI packet. However, Apple's implementation solved this issue in an elegant manner, using the session control protocol. The session protocol uses the same ports as the RTP-MIDI protocol itself. The ARP sequence then takes place during the session initiation sequence. When the RTP-MIDI application wants to send the first RTP-MIDI packet, the computer's routing tables are already initialized with the correct destination MAC addresses, which avoids any latency for the first packet.

Besides the ARP sequence, the IP stack itself requires computations to prepare the packets headers, such as IP header, UDP header and RTP header. With modern processors, this preparation is extremely fast and takes only a few microseconds, which is negligible compared to the application latency itself. As described before, once prepared, a RTP-MIDI packet can only be delayed when it tries to reach the network adapter if the adapter is already transmitting another packet, whether the socket is an IP one or a "raw" one. However, the latency introduced at this level is generally extremely low since the driver threads in charge of the network adapters have very high priority. Moreover, most network adapters have FIFO buffers at the hardware level, so the packets can be stored for immediate transmission in the network adapter itself without needing the driver thread to be executed first. A method to help keep the latency related to "adapter access competition" as low as possible is to reserve the network adapter for MIDI communication only, and use a different network adapter for other network usages like file sharing or Internet browsing.

Network components routing time

[edit]

The different components used to transmit Ethernet packets between the computers, whatever the protocols being used, introduce latency too. All modern network switches use the "store and forward" technology, in which packets are stored in the switch before they are sent to the next switch. However, the switching times are most often negligible. For example, a 64-byte packet on 100 Mbit/s network takes around 5.1 microseconds to be forwarded by each network switch. A complex network with 10 switches on a given path introduces then a latency of 51 microseconds.

The latency is however directly related to the network load itself, since the switches will delay a packet until the previous one is transmitted. The computation/measure of the real latency introduced by the network components can be a hard task, and will involve representative usecases, for example, measuring the latency between two networked devices connected to the same network switch will always give excellent results. As said in the previous section, one solution to limit the latency introduced by the network components is to use separate networks. However, this is far less critical for network components than for network adapters in computers.

Expected latency for real-time applications

[edit]

As it can be seen, the exact latency obtained for RTP-MIDI link depends on many parameters, most of them being related to the operating systems themselves. Measurements made by the different RTP-MIDI actors give latency times from a few hundreds of microseconds for embedded systems using real-time operating systems, up to 3 milliseconds when computers running general purpose operating systems are involved.

Latency enhancement (sub millisecond latency)

[edit]

The AES started a working group named SC-02-12H[27] in 2010 in order to demonstrate the capability of using RTP payloads in IP networks for very low latency applications. The draft proposal issued by the group in May 2013 demonstrates that it is possible to achieve RTP streaming for live applications, with a latency value as low as 125 microseconds.

Configuration

[edit]

The other most common concern related to RTP-MIDI is the configuration process, since the physical connection of a device to a network is not enough to ensure communication with another device. Since RTP-MIDI is based on IP protocol stack, the different layers involved in the communication process must be configured, such as IP address and UDP ports. In order to simplify this configuration, different solutions have been proposed, the most common being the "Zero Configuration" set of technologies, also known as Zeroconf.

RFC 3927 [28] describes a common method to automatically assign IP addresses, which is used by most RTP-MIDI compatible products. Once connected to the IP network, such a device can assign itself an IP address, with automatic IP address conflict resolution. If the device follows port assignation recommendation from the RTP specification, the device becomes "Plug&Play" from the network point of view. It is then possible to create an RTP-MIDI network entirely without needing to define any IP address and/or UDP port numbers. However these methods are generally reserved for small setups. Complete automation of the network configuration is generally avoided on big setups, since the localization of faulty devices can become complex, because there will be no direct relationship between the IP address which has been selected by the Zeroconf system and the physical location of the device. A minimum configuration would be then to assign a name to the device before connecting it to the network, which voids the "true Plug&Play" concept in that case.

One must note that the "Zero Configuration" concept is restricted to network communication layers. It is technically impossible to perform the complete installation of any networked device (related to MIDI or not) just by abstracting the addressing layer. A practical usecase which illustrates this limitation is an RTP-MIDI sound generator that has to be controlled from a MIDI master keyboard connected to an RTP-MIDI interface. Even if the sound generator and the MIDI interface integrate the "Zero Configuration" services, they are unable to know by themselves that they need to establish a session together, because the IP configuration services are acting at different levels. Any networked MIDI system, whatever the protocol used to exchange MIDI data (based on IP or not), then requires the mandatory use of a configuration tool to define the exchanges that have to take place between the devices after they have been connected to the network. This configuration tool can be an external management tool running on a computer, or be embedded in the application software of a device in form of a configuration menu if the device integrates a Human-Machine Interface.

Compatibility with MIDI 2.0

[edit]

The MIDI Manufacturers Association has announced in January 2019 that a major evolution of MIDI protocol, called MIDI 2.0[29] was entering in final prototyping phase.

MIDI 2.0 relies heavily on MIDI-CI extension, used for protocol negotiation (identification of MIDI 1.0 and MIDI 2.0 devices to allow protocol switchover). RTP-MIDI fully supports MIDI-CI protocol, since it uses MIDI 1.0 System Exclusive even on MIDI 2.0 devices.

An evolution of RTP-MIDI protocol to include MIDI 2.0 has been presented to the MMA and is currently being discussed in the MIDI 2.0 working group. The enhanced protocol supports both MIDI 1.0 and MIDI 2.0 data format in parallel (MIDI 2.0 uses 32-bit based packets, while MIDI 1.0 uses 8-bit based packets)

Companies/Projects using RTP-MIDI

[edit]
  • Apple Computer (RTP-MIDI driver integrated in Mac OS X and iOS for the whole range of products) - RTP-MIDI over Ethernet and WiFi
  • Yamaha (Motif synthesizers, UD-WL01 adapter[30]) - RTP-MIDI over Ethernet and WiFi
  • Behringer (X-Touch Control Surface)[31]
  • KissBox (RTP-MIDI interfaces with MIDI 1.0, LTC, I/O and ArtNet, VST plugins for hardware synthesizer remote control)
  • Tobias Erichsen Consulting (Free RTP-MIDI driver for Windows / Utilities)
  • GRAME (Linux driver)
  • HRS (MIDI Timecode distribution on Ethernet / Synchronization software)
  • iConnectivity (Audio & MIDI interfaces with USB and RTP-MIDI support)
  • Merging Technologies (Horus, Hapi, Pyramix, Ovation) - RTP-MIDI for LTC/MTC, MIDI DIN, and MicPre control [32]
  • Zivix PUC (Wireless RTP-MIDI interface for iOS devices)[33]
  • Arduino-AppleMIDI-Library[34]
  • MIDIbox[35]
  • Cinara (MIDI interface with USB and RTP-MIDI support)[36]
  • McLaren Labs rtpmidi for Linux[37]
  • BEB (DSP modules for modular synthesizers based on RTP-MIDI backbone)[38]
  • Axoloti (Hardware open-source synthesizer with RTP-MIDI connectivity)[39]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
RTP-MIDI is a network protocol that specifies a (RTP) payload format for transmitting (MIDI) 1.0 messages over IP networks, supporting low-latency applications such as collaborative musical performances and MIDI content streaming while incorporating mechanisms for packet loss recovery and timing synchronization. Developed by researchers John Lazzaro and John Wawrzynek at the , RTP-MIDI originated from a 2004 presentation at the on RTP payloads for MIDI, leading to its standardization by the (IETF) Audio/Video Transport working group in collaboration with the MIDI Manufacturers Association (MMA). The protocol was first published as RFC 4695 in November 2006 and updated as RFC 6295 in June 2011 to refine payload encoding, recovery features, and integration with (SDP) for session management. RFC 4696 provides non-normative implementation guidance, emphasizing its use in both interactive real-time scenarios and non-interactive streaming. Key features of RTP-MIDI include encoding all standard 1.0 commands into RTP packets with headers for MIDI data and optional recovery journals—sequences of prior commands that enable reconstruction of lost packets without retransmission, using policies like closed-loop or anchor recovery to balance latency and reliability. It supports up to 16 channels per stream, timestamp-based timing for (including MIDI Time Code), and multiple related streams via SDP parameters such as musicport for port numbering or ordered relationships. The protocol operates over UDP or TCP in or modes, integrates with Secure RTP (SRTP) for security, and is compatible with standards like , Downloadable Sounds Level 2 (DLS2), and MPEG-4 Structured Audio, making it suitable for low-bitrate music coding and real-time audio/video . RTP-MIDI, often referred to as AppleMIDI in Apple's ecosystem, gained widespread adoption through native support in macOS and via the Core MIDI framework's Network Driver, which adds a proprietary session establishment protocol using Bonjour for peer discovery and UDP ports for control and data transmission. Other implementations include open-source drivers for Windows (e.g., by Tobias Erichsen), support in frameworks like , hardware devices such as Kiss-Box Ethernet interfaces, and libraries for microcontrollers, Android, and apps, enabling wireless over Ethernet and networks. While RTP-MIDI remains focused on 1.0, the MMA has introduced Network 2.0 as a UDP-based successor for enhanced bidirectional communication and 2.0 features.

Overview

Definition and Purpose

RTP-MIDI is a network protocol specification that encapsulates Musical Instrument Digital Interface (MIDI) messages within Real-time Transport Protocol (RTP) packets transmitted over User Datagram Protocol (UDP) or Transmission Control Protocol (TCP)/Internet Protocol (IP), enabling their transport across Ethernet and WiFi networks. This format supports the full range of MIDI 1.0 commands, including those for real-time performance data, synchronization, and control, while integrating with standard IP-based networking infrastructure to facilitate low-latency communication. The primary purpose of RTP-MIDI is to enable real-time, bidirectional transmission of data between devices without requiring specialized hardware beyond standard network interfaces, thereby supporting collaborative music production, remote instrument control, and live performances over IP networks. By leveraging RTP's timing mechanisms and optional recovery features, the protocol ensures reliable delivery suitable for interactive applications, such as synchronized ensemble playing or streaming content, while minimizing latency critical for musical timing. It addresses the limitations of physical connections, like DIN cables or USB, by allowing virtual "cables" over networks that mimic direct device linking. At its core, RTP-MIDI employs RTP for precise packet sequencing and timestamping to maintain MIDI event timing, RTCP for session control, feedback on , and stream , and a session-based that establishes persistent connections between endpoints. These components collectively provide resilience against network variability, such as or dropped packets, through configurable recovery journals and tools.

Key Features

RTP-MIDI enables low-latency transmission of MIDI data over IP networks by leveraging Real-time Transport Protocol (RTP) timestamps, which synchronize commands with precise timing relative to the RTP clock rate, typically set to ensure accurate playback in musical applications. Delta times encoded in the payload (1-4 octets) represent the interval between MIDI commands and the RTP timestamp, allowing for faithful reproduction of timing from sources like Standard MIDI Files. This real-time capability supports interactive performances where synchronization across devices is critical, with configurable modes for timestamp semantics such as asynchronous or buffered rendering. The protocol facilitates bidirectional, full-duplex communication through sendrecv sessions that emulate the simultaneous send-and-receive behavior of physical MIDI DIN cables, enabling interactive exchanges between endpoints without directional restrictions. Multiple streams can share a namespace, identified by unique synchronization source (SSRC) identifiers, which supports virtual port mappings for complex in applications like networked ensembles. This duplex nature ensures seamless integration with existing MIDI workflows, treating network connections as virtual cables. RTP-MIDI operates with over standard IP infrastructures, utilizing or UDP/IP (or optionally TCP/IP). is achieved through support for multiple endpoints in topologies or client-server configurations, where a central session can route data among numerous participants using unique SSRCs per stream and for group communications. This enables applications ranging from small duets to large-scale networked orchestras, with session descriptions via SDP parameters defining transport details like IP versions and port assignments. The protocol's design accommodates varying network sizes without performance degradation in typical musical contexts. Error resilience is provided by RTP sequence numbers for detecting and ordering packets, combined with recovery journals that maintain a history of recent MIDI commands to reconstruct lost data without retransmissions that could introduce latency. Journals use checkpoint packets as anchors and include tools like recency bits for SysEx messages, employing closed-loop or anchor policies to balance reliability and real-time flow. This mechanism ensures uninterrupted MIDI streams even under moderate , preserving the protocol's suitability for time-sensitive audio production.

History

Origins and Development

RTP-MIDI emerged from efforts in the early 2000s to transport Musical Instrument Digital Interface (MIDI) data over IP networks, addressing the constraints of traditional wired connections such as serial cables and USB, which limited mobility in music studios and live performances. Independent developers, notably John Lazzaro and John Wawrzynek at the University of California, Berkeley, initiated the project to encapsulate MIDI messages within Real-time Transport Protocol (RTP) packets, drawing on the RTP/RTCP framework outlined in IETF RFC 3550 published in 2003. This work was conducted in cooperation with the MIDI Manufacturers Association (MMA), aiming to enable low-latency, reliable MIDI transmission for network musical performances and remote collaboration among musicians. A pivotal milestone occurred in 2004 when Lazzaro and Wawrzynek presented "An RTP Payload for MIDI" at the 117th (AES) Convention in , introducing the core concepts of the payload format and its integration with IETF multimedia protocols like (SDP) and (SIP). This presentation built on earlier explorations, such as their 2001 paper "A Case for Network Musical Performance," which highlighted the potential for IP-based in interactive applications. The motivations centered on creating a robust solution for wireless over Ethernet and , mitigating through innovative recovery mechanisms like journals, while supporting both interactive real-time use and streaming content delivery. By 2006, the first draft specifications culminated in the publication of IETF RFC 4695, "RTP Payload Format for MIDI," formalizing the protocol as a proposed standard under the Audio/Video Transport . This document detailed the packetization of MIDI commands, synchronization strategies, and error handling tailored for unreliable networks. Prior to this standardization, open-source aspects were evident in early prototypes shared among developer communities; for instance, in 2004, developer Tobias Erichsen encountered Lazzaro's draft and began experimenting with RTP-MIDI encapsulation, contributing feedback and creating initial implementations discussed on forums and mailing lists. These grassroots efforts fostered innovation before the protocol's broader adoption. The foundational RTP-MIDI specifications paved the way for subsequent commercial integrations, including Apple's implementation in 2005.

AppleMIDI Introduction

Apple introduced support for network-based MIDI transport in macOS 10.4 , released on April 29, 2005, under the name "Network MIDI." This implementation utilized Apple's Bonjour protocol for automatic discovery of MIDI sessions on local IP networks, allowing multiple Macintosh computers to share data without additional hardware or drivers. The feature was built on the emerging RTP- protocol, which encapsulates messages within (RTP) packets to ensure low-latency transmission suitable for real-time music performance. A key innovation was the deep integration with Apple's Core MIDI framework, which abstracted the networking layer to appear as standard virtual ports within applications. This enabled seamless pairing and session management through intuitive interfaces reminiscent of device connections, simplifying setup for musicians. In technical documentation, the protocol became known as AppleMIDI, reflecting its proprietary extensions and the Bonjour service type _apple-midi._udp used for advertisement. The first public applications to leverage Network MIDI were Apple's 2.0 and 7, which were compatible with and supported the feature in macOS 10.4 Tiger. In 2010, support extended to iOS devices with the introduction of Core MIDI APIs in iOS 4.2, enabling wireless MIDI connectivity in mobile music apps. This driverless integration on Apple platforms significantly boosted RTP-MIDI adoption among consumer musicians, as it eliminated the need for specialized hardware interfaces and facilitated easy network-based workflows in popular software like , democratizing access to networked MIDI for home studios and education.

Evolution Toward Modern Standards

Following Apple's introduction of its proprietary session protocol atop the standardized RTP payload format in 2005, open-source initiatives emerged to broaden accessibility beyond macOS and ecosystems. The rtpmidid project, a daemon for sharing ALSA sequencer devices via , marked a key effort, with its initial beta release in April 2020 enabling network import and export of MIDI sessions. These developments facilitated informal through community-driven implementations, compensating for the lack of cross-platform native support in early adopters. While the IETF formalized the RTP payload for in RFC 4695 (November 2006), which defined packet structures and recovery mechanisms for real-time transmission, the protocol was further refined in RFC 6295, published in November 2011. Discussions on full protocol integration, including Apple's session management, did not yield additional RFCs due to the proprietary nature of those extensions. This left RTP-MIDI's session establishment reliant on reverse-engineered components in non-Apple environments, contributing to persistent compatibility challenges such as connection failures across operating systems, network instability during OS upgrades, and difficulties in multi-device setups. Recent advancements, like the rtpmidid version 24.12 release in December 2024, addressed some issues by enhancing the MIDI router for improved session routing and stability in diverse network topologies. The recognition of RTP's header overhead and complexity in resource-constrained devices spurred a transition to lighter UDP-based alternatives, prioritizing lower latency and simpler error handling. The MIDI Association advanced this shift with Network MIDI 2.0 (UDP), initially prototyped in 2023 and formally ratified in November 2024, which supports both 1.0 and 2.0 via Universal MIDI Packets while incorporating and absent in RTP-MIDI. At the NAMM 2025 Show in January, the Association unveiled initial implementations of Network MIDI 2.0, positioning RTP-MIDI as a foundational bridge to these 2.0 network extensions through layers.

Protocol Fundamentals

Packet Header Format

The RTP-MIDI packet format adheres to the (RTP) structure defined in RFC 3550, consisting of a fixed 12-byte RTP header followed by a MIDI-specific payload that encapsulates Musical Instrument Digital Interface (MIDI) commands and timing information. This design enables low-latency transmission of MIDI data over IP networks while supporting error recovery and synchronization. The payload type for RTP-MIDI is dynamically assigned from the range 96-127, as registered with the (IANA). The RTP header includes essential fields for packet identification, ordering, and timing, formatted in big-endian byte order:
FieldSize (bits)Value/Description
Version (V)2Set to 2.
Padding (P)1Typically 0; indicates padding if 1.
Extension (X)1Typically 0; indicates RTP header extension if 1.
CSRC Count (CC)4Number of contributing sources (usually 0).
Marker (M)1Set to 1 if the MIDI command section length is greater than 0.
Payload Type (PT)7Dynamic value (96-127) for RTP-.
Sequence Number16Monotonically increasing counter (initial value random) to detect .
32Reflects the sampling instant of the first octet in the RTP payload; clock rate specified in session setup (e.g., 1000 Hz for 1 ms resolution).
SSRC32Synchronization source identifier, unique per stream to distinguish sources.
CSRC ListVariable (0-15 × 32)Contributing sources, if CC > 0 (rarely used in RTP-).
Following the RTP header, the MIDI payload begins with a 1- or 2-byte header for native streams, which defines the structure of the enclosed data. The first octet contains bit fields: B (header length indicator, 1 bit: 0 for 1-octet header, 1 for 2-octet), J (recovery journal present, 1 bit), Z (delta time 0 present, 1 bit), P (phantom byte status for recovery, 1 bit), and the low 4 bits of LEN ( command section length in bytes, 0-15). If B=1, a second octet provides the high 8 bits of LEN (0-255), allowing up to 4095 bytes total, though practical limits apply due to network MTU. The command section, limited to 30 bytes, follows and contains a sequence of timestamped commands, each prefixed by a variable-length delta time (1-4 bytes, encoded in 7-bit units with continuation bits). If J=1, a recovery journal section (up to 52 bytes) appends for error correction. If the LEN field is 0, the payload is empty, forming a timing packet used solely for via the RTP . A representative byte-level breakdown of a basic RTP-MIDI packet with a non-empty (assuming no extensions, CSRC, or journal; first command is a Note On event) might appear as follows in (simplified for illustration, with RTP header in bytes 0-11, header in byte 12, and partial ):
  • Bytes 0-1: V=2, P=0, X=0, CC=0, M=1, PT=97 (0x61) → 0x8061
  • Bytes 2-3: Sequence Number (e.g., 0x0001) → 0x0001
  • Bytes 4-7: (e.g., 0x00002710 for 10000 at 1000 Hz) → 0x00002710
  • Bytes 8-11: SSRC (e.g., 0x12345678) → 0x12345678
  • Byte 12: Header (e.g., B=0, J=0, Z=0, P=0, LEN=4 → 0x04)
  • Bytes 13-16: Delta Time (e.g., 0 for immediate, encoded as 0x00) + Command (Note On channel 0, note 60, velocity 64 → 0x90 0x3C 0x40)
This structure ensures efficient encapsulation, with the total MIDI command section capped at 30 bytes to accommodate recovery data while minimizing latency. RTP-MIDI integrates with RTCP for control functions, such as sender reports that aid in bandwidth estimation, though these are handled outside the data packet format.

Session Establishment and Management

RTP-MIDI sessions are established and managed using the (SDP) to negotiate transport parameters, media encoding, and stream configurations, typically in conjunction with signaling protocols such as SIP Offer/Answer or declarative protocols like RTSP. SDP media lines (e.g., m=audio 5004 RTP/AVP 96) specify the RTP payload type, (e.g., 1000 Hz for 1 ms resolution or 44100 Hz), and attributes like a=rtpmap:96 rtp-midi/44100 for native streams. Additional format-specific parameters (fmtp) configure features such as mode (tsmode), recovery journal policies (e.g., j_sec=1 for 1-second updates), and MIDI command subsets (e.g., cm_used=note-on, note-off). For related streams sharing a MIDI , SDP grouping attributes (e.g., a=group:FID 1 2) or the musicport parameter define identities or ordering. Sessions support or over UDP (with recovery journals for loss resilience) or TCP (without journals). Multiple concurrent streams per endpoint are possible, enabling complex topologies like splitting namespaces across streams with synchronized timestamps and shared SSRC values. Sequence numbers in RTP headers ensure ordering and loss detection, while RTCP provides feedback for quality monitoring. Synchronization relies on RTP timestamps aligned across streams and periodic RTCP sender reports to maintain timing accuracy, with configurable parameters like rtp_ptime (packet duration) and guardtime (minimum inter-packet interval, often 0 ms for low latency). For teardown, standard RTP/RTCP mechanisms (e.g., BYE packets) release resources, though application-specific signaling may handle session closure. This SDP-based approach decouples session parameters from physical devices, allowing flexible virtual MIDI port mappings independent of underlying network transports.

Endpoint and Participant Roles

In RTP-MIDI, endpoints refer to any IP-capable devices that function as sources or sinks, such as controller keyboards, synthesizers, sequencers, or content servers, enabling the transmission and reception of data over networks. Each endpoint is uniquely identified within an RTP session by a 32-bit Synchronization Source Identifier (SSRC) in the RTP header, which distinguishes multiple streams, and by a Canonical Name (CNAME) in RTCP reports, which provides persistent identification across sessions and detects SSRC collisions. These identifiers ensure that endpoints can participate in UDP-based sessions, where each stream typically encodes a single namespace comprising 16 voice channels plus system commands, though namespaces may be split across sessions using identical SSRC values for related streams. Participant roles in RTP-MIDI sessions are defined by their involvement in data flow and session dynamics, primarily as senders or receivers, with senders responsible for MIDI data into RTP packets, timestamping commands, and maintaining recovery journals to mitigate , while receivers detect losses, repair artifacts using those journals, and render the MIDI output. In session establishment, participants adopt temporary roles as initiator or acceptor: the initiator (e.g., SDP offerer) proposes connection parameters, while the acceptor (e.g., answerer) confirms or modifies them, after which roles become symmetric for bidirectional exchange. This dynamic is specified via SDP attributes such as sendrecv, recvonly, or sendonly. RTP-MIDI supports multiple participants through RTP mixing at a central point or by grouping multiple / streams, enabling configurations like ensemble performances where is distributed to several receivers via shared namespaces or coordinated sessions. Role flexibility allows endpoints to switch functions (e.g., from sender to receiver) across sessions without fixed hierarchy. Compared to physical , RTP-MIDI extends connectivity over IP networks using standard RTP ports, treating sessions as virtual channels for sources and destinations while leveraging timestamps for synchronization.

Apple's Session Protocol

Invitation and Connection Sequence

The invitation and connection sequence in Apple's RTP-MIDI implementation, known as AppleMIDI, begins with service discovery via Bonjour, where participating devices advertise their availability using the service type _apple-midi._udp. This zero-configuration protocol allows devices on the same local network to discover each other without manual IP configuration, registering a control port (denoted as N) and an adjacent MIDI data port (N+1) for UDP communication. AppleMIDI sessions via Bonjour are designed for devices on the same local network; connections across NAT or subnets may require additional network configuration. Once a device identifies a potential peer through Bonjour, the initiator sends an INVITE packet, represented by the 16-bit command 'IN' (ASCII 0x494E), over the control port. This packet includes the protocol version (set to 2 in network byte order), a random 32-bit initiator token generated by the sender, the sender's 32-bit Synchronization Source Identifier (SSRC) for distinguishing RTP streams, and an optional NULL-terminated string for the initiator's name. If no response is received, the initiator resends the INVITE every second, up to a maximum of 12 attempts. The responder, upon receiving the INVITE, replies on the same control port with either an packet (command 'OK', ASCII 0x4F4B) to accept—copying the initiator's token and including its own SSRC and name—or a rejection via the NAK equivalent, the 'NO' packet (command 'NO', ASCII 0x4E4F), which omits the name field. Following successful control port , the initiator repeats the INVITE on the MIDI port to establish the data channel. The responder mirrors the response with or NO on the MIDI port, using the same field structure. Upon mutual acceptance, the initiator initiates using dedicated sync packets to align timestamps and compensate for network latency. These packets include the SSRC, a count field (starting at 0 and incrementing to 2 over three exchanges), and 64-bit timestamps measured in 100-microsecond units from the local system clock. The sequence computes a round-trip offset as ((timestamp3 + timestamp1) / 2) - timestamp2, enabling latency adjustment for subsequent RTP-MIDI data packets; this sync process repeats at least every 60 seconds to maintain the session. Rejection via NO terminates the attempt without further exchanges, and failed retries after 12 attempts prompt the initiator to restart discovery.

Synchronization Mechanisms

RTP-MIDI maintains timing alignment between session participants after connection establishment primarily through RTP timestamps embedded in packet headers and RTCP sender reports, which allow receivers to synchronize multiple streams from the same sender by correlating their timing fields. In Apple's implementation of RTP-MIDI, known as AppleMIDI, is further refined using CK () command packets, which exchange local clock values alongside RTP timestamps to compute timing offsets. These CK packets include up to three 64-bit timestamps in 100-microsecond units, enabling participants to estimate clock offsets for ongoing alignment. The basic offset calculation in this process derives from the difference between remote and local timestamps, normalized by the clock rate: \text{offset} = \frac{\text{remote_timestamp} - \text{local_timestamp}}{\text{clock_rate}} This formula provides a straightforward adjustment for drift, with more advanced NTP-like averaging (e.g., \text{offset_estimate} = \frac{\text{timestamp3} + \text{timestamp1}}{2} - \text{timestamp2}) used in initial exchanges and periodically refreshed every 60 seconds. Receivers apply these offsets to RTP timestamps to align incoming MIDI commands accurately. To mitigate network variability, RTP-MIDI employs an adaptive buffer at the receiver, which dynamically adjusts its size based on observed packet arrival times and sender timing consistency, typically ranging from 100 µs to 2 ms on low- LANs. This buffering smooths out without introducing excessive latency, ensuring MIDI commands are played out in the correct sequence and timing. Resynchronization is triggered by detected anomalies such as sequence number gaps in RTP packets or insights from periodic RTCP reports, prompting the receiver to realign its clock and buffer using the latest offset data. In AppleMIDI, periodic CK timing packets sent every 60 seconds allow for drift corrections during active sessions and maintain tight synchronization even under minor network fluctuations.

Journal Updates and Error Handling

In RTP-MIDI, the recovery journal serves as a key mechanism for mitigating packet loss by maintaining a structured history of recent MIDI events at each endpoint, enabling state reconstruction without relying on retransmission requests. The journal is organized into chapters that categorize MIDI commands—such as channel-specific notes (Chapter N), control changes (Chapter C), and system messages (Chapter Q)—and references a checkpoint packet via its RTP sequence number, allowing receivers to apply corrective actions like NoteOff commands for indefinite artifacts (e.g., stuck notes). This buffer captures the session state in an oldest-first order, supporting active, N-active, and C-active command types to prioritize essential recovery data. Journal updates occur dynamically as the sender appends new events to the recovery journal after transmitting each RTP packet and trims older entries based on RTCP feedback from the receiver, which reports the highest successfully received number. These periodic RTCP sender and receiver reports facilitate a closed-loop policy (the default), reducing journal overhead while ensuring sufficient history for loss recovery; for instance, checkpoints can be updated every 5 seconds to optimize size without compromising reliability. In AppleMIDI implementations, the journal always includes the recovery section (indicated by the J bit) and encompasses specific chapters like P (program change), C (control), W (aftertouch), N (note), T (timing), A (active sense), Q (), and F (SysEx fragments), while excluding others such as M, E, D, V, and X to streamline transmission. Packet loss is handled through gap detection in the 16-bit RTP sequence numbers (extended to 32 bits internally for rollover tracking), prompting the receiver to execute recovery commands from the journal embedded in arriving packets; the S bit in journal headers further aids detection of single-packet losses. To prevent bandwidth overload, journal size is constrained by RTCP feedback and policy parameters (e.g., j_sec="recj" enables journaling, with limits on history depth), ensuring efficient operation over UDP. This approach delivers reliable transmission comparable to TCP but with lower latency and overhead, suitable for network musical applications.

Disconnection Procedures

RTP-MIDI supports both graceful and abrupt disconnection procedures to ensure reliable session termination and . In graceful teardown, a participant sends an RTCP BYE packet to signal its exit from the session, which includes an optional reason code to specify the cause, such as user disconnection or protocol errors. This packet is transmitted unreliably over the control channel, and upon receipt, the receiving peer acknowledges it implicitly by ceasing transmission and closing the RTP and RTCP ports associated with the session. In the AppleMIDI variant, this corresponds to the "End Session" command encoded as the two-byte sequence 0x4259 ('BY'), sent via the control UDP port, mirroring the RTCP BYE structure while integrating with Apple's session management. For abrupt disconnections, such as those caused by network failures or crashes, RTP-MIDI implementations detect inactivity through timeouts on control packets. Receivers monitor for the absence of RTCP packets, including Sender Reports (SR), Receiver Reports (RR), and recovery journals, typically timing out after 5-10 seconds of silence to trigger automatic session closure and prevent indefinite resource holding or stuck MIDI notes. This aligns with the protocol's minimum RTCP transmission interval of 5 seconds for small sessions, allowing prompt detection without excessive delay. In AppleMIDI, the (CK, 0x434B) packets, sent approximately every by the session initiator, provide an additional heartbeat; prolonged absence reinforces the timeout-based disconnect. Upon disconnection—whether graceful or abrupt—implementations release resources by flushing any buffered MIDI events to avoid artifacts, deregistering the virtual MIDI ports created for the session, and updating the mDNS (Bonjour) service announcement to remove the endpoint from network discovery lists. AppleMIDI handles multiple concurrent sessions independently, ensuring that a disconnection in one does not propagate to others, with each session maintaining separate port pairs and state. Specific error conditions are conveyed via the BYE packet's reason field, a variable-length text string that may indicate "user disconnected" for manual terminations or "network failure" for connectivity issues, aiding in diagnostics and logging without requiring additional packets.

Advanced Protocol Features

MIDI Merging

In RTP-MIDI, the merging process occurs at the session receiver, where multiple incoming MIDI streams are combined into a single output stream to maintain compatibility with traditional MIDI 1.0 DIN cable merging standards. Receivers interleave MIDI commands from these streams based on their RTP timestamps, ensuring proper ordering and timing preservation across the combined output. This timestamp-based approach relies on RTP sequence numbers to detect and reconstruct packet order, preventing out-of-sequence delivery that could disrupt musical performance. The protocol does not include a native command for merging; instead, it is implemented through endpoint logic that processes streams identified by unique Synchronization Source Identifiers (SSRCs). A common use case for MIDI merging in RTP-MIDI is in professional studio setups, where multiple controllers—such as keyboards or sequencers—transmit data over a network to feed a single or audio workstation, enabling collaborative music production without physical cabling. For instance, in network musical performances, participants can share a session where incoming streams from remote devices are seamlessly integrated into the local receiver's MIDI namespace, supporting real-time synchronization via RTCP sender reports. Limitations arise from potential channel conflicts when multiple streams target the same MIDI channels, which can lead to artifacts like stuck notes if not managed properly. Senders mitigate this by partitioning streams—such as assigning distinct channels to separate RTP sessions—while receivers resolve duplicates through filtering based on sequence numbers and SSRCs, though the protocol recommends careful configuration to avoid indefinite issues. In AppleMIDI implementations, the Core MIDI framework handles this merging transparently at the system level, presenting the combined stream as a unified virtual MIDI port without requiring application-level intervention.

MIDI Splitting and Thru Functionality

RTP-MIDI supports splitting by allowing endpoints to replicate incoming packets across multiple active sessions, ensuring that data from a single source can be distributed to numerous destinations without loss of . This replication process preserves the original RTP timestamps embedded in the packets, which are crucial for maintaining timing accuracy and order of delivery across the network. The thru functionality in RTP-MIDI emulates the behavior of traditional physical thru ports, where incoming data is forwarded to additional outputs without alteration or processing, enabling seamless passthrough in networked environments. Within a single session, this occurs automatically as messages from one participant are duplicated and broadcast to all other connected devices, functioning as a virtual thru box. Implementation of splitting and thru relies on virtual MIDI ports created by the operating system's RTP-MIDI driver or dedicated router software, which handle the duplication and logic. For instance, endpoints can support to four or more outputs by participating in multiple concurrent sessions, with each session treated independently to direct replicated streams. This approach allows a single incoming MIDI stream to be fanned out across diverse network destinations, such as multiple hardware interfaces or software applications. At the protocol level, RTP-MIDI provides no dedicated commands for splitting or thru; instead, these features emerge from the underlying multi-session management capabilities of the AppleMIDI session protocol layered atop RTP. Endpoints manage replication by joining multiple sessions simultaneously, using session identifiers and assignments to segregate traffic without dedicated signaling. In networked setups, this functionality facilitates daisy-chaining of MIDI devices over Ethernet or , where intermediate endpoints can filter and replicate streams to downstream participants while preventing feedback loops through selective session participation and port isolation. This contrasts with MIDI merging, which combines inputs from multiple sources into a unified stream.

Distributed Patchbay Concept

The distributed patchbay concept in RTP-MIDI envisions the IP network as a flexible matrix of virtual MIDI cables, where endpoints dynamically connect and route data without physical interconnections. Devices advertise their virtual ports through Bonjour on the _apple-midi._udp service, enabling session invitations that establish bidirectional streams. Each session functions as a virtual cable pair, supporting up to 16 such pairs per endpoint in typical implementations, allowing users to patch sources to destinations across the network as if using a traditional hardware patchbay. This model leverages the protocol's UDP-based control and RTP payload channels to create on-demand connections, transforming scattered devices into an interconnected ecosystem. A key benefit of this approach is its for expansive setups, accommodating dozens to over 100 devices in environments by distributing logic across participants rather than requiring centralized hardware. It significantly reduces cabling complexity, as Ethernet or infrastructure handles long-distance transmission, supporting runs up to hundreds of meters without signal degradation. In contrast to legacy MIDI 1.0 systems limited by daisy-chaining and single-cable constraints, RTP-MIDI's virtual patching minimizes setup time and physical clutter, making it suitable for mobile or venue-based applications. The protocol enables this distributed patching through multi-session support, where a single endpoint can maintain concurrent connections to multiple peers using unique session and SSRC identifiers for isolation. Automatic discovery and invitation sequences allow ad-hoc reconfiguration, with data automatically merged from incoming sessions at the receiver or split to outgoing ones, building on endpoint-level operations like thru functionality. This facilitates seamless integration in heterogeneous networks, where devices join or leave without disrupting existing routings, provided the underlying IP topology remains stable. Representative examples include large-scale live performance networks, such as theater productions or setups, where a central control station dynamically routes from a conductor's surface to distributed instrument sections across a venue, ensuring synchronized playback via redundant virtual paths. In such configurations, technicians can re-patch streams in real-time—e.g., redirecting clock signals to backup devices—leveraging the protocol's timestamping for precise . Despite these advantages, the concept has limitations in highly complex topologies, often relying on a dedicated central router or hub device to aggregate and manage connections for stability and . The protocol provides no native for simultaneous data streams from multiple sources, leaving resolution to endpoint merging logic, which may introduce variability in large scenarios without additional network optimizations.

Implementations

Desktop Operating Systems

RTP-MIDI support on macOS is provided natively through the Audio MIDI Setup utility, which has included network MIDI capabilities since OS X 10.4 . Users configure sessions via the MIDI Studio window, enabling driverless connections for up to eight concurrent RTP-MIDI sessions without requiring additional software. This built-in implementation leverages Bonjour for discovery and supports seamless integration with Core applications, allowing MIDI data to be routed between networked Macs over Ethernet or . On Windows, the Windows MIDI Services framework, introduced in updates starting from 2023, enables multiclient access and enhanced 1.0/ functionality, including support for Network 2.0 (UDP) as of previews in 2025. However, RTP-MIDI requires third-party tools like rtpMIDI for protocol handling and network sessions. For bridging local applications to network sessions, tools like loopMIDI create virtual ports that interconnect with RTP-MIDI drivers, facilitating routing in workstations. Linux implementations rely on open-source daemons such as rtpmidid, whose latest version 24.12 (released December 2024) provides robust functionality with integration to the ALSA sequencer for sharing devices over networks. This daemon supports low-latency operation through JACK via its built-in router, enabling connections between local audio servers and remote peers without kernel-level drivers. Configuration occurs via command-line interfaces or INI files, emphasizing user-space operation for compatibility across distributions. coexists with emerging Network 2.0 support in distributions. Across desktop platforms, RTP-MIDI operates driverlessly in user space, promoting portability, though Windows and typically require applications like the rtpMIDI driver for Windows or rtpmidid for to establish sessions, contrasting macOS's out-of-the-box availability.

Mobile and Embedded Systems

RTP-MIDI has been natively supported on iOS devices through the Core MIDI framework since iOS 4.2, released in , enabling seamless integration with networks for transport without additional hardware. This implementation allows iOS apps to discover and connect to RTP-MIDI sessions, facilitating wireless control between iPhones, iPads, and other compatible devices. For session management and pairing, third-party applications such as MIDI Network provide user-friendly interfaces to view active connections, create sessions, and handle device discovery via Bonjour, ensuring reliable interoperability with macOS or other RTP-MIDI endpoints. Similarly, apps like NetMIDI extend this capability by allowing iOS devices to join existing network sessions initiated from other platforms. On Android, RTP-MIDI support relies primarily on third-party libraries, as there is no built-in native implementation in the Android MIDI API, which focuses on USB and LE transports since Android 6.0. Notable libraries include nmj (Network MIDI for ), an open-source solution that enables RTP-MIDI compatibility with Apple's protocol for apps requiring network MIDI functionality. Apps such as Midi Connector leverage these libraries to provide RTP-MIDI sessions over , broadcasting devices via Bonjour and supporting multiple connections, though limited to one free session without licensing. Native network MIDI enhancements remain prospective, with introducing MIDI 2.0 support for USB but deferring comprehensive network protocols like RTP-MIDI or the emerging Network MIDI 2.0 (UDP) to future updates as of November 2025. In embedded systems, RTP-MIDI operates driverlessly on Wi-Fi-enabled devices, leveraging lightweight daemons to bridge ALSA sequencers or other MIDI interfaces to the network. For instance, the open-source rtpmidid daemon on allows sharing local MIDI devices over RTP-MIDI, enabling the Pi to act as a session host or client compatible with and macOS endpoints without proprietary drivers. This setup supports headless operation, where the device auto-discovers peers via and maintains sessions over Ethernet or , making it suitable for compact IoT music controllers or networked synthesizers. Mobile and embedded RTP-MIDI implementations face constraints from limited battery life and processing power, particularly on resource-constrained devices where continuous network polling and correction can drain resources. To mitigate CPU overhead, optimizations such as reducing the of recovery journal updates—RTP-MIDI's mechanism for detection and retransmission—help lower computational demands while preserving low-latency performance. On battery-powered systems like smartphones or single-board computers, disabling or minimizing journaling intervals prevents excessive usage and generation, though this trades some resiliency for efficiency in stable networks. iOS 18, released in 2024, enhances accessory pairing via an allowing apps to offer AirPods-like setup for third-party LE devices, including potential controllers, while Core supports independent use of LE and RTP-. Hybrid setups may require apps to route between and network .

Software Libraries and Frameworks

Several software libraries and frameworks enable developers to integrate RTP-MIDI functionality into applications, providing APIs for managing network sessions, transmitting MIDI data packets, and handling events across various programming languages and platforms. These tools abstract the underlying protocol complexities, allowing for custom implementations in desktop, web, and embedded environments. In , the rtp-midi library offers a comprehensive implementation of the RTP-MIDI protocol, supporting session initiation, listener roles, and integration with the standard javax.sound.midi for seamless MIDI I/O operations. This third-party library facilitates creating and joining RTP-MIDI sessions, as well as sending and receiving MIDI packets over UDP, with built-in support for protocol journals to maintain during network disruptions. Developers can utilize event callbacks to process incoming MIDI messages in real-time, making it suitable for audio applications on JVM-based platforms. For JavaScript environments, particularly in browser and contexts, implementations like WEBrtpMIDI provide RTP-MIDI transmission capabilities, bridging web-based interfaces with network sessions. This library allows web applications to act as RTP-MIDI endpoints, receiving notes from connected sessions and forwarding them via RTP packets, often leveraging for peer-to-peer connectivity in browser scenarios. In , the rtpmidi package serves as both a session initiator and listener, enabling server-side applications to handle over networks with asynchronous event handling for efficient packet processing. Recent updates to the rtpmidi package in have enhanced compatibility with modern versions, introducing improved async support for better performance in concurrent streams. Cross-platform development benefits from libraries such as libRtpMidi, a portable C++ SDK compatible with Windows, , macOS, , and embedded systems, which exposes API calls for session creation, joining, and MIDI packet transmission/reception. For Python, the pymidi library implements RTP-MIDI and AppleMIDI protocols, allowing developers to build virtual network MIDI devices with features like event callbacks for real-time message handling. These frameworks typically include methods for error recovery via journals and support disconnection procedures, ensuring robust integration without delving into low-level protocol details. Brief references to OS-level integrations, such as using these libraries in desktop audio setups, highlight their role in extending native MIDI support.

Hardware Devices and Projects

RTP-MIDI has been implemented in various projects, particularly those leveraging embedded platforms for musical applications. One prominent example is the ecosystem, where the AppleMIDI library enables devices equipped with Ethernet shields or modules, such as or , to participate in RTP-MIDI sessions over IP networks. This library supports sending and receiving MIDI messages within RTP packets, facilitating the creation of wireless MIDI controllers and networked instruments without proprietary hardware. For instance, -based sketches can integrate RTP-MIDI with IoT applications, allowing remote control of synthesizers or effects processors via Ethernet or , as demonstrated in community examples that connect devices to software like Labs' rtpMIDI for cross-platform compatibility. The MIDIbox project, an open-source initiative for DIY modular synthesizers, incorporates RTP-MIDI support through running on MIOS32-based hardware. This implementation utilizes the KissBox RTP-MIDI OEM board to transport data over Ethernet, enabling high-speed communication and integration with existing MIDIbox modules like sequencers and I/O expanders. Key features include automatic merging and splitting across multiple devices, connectivity between MIDIbox units, and compatibility with up to eight simultaneous sessions and 16 virtual cables, making it suitable for complex modular setups in live performance environments. In contexts, RTP-MIDI extensions over (AVB) and (TSN) provide deterministic low-latency transport, addressing the timing requirements of synchronized systems. AVB, standardized under , allows RTP-MIDI payloads to leverage reserved bandwidth and precise time synchronization for pro audio workflows, reducing in networked MIDI streams compared to standard Ethernet. This approach has been explored in hardware designs that combine RTP-MIDI with AVB switches, enabling reliable MIDI distribution in larger installations while maintaining sub-millisecond latency bounds essential for real-time applications.

Performance and Configuration

Latency Factors

Latency in RTP-MIDI transmissions arises from multiple sources, including processing at the application level, encapsulation within the IP stack, propagation through the network, and buffering to handle . These components contribute to the overall , which must be minimized for real-time musical applications such as synchronized performances. Application processing introduces delay during MIDI message parsing and handling, particularly in digital audio workstations (DAWs) where incoming RTP-MIDI packets are decoded and routed to virtual instruments or sequencers. Latencies for this stage vary depending on hardware, software implementation, and MIDI event complexity. The IP stack contributes additional overhead through UDP and RTP encapsulation, where payloads are wrapped with headers, timestamps, and sequence numbers before transmission. This process adds latency that depends on the operating system's networking optimizations. Network routing forms a significant variable component, with each hop adding delay based on router processing and link speeds; in local area networks (LANs), this is often minimal for direct connections but increases with switches or gateways. environments exacerbate variability due to contention, signal interference, and retransmissions, which can disrupt precise timing in streams. End-to-end latency in RTP-MIDI over a typical LAN aggregates these factors, often resulting in delays that require adjustments like buffer tuning for applications needing tight synchronization with audio. This total can be conceptually expressed as: Total latency=app_time+stack_time+network_time+buffer_jitter\text{Total latency} = \text{app\_time} + \text{stack\_time} + \text{network\_time} + \text{buffer\_jitter} where buffer jitter accounts for playout delays to reorder packets and absorb network variability.

Network Configuration Options

RTP-MIDI sessions utilize UDP for RTP data transmission and RTCP control packets, with ports negotiated via SDP; examples in RFC 4696 use port 5004 for RTP and 5005 for RTCP. To ensure low-latency performance, network administrators can apply (QoS) policies that prioritize RTP-MIDI's UDP packets using Code Point (DSCP) markings in the . For real-time applications like RTP-MIDI, a common recommendation is to set the DSCP value to EF (Expedited Forwarding, decimal 46), which enables network devices to provide preferential treatment such as reduced delay and . This marking aligns with broader RTP guidelines for audio and control traffic, helping to mitigate congestion in shared networks. Firewall configurations must permit inbound traffic on the negotiated ports to support RTP-MIDI functionality. For wide-area network (WAN) access across NAT routers, configure port forwarding rules to map external ports to the internal device's IP and corresponding ports, ensuring bidirectional communication. On Windows systems, add exceptions in the Windows Firewall for the relevant UDP ports to prevent blocking of RTP-MIDI traffic. When deploying RTP-MIDI over wireless networks, prefer 5 GHz bands over 2.4 GHz to reduce interference from other devices and achieve lower latency, as the higher offers better signal-to-noise ratios in typical environments. Additionally, disable power-saving modes on client devices and access points, such as Wi-Fi Multimedia (WMM) power save or adapter sleep settings, to maintain consistent packet delivery without delays from doze cycles. For Linux-based systems implementing AppleMIDI compatibility, the avahi-daemon service provides mDNS/Bonjour discovery, enabling integration with tools like rtpmidid for bridging ALSA sequencers to network sessions.

Optimization Techniques

Optimization techniques for RTP-MIDI aim to minimize end-to-end latency in scenarios requiring precise timing, such as live performances or synchronized ensemble playing, by addressing network variability and processing delays. Jitter buffer tuning is a primary method to balance latency and reliability in RTP-MIDI receivers. On stable, low-jitter networks like dedicated LANs, buffers can be small to achieve low total latency, while higher-jitter environments may require larger buffers to prevent packet reordering issues. Implementations often include adjustable parameters allowing musicians to latency for robustness against network fluctuations. Dynamic adjustment leverages RTCP feedback reports, which provide statistics on inter-arrival jitter and , enabling the receiver to adapt buffer size in real-time—for instance, shrinking it during periods of low variance to reduce delay. For sub-millisecond enhancements, RTP-MIDI employs timestamp interpolation and local clock to refine event timing beyond raw packet arrival. Senders apply RTP timestamps based on a constant , using for MIDI events to maintain uniform increments and minimize perceived even if packets arrive out of order. Receivers interpolate timestamps against their local clock, synchronized via NTP or PTP, to predict and correct for minor drifts, achieving timing accuracy within microseconds on low-latency links. Integration with (AVB) further bounds latency by reserving bandwidth and shaping traffic on Ethernet networks, guaranteeing maximum delays of 2 ms for Class A streams across multiple hops, which suits RTP-MIDI's UDP-based transport for deterministic performance. Software tweaks enhance RTP-MIDI processing efficiency, particularly on systems. Employing a real-time kernel, such as , reduces scheduling latencies by prioritizing time-sensitive tasks, improving responsiveness for RTP-MIDI daemons like rtpmidid that bridge ALSA sequencers to network streams. In fallback scenarios using TCP for reliability over lossy networks, disabling via the TCP_NODELAY socket option prevents small-packet buffering, cutting additional delays for interactive MIDI flows. Hardware aids focus on network infrastructure to suppress variability. outperforms by delivering consistent low latencies per hop, avoiding wireless interference. Dedicating VLANs for traffic enables QoS prioritization, ensuring RTP packets bypass non-critical data and maintain bounded delays in shared environments. Measured improvements demonstrate these techniques' impact: on high-performance WANs like CalREN2, optimized RTP-MIDI achieves median latencies of 2.1 ms without buffering, while AVB-enhanced Ethernet setups reduce total delays to under 1 ms across 7 hops in controlled tests.

Compatibility and Future Directions

Integration with MIDI 2.0

RTP-MIDI, originally designed for transporting 1.0 messages, maintains backward compatibility by encapsulating these payloads within its (RTP) packets, allowing seamless operation with legacy devices and software. While custom implementations or proposed extensions may allow RTP-MIDI to transport Universal MIDI Packets (UMPs)—the standardized container for both 1.0 and 2.0 messages—no official standard exists as of November 2025 for carrying enhanced 2.0 data over RTP-MIDI sessions. RTP-MIDI natively supports MIDI 1.0 system exclusive messages, enabling transport of MIDI 2.0 features like property exchange via MIDI Capability Inquiry (MIDI CI), which allows devices to query and configure capabilities such as relative controls for encoders, ensuring interoperability in mixed 1.0 and 2.0 environments. In implementations, the Windows MIDI Services update in 2025 bridges RTP-MIDI connections to MIDI 2.0 ports through its UMP-centric architecture, translating incoming RTP payloads (MIDI 1.0) to UMP format for native 2.0 processing and multi-client access. However, RTP-MIDI lacks native support for MIDI 2.0's advanced timing and synchronization features, relying instead on application-layer conversions to approximate behaviors within its RTP-based clocking system.

Relation to Network MIDI 2.0

Network MIDI 2.0 (NM 2.0) is a UDP-based transport specification developed by the MIDI Association to enable the direct transmission of Universal MIDI Packets (UMPs) over IP networks, supporting both MIDI 1.0 and MIDI 2.0 protocols. Ratified in November 2024 and officially introduced at the 2025 NAMM Show, the standard simplifies network connectivity for MIDI devices via Ethernet and , with initial compatible hardware becoming available in 2025. Unlike RTP-MIDI, which relies on the (RTP) and Real-time Transport Control Protocol (RTCP) for packet delivery and feedback, NM 2.0 uses lightweight UDP packets prefixed with a 4-byte "" signature, eliminating the additional headers and processing overhead associated with RTP/RTCP. This design prioritizes efficiency for local area networks, avoiding the session establishment and congestion control mechanisms inherent in RTP-MIDI. Key differences between NM 2.0 and RTP-MIDI include port allocation and discovery processes. NM 2.0 employs dynamic UDP ports, with hosts allocating a shared port for incoming connections and clients using unique ports per session, rather than RTP-MIDI's fixed port usage (e.g., 5004). Device discovery in NM 2.0 leverages multicast DNS (mDNS) with DNS Service Discovery (DNS-SD) for efficient peer detection on local networks, contrasting with RTP-MIDI's more involved session invitation and negotiation. Furthermore, NM 2.0 achieves lower latency—typically under 1 ms on Ethernet and under 5 ms on high-quality wireless LAN—due to its direct UDP approach and minimal session management, making it suitable for timing-critical applications without the buffering delays of RTP. Reliability is maintained through optional forward error correction (FEC) and retransmission requests, rather than RTP's continuous feedback loops. RTP-MIDI functions as a bridge technology, providing for existing implementations and third-party tools, while NM 2.0 emerges as the MIDI Association's recommended standard for new developments. Although RTP-MIDI could theoretically encapsulate UMP payloads from NM 2.0, the added RTP complexity offers no advantages in modern, low-latency local networks, potentially hindering . In terms of , Microsoft's Windows MIDI Services update, released in preview on February 5, 2025, integrates full 2.0 support—including UMP handling and network transport capabilities—alongside continued RTP-MIDI compatibility, facilitating dual-protocol environments. NM 2.0 has seen early hardware integration, with synthesizers and infrastructure devices demonstrated at NAMM 2025, signaling its role in upcoming setups. The transition from RTP-MIDI to NM 2.0 positions the former for legacy systems and cross-platform bridging, while the latter is poised to dominate future IP MIDI implementations due to its simplicity, reduced overhead, and alignment with 2.0's Universal MIDI Packet framework. This evolution ensures seamless payload compatibility—such as UMPs for enhanced resolution and bidirectional communication—without requiring protocol overhauls in existing 2.0 integrations.

Transition Challenges and Adoption

One major challenge in transitioning from RTP-MIDI to Network MIDI 2.0 stems from compatibility limitations, as RTP-MIDI primarily supports MIDI 1.0 messages and lacks native handling for MIDI 2.0's Universal MIDI Packet (UMP) format, which enables advanced features like high-resolution continuous controllers and bidirectional communication. This requires gateways or converters to bridge full MIDI 2.0 functionality, potentially introducing latency or complexity in mixed environments. Additionally, developers and users must retrain on Network MIDI 2.0's discovery mechanisms, which differ from RTP-MIDI's session-based approach using Real-time Transport Protocol, as the newer standard employs simpler UDP-based peer-to-peer connections without built-in encryption in its initial version. Adoption of RTP-MIDI remains stable and dominant in network MIDI applications, particularly within the where it is natively integrated into macOS and , contributing to a slow shift toward Network MIDI 2.0 despite the latter's ratification in November 2024. As of 2025, Network MIDI 2.0 device support has grown from zero compatible products in 2024 to several initial releases showcased at the , including interfaces from manufacturers like Bome, MusicKraken, and Kissbox. This gradual uptake is hindered by the need for OS-level updates, with third-party drivers still necessary for non-Apple platforms. To address these barriers, solutions such as hybrid drivers have emerged, exemplified by Microsoft's Windows MIDI Services Customer Preview released in early 2025, which provides backward compatibility for 1.0 devices (including RTP-MIDI via existing APIs) while enabling 2.0 support over network transports. Open-source converters and general-purpose connector applications further facilitate transitions by allowing RTP-MIDI sessions to interface with Network MIDI 2.0 endpoints, minimizing disruptions for legacy setups. Looking ahead, Network MIDI 2.0 is projected for broader implementation by 2027, with anticipated OS integrations from and Linux's ALSA enhancing its viability for low-latency, connections. Meanwhile, RTP-MIDI is expected to persist in embedded systems and legacy Apple-centric workflows, ensuring continuity for established users even as Network MIDI 2.0 gains traction for new 2.0 deployments.

Adoption and Use Cases

Companies and Projects

Apple has been the primary developer and provider of RTP-MIDI, originally known as Apple MIDI, integrating it into the Core MIDI framework since macOS 10.4 and iOS 4.0. This enables seamless MIDI data sharing over Ethernet and networks through the Audio MIDI Setup application, where users can configure network sessions for connecting remote MIDI devices without additional hardware. The protocol's support extends to professional applications like , which leverages Core MIDI for network MIDI input and output. The open-source project davidmoreno/rtpmidid provides an RTP- daemon for systems, enabling ALSA sequencer devices to share over networks and import remote RTP- sessions via mDNS discovery. Maintained actively, it saw releases including version 23.12 in December 2024, supporting platforms like and for cross-platform networking. The Association references RTP- extensively in its documentation as the foundational IETF standard (RFC 6295) for transporting 1.0 over IP networks, serving as a precursor to the Network 2.0 specification, which builds on RTP principles for enhanced reliability and 2.0 compatibility. In the community, the official AppleMIDI library enables RTP-MIDI implementation on boards like and those with Ethernet shields, allowing users to send and receive packets over or Ethernet for custom networked controllers and devices. Community sketches utilizing this library support applications from simple MIDI bridges to full instrument interfaces. Notable deployments of RTP-MIDI include live sound reinforcement at festivals and events, such as the 2024 ILDA Awards-winning laser show , where RTP-MIDI was used alongside MTC to align varianced laser projectors with audio in real-time performances.

Real-World Applications

RTP-MIDI enables remote collaboration among musicians by transmitting data over IP networks, allowing digital audio workstations (DAWs) to synchronize in real-time across local area networks (LANs) or wide area networks (WANs). This capability supports virtual bands and distributed performances, where participants exchange musical gestures without geographical constraints, fostering creative workflows in home studios or global ensembles. In live performances, RTP-MIDI powers setups by signals to in-ear monitors, lighting controllers, and playback systems, eliminating the need for extensive cabling while maintaining . Musicians can trigger sequences and effects from various positions, enhancing mobility and reliability during tours and concerts. Educational environments leverage RTP-MIDI for classroom networks that connect shared controllers, enabling instructors to stream live demonstrations to students' devices with minimal latency. This setup facilitates interactive lessons, where learners can mirror performances or collaborate on compositions virtually, bridging physical and remote music instruction. In professional audio production, RTP-MIDI integrates with Audio Video Bridging (AVB) systems in theaters, delivering low-latency MIDI routing for coordinated audio playback, lighting cues, and video synchronization. This combination ensures precise timing in complex installations, supporting seamless operation across networked devices in live venues. Emerging uses of RTP-MIDI extend to of Musical Things (IoMusT) applications, such as networked control of digital devices, where research as of September 2025 proposes enhancements like encrypted extensions to address security challenges in UDP transmission for generation and remote interaction.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.