Recent from talks
Nothing was collected or created yet.
RTP-MIDI
View on WikipediaThis article has an unclear citation style. (May 2022) |
| International standard | IETF RFC 4696 |
|---|---|
| Developed by | UC Berkeley |
| Website | www |
RTP-MIDI (also known as AppleMIDI) is a protocol to transport MIDI messages within Real-time Transport Protocol (RTP) packets over Ethernet and WiFi networks. It is completely open and free (no license is needed), and is compatible both with LAN and WAN application fields. Compared to MIDI 1.0, RTP-MIDI includes new features like session management, device synchronization and detection of lost packets, with automatic regeneration of lost data. RTP-MIDI is compatible with real-time applications, and supports sample-accurate synchronization for each MIDI message.
History of RTP-MIDI
[edit]In 2004, John Lazzaro and John Wawrzynek, from UC Berkeley, made a presentation in front of AES named "An RTP payload for MIDI".[1] In 2006, the document was submitted to IETF and received the number RFC 4695.[2] In parallel, another document was released by Lazzaro and Wawrzynek to give details about practical implementation of the RTP-MIDI protocol, especially the journaling mechanism.[3]
RFC 4695 has been obsoleted by RFC 6295 in 2011. The protocol has not changed between the two version of the RFC documents, the last one contains correction of errors found in RFC 4695)[4]
The MMA (MIDI Manufacturers Association) has created a page on its website in order to provide basic information related to RTP-MIDI protocol.[5]
AppleMIDI
[edit]Apple Computer introduced RTP-MIDI as a part of their operating system, Mac OS X v10.4, in 2005. The RTP-MIDI driver is reached using the Network icon in the MIDI/Audio Configuration tool. Apple's implementation strictly follows the RFC 4695 for RTP payload and journalling system, but uses a dedicated session management protocol; they do not follow the RFC 4695 session management proposal. This protocol is displayed in Wireshark as "AppleMIDI" and was later documented by Apple.
Apple also created a dedicated class in their mDNS/Bonjour implementation. Devices which comply with this class appear automatically in Apple's RTP-MIDI configuration panel as the Participants directory, making the Apple MIDI system fully 'Plug & Play'. However, it is possible to manually enter IP addresses and ports in this directory to connect to devices which do not support Bonjour.
Apple also introduced RTP-MIDI support in iOS4, but such devices cannot be session initiators.
The RTP-MIDI driver from Apple creates virtual MIDI ports named "Sessions", which are available as MIDI ports in any software, such as sequencers or software instruments, using CoreMIDI, where they appear as a pair of MIDI IN / MIDI OUT ports like any other MIDI 1.0 port or USB MIDI port.
Implementations
[edit]Embedded devices
[edit]In 2006, the Dutch company Kiss-Box presented a first embedded implementation of RTP-MIDI, in different products like MIDI or LTC interfaces.[6] These devices comply with AppleMIDI implementation, using the same session management protocol, in order to be compatible with the other devices and operating system using this protocol.
A proprietary driver was initially developed by the company for Windows XP, but it was restricted to communication with their devices; it was not possible to connect a PC with a Mac computer using this driver. The support of this driver was dropped in 2012 in favor of the standard approach when rtpMIDI driver for Windows became available.
Kiss-Box announced released in 2012 a new generation of CPU boards, named "V3", which support the session initiator functionalities. These models are able to establish sessions with other RTP-MIDI devices without requiring a computer as a control point.
During NAMM 2013, the Canadian company iConnectivity presented a new interface named iConnectivityMIDI4+ which supports RTP-MIDI and allows direct bridging between USB and RTP-MIDI devices. They have since followed up with several other RTP-MIDI capable interfaces, including the mio4 and mio10, and the PlayAUDIO 12.
Windows
[edit]Tobias Erichsen in 2010 released a Windows implementation of Apple's RTP-MIDI driver.[7] This driver works under XP, Vista, Windows 7, Windows 8, and Windows 10, 32 and 64 bit versions.[8] The driver uses a configuration panel very similar to the Apple's one, and is fully compliant with Apple's implementation. It can then be used to connect a Windows machine with a Macintosh computer, but also embedded systems. As with Apple's driver, the Windows driver creates virtual MIDI ports, which become visible from any MIDI application running on the PC. Access is done through mmsystem layer, like all other MIDI ports.
Linux
[edit]RTP-MIDI support for Linux has been reactivated in February 2013 after an idle period. Availability of drivers have been announced on some forums, based on the original work of Nicolas Falquet and Dominique Fober.[9][10]
A specific /but incomplete) implementation for Raspberry PI computer is also available, called raveloxmidi.[11] Check rtpmidid later below for a full implementation.
A full implementation of RTP-MIDI (including the journalling system) is available within the Ubuntu distribution, in the Scenic software package.[12]
There is a new implementation, rtpmidid,[13] that integrates seamlessly with the ALSA sequencer, allowing use of tools like QjackCtl to control the connections. This implementation is also available for ARM64, which means it works on Raspberry PI computer.
iOS
[edit]Apple added full CoreMIDI support in their iOS devices in 2010, allowing the development of MIDI applications for iPhone, iPad and iPods. MIDI then became available from the docking port in the form of a USB controller, allowing connection of USB MIDI devices using the "Apple Camera Kit". It was also available in form of an RTP-MIDI session listener over WiFi.
iOS devices do not support session initiation functionalities, which requires the use of an external session initiator on the network to open an RTP-MIDI session with the iPad. This session initiator can be a Mac computer or a Windows computer with the RTP-MIDI driver activated, or an embedded RTP-MIDI device. The RTP-MIDI session appears under the name "Network MIDI" to all CoreMIDI applications on iOS, and no specific development is required to add RTP-MIDI support in the iOS application. The MIDI port is virtualized by CoreMIDI, so the programmer just needs to open a MIDI connection, regardless of whether the port is connected to USB or RTP-MIDI.
Some complaints arose about the use of the MIDI over USB with iOS devices,[14] since the iPad/iPhone must provide power supply to the external device. Some USB MIDI adapters draw too much current for the iPad, which limits the current and blocks the startup of the device, which then does not appear as available to the application. This problem is avoided by the use of RTP-MIDI.
Javascript
[edit]Since June 2013, a Javascript implementation of RTP-MIDI, created by J.Dachtera, is available as an open-source project.[15] The source code is based on Apple's session management protocol, and can act as a session initiator and session listener.
Java
[edit]Cross-platform Java implementations of RTP-MIDI are possible, particularly 'nmj' library.[16]
WinRT
[edit]The WinRTP-MIDI project [17] is an open-source implementation of RTP-MIDI protocol stack under Windows RT. The code was initially designed to be portable between the various versions of Windows, but the last version has been optimized for WinRT, in order to simplify the design of applications for Windows Store.
Arduino
[edit]RTP-MIDI was available for the Arduino platform in November 2013, under the name "AppleMIDI library".[18] The software module can run either on Arduino modules with integrated Ethernet adapter, like the Intel Galileo, or run on the "Ethernet shield".
KissBox produces an RTP-MIDI OEM module, an external communication processor board, which connects over an SPI bus link.
MIDIbox
[edit]In December 2013, two members of the MIDIbox DIY group started to work on an initial version of MIOS (MIDIbox Operating System) including RTP-MIDI support over a fast SPI link. In order to simplify integration, it was decided to use an external network processor board handling the whole protocol stack. A first beta version was released in the second week of January 2014.[19] The first official software was released during first week of March 2014.
The protocol used on the SPI link between the MIOS processor and the network processor is based on the same format as USB, using 32-bit words containing a complete MIDI message, and has been proposed as an open standard for communication between network processor modules and MIDI application boards.
Axoloti
[edit]The Axoloti is an open-source hardware synthesizer based on a STM32F427 ARM processor. This synthesizer is fully programmable using a virtual patch concept, similar to Max/MSP, and includes a full MIDI support. A node.js extension has been developed to allow RTP-MIDI connection of an Axoloti with any RTP-MIDI devices.[20] The Axoloti hardware can also be equipped with a RTP-MIDI external coprocessor, connected via the SPI bus available on the expansion port of the Axoloti core. The approach is the same as the one described for Arduino and MIDIbox.
MIDIKit Cross-platform library
[edit]MIDIKit is an open-source, cross-platform library which provides a unified MIDI API for the various MIDI API available on the market (Core MIDI, Windows MME, Linux ALSA, etc...). MIDIKit supports RTP-MIDI protocol, including the journalling system. RTP-MIDI ports are seen within MIDIKit as complementary ports (they do not rely on rtpMIDI driver), added to native system MIDI ports[21]
Driverless use
[edit]Since RTP-MIDI is based on UDP/IP, any application can implement the protocol directly, without needing any driver. The drivers are needed only when users want to make the networked MIDI ports appear as a standard MIDI port. For example, some Max/MSP objects and VST plugins have been developed following this methodology.
RTP-MIDI over AVB
[edit]AVB is a set of technical standards which define specifications for extremely low latency streaming services over Ethernet networks. AVB networks are able to provide latencies down to one audio sample across a complete network.
RTP-MIDI is natively compatible with AVB networks, like any other IP protocol, since AVB switches (also known as "IEEE802.1 switches") automatically manage the priority between real-time audio/video streams and IP traffic. RTP-MIDI protocol can also use the real-time capabilities of AVB if the device implements the RTCP payload described in IEEE-1733 document.[22] RTP-MIDI applications can then correlate the "presentation" timestamp, provided by IEEE-802.1 Master Clock, with the RTP timestamp, ensuring a sample-accurate time distribution of the MIDI events.
Protocol
[edit]RFC 4695/RFC 6295 split the RTP-MIDI implementation in different parts. The only mandatory one, which defines compliance to RTP-MIDI specification, is the payload format. The journalling part is optional, but RTP-MIDI packets shall indicate that they have an empty journal, so the journal is always present in the RTP-MIDI packet, even if it is empty. The session initiation/management part is purely informational. It was not used by Apple, which created its own session management protocol.
Header format
[edit]| Section | Bit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RTP | 0 | V | P | X | CC | M | Payload type (PT) | Sequence number | |||||||||||||||||||||||||
| 32 | Timestamp | ||||||||||||||||||||||||||||||||
| 64 | Synchronization source (SSRC) identifier | ||||||||||||||||||||||||||||||||
| 96 | Contributing source (CSRC) identifiers (optional) | ||||||||||||||||||||||||||||||||
| … | … | ||||||||||||||||||||||||||||||||
| MIDI commands | … | B | J | Z | P | LEN… | MIDI messages list… | ||||||||||||||||||||||||||
| Journal (optional depending on J flag) | … | S | Y | A | H | TOTCHAN | Checkpoint Packet Seqnum | System journal (optional)… | |||||||||||||||||||||||||
| Channel journals… | |||||||||||||||||||||||||||||||||
Sessions
[edit]RTP-MIDI sessions are in charge of creating a virtual path between two RTP-MIDI devices, and they appear as a MIDI IN / MIDI OUT pair from the application point of view. RFC 6295 proposes to use SIP (Session Initiation Protocol) and SDP (Session Description Protocol), but Apple decided to create its own session management protocol. Apple's protocol links the sessions with names used on Bonjour, and also offers clock synchronization service.

A given session is always created between two, and only two participants, each session being used to detect potential message loss between the two participants. However, a given session controller can open multiple sessions in parallel, which enables capabilities such as splitting, merging, or a distributed patchbay. On the diagram given here, device 1 has two sessions being opened at the same time, one with device 2 and another one with device 3, but the two sessions in device 1 appear as the same virtual MIDI interface to the final user.
Sessions vs. endpoints
[edit]A common mistake is the mismatch between RTP-MIDI endpoints and RTP-MIDI sessions, since they both represent a pair of MIDI IN / MIDI OUT ports.
An endpoint is used to exchange MIDI data between the element (software and/or hardware) in charge of decoding the RTP-MIDI transport protocol and the element using the MIDI messages. In other terms, only MIDI data are visible at endpoint level. For devices with MIDI 1.0 DIN connectors, there is one endpoint per connector pair, for example: 2 endpoints for KissBox MIDI2TR, 4 endpoints for iConnectivityMIDI4+, etc. Devices using other communication links like SPI or USB offer more endpoints, for example, a device using the 32 bits encoding of USB MIDI Class can represent up to 16 endpoints using the Cable Identifier field. An endpoint is represented on the RTP-MIDI side by a paired UDP port when AppleMIDI session protocol is used.
A session defines the connection between two endpoints. MIDI IN of one endpoint is connected to the MIDI OUT of the remote endpoint, and vice versa. A single endpoint can accept multiple sessions, depending on the software configuration. Each session for a given endpoint appears as a single one for the remote session handler. A remote session handler does not know if the endpoint it is connected to is being used by other sessions at the same time. If multiple sessions are active for a given endpoint, the different MIDI streams reaching the endpoint are merged before the MIDI data are sent to the application. In the other direction, MIDI data produced by an application is sent to all session handlers connected to the endpoint.
AppleMIDI session participants
[edit]AppleMIDI implementation defines two kind of session controllers: session initiators and session listeners. Session initiators are in charge of inviting the session listeners, and are responsible of the clock synchronization sequence. Session initiators can generally be session listeners, but some devices, such as iOS devices, can be session listeners only.
MIDI merging
[edit]RTP-MIDI devices are able to merge different MIDI streams without needing any specific component, in contrast to MIDI 1.0 devices that require "MIDI mergers". As it can be seen on the diagram, when a session controller is connected to two or more remote sessions, it automatically merges the MIDI streams coming from the remote devices, without requiring any specific configuration.
MIDI splitting ("MIDI THRU")
[edit]RTP-MIDI devices are able to duplicate MIDI streams from one session to any number of remote sessions without requiring any "MIDI THRU" support device. When an RTP-MIDI session is connected to two or more remote sessions, all the remote sessions receive a copy of the MIDI data sent from the source.
Distributed patchbay concept
[edit]RTP-MIDI sessions are also able to provide a "patchbay" feature, which is possible under MIDI 1.0 only by using a separate hardware device. A MIDI 1.0 patchbay is a hardware device which allows dynamic connections between a set of MIDI inputs and a set of MIDI outputs, most of the time in the form of a matrix. The concept of "dynamic" connection is made in contrast to the classical use of MIDI 1.0 lines where cables were connected "statically" between two devices. Rather than establishing the data path between devices in form of a cable, the patchbay becomes a central point where all MIDI devices are connected. The software in the MIDI patchbay is configured to define which MIDI input goes to which MIDI output, and the user can change this configuration at any moment, without needing to disconnect the MIDI DIN cables.
The "patchbay" hardware modules are not needed anymore with RTP-MIDI, thanks to the session concept. The sessions are, by definition, virtual paths established over the network between two MIDI ports. No specific software is needed to perform the patchbay functions since the configuration process precisely defines the destinations for each MIDI stream produced by a given MIDI device. It is then possible to change at any time these virtual paths just by changing the destination IP addresses used by each session initiator. The "patch" configuration formed in this way can stored in non-volatile memory, to allow the patch to reform automatically when the setup is powered, but they can also be changed directly, like with the RTP-MIDI Manager software tool or with the RTP-MIDI drivers control panels, at RAM level.
Apple's session protocol
[edit]RFC6295 document proposes to use SDP (Session Description Protocol) and SIP (Session Initiation Protocol) protocols in order to establish and manage sessions between RTP-MIDI partner. These two protocols are however quite heavy to implement especially on small systems, especially since they do not constrain any of the parameters enumerated in the session descriptor, like sampling frequency, which defines in turn all fields related to timing data both in RTP headers and RTP-MIDI payload. Moreover, the RFC6295 document only suggests using these protocols, allowing any other protocol to be used, leading to potential incompatibilities between suppliers.
Apple decided to create their own protocol, imposing all parameters related to synchronization like the sampling frequency. This session protocol is called "AppleMIDI" in Wireshark software. Session management with AppleMIDI protocol requires two UDP ports, the first one is called "Control Port", the second one is called "Data Port". When used within a multithread implementation, only the Data port requires a "real-time" thread, the other port can be controlled by a normal priority thread. These two ports must be located at two consecutive locations (n / n+1); the first one can be any of the 65536 possible ports.
There is no constraint on the number of sessions that can be opened simultaneously on the set of UDP ports with AppleMIDI protocol. It is possible to either create one port group per session manager, or use only one group for multiple sessions, which limits the memory footprint in the system. In this last case, the IP stack provides resources to identify partners from their IP address and ports numbers. This functionality is called "socket reuse" and is available in most modern IP implementations.
All AppleMIDI protocol messages use a common structure of 4 words of 32 bits, with a header containing two bytes with value 255, followed by two bytes describing the meaning of the message:
| Description | Wireshark header definition | Field value (hex) | Field value (chars) |
|---|---|---|---|
| Invitation | APPLEMIDI_COMMAND_INVITATION |
0x494e |
IN
|
| Invitation accepted | APPLEMIDI_COMMAND_INVITATION_ACCEPTED |
0x4f4b |
OK
|
| Invitation refused | APPLEMIDI_COMMAND_INVITATION_REJECTED |
0x4e4f |
NO
|
| Closing session | APPLEMIDI_COMMAND_ENDSESSION |
0x4259 |
BY
|
| Clock synchronization | APPLEMIDI_COMMAND_SYNCHRONIZATION |
0x434b |
CK
|
| Journalling synchronization | APPLEMIDI_COMMAND_RECEIVER_FEEDBACK |
0x5253 |
RS
|
| Bitrate | APPLEMIDI_COMMAND_BITRATE_RECEIVE_LIMIT |
0x524c |
RL
|
These messages control a state machine related to each session. For example, this state machine forbids any MIDI data exchange until a session reaches the "opened" state.
Invitation sequence
[edit]Opening a session starts with an invitation sequence. The first session partner (the "Session Initiator") sends an IN message to the control port of the second partner. They answer by sending an OK message if they agree to open the session, or by a NO message if they do not accept the invitation. If an invitation is accepted on the control port, the same sequence is repeated on the data port. Once invitations have been accepted on both ports, the state machine goes into the synchronization phase.
Synchronization sequence
[edit]The synchronization sequence allows both session participants to share informations related to their local clocks. This phase makes it possible to compensate for the latency induced by the network, and also to support the "future timestamping" (see "Latency" section below).
The session initiator sends a first message (named CK0) to the remote partner, giving its local time in 64 bits (Note that this is not an absolute time, but a time related to a local reference, generally given in microseconds since the startup of operating system kernel). This time is expressed on a 10 kHz sampling clock basis (100 microseconds per increment). The remote partner must answer this message with a CK1 message, containing its own local time in 64 bits. Both partners then know the difference between their respective clocks and can determine the offset to apply to Timestamp and Deltatime fields in the RTP-MIDI protocol.
The session initiator finishes this sequence by sending a last message called CK2, containing the local time when it received the CK1 message. This technique makes it possible to compute the average latency of the network, and also to compensate for a potential delay introduced by a slow starting thread, which can occur with non-realtime operating systems like Linux, Windows or OS X.
Apple recommends repeating this sequence a few times just after opening the session, in order to get better synchronization accuracy, in case one of them has been delayed accidentally because of a temporary network overload or a latency peak in a thread activation.
This sequence must repeat cyclically, between 2 and 6 times per minute typically, and always by the session initiator, in order to maintain long term synchronization accuracy by compensation of local clock drift, and also to detect a loss of communication partner. A partner not answering multiple CK0 messages shall consider that the remote partner is disconnected. In most cases, session initiators switch their state machine into "Invitation" state in order to re-establish communication automatically as soon as the distant partner reconnects to the network. Some implementations, especially on personal computers, also display an alert message and offer to the user a choice between a new connection attempt or closing the session.
Journal update
[edit]The journalling mechanism permits to detect MIDI messages loss and allows the receiver to generate missing data without needing any retransmission. The journal keeps in memory "MIDI images" for the different session partners at different moments. However, it is useless to keep in memory the journalling data corresponding to events received correctly by a session partner. Each partner then sends cyclically to the other partner the RS message, indicating the last sequence number received correctly, in other words, without any gap between two sequence numbers. The sender can then free the memory containing old journalling data if necessary.
Disconnection of session's partner
[edit]A session partner can ask at any moment to leave a session, which will close the session in return. This is done using the BY message. When a session partner receives this message, it immediately closes the session with the remote partner that sent the message, and it frees all resources allocated to this session. This message can be sent by the session initiator or by the session listener ("invited" partner).[23]
Latency
[edit]The most common concern about RTP-MIDI is related to latency issues, a general concern with Digital Audio Workstations, mainly because it uses the IP stack. It can however easily be shown that a correctly programmed RTP-MIDI application or driver does not exhibit more latency than other communication methods.
Moreover, RTP-MIDI as described in RFC 6295 contains a latency compensation mechanism. A similar mechanism is found in most plugins, which can inform the host of the latency they add to the processing path. The host can then send samples to the plugin in advance, so the samples are ready and sent synchronously with other audio streams. The compensation mechanism described in RF6295 uses a relative timestamp system, based on the MIDI deltatime, as described in.[24] Each MIDI event transported in the RTP payload has a leading deltatime value, related to the current payload time origin, defined by the Timestamp field in RTP header.
Each MIDI event in the RTP-MIDI payload can then be strictly synchronized with the global clock. The synchronization accuracy directly depends on the clock source defined when opening the RTP-MIDI session. RFC 6295 gives some examples based on an audio sampling clock, in order to get a sample accurate timestamping of MIDI events. Apple's RTP-MIDI implementation, as with all other related implementations like rtpMIDI driver for Windows or KissBox embedded systems, use a fixed clock rate of 10 kHz rather than a sampling audio rate. The timing accuracy of all MIDI events is then 100 microseconds for these implementations.
Sender and receiver clocks are synchronized when the session is initiated, and they are kept synchronized during the whole session period by the regular synchronization cycles, controlled by the session initiators. This mechanism has the capability to compensate for any latency, from a few hundreds of microseconds, as seen on LAN applications, to seconds. It can compensate for the latency introduced by the Internet for example, allowing real-time execution of music pieces.
This mechanism is however mainly designed for pre-recorded MIDI streams, like the one coming from a sequencer track. When RTP-MIDI is used for real-time applications (e.g. controlling devices from a RTP-MIDI compatible keyboard [25]), deltatime is mostly set to the specific value of 0, which means that the related MIDI event shall be interpreted as soon as it is received). With such usecase, the latency compensation mechanism described previously can not be used.
The latency which can be obtained is then directly related to the different networking components involved in the communication path between the RTP-MIDI devices:
- MIDI application processing time
- IP communication stack processing time
- Network switches/routers packet forwarding time
Application processing time
[edit]Application processing time is generally tightly controlled, since MIDI tasks are most often real-time tasks. In most cases, the latency comes directly from the thread latency which can be obtained on a given operating system, typically 1-2 ms max on Windows and Mac OS systems. Systems with real-time kernel can achieve much better results, down to 100 microseconds. This time can be considered as constant, whatever the communication channel (MIDI 1.0, USB, RTP-MIDI, etc...), since the processing threads are operating on a different level than the communication related threads/tasks.
IP stack processing time
[edit]IP stack processing time is the most critical one, since the communication process goes under operating system control. This applies to any communication protocol, IP related or not, since most operating systems, including Windows, Mac OS or Linux, do not allow direct access to the Ethernet adapter. In particular, a common mistake is to conflate "raw sockets" with "direct access to network"; sockets being the entry point to send and receive data over network in most operating systems. A "raw socket" is a socket which allows an application to send any packet using any protocol. The application is then responsible to build the telegram following given protocol rules, while "direct access" would require system-level access which is restricted to the operating system kernel. A packet sent using a raw socket can then be delayed by the operating system if the network adapter is currently being used by another application. Thus, an IP packet can be sent to the network before a packet related to a raw socket. Technically speaking, access to a given network card is controlled by "semaphores".[26]
IP stacks need to correlate Ethernet addresses (MAC address) and IP addresses, using a specific protocol named ARP. When a RTP-MIDI application wants to send a packet to a remote device, it must locate it first on the network, since Ethernet does not understand IP-related concepts, in order to create the transmission path between the routers/switches. This is done automatically by the IP stack by sending first an ARP (Address Recognition Protocol) request. When the destination device recognizes its own IP address in the ARP packet, it sends back an ARP reply with its MAC address. The IP stack can then send the RTP-MIDI packet. The next RTP-MIDI packets do not need the ARP sequence anymore, unless the link becomes inactive for a few minutes, which clears the ARP entry in the sender's routing table.
This ARP sequence can take a few seconds, which can in turn introduce noticeable latency, at least for the first RTP-MIDI packet. However, Apple's implementation solved this issue in an elegant manner, using the session control protocol. The session protocol uses the same ports as the RTP-MIDI protocol itself. The ARP sequence then takes place during the session initiation sequence. When the RTP-MIDI application wants to send the first RTP-MIDI packet, the computer's routing tables are already initialized with the correct destination MAC addresses, which avoids any latency for the first packet.
Besides the ARP sequence, the IP stack itself requires computations to prepare the packets headers, such as IP header, UDP header and RTP header. With modern processors, this preparation is extremely fast and takes only a few microseconds, which is negligible compared to the application latency itself. As described before, once prepared, a RTP-MIDI packet can only be delayed when it tries to reach the network adapter if the adapter is already transmitting another packet, whether the socket is an IP one or a "raw" one. However, the latency introduced at this level is generally extremely low since the driver threads in charge of the network adapters have very high priority. Moreover, most network adapters have FIFO buffers at the hardware level, so the packets can be stored for immediate transmission in the network adapter itself without needing the driver thread to be executed first. A method to help keep the latency related to "adapter access competition" as low as possible is to reserve the network adapter for MIDI communication only, and use a different network adapter for other network usages like file sharing or Internet browsing.
Network components routing time
[edit]The different components used to transmit Ethernet packets between the computers, whatever the protocols being used, introduce latency too. All modern network switches use the "store and forward" technology, in which packets are stored in the switch before they are sent to the next switch. However, the switching times are most often negligible. For example, a 64-byte packet on 100 Mbit/s network takes around 5.1 microseconds to be forwarded by each network switch. A complex network with 10 switches on a given path introduces then a latency of 51 microseconds.
The latency is however directly related to the network load itself, since the switches will delay a packet until the previous one is transmitted. The computation/measure of the real latency introduced by the network components can be a hard task, and will involve representative usecases, for example, measuring the latency between two networked devices connected to the same network switch will always give excellent results. As said in the previous section, one solution to limit the latency introduced by the network components is to use separate networks. However, this is far less critical for network components than for network adapters in computers.
Expected latency for real-time applications
[edit]As it can be seen, the exact latency obtained for RTP-MIDI link depends on many parameters, most of them being related to the operating systems themselves. Measurements made by the different RTP-MIDI actors give latency times from a few hundreds of microseconds for embedded systems using real-time operating systems, up to 3 milliseconds when computers running general purpose operating systems are involved.
Latency enhancement (sub millisecond latency)
[edit]The AES started a working group named SC-02-12H[27] in 2010 in order to demonstrate the capability of using RTP payloads in IP networks for very low latency applications. The draft proposal issued by the group in May 2013 demonstrates that it is possible to achieve RTP streaming for live applications, with a latency value as low as 125 microseconds.
Configuration
[edit]This section needs additional citations for verification. (October 2024) |
The other most common concern related to RTP-MIDI is the configuration process, since the physical connection of a device to a network is not enough to ensure communication with another device. Since RTP-MIDI is based on IP protocol stack, the different layers involved in the communication process must be configured, such as IP address and UDP ports. In order to simplify this configuration, different solutions have been proposed, the most common being the "Zero Configuration" set of technologies, also known as Zeroconf.
RFC 3927 [28] describes a common method to automatically assign IP addresses, which is used by most RTP-MIDI compatible products. Once connected to the IP network, such a device can assign itself an IP address, with automatic IP address conflict resolution. If the device follows port assignation recommendation from the RTP specification, the device becomes "Plug&Play" from the network point of view. It is then possible to create an RTP-MIDI network entirely without needing to define any IP address and/or UDP port numbers. However these methods are generally reserved for small setups. Complete automation of the network configuration is generally avoided on big setups, since the localization of faulty devices can become complex, because there will be no direct relationship between the IP address which has been selected by the Zeroconf system and the physical location of the device. A minimum configuration would be then to assign a name to the device before connecting it to the network, which voids the "true Plug&Play" concept in that case.
One must note that the "Zero Configuration" concept is restricted to network communication layers. It is technically impossible to perform the complete installation of any networked device (related to MIDI or not) just by abstracting the addressing layer. A practical usecase which illustrates this limitation is an RTP-MIDI sound generator that has to be controlled from a MIDI master keyboard connected to an RTP-MIDI interface. Even if the sound generator and the MIDI interface integrate the "Zero Configuration" services, they are unable to know by themselves that they need to establish a session together, because the IP configuration services are acting at different levels. Any networked MIDI system, whatever the protocol used to exchange MIDI data (based on IP or not), then requires the mandatory use of a configuration tool to define the exchanges that have to take place between the devices after they have been connected to the network. This configuration tool can be an external management tool running on a computer, or be embedded in the application software of a device in form of a configuration menu if the device integrates a Human-Machine Interface.
Compatibility with MIDI 2.0
[edit]The MIDI Manufacturers Association has announced in January 2019 that a major evolution of MIDI protocol, called MIDI 2.0[29] was entering in final prototyping phase.
MIDI 2.0 relies heavily on MIDI-CI extension, used for protocol negotiation (identification of MIDI 1.0 and MIDI 2.0 devices to allow protocol switchover). RTP-MIDI fully supports MIDI-CI protocol, since it uses MIDI 1.0 System Exclusive even on MIDI 2.0 devices.
An evolution of RTP-MIDI protocol to include MIDI 2.0 has been presented to the MMA and is currently being discussed in the MIDI 2.0 working group. The enhanced protocol supports both MIDI 1.0 and MIDI 2.0 data format in parallel (MIDI 2.0 uses 32-bit based packets, while MIDI 1.0 uses 8-bit based packets)
Companies/Projects using RTP-MIDI
[edit]- Apple Computer (RTP-MIDI driver integrated in Mac OS X and iOS for the whole range of products) - RTP-MIDI over Ethernet and WiFi
- Yamaha (Motif synthesizers, UD-WL01 adapter[30]) - RTP-MIDI over Ethernet and WiFi
- Behringer (X-Touch Control Surface)[31]
- KissBox (RTP-MIDI interfaces with MIDI 1.0, LTC, I/O and ArtNet, VST plugins for hardware synthesizer remote control)
- Tobias Erichsen Consulting (Free RTP-MIDI driver for Windows / Utilities)
- GRAME (Linux driver)
- HRS (MIDI Timecode distribution on Ethernet / Synchronization software)
- iConnectivity (Audio & MIDI interfaces with USB and RTP-MIDI support)
- Merging Technologies (Horus, Hapi, Pyramix, Ovation) - RTP-MIDI for LTC/MTC, MIDI DIN, and MicPre control [32]
- Zivix PUC (Wireless RTP-MIDI interface for iOS devices)[33]
- Arduino-AppleMIDI-Library[34]
- MIDIbox[35]
- Cinara (MIDI interface with USB and RTP-MIDI support)[36]
- McLaren Labs rtpmidi for Linux[37]
- BEB (DSP modules for modular synthesizers based on RTP-MIDI backbone)[38]
- Axoloti (Hardware open-source synthesizer with RTP-MIDI connectivity)[39]
References
[edit]- ^ An RTP Payload format for MIDI. The 117th Convention of the Audio Engineering Society, October 28-31, 2004, San Francisco, CA.
- ^ RTP Payload format for MIDI - RFC 4695
- ^ Implementation Guide for RTP MIDI. RFC 4696
- ^ RTP Payload format for MIDI - RFC 6295
- ^ https://www.midi.org/midi-articles/rtp-midi-or-midi-over-networks 'About RTP-MIDI' page on MMA website
- ^ Kiss-Box website (hardware devices using RTP-MIDI protocol)
- ^ RTP-MIDI driver for Windows
- ^ "RtpMIDI | Tobias Erichsen".
- ^ "Implementing a MIDI stream over RTP" (PDF). Archived from the original (PDF) on 2013-01-31. Retrieved 2013-05-11.
- ^ "Recovery journal and evaluation of alternative proposal" (PDF). Archived from the original (PDF) on 2013-01-31. Retrieved 2013-05-11.
- ^ https://github.com/ravelox/pimidi RTP-MIDI implementation dedicated to Raspberry PI platform
- ^ http://manpages.ubuntu.com/manpages/oneiric/man1/midistream.1.html#contenttoc0 Archived 2015-05-18 at the Wayback Machine User's manual of RTP-MIDI object called "midistream" under Linux Ubuntu
- ^ https://github.com/davidmoreno/rtpmidid rtpmidid at github
- ^ "Apple page about USB MIDI connectivity problems". support.apple.com.
- ^ "Node RTP Midi". GitHub. 3 March 2022.
- ^ "nmj". Humatic.de. Retrieved 2022-05-27.
- ^ http://winrtpmidi.codeplex.com Archived 2014-05-21 at the Wayback Machine Website of open-source WinRTP-MIDI project
- ^ RTP-MIDI/AppleMIDI library for Arduino
- ^ MIDIbox forum announcement of RTP-MIDI support in MIOS
- ^ https://gist.github.com/DatanoiseTV/6a59fc66517fbd923ed9 Node.js extension to provide RTP-MIDI connection to Axoloti
- ^ https://github.com/jpommerening/midikit/blob/master/driver/common/rtpmidi.c Cross-platform unified MIDI library with integrated RTP-MIDI support
- ^ IEEE Standard for Layer 3 Transport Protocol for Time-Sensitive Applications in Local Area Networks
- ^ "MIDI Network Driver Protocol". developer.apple.com. Retrieved 2025-02-10.
- ^ MIDI 1.0 Specification - Section 4 - Standard MIDI Files
- ^ "CME - Partner". Archived from the original on 2013-03-16. Retrieved 2013-05-10. RTP-MIDI expansion kit for CME keyboards
- ^ "Operating systems semaphores".[user-generated source]
- ^ AES standard group for audio interoperability over IP networks
- ^ Automatic configuration of IPv4 Link-Local addresses - RFC3927
- ^ "The MIDI Manufacturers Association (MMA) and the Association of Music Electronics Industry (AMEI) announce MIDI 2.0™ Prototyping -". Archived from the original on 2019-02-10. Retrieved 2019-02-07.
- ^ "UD-WL01 - Overview - Yamaha USA".
- ^ "Behringer: X-TOUCH". www.behringer.com. Archived from the original on 2014-01-26.
- ^ "Merging Technologies | Products Overview".
- ^ "The Legacy Wireless WiFi MIDI Product".
- ^ "lathoub/Arduino-AppleMidi-Library". GitHub. Retrieved 2016-05-28.
- ^ MIDIbox homepage
- ^ Cinara homepage
- ^ McLaren Labs
- ^ HorusDSP Homepage
- ^ "Axoloti main page". Archived from the original on 2016-12-31. Retrieved 2016-04-14.
RTP-MIDI
View on GrokipediaOverview
Definition and Purpose
RTP-MIDI is a network protocol specification that encapsulates Musical Instrument Digital Interface (MIDI) messages within Real-time Transport Protocol (RTP) packets transmitted over User Datagram Protocol (UDP) or Transmission Control Protocol (TCP)/Internet Protocol (IP), enabling their transport across Ethernet and WiFi networks.[8][5] This format supports the full range of MIDI 1.0 commands, including those for real-time performance data, synchronization, and control, while integrating with standard IP-based networking infrastructure to facilitate low-latency communication.[8] The primary purpose of RTP-MIDI is to enable real-time, bidirectional transmission of MIDI data between devices without requiring specialized hardware beyond standard network interfaces, thereby supporting collaborative music production, remote instrument control, and live performances over IP networks.[8] By leveraging RTP's timing mechanisms and optional recovery features, the protocol ensures reliable delivery suitable for interactive applications, such as synchronized ensemble playing or streaming MIDI content, while minimizing latency critical for musical timing.[8] It addresses the limitations of physical MIDI connections, like DIN cables or USB, by allowing virtual "cables" over networks that mimic direct device linking.[5] At its core, RTP-MIDI employs RTP for precise packet sequencing and timestamping to maintain MIDI event timing, RTCP for session control, feedback on packet loss, and stream synchronization, and a session-based architecture that establishes persistent connections between endpoints.[8] These components collectively provide resilience against network variability, such as jitter or dropped packets, through configurable recovery journals and synchronization tools.[8]Key Features
RTP-MIDI enables low-latency transmission of MIDI data over IP networks by leveraging Real-time Transport Protocol (RTP) timestamps, which synchronize commands with precise timing relative to the RTP clock rate, typically set to ensure accurate playback in musical applications. Delta times encoded in the payload (1-4 octets) represent the interval between MIDI commands and the RTP timestamp, allowing for faithful reproduction of timing from sources like Standard MIDI Files. This real-time capability supports interactive performances where synchronization across devices is critical, with configurable modes for timestamp semantics such as asynchronous or buffered rendering.[8] The protocol facilitates bidirectional, full-duplex communication through sendrecv sessions that emulate the simultaneous send-and-receive behavior of physical MIDI DIN cables, enabling interactive exchanges between endpoints without directional restrictions. Multiple streams can share a MIDI namespace, identified by unique synchronization source (SSRC) identifiers, which supports virtual port mappings for complex routing in applications like networked music ensembles. This duplex nature ensures seamless integration with existing MIDI workflows, treating network connections as virtual cables.[8] RTP-MIDI operates with network transparency over standard IP infrastructures, utilizing unicast or multicast UDP/IP (or optionally TCP/IP). Scalability is achieved through support for multiple endpoints in peer-to-peer topologies or client-server configurations, where a central session can route data among numerous participants using unique SSRCs per stream and multicast for group communications. This enables applications ranging from small duets to large-scale networked orchestras, with session descriptions via SDP parameters defining transport details like IP versions and port assignments. The protocol's design accommodates varying network sizes without performance degradation in typical musical contexts.[8] Error resilience is provided by RTP sequence numbers for detecting and ordering packets, combined with recovery journals that maintain a history of recent MIDI commands to reconstruct lost data without retransmissions that could introduce latency. Journals use checkpoint packets as anchors and include tools like recency bits for SysEx messages, employing closed-loop or anchor policies to balance reliability and real-time flow. This mechanism ensures uninterrupted MIDI streams even under moderate packet loss, preserving the protocol's suitability for time-sensitive audio production.[8][9]History
Origins and Development
RTP-MIDI emerged from efforts in the early 2000s to transport Musical Instrument Digital Interface (MIDI) data over IP networks, addressing the constraints of traditional wired connections such as serial cables and USB, which limited mobility in music studios and live performances. Independent developers, notably John Lazzaro and John Wawrzynek at the University of California, Berkeley, initiated the project to encapsulate MIDI messages within Real-time Transport Protocol (RTP) packets, drawing on the RTP/RTCP framework outlined in IETF RFC 3550 published in 2003. This work was conducted in cooperation with the MIDI Manufacturers Association (MMA), aiming to enable low-latency, reliable MIDI transmission for network musical performances and remote collaboration among musicians.[10][2][9] A pivotal milestone occurred in 2004 when Lazzaro and Wawrzynek presented "An RTP Payload for MIDI" at the 117th Audio Engineering Society (AES) Convention in San Francisco, introducing the core concepts of the payload format and its integration with IETF multimedia protocols like Session Description Protocol (SDP) and Session Initiation Protocol (SIP). This presentation built on earlier explorations, such as their 2001 paper "A Case for Network Musical Performance," which highlighted the potential for IP-based MIDI in interactive applications. The motivations centered on creating a robust solution for wireless MIDI over Ethernet and Wi-Fi, mitigating packet loss through innovative recovery mechanisms like journals, while supporting both interactive real-time use and streaming content delivery.[11] By 2006, the first draft specifications culminated in the publication of IETF RFC 4695, "RTP Payload Format for MIDI," formalizing the protocol as a proposed standard under the Audio/Video Transport Working Group. This document detailed the packetization of MIDI commands, synchronization strategies, and error handling tailored for unreliable networks. Prior to this standardization, open-source aspects were evident in early prototypes shared among developer communities; for instance, in 2004, developer Tobias Erichsen encountered Lazzaro's draft and began experimenting with RTP-MIDI encapsulation, contributing feedback and creating initial implementations discussed on forums and mailing lists. These grassroots efforts fostered innovation before the protocol's broader adoption.[10][12] The foundational RTP-MIDI specifications paved the way for subsequent commercial integrations, including Apple's implementation in 2005.[10]AppleMIDI Introduction
Apple introduced support for network-based MIDI transport in macOS 10.4 Tiger, released on April 29, 2005, under the name "Network MIDI." This implementation utilized Apple's Bonjour zero-configuration networking protocol for automatic discovery of MIDI sessions on local IP networks, allowing multiple Macintosh computers to share MIDI data without additional hardware or drivers.[13][4] The feature was built on the emerging RTP-MIDI protocol, which encapsulates MIDI messages within Real-time Transport Protocol (RTP) packets to ensure low-latency transmission suitable for real-time music performance.[10] A key innovation was the deep integration with Apple's Core MIDI framework, which abstracted the networking layer to appear as standard virtual MIDI ports within applications. This enabled seamless pairing and session management through intuitive interfaces reminiscent of iTunes device connections, simplifying setup for musicians.[4] In technical documentation, the protocol became known as AppleMIDI, reflecting its proprietary extensions and the Bonjour service type_apple-midi._udp used for advertisement.[4]
The first public applications to leverage Network MIDI were Apple's GarageBand 2.0 and Logic Pro 7, which were compatible with and supported the feature in macOS 10.4 Tiger.[13] In 2010, support extended to iOS devices with the introduction of Core MIDI APIs in iOS 4.2, enabling wireless MIDI connectivity in mobile music apps.[14]
This driverless integration on Apple platforms significantly boosted RTP-MIDI adoption among consumer musicians, as it eliminated the need for specialized hardware interfaces and facilitated easy network-based workflows in popular software like GarageBand, democratizing access to networked MIDI for home studios and education.[4][12]
Evolution Toward Modern Standards
Following Apple's introduction of its proprietary session protocol atop the standardized RTP payload format in 2005, open-source initiatives emerged to broaden RTP-MIDI accessibility beyond macOS and iOS ecosystems.[10] The rtpmidid project, a Linux daemon for sharing ALSA sequencer devices via RTP-MIDI, marked a key effort, with its initial beta release in April 2020 enabling network import and export of MIDI sessions.[15] These developments facilitated informal standardization through community-driven implementations, compensating for the lack of cross-platform native support in early RTP-MIDI adopters.[16] While the IETF formalized the RTP payload for MIDI in RFC 4695 (November 2006), which defined packet structures and recovery mechanisms for real-time transmission, the protocol was further refined in RFC 6295, published in November 2011. Discussions on full protocol integration, including Apple's session management, did not yield additional RFCs due to the proprietary nature of those extensions.[10][1] This left RTP-MIDI's session establishment reliant on reverse-engineered components in non-Apple environments, contributing to persistent compatibility challenges such as connection failures across operating systems, network instability during OS upgrades, and difficulties in multi-device setups.[17] Recent advancements, like the rtpmidid version 24.12 release in December 2024, addressed some issues by enhancing the MIDI router for improved session routing and stability in diverse network topologies.[18] The recognition of RTP's header overhead and complexity in resource-constrained devices spurred a transition to lighter UDP-based alternatives, prioritizing lower latency and simpler error handling.[7] The MIDI Association advanced this shift with Network MIDI 2.0 (UDP), initially prototyped in 2023 and formally ratified in November 2024, which supports both MIDI 1.0 and 2.0 via Universal MIDI Packets while incorporating forward error correction and authentication absent in RTP-MIDI.[19] At the NAMM 2025 Show in January, the Association unveiled initial implementations of Network MIDI 2.0, positioning RTP-MIDI as a foundational bridge to these MIDI 2.0 network extensions through backward compatibility layers.[20]Protocol Fundamentals
Packet Header Format
The RTP-MIDI packet format adheres to the Real-time Transport Protocol (RTP) structure defined in RFC 3550, consisting of a fixed 12-byte RTP header followed by a MIDI-specific payload that encapsulates Musical Instrument Digital Interface (MIDI) commands and timing information. This design enables low-latency transmission of MIDI data over IP networks while supporting error recovery and synchronization. The payload type for RTP-MIDI is dynamically assigned from the range 96-127, as registered with the Internet Assigned Numbers Authority (IANA).[21] The RTP header includes essential fields for packet identification, ordering, and timing, formatted in big-endian byte order:| Field | Size (bits) | Value/Description |
|---|---|---|
| Version (V) | 2 | Set to 2. |
| Padding (P) | 1 | Typically 0; indicates padding if 1. |
| Extension (X) | 1 | Typically 0; indicates RTP header extension if 1. |
| CSRC Count (CC) | 4 | Number of contributing sources (usually 0). |
| Marker (M) | 1 | Set to 1 if the MIDI command section length is greater than 0. |
| Payload Type (PT) | 7 | Dynamic value (96-127) for RTP-MIDI. |
| Sequence Number | 16 | Monotonically increasing counter (initial value random) to detect packet loss. |
| Timestamp | 32 | Reflects the sampling instant of the first octet in the RTP payload; clock rate specified in session setup (e.g., 1000 Hz for 1 ms resolution). |
| SSRC | 32 | Synchronization source identifier, unique per stream to distinguish sources. |
| CSRC List | Variable (0-15 × 32) | Contributing sources, if CC > 0 (rarely used in RTP-MIDI). |
- Bytes 0-1: V=2, P=0, X=0, CC=0, M=1, PT=97 (0x61) → 0x8061
- Bytes 2-3: Sequence Number (e.g., 0x0001) → 0x0001
- Bytes 4-7: Timestamp (e.g., 0x00002710 for 10000 at 1000 Hz) → 0x00002710
- Bytes 8-11: SSRC (e.g., 0x12345678) → 0x12345678
- Byte 12: MIDI Header (e.g., B=0, J=0, Z=0, P=0, LEN=4 → 0x04)
- Bytes 13-16: Delta Time (e.g., 0 for immediate, encoded as 0x00) + MIDI Command (Note On channel 0, note 60, velocity 64 → 0x90 0x3C 0x40)
Session Establishment and Management
RTP-MIDI sessions are established and managed using the Session Description Protocol (SDP) to negotiate transport parameters, media encoding, and stream configurations, typically in conjunction with signaling protocols such as SIP Offer/Answer or declarative protocols like RTSP. SDP media lines (e.g.,m=audio 5004 RTP/AVP 96) specify the RTP payload type, clock rate (e.g., 1000 Hz for 1 ms resolution or 44100 Hz), and attributes like a=rtpmap:96 rtp-midi/44100 for native streams. Additional format-specific parameters (fmtp) configure features such as timestamp mode (tsmode), recovery journal policies (e.g., j_sec=1 for 1-second updates), and MIDI command subsets (e.g., cm_used=note-on, note-off). For related streams sharing a MIDI namespace, SDP grouping attributes (e.g., a=group:FID 1 2) or the musicport parameter define identities or ordering.[27][28]
Sessions support unicast or multicast over UDP (with recovery journals for loss resilience) or TCP (without journals). Multiple concurrent streams per endpoint are possible, enabling complex topologies like splitting namespaces across streams with synchronized timestamps and shared SSRC values. Sequence numbers in RTP headers ensure ordering and loss detection, while RTCP provides feedback for quality monitoring. Synchronization relies on RTP timestamps aligned across streams and periodic RTCP sender reports to maintain timing accuracy, with configurable parameters like rtp_ptime (packet duration) and guardtime (minimum inter-packet interval, often 0 ms for low latency). For teardown, standard RTP/RTCP mechanisms (e.g., BYE packets) release resources, though application-specific signaling may handle session closure.[29][23]
This SDP-based approach decouples session parameters from physical devices, allowing flexible virtual MIDI port mappings independent of underlying network transports.[28]
Endpoint and Participant Roles
In RTP-MIDI, endpoints refer to any IP-capable devices that function as MIDI sources or sinks, such as controller keyboards, synthesizers, sequencers, or content servers, enabling the transmission and reception of MIDI data over networks.[21] Each endpoint is uniquely identified within an RTP session by a 32-bit Synchronization Source Identifier (SSRC) in the RTP header, which distinguishes multiple streams, and by a Canonical Name (CNAME) in RTCP reports, which provides persistent identification across sessions and detects SSRC collisions.[21] These identifiers ensure that endpoints can participate in unicast UDP-based sessions, where each stream typically encodes a single MIDI namespace comprising 16 voice channels plus system commands, though namespaces may be split across sessions using identical SSRC values for related streams.[21] Participant roles in RTP-MIDI sessions are defined by their involvement in data flow and session dynamics, primarily as senders or receivers, with senders responsible for transcoding MIDI data into RTP packets, timestamping commands, and maintaining recovery journals to mitigate packet loss, while receivers detect losses, repair artifacts using those journals, and render the MIDI output.[21] In session establishment, participants adopt temporary roles as initiator or acceptor: the initiator (e.g., SDP offerer) proposes connection parameters, while the acceptor (e.g., answerer) confirms or modifies them, after which roles become symmetric for bidirectional exchange. This dynamic is specified via SDP attributes such assendrecv, recvonly, or sendonly.[21]
RTP-MIDI supports multiple participants through RTP mixing at a central point or by grouping multiple unicast/multicast streams, enabling configurations like ensemble performances where MIDI is distributed to several receivers via shared namespaces or coordinated sessions. Role flexibility allows endpoints to switch functions (e.g., from sender to receiver) across sessions without fixed hierarchy.[21][23]
Compared to physical MIDI, RTP-MIDI extends connectivity over IP networks using standard RTP ports, treating sessions as virtual channels for sources and destinations while leveraging timestamps for synchronization.[21]
Apple's Session Protocol
Invitation and Connection Sequence
The invitation and connection sequence in Apple's RTP-MIDI implementation, known as AppleMIDI, begins with service discovery via Bonjour, where participating devices advertise their availability using the service type_apple-midi._udp. This zero-configuration protocol allows devices on the same local network to discover each other without manual IP configuration, registering a control port (denoted as N) and an adjacent MIDI data port (N+1) for UDP communication. AppleMIDI sessions via Bonjour are designed for devices on the same local network; connections across NAT or subnets may require additional network configuration.[4]
Once a device identifies a potential peer through Bonjour, the initiator sends an INVITE packet, represented by the 16-bit command 'IN' (ASCII 0x494E), over the control port. This packet includes the protocol version (set to 2 in network byte order), a random 32-bit initiator token generated by the sender, the sender's 32-bit Synchronization Source Identifier (SSRC) for distinguishing RTP streams, and an optional NULL-terminated UTF-8 string for the initiator's name. If no response is received, the initiator resends the INVITE every second, up to a maximum of 12 attempts. The responder, upon receiving the INVITE, replies on the same control port with either an OK packet (command 'OK', ASCII 0x4F4B) to accept—copying the initiator's token and including its own SSRC and name—or a rejection via the NAK equivalent, the 'NO' packet (command 'NO', ASCII 0x4E4F), which omits the name field.[4]
Following successful control port negotiation, the initiator repeats the INVITE on the MIDI port to establish the data channel. The responder mirrors the response with OK or NO on the MIDI port, using the same field structure. Upon mutual acceptance, the initiator initiates clock synchronization using dedicated sync packets to align timestamps and compensate for network latency. These packets include the SSRC, a count field (starting at 0 and incrementing to 2 over three exchanges), and 64-bit timestamps measured in 100-microsecond units from the local system clock. The sequence computes a round-trip offset as ((timestamp3 + timestamp1) / 2) - timestamp2, enabling latency adjustment for subsequent RTP-MIDI data packets; this sync process repeats at least every 60 seconds to maintain the session.[4]
Rejection via NO terminates the attempt without further exchanges, and failed retries after 12 attempts prompt the initiator to restart discovery.[4]
Synchronization Mechanisms
RTP-MIDI maintains timing alignment between session participants after connection establishment primarily through RTP timestamps embedded in packet headers and RTCP sender reports, which allow receivers to synchronize multiple streams from the same sender by correlating their timing fields.[30] In Apple's implementation of RTP-MIDI, known as AppleMIDI, clock synchronization is further refined using CK (synchronization) command packets, which exchange local clock values alongside RTP timestamps to compute timing offsets.[4] These CK packets include up to three 64-bit timestamps in 100-microsecond units, enabling participants to estimate clock offsets for ongoing alignment.[31] The basic offset calculation in this process derives from the difference between remote and local timestamps, normalized by the clock rate: \text{offset} = \frac{\text{remote_timestamp} - \text{local_timestamp}}{\text{clock_rate}} This formula provides a straightforward adjustment for drift, with more advanced NTP-like averaging (e.g., \text{offset_estimate} = \frac{\text{timestamp3} + \text{timestamp1}}{2} - \text{timestamp2}) used in initial exchanges and periodically refreshed every 60 seconds.[4] Receivers apply these offsets to RTP timestamps to align incoming MIDI commands accurately. To mitigate network variability, RTP-MIDI employs an adaptive jitter buffer at the receiver, which dynamically adjusts its size based on observed packet arrival times and sender timing consistency, typically ranging from 100 µs to 2 ms on low-jitter LANs.[32] This buffering smooths out jitter without introducing excessive latency, ensuring MIDI commands are played out in the correct sequence and timing.[33] Resynchronization is triggered by detected anomalies such as sequence number gaps in RTP packets or insights from periodic RTCP reports, prompting the receiver to realign its clock and buffer using the latest offset data.[34] In AppleMIDI, periodic CK timing packets sent every 60 seconds allow for drift corrections during active sessions and maintain tight synchronization even under minor network fluctuations.[4]Journal Updates and Error Handling
In RTP-MIDI, the recovery journal serves as a key mechanism for mitigating packet loss by maintaining a structured history of recent MIDI events at each endpoint, enabling state reconstruction without relying on retransmission requests. The journal is organized into chapters that categorize MIDI commands—such as channel-specific notes (Chapter N), control changes (Chapter C), and system messages (Chapter Q)—and references a checkpoint packet via its RTP sequence number, allowing receivers to apply corrective actions like NoteOff commands for indefinite artifacts (e.g., stuck notes). This buffer captures the session state in an oldest-first order, supporting active, N-active, and C-active command types to prioritize essential recovery data.[8][3] Journal updates occur dynamically as the sender appends new MIDI events to the recovery journal after transmitting each RTP packet and trims older entries based on RTCP feedback from the receiver, which reports the highest successfully received sequence number. These periodic RTCP sender and receiver reports facilitate a closed-loop policy (the default), reducing journal overhead while ensuring sufficient history for loss recovery; for instance, checkpoints can be updated every 5 seconds to optimize size without compromising reliability. In AppleMIDI implementations, the journal always includes the recovery section (indicated by the J bit) and encompasses specific chapters like P (program change), C (control), W (aftertouch), N (note), T (timing), A (active sense), Q (sequencer), and F (SysEx fragments), while excluding others such as M, E, D, V, and X to streamline transmission.[8][3][4] Packet loss is handled through gap detection in the 16-bit RTP sequence numbers (extended to 32 bits internally for rollover tracking), prompting the receiver to execute recovery commands from the journal embedded in arriving packets; the S bit in journal headers further aids detection of single-packet losses. To prevent bandwidth overload, journal size is constrained by RTCP feedback and policy parameters (e.g., j_sec="recj" enables journaling, with limits on history depth), ensuring efficient operation over UDP. This approach delivers reliable MIDI transmission comparable to TCP but with lower latency and overhead, suitable for network musical applications.[8][3][4]Disconnection Procedures
RTP-MIDI supports both graceful and abrupt disconnection procedures to ensure reliable session termination and resource management. In graceful teardown, a participant sends an RTCP BYE packet to signal its exit from the session, which includes an optional reason code to specify the cause, such as user disconnection or protocol errors. This packet is transmitted unreliably over the control channel, and upon receipt, the receiving peer acknowledges it implicitly by ceasing transmission and closing the RTP and RTCP ports associated with the session. In the AppleMIDI variant, this corresponds to the "End Session" command encoded as the two-byte sequence 0x4259 ('BY'), sent via the control UDP port, mirroring the RTCP BYE structure while integrating with Apple's session management.[4][35] For abrupt disconnections, such as those caused by network failures or crashes, RTP-MIDI implementations detect inactivity through timeouts on control packets. Receivers monitor for the absence of RTCP packets, including Sender Reports (SR), Receiver Reports (RR), and recovery journals, typically timing out after 5-10 seconds of silence to trigger automatic session closure and prevent indefinite resource holding or stuck MIDI notes. This aligns with the protocol's minimum RTCP transmission interval of 5 seconds for small sessions, allowing prompt detection without excessive delay. In AppleMIDI, the Clock Synchronization (CK, 0x434B) packets, sent approximately every 60 seconds by the session initiator, provide an additional heartbeat; prolonged absence reinforces the timeout-based disconnect.[3][4] Upon disconnection—whether graceful or abrupt—implementations release resources by flushing any buffered MIDI events to avoid artifacts, deregistering the virtual MIDI ports created for the session, and updating the mDNS (Bonjour) service announcement to remove the endpoint from network discovery lists. AppleMIDI handles multiple concurrent sessions independently, ensuring that a disconnection in one does not propagate to others, with each session maintaining separate port pairs and state.[4][36] Specific error conditions are conveyed via the BYE packet's reason field, a variable-length text string that may indicate "user disconnected" for manual terminations or "network failure" for connectivity issues, aiding in diagnostics and logging without requiring additional packets.Advanced Protocol Features
MIDI Merging
In RTP-MIDI, the merging process occurs at the session receiver, where multiple incoming MIDI streams are combined into a single output stream to maintain compatibility with traditional MIDI 1.0 DIN cable merging standards.[37] Receivers interleave MIDI commands from these streams based on their RTP timestamps, ensuring proper ordering and timing preservation across the combined output.[38] This timestamp-based approach relies on RTP sequence numbers to detect and reconstruct packet order, preventing out-of-sequence delivery that could disrupt musical performance.[22] The protocol does not include a native command for merging; instead, it is implemented through endpoint logic that processes streams identified by unique Synchronization Source Identifiers (SSRCs).[37] A common use case for MIDI merging in RTP-MIDI is in professional studio setups, where multiple controllers—such as keyboards or sequencers—transmit data over a network to feed a single synthesizer or audio workstation, enabling collaborative music production without physical cabling.[39] For instance, in network musical performances, participants can share a session where incoming streams from remote devices are seamlessly integrated into the local receiver's MIDI namespace, supporting real-time synchronization via RTCP sender reports.[22] Limitations arise from potential channel conflicts when multiple streams target the same MIDI channels, which can lead to artifacts like stuck notes if not managed properly.[37] Senders mitigate this by partitioning streams—such as assigning distinct channels to separate RTP sessions—while receivers resolve duplicates through filtering based on sequence numbers and SSRCs, though the protocol recommends careful configuration to avoid indefinite issues.[37] In AppleMIDI implementations, the Core MIDI framework handles this merging transparently at the system level, presenting the combined stream as a unified virtual MIDI port without requiring application-level intervention.[40]MIDI Splitting and Thru Functionality
RTP-MIDI supports MIDI splitting by allowing endpoints to replicate incoming MIDI packets across multiple active sessions, ensuring that data from a single source can be distributed to numerous destinations without loss of synchronization. This replication process preserves the original RTP timestamps embedded in the packets, which are crucial for maintaining timing accuracy and order of delivery across the network.[41][42] The thru functionality in RTP-MIDI emulates the behavior of traditional physical MIDI thru ports, where incoming data is forwarded to additional outputs without alteration or processing, enabling seamless passthrough in networked environments. Within a single session, this occurs automatically as MIDI messages from one participant are duplicated and broadcast to all other connected devices, functioning as a virtual thru box.[41] Implementation of splitting and thru relies on virtual MIDI ports created by the operating system's RTP-MIDI driver or dedicated router software, which handle the duplication and routing logic. For instance, endpoints can support fan-out to four or more outputs by participating in multiple concurrent sessions, with each session treated independently to direct replicated streams. This approach allows a single incoming MIDI stream to be fanned out across diverse network destinations, such as multiple hardware interfaces or software applications.[42][41] At the protocol level, RTP-MIDI provides no dedicated commands for splitting or thru; instead, these features emerge from the underlying multi-session management capabilities of the AppleMIDI session protocol layered atop RTP. Endpoints manage replication by joining multiple sessions simultaneously, using session identifiers and port assignments to segregate traffic without dedicated signaling.[41][4] In networked setups, this functionality facilitates daisy-chaining of MIDI devices over Ethernet or Wi-Fi, where intermediate endpoints can filter and replicate streams to downstream participants while preventing feedback loops through selective session participation and port isolation. This contrasts with MIDI merging, which combines inputs from multiple sources into a unified stream.[41]Distributed Patchbay Concept
The distributed patchbay concept in RTP-MIDI envisions the IP network as a flexible matrix of virtual MIDI cables, where endpoints dynamically connect and route data without physical interconnections. Devices advertise their virtual ports through Bonjour service discovery on the_apple-midi._udp service, enabling peer-to-peer session invitations that establish bidirectional MIDI streams. Each session functions as a virtual cable pair, supporting up to 16 such pairs per endpoint in typical implementations, allowing users to patch MIDI sources to destinations across the network as if using a traditional hardware patchbay. This model leverages the protocol's UDP-based control and RTP payload channels to create on-demand connections, transforming scattered devices into an interconnected MIDI ecosystem.[4]
A key benefit of this approach is its scalability for expansive setups, accommodating dozens to over 100 devices in professional environments by distributing routing logic across participants rather than requiring centralized hardware. It significantly reduces cabling complexity, as Ethernet or Wi-Fi infrastructure handles long-distance transmission, supporting runs up to hundreds of meters without signal degradation. In contrast to legacy MIDI 1.0 systems limited by daisy-chaining and single-cable constraints, RTP-MIDI's virtual patching minimizes setup time and physical clutter, making it suitable for mobile or venue-based applications.[43]
The protocol enables this distributed patching through multi-session support, where a single endpoint can maintain concurrent connections to multiple peers using unique session tokens and SSRC identifiers for isolation. Automatic discovery and invitation sequences allow ad-hoc reconfiguration, with MIDI data automatically merged from incoming sessions at the receiver or split to outgoing ones, building on endpoint-level operations like thru functionality. This facilitates seamless integration in heterogeneous networks, where devices join or leave without disrupting existing routings, provided the underlying IP topology remains stable.[4]
Representative examples include large-scale live performance networks, such as theater productions or ensemble setups, where a central control station dynamically routes MIDI from a conductor's surface to distributed instrument sections across a venue, ensuring synchronized playback via redundant virtual paths. In such configurations, technicians can re-patch streams in real-time—e.g., redirecting clock signals to backup devices—leveraging the protocol's timestamping for precise synchronization.[43]
Despite these advantages, the concept has limitations in highly complex topologies, often relying on a dedicated central router or hub device to aggregate and manage connections for stability and conflict avoidance. The protocol provides no native arbitration for simultaneous data streams from multiple sources, leaving resolution to endpoint merging logic, which may introduce variability in large multicast scenarios without additional network optimizations.[43]
