Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
MPEG-G
MPEG-G (ISO / IEC 23092) is an ISO/IEC standard designed for genomic information representation by the collaboration of the ISO/IEC JTC 1/SC 29/WG 9 (MPEG) and ISO TC 276 "Biotechnology" Work Group 5. The goal of the standard is to provide interoperable solutions for data storage, access, and protection across different possible implementations for data information generated by high-throughput sequencing machines and their subsequent processing and analysis. The standard is composed of different parts, each one addressing a specific aspect, such as compression, metadata association, Application Programming Interfaces (APIs), and a reference software for data decoding. Together with the reference decoder software, commercial and open source implementations started to be available in 2019, covering progressively more of the published parts of the standard.
The advent of high-throughput sequencing (HTS) technologies has revolutionized the field of quantitative biology. Availability of large collections of genomic information has now entered everyday practice and has become a cornerstone of a number of disciplines, ranging from biological research to personalized medicine in the clinic. At the moment, genomic information is mostly exchanged through a variety of data formats, such as FASTA/FASTQ for unaligned sequencing reads and SAM/BAM/CRAM for aligned reads. The ISO/IEC 23092 (MPEG-G) standard aims to provide a unified format for the efficient representation and compression of such diverse data, both for file storage and data transport. In order to do that, the standard is divided in several parts.
The MPEG-G standard utilizes technology and data representation architectures previously validated in the field of digital media. They allow to compress and transport genome sequencing data even in complex scenarios, for instance when access is needed to large amounts of possibly distributed data, or when part of the data needs to be encrypted for privacy reasons. Conceptually, such requirements lead to the definition of a number of mutually interrelated mechanisms, which are summarized in the following list:
In turn, some of these topic have been collected together, in order to make the standard easier to understand and implement. As a result, the ISO/IEC 23092 standard is physically structured as a series of separate document, as follows:
ISO/IEC 23092-1 specifies how the genomic data is organized within MPEG-G structures for transport (i.e., streaming) and storage. Formats of genomic record, reference record, MPEG-G file and transport stream are defined in this part. It introduces Access Unit as the container of the compressed genomic data and provides a reference conversion process among different formats.
ISO/IEC 23092-2 specifies the syntax and methods for MPEG-G lossless compression of sequencing data and lossy compression of associated quality scores. MPEG-G, as is typical for MPEG standards, only specifies the decoding process while the encoding process is left open to algorithmic and implementation-specific innovations. All MPEG-G conformed decoders produce identical outputs from the multiplexed bitstreams included in MPEG-G files and the data streams in streaming scenarios.
The input data of the encoder are genomic records or metadata, with optional reference data, while its output is MPEG-G file or transport streams.
ISO/IEC 23092-3 specifies a metadata format and provides genomic data representation APIs to support interoperability among existing tools and systems. Part 3 specifies how an MPEG-G compliant bitstream can be integrated with metadata as well as mechanisms to implement access control, integrity verification, authentication and authorization mechanisms. This part also contains an informative section devoted to the mapping between SAM and MPEG-G data structures, including backward compatibility with existing SAM content. It defines:
Hub AI
MPEG-G AI simulator
(@MPEG-G_simulator)
MPEG-G
MPEG-G (ISO / IEC 23092) is an ISO/IEC standard designed for genomic information representation by the collaboration of the ISO/IEC JTC 1/SC 29/WG 9 (MPEG) and ISO TC 276 "Biotechnology" Work Group 5. The goal of the standard is to provide interoperable solutions for data storage, access, and protection across different possible implementations for data information generated by high-throughput sequencing machines and their subsequent processing and analysis. The standard is composed of different parts, each one addressing a specific aspect, such as compression, metadata association, Application Programming Interfaces (APIs), and a reference software for data decoding. Together with the reference decoder software, commercial and open source implementations started to be available in 2019, covering progressively more of the published parts of the standard.
The advent of high-throughput sequencing (HTS) technologies has revolutionized the field of quantitative biology. Availability of large collections of genomic information has now entered everyday practice and has become a cornerstone of a number of disciplines, ranging from biological research to personalized medicine in the clinic. At the moment, genomic information is mostly exchanged through a variety of data formats, such as FASTA/FASTQ for unaligned sequencing reads and SAM/BAM/CRAM for aligned reads. The ISO/IEC 23092 (MPEG-G) standard aims to provide a unified format for the efficient representation and compression of such diverse data, both for file storage and data transport. In order to do that, the standard is divided in several parts.
The MPEG-G standard utilizes technology and data representation architectures previously validated in the field of digital media. They allow to compress and transport genome sequencing data even in complex scenarios, for instance when access is needed to large amounts of possibly distributed data, or when part of the data needs to be encrypted for privacy reasons. Conceptually, such requirements lead to the definition of a number of mutually interrelated mechanisms, which are summarized in the following list:
In turn, some of these topic have been collected together, in order to make the standard easier to understand and implement. As a result, the ISO/IEC 23092 standard is physically structured as a series of separate document, as follows:
ISO/IEC 23092-1 specifies how the genomic data is organized within MPEG-G structures for transport (i.e., streaming) and storage. Formats of genomic record, reference record, MPEG-G file and transport stream are defined in this part. It introduces Access Unit as the container of the compressed genomic data and provides a reference conversion process among different formats.
ISO/IEC 23092-2 specifies the syntax and methods for MPEG-G lossless compression of sequencing data and lossy compression of associated quality scores. MPEG-G, as is typical for MPEG standards, only specifies the decoding process while the encoding process is left open to algorithmic and implementation-specific innovations. All MPEG-G conformed decoders produce identical outputs from the multiplexed bitstreams included in MPEG-G files and the data streams in streaming scenarios.
The input data of the encoder are genomic records or metadata, with optional reference data, while its output is MPEG-G file or transport streams.
ISO/IEC 23092-3 specifies a metadata format and provides genomic data representation APIs to support interoperability among existing tools and systems. Part 3 specifies how an MPEG-G compliant bitstream can be integrated with metadata as well as mechanisms to implement access control, integrity verification, authentication and authorization mechanisms. This part also contains an informative section devoted to the mapping between SAM and MPEG-G data structures, including backward compatibility with existing SAM content. It defines: