Recent from talks
Contribute something
Nothing was collected or created yet.
NetCDF
View on Wikipedia| Network Common Data Form | |
|---|---|
| Filename extension |
.nc |
| Internet media type | application/netcdf application/x-netcdf |
| Magic number | CDF\001 \211HDF\r\n\032\n |
| Developed by | University Corporation for Atmospheric Research (UCAR) |
| Latest release | |
| Type of format | scientific binary data |
| Extended from | Common Data Format (CDF) Hierarchical Data Format (HDF) |
| Website | www |
NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage[1] is hosted by the Unidata program at the University Corporation for Atmospheric Research (UCAR). They are also the chief source of netCDF software, standards development, updates, etc. The format is an open standard. NetCDF Classic and 64-bit Offset Format are an international standard of the Open Geospatial Consortium.[2]
History
[edit]The project started in 1988 and is still actively supported by UCAR. The original netCDF binary format (released in 1990, now known as "netCDF classic format") is still widely used across the world and continues to be fully supported in all netCDF releases. Version 4.0 (released in 2008) allowed the use of the HDF5 data file format. Version 4.1 (2010) added support for C and Fortran client access to specified subsets of remote data via OPeNDAP. Version 4.3.0 (2012) added a CMake build system for Windows builds. Version 4.7.0 (2019) added support for reading Amazon S3 objects. Version 4.8.0 (2021) added further support for Zarr. Version 4.9.0 (2022) added support for Zstandard compression. Further releases are planned to improve performance, add features, and fix bugs.
The format was originally based on the conceptual model of the Common Data Format developed by NASA, but has since diverged and is not compatible with it.[3][4]
Format description
[edit]The netCDF libraries support multiple different binary formats for netCDF files:
- The classic format was used in the first netCDF release, and is still the default format for file creation.
- The 64-bit offset format was introduced in version 3.6.0, and it supports larger variable and file sizes.
- The netCDF-4/HDF5 format was introduced in version 4.0; it is the HDF5 data format, with some restrictions.
- The HDF4 SD format is supported for read-only access.
- The CDF5 format is supported, in coordination with the parallel-netcdf project.
All formats are "self-describing". This means that there is a header which describes the layout of the rest of the file, in particular the data arrays, as well as arbitrary file metadata in the form of name/value attributes. The format is platform independent, with issues such as endianness being addressed in the software libraries. The data are stored in a fashion that allows efficient subsetting.
Starting with version 4.0, the netCDF API[5] allows the use of the HDF5 data format. NetCDF users can create HDF5 files with benefits not available with the netCDF format, such as much larger files and multiple unlimited dimensions.
Full backward compatibility in accessing old netCDF files and using previous versions of the C and Fortran APIs is supported.
Software
[edit]Access libraries
[edit]The software libraries supplied by UCAR provide read-write access to netCDF files, encoding and decoding the necessary arrays and metadata. The core library is written in C, and provides an application programming interface (API) for C, C++ and two APIs for Fortran applications, one for Fortran 77, and one for Fortran 90. An independent implementation, also developed and maintained by Unidata, is written in 100% Java, which extends the core data model and adds additional functionality. Interfaces to netCDF based on the C library are also available in other languages including R (ncdf,[6] ncvar and RNetCDF[7] packages), Perl Data Language, Python, Ruby, Haskell,[8] Mathematica, MATLAB, Interactive Data Language (IDL), Julia and Octave. The specification of the API calls is very similar across the different languages, apart from inevitable differences of syntax. The API calls for version 2 were rather different from those in version 3, but are also supported by versions 3 and 4 for backward compatibility. Application programmers using supported languages need not normally be concerned with the file structure itself, even though it is available as open formats.
Applications
[edit]A wide range of application software has been written which makes use of netCDF files. These range from command line utilities to graphical visualization packages. A number are listed below, and a longer list[9] is on the UCAR website.
- A commonly used set of Unix command line utilities for netCDF files is the NetCDF Operators (NCO) suite, which provide a range of commands for manipulation and analysis of netCDF files including basic record concatenating, array slicing and averaging.
- ncBrowse[10] is a generic netCDF file viewer that includes Java graphics, animations and 3D visualizations for a wide range of netCDF file conventions.
- ncview[11] is a visual browser for netCDF format files. This program is a simple, fast, GUI-based tool for visualising fields in a netCDF file. One can browse through the various dimensions of a data array, taking a look at the raw data values. It is also possible to change color maps, invert the data, etc.
- Panoply[12] is a netCDF file viewer developed at the NASA Goddard Institute for Space Studies which focuses on presentation of geo-gridded data. It is written in Java and thus platform independent. Although its feature set overlaps with ncBrowse and ncview, Panoply is distinguished by offering a wide variety of map projections and ability to work with different scale color tables.
- The NCAR Command Language (NCL) is used to analyze and visualize data in netCDF files (among other formats).
- The Python programming language can access netCDF files with the PyNIO[13] module (which also facilitates access to a variety of other data formats). netCDF files can also be read with the Python module
netCDF4-python,[14] and into a pandas-likeDataFramewith thexarraymodule.[15] - Ferret is an interactive computer visualization and analysis environment designed to meet the needs of oceanographers and meteorologists analyzing large and complex gridded data sets. Ferret offers a Mathematica-like approach to analysis; new variables may be defined interactively as mathematical expressions involving data set variables. Calculations may be applied over arbitrarily shaped regions. Fully documented graphics are produced with a single command.
- GrADS (Grid Analysis and Display System)[16] is an interactive desktop tool that is used for easy access, manipulation, and visualization of earth science data. GrADS has been implemented worldwide on a variety of commonly used operating systems and is freely distributed over the Internet.
- nCDF_Browser[17] is a visual nCDF browser, written in the IDL programming language. Variables, attributes, and dimensions can be immediately downloaded to the IDL command line for further processing. All the Coyote Library[18] files necessary to run nCDF_Browser are available in the zip file.
- ArcGIS versions after 9.2[19] support netCDF files that follow the Climate and Forecast Metadata Conventions and contain rectilinear grids with equally-spaced coordinates. The Multidimensional Tools toolbox can be used to create raster layers, feature layers, and table views from netCDF data in ArcMap, or convert feature, raster, and table data to netCDF.
- OriginPro version 2021b supports [20] netCDF CF Convention. Averaging can be performed during import to allow handling of large datasets in a GUI software.
- The GDAL (Geospatial Data Abstraction Library) provides support[21] for read and write access to netCDF data.
- netCDF Explorer is a multi-platform graphical browser for netCDF files. netCDF Explorer can browse files locally or remotely, by means of OPeNDAP
- R supports netCDF through packages such as ncdf4 (including HDF5 support)[22] or RNetCDF (no HDF5 support).[23]
- HDFql enables users to manage netCDF-4/HDF5 files through a high-level language (similar to SQL) in C, C++, Java, Python, C#, Fortran and R.[24]
- Metview workstation and batch system from the European Centre for Medium-Range Weather Forecasts (ECMWF) can handle NetCDF together with GRIB and BUFR.
- OpenChrom ships a converter under the terms of the Eclipse Public License[25]
Common uses
[edit]It is commonly used in climatology, meteorology and oceanography applications (e.g., weather forecasting, climate change) and GIS applications.
It is an input/output format for many GIS applications, and for general scientific data exchange. To quote from their site:[26]
- "NetCDF (network Common Data Form) is a set of interfaces for array-oriented data access and a freely-distributed collection of data access libraries for C, Fortran, C++, Java, and other languages. The netCDF libraries support a machine-independent format for representing scientific data. Together, the interfaces, libraries, and format support the creation, access, and sharing of scientific data."
Conventions
[edit]The Climate and Forecast (CF) conventions are metadata conventions for earth science data, intended to promote the processing and sharing of files created with the NetCDF Application Programmer Interface (API). The conventions define metadata that are included in the same file as the data (thus making the file "self-describing"), that provide a definitive description of what the data in each variable represents, and of the spatial and temporal properties of the data (including information about grids, such as grid cell bounds and cell averaging methods). This enables users of data from different sources to decide which data are comparable, and allows building applications with powerful extraction, regridding, and display capabilities.
Parallel-NetCDF
[edit]An extension of netCDF for parallel computing called Parallel-NetCDF (or PnetCDF) has been developed by Argonne National Laboratory and Northwestern University.[27] This is built upon MPI-IO, the I/O extension to MPI communications. Using the high-level netCDF data structures, the Parallel-NetCDF libraries can make use of optimizations to efficiently distribute the file read and write applications between multiple processors. The Parallel-NetCDF package can read/write only classic and 64-bit offset formats. Parallel-NetCDF cannot read or write the HDF5-based format available with netCDF-4.0. The Parallel-NetCDF package uses different, but similar APIs in Fortran and C.
Parallel I/O in the Unidata netCDF library has been supported since release 4.0, for HDF5 data files. Since version 4.1.1 the Unidata NetCDF C library supports parallel I/O to classic and 64-bit offset files using the Parallel-NetCDF library, but with the NetCDF API.
Interoperability of C/Fortran/C++ libraries with other formats
[edit]The netCDF C library, and the libraries based on it (Fortran 77 and Fortran 90, C++, and all third-party libraries) can, starting with version 4.1.1, read some data in other data formats. Data in the HDF5 format can be read, with some restrictions. Data in the HDF4 format can be read by the netCDF C library if created using the HDF4 Scientific Data (SD) API.
NetCDF-Java common data model
[edit]The NetCDF-Java library currently reads the following file formats and remote access protocols:
- BUFR Format Documentation[28] (ongoing development)
- CINRAD level II[29] (Chinese Radar format)
- DMSP[30] (Defense Meteorological Satellite Program)
- DORADE[31] radar file format
- GINI[32] (GOES Ingest and NOAAPORT Interface) image format
- GEMPAK[33] gridded data
- GRIB version 1 and version 2 (ongoing work on tables)
- GTOPO[34] 30-sec elevation dataset (USGS)
- Hierarchical Data Format (HDF4, HDF-EOS2, HDF5, HDF-EOS5)
- NetCDF[35] (classic and large format)
- NetCDF-4[36] (built on HDF5)
- NEXRAD Radar[37] level 2 and level 3.
There are a number of other formats in development. Since each of these is accessed transparently through the NetCDF API, the NetCDF-Java library is said to implement a common data model for scientific datasets.
The Java common data model has three layers, which build on top of each other to add successively richer semantics:
- The data access layer, also known as the syntactic layer, handles data reading.
- The coordinate system layer identifies the coordinates of the data arrays. Coordinates are a completely general concept for scientific data; specialized georeferencing coordinate systems, important to the Earth Science community, are specially annotated.
- The scientific data type layer identifies specific types of data, such as grids, images, and point data, and adds specialized methods for each kind of data.
The data model of the data access layer is a generalization of the NetCDF-3 data model, and substantially the same as the NetCDF-4 data model. The coordinate system layer implements and extends the concepts in the Climate and Forecast Metadata Conventions. The scientific data type layer allows data to be manipulated in coordinate space, analogous to the Open Geospatial Consortium specifications. The identification of coordinate systems and data typing is ongoing, but users can plug in their own classes at runtime for specialized processing.
See also
[edit]References
[edit]- ^ "NetCDF Home Page". Unidata/UCAR. Archived from the original on 2017-12-06. Retrieved 2017-12-05.
- ^ "OGC standard netCDF Classic and 64-bit Offset". Opengeospatial.org. Archived from the original on 2017-11-30. Retrieved 2017-12-05.
- ^ "Background - The NetCDF Users' Guide". Unidata.ucar.edu. Archived from the original on 2018-11-02. Retrieved 2013-11-27.
- ^ "CDF - Frequently asked questions". NASA. Archived from the original on 2018-06-19. Retrieved 2018-11-02.
- ^ "Version 4.0 of the netCDF API". Unidata.ucar.edu. Archived from the original on 2015-06-17. Retrieved 2013-11-27.
- ^ "ncdf". Cirrus.ucsd.edu. 2013-08-06. Archived from the original on 2013-12-03. Retrieved 2013-11-27.
- ^ "Rnetcdf". Cran.r-project.org. 2012-07-19. Archived from the original on 2013-12-02. Retrieved 2013-11-27.
- ^ "hnetcdf: Haskell NetCDF library". hackage.haskell.org. 2014-07-10. Archived from the original on 2014-07-09. Retrieved 2014-07-10.
- ^ "Software for Manipulating or Displaying NetCDF Data". Unidata.ucar.edu. Retrieved 2020-10-23.
- ^ "ncBrowse". Epic.noaa.gov. Archived from the original on 2013-12-03. Retrieved 2013-11-27.
- ^ "ncview". Meteora.ucsd.edu. Archived from the original on 2014-02-12. Retrieved 2013-11-27.
- ^ "Panoply". Giss.nasa.gov. Goddard Institute for Space Studies. Archived from the original on 2014-06-20. Retrieved 2013-11-27.
- ^ "PyNIO". Pyngl.ucar.edu. 2011-07-28. Archived from the original on 2013-11-25. Retrieved 2013-11-27.
- ^ "netCDF4". Archived from the original on 2017-11-29. Retrieved 2017-12-04.
- ^ "xarray: N-D labeled arrays and datasets in Python". Archived from the original on 2016-09-01. Retrieved 2016-09-07.
- ^ "GrADS Home Page". Archived from the original on 2016-02-13. Retrieved 2018-04-10.
- ^ "Coyote's Guide to IDL Programming". Dfanning.com. 2013-11-23. Archived from the original on 2015-09-23. Retrieved 2013-11-27.
- ^ "Coyote Library". Dfanning.com. 2013-11-23. Archived from the original on 2015-09-23. Retrieved 2013-11-27.
- ^ "ArcGIS version 9.2". Esri.com. Archived from the original on 2013-11-22. Retrieved 2013-11-27.
- ^ "NetCDF Importing and Processing". originlab.com. Retrieved 2021-05-11.
- ^ "NetCDF network Common Data Form". Gdal.org. Archived from the original on 2013-06-06. Retrieved 2013-11-27.
- ^ David Pierce (2014). ncdf4: Interface to Unidata netCDF (version 4 or earlier) format data files. R package version 1.13. https://cran.r-project.org/package=ncdf4
- ^ Pavel Michna and with contributions from Milton Woods (2015). RNetCDF: Interface to NetCDF Datasets. R package version 1.7-3. https://cran.r-project.org/package=RNetCDF
- ^ http://www.hdfql.com
- ^ OpenChrom: a cross-platform open source software for the mass spectrometric analysis of chromatographic data, Philip Wenig, Juergen Odermatt, BMC Bioinformatics; 2010; doi:10.1186/1471-2105-11-405
- ^ "What Is netCDF?". Unidata Program Center. Archived from the original on 2013-03-15. Retrieved 2012-11-26.
- ^ "parallel-netcdf". Mcs.anl.gov. 2013-11-17. Archived from the original on 2008-12-01. Retrieved 2013-11-27.
- ^ "BUFR FORMAT DOCUMENTATION". Archived from the original on October 9, 2007. Retrieved February 2, 2008.
- ^ [1] Archived September 5, 2008, at the Wayback Machine
- ^ [2]
- ^ [3] Archived May 21, 2008, at the Wayback Machine
- ^ "GINI Satellite Format". Weather.unisys.com. Archived from the original on 2013-12-02. Retrieved 2013-11-27.
- ^ "Unidata | GEMPAK". Unidata.ucar.edu. Archived from the original on 2013-11-04. Retrieved 2013-11-27.
- ^ [4] Archived February 12, 2008, at the Wayback Machine
- ^ "NetCDF". Unidata.ucar.edu. Archived from the original on 2013-11-29. Retrieved 2013-11-27.
- ^ "NetCDF-4". Unidata.ucar.edu. Archived from the original on 2015-06-17. Retrieved 2013-11-27.
- ^ Steve Ansari. "NCDC: Radar Resources". Ncdc.noaa.gov. Archived from the original on 2013-12-02. Retrieved 2013-11-27.
External links
[edit]- Official website
- NetCDF User's Guide — describes the file format
- "An Introduction to Distributed Visualization"[dead link]; section 4.2 contains a comparison of CDF, HDF, and netCDF.
- Animating NetCDF Data in ArcMap
- List of software utilities using netCDF files
NetCDF
View on GrokipediaHistory
Origins and Development
NetCDF originated in the late 1980s as part of the Unidata program, an NSF-funded initiative hosted at the University Corporation for Atmospheric Research (UCAR) to support data access and analysis in the earth sciences, particularly meteorology.[6] The development was driven by the need for a machine-independent, self-describing data format that could facilitate the sharing and reuse of array-oriented scientific data across diverse computing platforms, addressing limitations in existing formats used for real-time meteorological data exchange.[6] Unidata's focus on improving software portability for C and Fortran applications in weather and climate research underscored these motivations, aiming to enable broader interdisciplinary collaboration.[6] The foundational work began in 1987 with a Unidata workshop in Boulder, Colorado, where participants proposed adapting NASA's Common Data Format (CDF)—developed at the Goddard Space Flight Center's National Space Science Data Center—for meteorological applications.[6] In early 1988, Glenn Davis, a key developer at Unidata, created a prototype implementation in C, layering it on Sun Microsystems' External Data Representation (XDR) standard to ensure portability across UNIX and VMS systems.[6] This prototype demonstrated the feasibility of a single-file, machine-independent interface for multidimensional scientific data. Inspired by formats like GRIB, which were efficient for gridded meteorological data but lacked extensibility and self-description, netCDF emphasized array-oriented structures with embedded metadata to promote long-term usability and platform independence.[6] An August 1988 workshop, involving collaborators such as Joe Fahle from SeaSpace and Michael Gough from NASA, finalized the netCDF interface specification, with Davis and Russ Rew implementing the initial software.[6] Early adoption was swift within the geosciences community, particularly by NOAA for distributing observational and forecast data in meteorology, and by NASA for archiving and sharing earth observation datasets, leveraging netCDF's compatibility with existing workflows in weather and climate research.[6] This institutional backing from NSF through Unidata solidified netCDF as a standard for portable, extensible data formats in the earth sciences from its inception.[1]Key Milestones and Versions
The initial release of NetCDF version 1.0 occurred in 1990, introducing the classic file format along with Fortran and C programming interfaces for creating, accessing, and sharing array-oriented scientific data.[6] This version established the foundational self-describing, machine-independent format based on XDR encoding, targeting portability across UNIX and VMS systems.[6] In May 1997, NetCDF 3.3 was released, incorporating shared library support to facilitate easier distribution and integration, while enhancing overall portability and introducing type-safe interfaces in C and Fortran.[7] These updates addressed growing demands for robust, multi-platform deployment in scientific computing environments.[6] A significant advancement came with the 64-bit offset variant in December 2004 as part of NetCDF 3.6.0, which resolved limitations of the classic format, such as the 2 GB file size cap, enabling handling of much larger datasets without altering the core data model.[7] This extension maintained backward compatibility while supporting modern storage needs.[8] The transition to NetCDF-4 began in June 2008, integrating the HDF5 library to enable hierarchical organization through groups, user-defined data types, and advanced features like zlib and szip compression, along with chunking and parallel I/O capabilities.[6] This release marked a shift toward more flexible, feature-rich storage while preserving access to legacy classic and 64-bit offset files.[7] NetCDF 4.5, released in October 2017, focused on performance improvements, including full DAP4 protocol support for remote data access and enhancements to parallel I/O efficiency.[9] The most recent major update, NetCDF 4.9.3 on February 7, 2025, included bug fixes and enhancements such as an extension to the API for programmatic control of the plugin search path, along with notes on a known compatibility issue in parallel I/O with mpich 4.2.0.[7][10] These changes bolster reliability in distributed workflows.[10]Data Model and Format
Core Data Model
The NetCDF data model provides an abstract, machine-independent framework for representing multidimensional scientific data, enabling self-describing datasets that include both the data values and the necessary metadata for interpretation. At its core, the model organizes data into dimensions, variables, and attributes, which together describe the structure, content, and auxiliary information of a dataset. This design ensures that all essential details—such as data types, array shapes, and semantic descriptors—are embedded within the file itself, eliminating the need for external documentation or proprietary software to understand the contents.[11] Dimensions define the axes along which data varies, serving as named extents for variables; they can be fixed-length or unlimited (one in the classic model, multiple in the enhanced NetCDF-4 model), allowing datasets to grow dynamically along those axes without altering the file structure. Variables represent the primary data containers as multidimensional arrays associated with one or more dimensions, supporting standard atomic types such as byte, short, int, float, double, and char for character strings; scalar variables (zero-dimensional) and one-dimensional string variables are also permitted. In the enhanced model, variables can leverage user-defined compound types (similar to C structs), enumerations, opaque types, and variable-length arrays, providing greater flexibility for complex data representations like records or nested structures. Attributes, which are optional key-value pairs, attach to variables, dimensions, or the entire dataset to supply metadata; these can be scalar or one-dimensional arrays of numeric, string, or other types, conveying details such as units, validity ranges, or descriptive names.[11] The enhanced NetCDF-4 model introduces groups to create a hierarchical organization, akin to directories in a file system, where datasets can contain nested subgroups, each with its own dimensions, variables, and attributes; this supports partitioning large or multifaceted datasets while maintaining backward compatibility with the classic model. For instance, a climate dataset might include a three-dimensional variable named "temperature" with dimensions "time" (unlimited), "lat" (fixed at 180), and "lon" (fixed at 360), storing air temperature values as double-precision floats; associated attributes could specifyunits = "K" for Kelvin scale and long_name = "surface air temperature" for semantic clarity, ensuring the variable's physical meaning is self-evident. This structure promotes interoperability across disciplines, as the model abstracts away storage details to focus on logical data relationships.[11]
File Format Variants
NetCDF supports three primary file format variants, each designed to balance portability, scalability, and advanced features for storing multidimensional scientific data. The classic format provides a simple, widely compatible structure, while the 64-bit offset variant addresses size limitations, and the NetCDF-4 format leverages HDF5 for enhanced capabilities like compression and hierarchical organization. These variants maintain the core NetCDF data model but differ in their binary encoding and storage mechanisms.[12] The classic format, also known as NetCDF-3, employs a flat structure using the Common Data Form (CDF) binary encoding. It begins with a fixed header containing a magic number "CDF" followed by version byte \x01, the number of records, and lists of dimensions, global attributes, and variables, with data sections appended afterward. It supports only 32-bit offsets, limiting the file size to approximately 2 GB, and permits just one unlimited dimension per file without support for groups or internal compression. Its simplicity ensures high portability across platforms, making it suitable for legacy systems and applications requiring maximum compatibility.[12][13][4] The 64-bit offset format extends the classic format to accommodate larger datasets by replacing 32-bit offsets with 64-bit ones in the header and variable sections, using version byte \x02 after the "CDF" magic number. This allows files exceeding 4 GiB while retaining the flat structure, single unlimited dimension, and absence of compression or groups. Variable and record data remain limited to under 4 GiB, but the format enables efficient handling of extensive multidimensional arrays without altering the core encoding. It requires netCDF library version 3.6.0 or later for reading and writing.[12][4][13] The NetCDF-4 format, introduced in library version 4.0, is built on the HDF5 storage layer, enabling a richer set of features while providing a superset of the classic model's capabilities. It supports hierarchical groups for organizing data, user-defined compound and enumerated types, multiple unlimited dimensions, and variable sizes up to HDF5 limits (far exceeding 4 GiB). Compression is available via the deflate (zlib) algorithm at levels 1 through 9, along with chunking to optimize I/O for partial access to large arrays. Although it subsets HDF5's full feature set—excluding non-hierarchical groups and certain reference types—NetCDF-4 files are fully HDF5-compatible and identifiable by the "HDF5" signature. This format requires HDF5 library version 1.8.9 or later.[12][4] Format identification relies on the file's magic number: "CDF" with \x01 for classic, "CDF" with \x02 for 64-bit offset, and "HDF5" for NetCDF-4. Tools such as ncdump can inspect and display file contents, revealing the format variant along with metadata and data summaries for verification. NetCDF-4 libraries ensure backward compatibility by transparently reading and writing classic and 64-bit offset files, allowing seamless transitions without modifying existing applications.[12][4]Software and Libraries
Core Libraries and APIs
The NetCDF-C library serves as the reference implementation for the NetCDF data format, providing a comprehensive C API for creating, accessing, and manipulating NetCDF files. Developed and maintained by Unidata, it supports both the classic NetCDF format and the enhanced NetCDF-4 format, enabling the handling of multidimensional scientific data in a portable, self-describing manner.[3] The library includes core functions such asnc_create() for opening or creating a new NetCDF dataset, nc_def_dim() for defining dimensions, and nc_put_vara() for writing subsets of variable data, alongside inquiry functions like nc_inq_varid() for retrieving variable identifiers. These functions facilitate the construction of complex data structures, including variables, attributes, and groups in NetCDF-4 files.
The API employs a two-phase design to ensure data integrity and efficiency: a define mode, entered upon file creation or opening, where metadata such as dimensions, variables, and attributes are specified using functions prefixed with nc_def_, followed by a transition to data mode via nc_enddef() to enable reading and writing actual data values.[14] This separation prevents inadvertent metadata changes during data operations and supports atomic file updates in the classic format. Error handling is managed through return codes from API calls, with nc_strerror() converting numeric error codes (e.g., NC_EINDEFINE for operations attempted in the wrong mode) into descriptive strings for debugging. The library returns NC_NOERR (0) on success, ensuring robust integration in applications.
Key features of the NetCDF-C API include support for remote data access through integration with the OPeNDAP protocol, allowing nc_open() to accept URLs in place of local file paths for seamless retrieval of distributed datasets, provided the library is configured with DAP support using libcurl.[15] Subsetting operations are enabled via hyperslab mechanisms, where functions like nc_get_vara() and nc_put_vara() specify data selections using start, count, stride, and imap vectors to extract or insert multidimensional array portions without loading entire datasets into memory.[14] For instance, the start vector defines the corner index per dimension, while stride allows non-contiguous access, such as every nth element.[14]
Performance optimizations in the NetCDF-C library include buffered I/O for the classic format, modeled after the C standard I/O library, which aggregates reads and writes to minimize system calls and enhance sequential access efficiency; nc_sync() can flush buffers explicitly for multi-process coordination.[16] In the NetCDF-4 format, the library delegates low-level I/O to the HDF5 library, leveraging HDF5's chunk caching (enabled in read-only mode) and parallel access capabilities via nc_open_par() for high-performance computing environments.[16] This delegation supports advanced features like compression and unlimited dimensions while maintaining the NetCDF API's simplicity.[3] The C API forms the basis for extensions in other language bindings, which offer additional conveniences for specific ecosystems.
Language Bindings and Tools
NetCDF provides official language bindings that extend the core C library to support common scientific programming languages. The NetCDF-Fortran binding offers both Fortran 77 and Fortran 90 interfaces, mirroring the functionality of the C API with functions prefixed by "nf90_" for modern usage, such as nf90_open for file access and nf90_put_var for writing data.[17] This binding depends on the underlying NetCDF-C library and is widely used in legacy climate modeling codes. The NetCDF-C++ binding, provided as a legacy option, delivers object-oriented wrappers around the C API, including classes like NcFile and NcVar for file and variable manipulation, though it is deprecated in favor of newer C++ standards and the direct use of the C library.[18] Community-developed bindings enhance NetCDF accessibility in dynamic languages. The netCDF4 Python module serves as a high-level interface to the NetCDF C library, leveraging HDF5 for enhanced features like compression and groups, and supports reading, writing, and creating files via the Dataset class.[19] In R, the ncdf4 package provides a comprehensive interface for opening, reading, and manipulating NetCDF version 4 or earlier files, including support for dimensions, variables, and attributes through functions like nc_open and ncvar_get.[20] For Julia, the NCDatasets.jl package implements dictionary-like access to NetCDF datasets and variables, enabling efficient loading and creation of files while adhering to the Common Data Model.[21] A suite of command-line tools accompanies the NetCDF libraries for file inspection and manipulation. The ncdump utility converts NetCDF files to human-readable CDL (Network Common Data form Language) text, facilitating debugging and metadata examination.[22] Ncgen generates binary NetCDF files from CDL descriptions or produces C/Fortran code skeletons for data access, while nccopy handles file copying with optional format conversions between classic and enhanced models.[22] The NetCDF Operators (NCO) toolkit extends these capabilities with operators for tasks like averaging, subsetting, and arithmetic on variables, such as ncea for ensemble averaging across multiple files. NetCDF integrates seamlessly with scientific software ecosystems. MATLAB includes built-in functions like ncread and ncinfo for importing and exploring NetCDF data, supporting both local files and remote OPeNDAP access.[23] IDL provides native NetCDF support through routines like NCDF_OPEN, enabling direct variable extraction in geospace analysis workflows. The Geospatial Data Abstraction Library (GDAL) features a dedicated NetCDF driver for raster data, allowing conversion and processing in GIS applications like reading multidimensional arrays as geospatial layers.[24]Conventions and Standards
Metadata Conventions
Metadata conventions in NetCDF provide standardized ways to describe datasets, ensuring they are discoverable, interpretable, and interoperable across diverse software tools and scientific communities. These conventions primarily involve attributes attached to global datasets, variables, dimensions, and coordinate variables, which encode essential information such as units, coordinate systems, and data quality indicators. By adhering to these guidelines, NetCDF files become self-describing, allowing users to understand the structure and semantics without external documentation.[25] The COARDS (Cooperative Ocean/Atmosphere Research Data Service) convention, established in 1995, forms a foundational standard for metadata in NetCDF files, particularly for ocean and atmospheric data. It specifies conventions for representing time coordinates, latitude/longitude axes, and units to facilitate data exchange and visualization in gridded datasets. For instance, time variables must use a units attribute in the format "seconds since YYYY-MM-DD hh:mm:ss" to enable consistent parsing across applications. COARDS emphasizes simplicity and backward compatibility, serving as the basis for subsequent extensions.[26][27] Integration with the UDUnits library enhances the handling of physical units in NetCDF metadata, allowing tools to parse and convert units automatically. The "units" attribute for variables follows UDUnits syntax, such as "meters/second" for velocity, enabling arithmetic operations and dimension consistency checks. This integration is recommended in NetCDF best practices to ensure quantitative data is meaningfully described and comparable. UDUnits supports a wide range of units, from SI standards to custom expressions, promoting precision in scientific computations.[25][28] NetCDF attribute guidelines recommend using conventional names to standardize metadata, including "standard_name" for semantic identification from controlled vocabularies, "units" for measurement scales, and "missing_value" or "_FillValue" to denote absent data points. These attributes should be applied at appropriate levels: global attributes for dataset-wide details like title and history, and variable-specific ones for context like long_name for human-readable descriptions. To maintain broad compatibility, especially with classic NetCDF formats, attribute names and values are advised to avoid non-ASCII characters, sticking to alphanumeric and underscore compositions. Examples include:- units: "degrees_north" for latitude variables.
- missing_value: A scalar value like -9999.0 to flag invalid entries.
- standard_name: "air_temperature" to link to predefined terms.
Specialized Standards like CF
The Climate and Forecast (CF) conventions represent the most prominent specialized extension to the NetCDF metadata standards, tailored for climate, weather, and oceanographic data to ensure self-describing datasets that facilitate interoperability and analysis.[31] Developed by a community of scientists and data managers, the CF conventions build upon foundational NetCDF attributes to specify detailed semantic information, with the latest released version being 1.12 in December 2024 and a 1.13 draft under active development as of 2025.[32] These conventions promote the sharing and processing of gridded data by defining standardized ways to encode physical meanings, spatial structures, and temporal aspects without altering the underlying NetCDF data model.[33] Central to the CF conventions are mechanisms for describing complex geospatial structures, including grid mappings that link data variables to coordinate reference systems via thegrid_mapping attribute, which supports projections such as Lambert conformal or rotated pole grids.[34] Auxiliary coordinates allow multi-dimensional or non-dimension-aligned data, like 2D latitude-longitude fields, to be referenced using the coordinates attribute for enhanced representation of irregular geometries.[35] Cell methods encode statistical summaries over data intervals—such as means, maxima, or point samples—through the cell_methods attribute, while standard names from the CF dictionary provide canonical identifiers for variables, ensuring consistent interpretation across tools (e.g., air_temperature for atmospheric data).[36] Additional key elements include bounds variables for defining irregular cell shapes, such as vertex coordinates for polygonal cells via the bounds attribute, and formula_terms for deriving vertical coordinates from parametric equations, like mapping sigma levels to pressure heights.[37][38]
Compliance with CF conventions is structured in levels, from basic adherence to full implementation, enabling strict validation for tools like the Climate Data Operators (CDO), a suite of over 700 command-line operators for manipulating NetCDF files that relies on CF metadata for accurate processing of climate model outputs.[39] High compliance enhances usability in data portals such as the THREDDS Data Server (TDS), which leverages CF attributes to provide OPeNDAP access, subsetting, and cataloging of datasets, thereby improving discoverability and remote analysis in distributed scientific workflows.[39]
The evolution of CF conventions includes deepening integration with geospatial standards like ISO 19115, particularly through support for Coordinate Reference System (CRS) Well-Known Text (WKT) formats in grid mappings, allowing seamless mapping of CF metadata to broader metadata profiles for enhanced interoperability in Earth observation systems.[40] Ongoing updates, discussed at annual workshops such as the virtual 2025 CF Workshop held in September, continue to address emerging needs like provenance tracking for derived datasets, with community proposals exploring extensions for machine learning workflows to document model training and inference lineages.[41][42]
Advanced Capabilities
Parallel-NetCDF
Parallel-NetCDF (PNetCDF) is a high-performance parallel I/O library designed for accessing NetCDF files in classic formats (CDF-1, CDF-2, and CDF-5) within distributed computing environments, enabling efficient data sharing among multiple processes.[43] Developed independently from Unidata's NetCDF project starting in 2001 by researchers at Northwestern University and Argonne National Laboratory, PNetCDF was first released in 2005 and builds directly on the Message Passing Interface (MPI) to support both collective and independent I/O operations.[44] Unlike NetCDF-4, which relies on Parallel HDF5 for parallel access, PNetCDF avoids dependencies on HDF5, allowing it to handle non-contiguous data access patterns without the overhead of intermediate layers.[43] The library provides a parallel extension to the NetCDF API, prefixed withncmpi_ (e.g., ncmpi_create for creating a new parallel NetCDF file using an MPI communicator and info object, which returns a file ID for subsequent operations).[45] Key functions include collective variants like ncmpi_put_vara_all for synchronized writes across processes, which ensure all ranks complete the operation before proceeding and optimize data aggregation.[46] PNetCDF employs a two-phase I/O strategy to aggregate small, non-contiguous requests from multiple processes into larger, contiguous transfers, reducing contention on parallel file systems and improving bandwidth utilization.[47]
This design offers significant advantages in scalability for large-scale simulations, such as those in exascale computing, where it has demonstrated sustained performance on systems with thousands of processes by leveraging MPI-IO optimizations like collective buffering.[48] For instance, in climate modeling applications, PNetCDF enables efficient parallel reads and writes of multi-dimensional arrays, maintaining compatibility with classic and 64-bit offset formats while supporting unsigned data types in CDF-5.[49]
However, PNetCDF has limitations, including no support for NetCDF-4 features such as groups, unlimited dimensions, or compression in parallel mode, restricting its use to simpler classic format structures.[43] For modern high-performance alternatives addressing these gaps, integrations like ADIOS2 provide enhanced flexibility for adaptive I/O in exascale workflows, often used alongside or in place of PNetCDF in applications like the Weather Research and Forecasting (WRF) model.[50]
Interoperability Features
NetCDF-4, introduced in 2008, is built upon the HDF5 file format, enabling seamless interoperability between the two systems. This foundation allows for bidirectional reading and writing: files created with the NetCDF-4 library are valid HDF5 files that can be accessed and modified by any HDF5-compliant application, provided they adhere to NetCDF conventions such as avoiding non-standard data types or complex group structures. Conversely, the NetCDF-4 library can read and edit existing HDF5 files as long as they conform to NetCDF-4 constraints, including the use of dimension scales for shared dimensions. In this mapping, NetCDF dimensions are represented as HDF5 dimension scales—special one-dimensional datasets attached to multidimensional datasets—which facilitate shared dimensions across variables and preserve coordinate information. For instance, a latitude dimension in NetCDF corresponds to an HDF5 dataset with scale attributes, ensuring compatibility without loss of structure.[51][52] A key interoperability feature is support for OPeNDAP, a protocol for remote data access that has been integrated into the NetCDF C library since version 4.1.1. This enables users to access NetCDF datasets hosted on OPeNDAP servers via simple URL-based queries, allowing subsetting of data along dimensions (e.g., selecting specific time ranges or spatial slices) without downloading entire files. Such remote access promotes efficient web-based data sharing in scientific workflows, as demonstrated by tools like the THREDDS Data Server, which serves NetCDF data over OPeNDAP for direct integration into analysis software. The C, Fortran, and C++ NetCDF libraries handle this transparently by treating OPeNDAP URLs as local file paths, leveraging the library's built-in DAP support when compiled with the--enable-dap option.[53][54]
NetCDF also supports conversions to and from other formats through dedicated tools, enhancing ecosystem integration. For HDF5 inspection and basic export, the h5dump utility from the HDF Group can dump NetCDF-4 (HDF5-based) files into text or XML representations, which can then be reimported into HDF5 or other systems, though for full structural preservation, the NetCDF library's nccopy tool is preferred to convert classic NetCDF-3 files to NetCDF-4/HDF5. GRIB files, common in meteorology, can be converted to NetCDF using wgrib2, which maps GRIB grids (e.g., latitude-longitude) to NetCDF variables following COARDS conventions, supporting common projections like Mercator but requiring preprocessing for rotated or thinned grids. Additionally, integration with Zarr—a cloud-optimized array storage format—has advanced through Unidata's ncZarr specification, which maps NetCDF-4 structures to Zarr groups for efficient object-store access, enabling subsetting and parallel reads in cloud environments without altering application code. This is particularly useful for large-scale Earth science data, as seen in virtual Zarr datasets derived from NetCDF files via tools like Kerchunk. In the C, Fortran, and C++ libraries, HDF5 handling is transparent via the underlying HDF5 API, allowing direct manipulation of NetCDF-4 files as HDF5 objects. However, the Java NetCDF library has limitations in direct HDF5 access, providing read support for most HDF5 files but requires the netCDF-C library via JNI for writing NetCDF-4/HDF5 formats, without which output is restricted to the classic NetCDF-3 structure.[55][56][57]
