Hubbry Logo
LAS file formatLAS file formatMain
Open search
LAS file format
Community hub
LAS file format
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
LAS file format
LAS file format
from Wikipedia
LAS
Filename extension
.las
Internet media typeapplication/vnd.las
Magic numberLASF
Developed byAmerican Society for Photogrammetry and Remote Sensing
Initial releaseMay 9, 2003; 22 years ago (2003-05-09)
Latest release
1.4 R15
July 9, 2019; 6 years ago (2019-07-09)
Type of formatPoint cloud data
Open format?Yes

The LAS (LASer) format is a file format designed for the interchange and archiving of Lidar point cloud data. It is an open, binary format specified by the American Society for Photogrammetry and Remote Sensing (ASPRS). The format is widely used[1] and regarded as an industry standard for Lidar data.[2][3]

File structure

[edit]

A LAS file consists of the following overall sections:

Section Description
Public header block Describes format, number of points, extent of the point cloud and other generic data.
Variable length records (VLR) Any number of optional records to provide various data such as the spatial reference system used, metadata, waveform packet information and user application data.[1][4] Each VLR can hold a data payload of up to 65,535 bytes in length.
Point data records Data for each of the individual points in the point cloud, including coordinates, classification (e.g. terrain or building), flight and scan data, etc.
Extended variable length records (EVLR) Introduced with LAS 1.3,[5] EVLRs are similar to VLRs but are located after the point data records and allow a much larger data payload per record due to the use of 8-byte size descriptors.

Point data records

[edit]

A LAS file contains point records in one of the point data record formats defined by the LAS specification; as of LAS 1.4, there are 11 point data record formats (0 through 10) available. All point data records must be of the same format within the file. The various formats differ in the data fields available, such as GPS time, RGB and NIR color and wave packet information.

The 3D point coordinates are represented within the point data records by 32-bit integers, to which a scaling and offset defined in the public header must be applied in order to obtain the actual coordinates.

As the number of bytes used per point data record is explicitly given in the public header block, it is possible to add user-defined fields in "extra bytes" to the fields given by the specification-defined point data record formats. A standardized way of interpreting such extra bytes was introduced in the LAS 1.4 specification, in the form of a specific EVLR.[4]

Derivative formats

[edit]

LAZ

[edit]

LAZ is an open format for lossless compression of LAS files developed by Martin Isenburg, author of its original LASzip reference implementation.[6][7][8] The size of a LAZ file is typically 7 to 25 percent of the corresponding LAS file, and it has become a de facto industry standard for compressed point cloud data.[9]

LAZ files are similar in structure to the corresponding uncompressed LAS files, except the point data records are replaced by chunks of compressed point data.[10]

LASzip was originally published under the GNU LGPL, but relicensed under the Apache Public License 2.0 in late 2021.[11] Alternative LAZ implementations include LAZperf[12] and laz-rs.[13]

COPC

[edit]

The Cloud Optimized Point Cloud (COPC) format specified by Hobu, Inc.[14] is a cloud-optimized variant of LAZ, analogous to the COG format's relationship to GeoTIFF.[15] A COPC file guarantees a restricted structure of the LAZ file to make its data chunks correspond to nodes in an octree, making the file suitable for subset requests.

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The LAS (LASer) file format is an open binary standard for the storage, interchange, and archiving of three-dimensional data, particularly from light detection and ranging () surveys. Developed by the American Society for Photogrammetry and (ASPRS) to promote among LiDAR hardware vendors, software developers, and data users, it was first introduced in version 1.0 in 2003 as a means to standardize the exchange of airborne and terrestrial data. The format features a structured layout including a header for global metadata, variable-length records for additional information, and scalable point data records that encode spatial coordinates along with attributes such as intensity, , and return numbers. Over time, LAS has evolved through multiple versions to accommodate growing data complexity: version 1.1 (2005) added support for waveform data, 1.2 (2006) improved , 1.3 (2008) introduced enhanced point formats, and 1.4 (2011, with revisions up to r15 as of 2019) added capabilities for color, synthetic points, and better projection handling while ensuring backward compatibility. Maintained by the ASPRS LAS , the format is widely adopted in geospatial, environmental, and applications, with tools and libraries available for reading and writing in various programming languages. Its derivatives, such as the compressed LAZ format, address storage efficiency for large datasets.

Introduction

Purpose and Development

The LAS (LASer) file format serves as an open binary interchange standard for exchanging 3-dimensional data, originally developed for airborne acquisitions but also supporting terrestrial, bathymetric, and other sensor-derived datasets to facilitate interoperability among hardware vendors, software developers, and data users. This format addresses the need for a vendor-neutral structure to store geospatial coordinates, intensity values, and associated metadata without proprietary constraints, enabling efficient data sharing in applications. Initiated by the American Society for Photogrammetry and Remote Sensing (ASPRS) Lidar Division's LAS Working Group in 2003, the format's first version, LAS 1.0, was released on May 9, 2003, establishing a foundational binary structure for point records and headers. The specification evolved through iterative revisions to accommodate advancing technologies: LAS 1.1, released on May 7, 2005, introduced GPS time stamps for precise temporal referencing of points. LAS 1.2, released on April 29, 2008, added support for RGB color values to integrate photogrammetric enhancements. LAS 1.3, approved in August 2009 with specification issued October 24, 2010, incorporated full waveform data packets and expanded classification options to 256 discrete classes for improved point categorization. LAS 1.4, approved in 2011 with its latest revision (R16) issued in August 2025, introduced 64-bit offset support for larger datasets, up to 15 returns per pulse, and Well-Known Text (WKT) coordinate reference system definitions. The LAS format has achieved widespread adoption as the for topographic and hydrographic data, mandated by the U.S. Geological Survey (USGS) in its 3D Elevation Program (3DEP) deliverables and by the (NOAA) for coastal and marine projects. It has also been endorsed internationally, including as an Open Geospatial Consortium (OGC) Community Standard in March 2018, promoting global consistency in geospatial data exchange. LAS 1.5 was released in August 2025, incorporating features like point density validation methods (e.g., the Voronoi method) and support for synthetic aperture-derived points, following a public comment period from April to May 2025.

Key Features and Versions

The LAS file format is a binary standard designed for the efficient storage and interchange of large-scale 3D data, capable of handling up to billions of points through the use of 64-bit integer extents for coordinates and other spatial parameters in version 1.4 and later. This binary structure, employing little-endian byte order, minimizes file sizes while preserving for lidar-derived and other datasets. Core features of the format include scalable precision for point coordinates, achieved via offset and scale factors that allow representation of sub-millimeter accuracy within fixed-size integer fields, adapting to varying project requirements without altering the overall file schema. It supports multiple returns per laser pulse, enabling up to 15 returns in version 1.4 to capture detailed vertical structures in or . Classifications are expanded to 256 distinct categories in 1.4, facilitating nuanced labeling of points such as ground, , or buildings. Optional attributes enhance data richness, including intensity values for signal strength, scan angle for beam orientation, and GPS time for temporal synchronization. The format maintains across versions from 1.5 down to 1.0, ensuring older software can read newer files by ignoring unrecognized elements. Point data records are categorized into legacy formats 0 through 5, which use simple integer coordinates, and extended formats 6 through 10, which incorporate double-precision coordinates and require Well-Known Text (WKT) representations for Coordinate Reference Systems (CRS) to specify spatial projections. Version 1.4 introduces several enhancements for advanced applications, including an overlap flag to indicate points from overlapping flight lines, a scanner channel identifier for multi-channel systems, and a synthetic point flag that designates class 1 for algorithmically generated points rather than directly measured ones. Additionally, extra bytes Variable Length Records (VLRs) allow for the inclusion of custom attributes, such as color or waveform data, extending the format's flexibility. Structural limits in the format promote while constraining overhead: the public header block is fixed at 375 bytes, Variable Length Records (VLRs) are capped at 65,535 bytes each for metadata like , and Extended VLRs (EVLRs) accommodate larger payloads, such as full waveforms exceeding 65 KB. These constraints balance efficiency with the need to support expansive datasets in geospatial workflows.

File Organization

Version and Well Information Sections

LAS files are organized as plain-text ASCII files divided into sections, each starting with a tilde (~) prefix followed by a section name in uppercase (e.g., ~VERSION). Sections are mandatory or optional, appear in a prescribed order, and contain key-value pairs or data arrays delimited by spaces, commas, or tabs as specified in the ~Version section. The format supports across versions, with LAS 2.0 using fixed sections for one-dimensional log data and LAS 3.0 extending to multidimensional arrays and additional data types like core analyses. The first mandatory section, ~Version, identifies the file's LAS version (e.g., "VERS. 2.0" or "VERS. 3.0 : CWLS LAS FORMAT VERSION 3.0"), wrap status ("WRAP. NO" for non-wrapped data, deprecated in 3.0), and data delimiter ("DLM. SPACE" or alternatives like or tab). It may include a program identifier for the creating software. This section ensures parsers recognize the file structure and data layout. Immediately following is the mandatory ~Well section, which provides essential well identification and logging parameters. It includes mnemonics such as "WELL." for well name, "LOC." for location, "CTRY." for country, "SRVC." for service company, "DATE." for logging date (in DD-MMM-YYYY format), "UWI." for unique well identifier, "API." for API number, latitude ("LATI."), longitude ("LONG."), elevation ("ELEV."), and time zone ("TZ."). Depth-related fields cover start depth ("STRT."), stop depth ("STOP."), step interval ("STEP."), and null value indicator ("NULL."). In LAS 3.0, it requires at least 11 specific parameters for completeness. This section establishes the spatial and temporal context for the log data.

Curve and Parameter Sections

The mandatory ~Curves section (or ~Curve in some versions) lists the log traces included in the file, with one line of mnemonic-unit pairs corresponding to each column in the data array. For example: "~Curves DEPT.FT GR.GAPI NPHI.PU" defines depth (DEPT) in feet, gamma ray (GR) in API units, and neutron porosity (NPHI) in porosity units. Mnemonics are uppercase, up to 10 characters, followed by a period and 3-6 character units (e.g., FT, GAPI, PU). This section defines the structure of the subsequent data, ensuring . In LAS 3.0, for multidimensional data, this is replaced by a ~Definition section within data sets. The optional ~Parameter section (or ~Params) stores supplementary one-dimensional well attributes as key-value pairs, such as "RUN. 1" for run number, "DREF. 500" for depth reference, "BHA. 7 1/2"" for bottom-hole assembly size, or fluid properties like mud weight ("MDWT. 9.2 PPG"). Each has a mnemonic, unit, value, and . Multiple ~Parameter sections can exist for different runs or tools. In LAS 3.0, these are grouped within data section sets for logs, cores, or other types. This enriches the dataset without altering the core log structure. An optional ~Other section allows free-form comments or additional text, often used for tool calibration notes or disclaimers. It is less structured and deprecated in favor of more specific sections in LAS 3.0.
SectionMandatory/OptionalKey ContentsVersion Notes
~VersionMandatoryVersion (e.g., 2.0, 3.0), wrap (NO), (SPACE)3.0 adds program ID
~WellMandatoryWell name, location (LATI, LONG), depths (STRT, STOP, STEP), NULL valueRequires 11 params in 3.0
~CurvesMandatory (2.0)Mnemonic-unit pairs for log traces (e.g., DEPT.FT)~ in 3.0 for arrays
~ParameterOptionalKey-value metadata (e.g., MDWT. 9.2 PPG)Grouped in data sets in 3.0
~OtherOptionalFree-form commentsDeprecated in 3.0

Data Section

The mandatory data section, ~ASCII in LAS 2.0 or ~Data in LAS 3.0, contains the numerical log values as a delimited , with one row per depth step. Each row starts with the index value (typically depth, e.g., "3264.500 DEPT GR NPHI"), followed by values for each , using the null indicator (-999.25 or specified) for . Data supports floating-point (default), , , exponential, date-time, and degree-minute-second formats, with precision controlled by {F x.y} or similar descriptors in 3.0. In LAS 3.0, the data section is part of extensible "data section sets" for multiple runs or types (e.g., ~Log_Data, ~Core_Data), supporting 1D curves, 2D arrays (e.g., for logs), and 3D arrays (e.g., for dipole sonic). Sections are indexed (e.g., ~Log_Data) for multiple instances, with preceding ~ and ~ sections. This allows comprehensive storage of geophysical, geological, and petrophysical data while maintaining the simple ASCII parsing. Files end without a specific terminator, relying on the ~ASCII or ~Data marker to signal the start.

Point Data Records

Legacy Formats

The legacy point data record formats 0 through 5 in the LAS specification form the foundational structures for encoding point cloud data, originating from the earliest versions of the standard and continuing to support in later releases. These formats prioritize efficiency for basic airborne and terrestrial datasets, accommodating essential geometric, radiometric, and metadata attributes while maintaining compact record sizes ranging from 20 to 63 bytes per point. They are particularly prevalent in datasets generated by pre-2011 systems, where advanced features like multi-return expansion or high-precision waveform descriptors were not yet standardized. Limitations include support for up to five returns per emitted pulse, 32 categories (values 0-31), and scan angle ranks encoded in a single signed byte representing -90 to +90 degrees in 0.006-degree increments. Format 0, the simplest legacy variant at 20 bytes per point, includes core spatial coordinates as scaled long integers (X, Y, Z; 4 bytes each), intensity as an unsigned short (2 bytes, ranging 0-65535), a flags byte (1 byte) detailing return number (bits 0-2: 1-5), number of returns (bits 3-5: 1-5), scan direction (bit 6: 0 right-to-left, 1 left-to-right), and edge of flight line (bit 7: 0 interior, 1 edge), classification as an unsigned char (1 byte), scan angle rank as a signed char (1 byte), user data as an unsigned char (1 byte), and point source ID as an unsigned short (2 bytes, identifying the originating flight line or scanner). This format suits unenhanced surveys focused on elevation modeling without temporal or colorimetric data. Format 1 extends Format 0 to 28 bytes by appending GPS time as a double-precision float (8 bytes, in seconds since the GPS of January 6, 1970, 00:00:00 UTC), enabling temporal synchronization for trajectory-aware processing in multi-pulse scans. It remains ideal for legacy datasets requiring pulse timing without color or waveform details. Format 2, at 26 bytes, builds on Format 0 by adding RGB color values (, , ; each an unsigned short of 2 bytes, ranging 0-65535), facilitating visualization integration with imagery from co-registered cameras in early hyperspectral or photogrammetric workflows. Format 3 combines the enhancements of Formats 1 and 2 into a 34-byte record, incorporating both GPS time (8 bytes) and RGB colors (6 bytes total) for temporally and visually enriched point clouds, commonly used in pre-2011 urban mapping projects blending with . Format 4, expanded to 57 bytes, incorporates waveform packet data atop Format 1's structure: wave packet descriptor index (unsigned char, 1 byte, referencing up to 256 descriptors in the file header), byte offset to waveform data packet (unsigned long long, 8 bytes, file-relative position), waveform packet size (unsigned long, 4 bytes), and return point waveform location (float, 4 bytes) with parametric offsets dx, dy, dz (each float, 4 bytes, relative to the point in meters). This format supports full- decomposition for vegetation penetration analysis in legacy full- systems, with coordinate reference system details handled via variable-length records for . Format 5, the most feature-complete legacy option at 63 bytes, merges Format 3's GPS time and RGB with Format 4's attributes, serving complex datasets from early hybrid sensors combining discrete returns, colors, timing, and pulse shapes, such as in applications prior to the 2011 specification update.
FormatSize (bytes)Key Additions Over Format 0Primary Use Case
020None (core fields only)Basic elevation data
128GPS time (8)Time-synchronized scans
226RGB colors (6)Colorized point clouds
334GPS time (8) + RGB (6)Visual-temporal mapping
457 descriptors (29)Full-waveform analysis
563GPS time (8) + RGB (6) + (29)Hybrid sensor data

Extended Formats

The extended point data record formats 6 through 10, introduced in LAS version 1.4, expand upon the capabilities of earlier formats by incorporating higher precision for coordinates (scaled integers with 32 bits each for X, Y, Z), mandatory GPS time stamping, and additional flags for and attributes. These formats support up to 15 returns per pulse (encoded in 4 bits) and 256 classification values (using 1 byte), compared to the limitations in legacy formats 0-5. They also include enhanced bit fields in the return information and classification bytes to flag synthetic points, keypoints, withheld points, and overlaps, enabling more nuanced data management. The return information byte (2 bytes total) allocates bits 0-3 for the return number (1-15) and bits 4-7 for the number of returns (1-15), providing finer for multi-return pulses. The classification byte (1 byte) uses 5 bits (0-4) for the value (0-31, extensible to 256 with higher bits) and includes 4 bits: bit 5 for synthetic points (indicating non-directly measured points, often from ), bit 6 for keypoints (critical features), bit 7 for withheld points (excluded from processing), and an additional overlap (bit 5 in some contexts, but primarily bit 7 in extended use for overlap detection). A deleted is integrated into bit 6 of the classification byte across these formats, allowing points to be marked for removal without altering the file structure. The scanner channel is encoded in 2 bits (0-3, for multi-channel systems), and the scan angle uses 2 bytes for values from -360 to +360 degrees in 0.006-degree increments. Synthetic points are further supported via the synthetic in classification and source ID flags in the point source ID field (2 bytes, values 1-65535 indicating origin). Format 6 serves as the 30-byte base for these extensions, including the enhanced flags, GPS time (8 bytes, double-precision), and point source ID (2 bytes), but without color or waveform data. Format 7 extends this to 36 bytes by adding RGB color values (3 unsigned shorts, 16 bits each, scaled 0-65535). Format 8 increases to 38 bytes, incorporating near-infrared (NIR) data (2 bytes, unsigned short) alongside the RGB from format 7 for multispectral applications. Formats 9 and 10 build on waveform-enabled structures similar to legacy formats 4 and 5 but with the new flags and precision. Format 9 (59 bytes) adds full packet data, including descriptor index (1 byte), byte offset to waveform (8 bytes), waveform packet size (4 bytes), return point location (4 bytes), and parametric offsets dx/dy/dz (3 x 4 bytes). Format 10 (67 bytes) combines this support with RGB and NIR colors from formats 7 and 8. All extended formats require GPS time as a mandatory field for temporal synchronization and Well-Known Text (WKT) Coordinate Reference System (CRS) representation in the file header, replacing for these point types to ensure precise .
FormatBytesKey Additions Over Base
630Enhanced flags, GPS time, scan angle (2 bytes), scanner channel
736RGB (3 × 2 bytes)
838RGB + NIR (2 bytes)
959 packet data (29 bytes total)
1067 + RGB + NIR

Data Attributes

Well Information and Location

The ~WELL section in the LAS file format provides essential identification and location attributes for the well, serving as a mandatory component to contextualize the log data. Key mnemonics include WELL (well name, up to 40 characters), FLD (field name), LOC (location identifier), CTRY (country code), STAT (state or province), DATE (logging date in DD-MMM-YYYY format), SRVC (service company name), COMP (company name), and LIC (license number). Location attributes encompass coordinate data, such as LATI ( in decimal degrees, positive north), LONG ( in , positive east), X and Y coordinates if applicable, and GDAT (, e.g., "WGS84"). Elevation details include KB (kelly bushing elevation in specified units, e.g., FT or M), RT ( elevation), and GL (ground level elevation). The section also defines the log index range with STRT (start depth), STOP (stop depth), STEP (step interval), all in consistent units (e.g., M or FT), and NULL (null value indicator, e.g., -999.25). These attributes ensure traceability and georeferencing of the borehole data, with units and descriptions provided for each mnemonic to promote .

Parameters

The optional ~ (or ~Params in LAS 3.0) section stores supplementary one-dimensional attributes related to the well, tools, or environmental conditions, enhancing the dataset with metadata not covered in ~WELL. Each parameter is recorded as a four-field line: mnemonic (up to 10 characters, e.g., BHT for bottom-hole temperature), value (numeric or string, e.g., 120.5), unit (up to 6 characters, e.g., DEG C), and description (free-text explanation, e.g., "Bottom Hole Temperature"). Common examples include tool-specific details like LITH ( description), BS (bit size in IN), or MD ( at a reference). In LAS 3.0, parameters can be grouped under specialized sections such as ~Core (for core analysis attributes like , permeability) or ~Inclinometry (for deviation survey data like AZIM for , MD for measured depth). Values support various formats (integer, floating-point, date), with nulls handled via the global NULL value. This section facilitates petrophysical interpretation by providing context for log corrections and , maintaining backward compatibility with LAS 2.0.

Curve Data

The mandatory ~CURVE section (or ~Curves/~Log_Definition in LAS 3.0) defines the attributes of the log traces, listing each curve's mnemonic, units, API code (if applicable), and description to enable accurate data interpretation. The first mnemonic is typically the index, such as DEPT (depth in FT or M) or TIME (time in S for time-based logs). Subsequent mnemonics represent measurements, e.g., GR ( in API units), NPHI ( in V/V or PU), or LLSD (laterolog deep resistivity in OHMM). Each line follows the format: mnemonic (up to 10 characters), unit (up to 6 characters, e.g., , OHMM, US/G), (industry code or blank), description (text, e.g., ""). In LAS 3.0, extended support for arrays (e.g., NMR[1:10] for multi-channel data) and frames allows multidimensional attributes, with formats specified (e.g., {F6.2} for floating-point). These attributes directly precede the ~ASCII section, where numerical values are arrayed row-wise, delimited by spaces or commas, ensuring in analysis software. Units must be consistent across the file, and descriptions aid in identifying curve purposes, such as distinguishing corrected vs. raw data.

Georeferencing

Coordinate Systems

The LAS file format for provides georeferencing primarily through the mandatory ~WELL section, which stores well and elevation data using simple mnemonic-based parameters. This section supports geographic coordinates via , or projected coordinates using X and Y values, along with associated datum information. Unlike multidimensional formats, well log LAS files focus on one-dimensional depth-indexed data tied to the well's surface . Key mnemonics for horizontal positioning include:
  • LATI: Latitude of the well location, in decimal degrees or degrees/minutes/seconds (e.g., 45.37° 12' 58" N).
  • LONG: Longitude of the well location, in decimal degrees or degrees/minutes/seconds (e.g., 13.22° 30' 09" W).
  • X and Y: Easting and northing coordinates for projected systems, in meters or feet, as alternatives to LATI/LONG.
The geodetic datum is specified with GDAT, using codes from standards like EPSG (e.g., "WGS84" or EPSG:4326), ensuring compatibility with global reference frames. For projected coordinates, HZCS defines the horizontal coordinate system, such as UTM zones (e.g., "UTM Zone 15N"). Files must use either the geographic (LATI/LONG + GDAT) or projected (X/Y + HZCS + GDAT) set consistently. Vertical georeferencing is handled through elevation mnemonics relative to a datum, typically mean sea level or ellipsoidal height implied by GDAT:
  • KB or RKB: Elevation of the rotary kelly bushing, the common depth reference point.
  • GL: Ground level elevation at the well site.
  • EREF: Elevation of the depth reference point, in the units specified (default meters).
Depth values in the ~ASCII section are measured from the depth reference (e.g., KB), and absolute elevations are computed by subtracting depths from the reference elevation. LAS 3.0 (introduced in ) enhanced support for standardized datum codes, while earlier versions like LAS 2.0 used free-text descriptions. No formal support exists for complex transformations like WKT; datum shifts are handled externally in software.

Projection and Metadata

Projection details in LAS well log files are limited to the specification of the in the ~WELL section via HZCS, which identifies the (e.g., "Lambert Conformal Conic" or UTM parameters). No dedicated records for transformation parameters like scale factors or false origin are included; users rely on external definitions from the datum and projection codes. For UTM, the zone and hemisphere are typically appended (e.g., "UTM 15N"). Additional metadata enriches context:
  • COUN: Country code, using ISO standards for jurisdictional reference.
  • SRVC: Service company, often providing the location survey.
  • Comments in the ~WELL section can include survey details, such as GPS accuracy or datum transformation notes.
For deviated or horizontal wells, basic trajectory data (e.g., inclination, ) may appear in the ~ section, but full 3D paths are not standard in LAS and are often stored separately. The U.S. Geological Survey recommends including precise location metadata for archival compliance. Vertical datums are implicitly tied to GDAT, with elevations referenced to local mean unless specified otherwise. As of 2025, LAS 3.0 remains the reference for enhanced geodetic support, maintained by the Canadian Well Logging Society.

Derivatives

LAZ Compression

The LAZ format is a lossless compression extension for the LAS file format, designed to significantly reduce file sizes while preserving all original data for exact reconstruction. Developed by Martin Isenburg and presented in November 2011 at the European LiDAR Mapping Forum in Salzburg, it builds directly on the LAS 1.4 specification and is implemented through the open-source LASzip library. LAZ employs entropy coding combined with predictive arithmetic techniques to exploit redundancies in LiDAR point data, such as spatial correlations and repeating attribute values, enabling efficient compression without any loss of information. LAZ files maintain binary compatibility with LAS files upon decompression, differing only in the compressed point records section, which is organized into fixed-size chunks (default 50,000 points) for streaming and support. The file header includes a mandatory "laszip encoded" Variable Length Record (VLR) with Record ID 22204, which specifies the compressor version, chunk , and a list of compressed items tailored to the point format. These items define reversible coders for individual fields: for example, , and Z coordinates use differential encoding with second-order (medians of prior differences for X/Y and previous return for Z); intensity values are encoded via differences from the most recent value using context-based (four contexts based on return number); and classification bytes employ 256-context to handle the limited range of values efficiently. LAZ supports all LAS point record formats 0 through 10, with formats 0-5 using pointwise compression and formats 6-10 employing layered chunked compression for extended attributes like color or . LAZ achieves typical size reductions to 7-20% of the original LAS file (equivalent to 5-14x compression). This results in substantial savings for large datasets; for instance, a 776 GB LAS file containing 30 billion points can be compressed to 64 GB in LAZ format. The process is fully reversible, ensuring decompressed output is bit-identical to the source LAS file, which has made LAZ the de facto industry standard for data storage and distribution. Encoding and decoding are facilitated by the LAStools suite, which includes the laszip command-line utility for direct conversion between LAS and LAZ files, and the LASzip library is integrated into broader ecosystems such as PDAL for point cloud processing pipelines and for visualization and analysis.

COPC Format

The Cloud Optimized Point Cloud (COPC) format is an extension of the LAZ 1.4 compressed LAS format that incorporates a clustered structure for efficient spatial indexing and querying of point cloud data. Developed by Howard Butler, Andrew Bell, and Connor Manning at Hobu, Inc., and released in 2021, COPC enables partial and hierarchical access to large datasets without requiring the entire file to be loaded into memory, making it suitable for cloud-based and web delivery applications. COPC files maintain the core LAZ structure but add specific Variable Length Records (VLRs) to define the octree organization. The primary info VLR (User ID: "copc", Record ID: 1) is a 160-byte record located at offset 375 in the file header, containing the unscaled octree center coordinates (center_x, center_y, center_z as doubles), the root node halfsize (double, representing the distance from center to edge along each axis), base spacing (double, which halves at each subsequent level), offset and size of the root hierarchy page (uint64_t each), and the minimum/maximum GPSTime values (doubles), followed by 11 reserved uint64_t fields set to zero. This VLR effectively defines the overall bounding box using six double-precision values derived from the center and halfsize. The hierarchy VLR (User ID: "copc", Record ID: 1000) stores the octree metadata in one or more fixed-size pages, each comprising consecutive 32-byte entries for octree nodes; each entry includes a VoxelKey (level as int32_t, followed by x, y, z coordinates as int32_t, with negative values indicating invalid nodes) and an Entry struct (offset as uint64_t, byte size as int32_t, point count as int32_t: positive for leaf nodes with points, -1 for internal nodes linking to child pages, or 0 for empty nodes). COPC requires LAZ 1.4 with Point Data Record Formats (PDRF) 6, 7, or 8 to support the necessary extended header fields. The indexing mechanism in COPC relies on an hierarchy starting at level 0 () and descending to a maximum level determined by the and spacing, where each level subdivides the into eight child nodes with halved spacing. pages group entries for nodes whose bounding volumes contain points, allowing readers to traverse from the page (whose offset and are specified in the info VLR) to subpages or chunks via the entry offsets and ; leaf nodes point directly to compressed LAZ chunks of points within their bounds. This structure supports spatial filtering by bounds or , as applications can skip irrelevant branches and stream only required pages or chunks over networks, reducing I/O overhead for massive datasets. In practice, COPC facilitates web and cloud delivery of point clouds, with tools like PDAL for reading/writing and Potree 2.0 for visualization providing native support; it is particularly effective for national-scale repositories, such as the USGS 3D Elevation Program (3DEP), where it minimizes transfer for subset queries compared to unindexed LAZ files.

References

  1. Sep 1, 1989 · The Canadian Well Logging Society's Floppy Disk Committee has designed a standard format for log data on floppy disks. It is known as the LAS ...
Add your contribution
Related Hubs
User Avatar
No comments yet.