Cloud Optimized GeoTIFF (COG) Overview | Introduction

Overview#

Cloud Optimized GeoTIFF (COG) relies on two auxiliary technologies.

The first is the storage capability of GeoTIFF: storing pixels in a special way, rather than simply storing raw pixels directly.
The second is the range request supported by HTTP Get, which allows the client to request only the necessary portion of the file.

The GeoTIFF storage method enables the latter request to conveniently access the portion of data that needs to be processed.

Organization of GeoTIFF#

The two main data organization techniques used by COG are tiles and overviews, and data compression also makes online data transmission more efficient.

Tile slicing creates built-in slices within the image, rather than simply using stripes of data. If stripes of data are used, accessing specific data requires reading the entire dataset. Once slices can be quickly accessed in a specified area, the same request can be fulfilled by accessing only the specific part of the data.

Overviews create multiple downsampled versions of the same image. Downsampling means that when an original image is 'shrunk', many details are lost (the current 1 pixel may represent 100 or even 1000 pixels in the original image), and it also has a smaller data volume. Typically, a GeoTIFF will have multiple overviews to match different zoom levels. This speeds up server response because rendering only needs to return the specific pixel value, without needing to determine which pixel value represents those 1000 pixels, although this can increase the overall file size.

Through data compression, software can quickly access images, usually resulting in a better user experience, but making HTTP GET range requests work more efficiently is still very important.

HTTP Get Range Requests#

HTTP version 1.1 introduced a very powerful feature: range requests, used when the client requests data from the server via a GET request. If the server's response header includes Accept-Ranges: bytes, it indicates that bytes in the data can be requested in chunks by the client in any way they want. This is often referred to as "Byte Serving," and Wikipedia has an article that explains how it works in detail. The client can request the necessary bytes from the server, and in the web domain, this is widely used, for example in video services, allowing the client to operate without needing to download the entire file.

Range requests are an optional field, so the server does not have to implement it. However, most cloud service providers (Amazon, Google, Microsoft, OpenStack, etc.) offer this option in their object storage tools. Therefore, most data stored in the cloud can provide range request services.

Integration#

After introducing these two technologies, it becomes clear how the two parts work together. Tiles and overviews in GeoTIFF are stored in a defined structure in cloud files, allowing range requests to access the relevant parts of the file.

Overviews come into play when the client wants to render a quick view of the entire image, eliminating the need to download every pixel. Thus, the request becomes one for a smaller volume, pre-created overview. The specific structure of the GeoTIFF file allows the client to easily access the necessary part of the entire file on servers that support HTTP range requests.

Slicing is useful when specific parts of the entire image need to be processed or visualized. This can be part of an overview or at full resolution. It is important to note that the tiles organize all related data in the same location within the file, allowing range requests to retrieve it when needed.

If a GeoTIFF has not been 'cloud optimized' with overviews and slices, some remote operations can still be performed, but they require downloading the entire dataset or a volume of data that exceeds actual needs.

Advantages#

An increasing amount of geographic information data is being migrated to the cloud☁️, and most of it is stored in cloud service-based object storage, such as S3 or Google Cloud Storage. Traditional GIS file formats can be easily stored in the cloud, but they are no longer efficient for providing web map tile services or performing rapid data processing, often requiring the entire dataset to be downloaded elsewhere before converting to a more optimized format or loading into memory.

Cloud Optimized GeoTIFF makes data flow more efficient through some small technologies, enabling cloud service-based geographic data workflows. Online image platforms like Planet Platform and GBDX use this method to provide image services, making image processing very fast. Software using COG technology can optimize execution time by retrieving only the necessary parts of the data.

Many new geographic information software, such as GeoTrellis, Google Earth Engine, and IDAHO, also incorporate the COG concept into their software architecture. Each processing node executes image processing at high speed by retrieving parts of the COG file stream.

The impact on the existing GeoTIFF standard is not like introducing a new file format. Current software can read COG without any modifications. They do not need to have the capability to process stream files; they only need to download the entire file and read it.

Providing Cloud Optimized GeoTIFF format files in the cloud can help reduce a significant amount of file copying. Online software can use stream files without needing to maintain their own copies, making it more efficient and a common pattern today. Additionally, data providers do not need to offer data in multiple formats, as both legacy and modern software can read this data. Data providers only need to update one version of the data, without unnecessary copies and downloads, allowing multiple online software to use it simultaneously.

QUICK START#

Introduction#

This tutorial explains how developers can use and produce Cloud Optimized GeoTIFF.

Reading#

The simplest way to use it is through GDAL's VSI Curl feature. You can read the GDAL Wiki section on How to read it with GDAL. Most geographic information software today uses GDAL as a dependency, so incorporating GDAL is the fastest way to read COG functionality.

On Planet, all data is already in COG format, and there is a small tutorial on downloading: download part of an image. Most tutorials only cover how to use the Planet API but also explain how GDAL Warp can extract a single working area from a large COG file.

Creating#

Also on the GDAL wiki about COG, see How to generate it with GDAL.

$ gdal_translate in.tif out.tif -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=DEFLATE

Or use the rio-cogeo plugin:

$ rio cogeo create in.tif out.tif --cog-profile deflate

Many other geographic information software should also be able to add appropriate thumbnails and slices.

Validation#

Using the rio-cogeo plugin:

$ rio cogeo validate test.tif

References#

https://www.cogeo.org/