Skip to main content

Science Data Processing Software

Programmatic Access of EOSDIS DAAC Hosted Services

User Documentation

Updated 09/15/2017

Overview

What Is EGI Programmatic Access ?

Programmatic Access is a capability enhancement to the Data Access services at EOSDIS Service Interface (ESI) enabled DAACs. (The ESI enabled DAACs are NSIDC, LP-DAAC and ASDC.)
As seen in the diagram below, it adds an ability for EGI (the ESI Gateway Interface) to access CMR (the Common Metadata Repository), and extends the accessible protocols to include WCS (Web Coverage Service) compatibility.
This improves the scriptability of the EGI component, which is the exposed user program interface into the data access services.

The Programmatic Access interface is used to locate and access DAAC hosted data, optionally performing services on the data. The resulting output data is synchronously streamed back to the user, either as a single file or a multi-file zip. It combines into one interface functions that formerly required multiple interfaces: searching CMR for science granules of interest, and then obtaining the science data files with optional services applied.

The EOSDIS Services Gateway Interface Context

What does it look like ?

EGI Programmatic Access is exposed as a synchronous REST interface using the HTTP protocol. It takes the form of an HTTP URL, containing a series of key-value-pairs (KVPs) that specify the operands and the operations performed.
The URL incorporates these elements:

  • the EGI endpoint
  • KVPs that control a CMR search for granules (collection, time, spatial constraints)
  • KVPs that identify the services to be performed for each granule (subsetting, reformatting, etc)
  • KVPs to provide administrative control and information (token, paging)

Here is an example:

    https://n5eil01u.ecs.nsidc.org/egi/request?short_name=MOD10A1
        &version=006
        &time=2016-07-01,2016-07-02
        &format=GeoTIFF
        &token=D3B96CDF-8E20-09C3-3FC6-9D3656610F19

Description: This request accesses the egi services at the NSIDC DAAC (https://n5eil01u.ecs.nsidc.org/egi/)
to make a request to use CMR to find data of interest identified as the MODIS product MOD10A1 version 006, obtained on July 1 2016
(using the KVPs short_name=MOD10A1&version=006&time=2016-07-01,2016-07-02)
and to return the resulting data files formatted as geotiffs
(using the KVP format=GeoTIFF). Use the specified token as authentication of the requester (token=D3B96CDF-8E20-09C3-3FC6-9D3656610F19).

This request and its response can be communicated with the DAAC using any HTTP client program, including browsers, command line utilities, and custom programs. A flexible command line utility named “curl” is commonly used for this purpose. The examples that follow show how to use curl to request programmatic access services and receive the resulting output files. Alternatively, a user can write a custom program in any language to implement their desired workflow, using a curl library or the native http communication functions of the language. An example demonstrating programmatic access of CMR over HTTP written in Python can be found here: https://git.earthdata.nasa.gov/projects/HDS/repos/cmr/. This can be readily extended to encompass the EGI Programmatic Access functionality described herein.

The EGI Programmatic Access request incorporates parameters to control the CMR query and parameters to specify data processing services. The EGI component performs the interpretation of the parameters and passes the appropriate query request to CMR. The query response flow can follow one of several paths:

  1. CMR Only
    If there are no parameters specifying data services, then this is a CMR-only request and EGI responds with a redirect url (HTTP response code 303). The HTTP client should follow this url to perform the request directly with CMR.

  2. Single File Result
    If the final result of the CMR query and the requested processing results in a single file, that file is synchronously streamed back to the requester, with the appropriate file name (as determined by the requested services) provided in the HTTP response header. The client should save the file with that name.

  3. Multi-file Result
    If the final result of the CMR query and the requested processing produces multiple files, they are bundled together and returned as a zip file with a unique file name (determined by the ESI job number) passed in the HTTP response header. The client should save the file with that name.

Each of these flows needs a slightly different response logic in the requesting client program.
We will look at examples of handling this flow using curl in a shell script.
Attached is an example shell script named pa.sh that allows easy exercising of programmatic access for ad-hoc queries.

Using curl to Communicate With EGI

The Curl Program

Curl is an extensive program (and library) that supports many protocols and options. Here we are focusing only on the HTTP protocol and options that are useful for programmatic access. For more documentation, see the curl man page on your system.
Some of the program options used below require curl version 7.20.0 (2010) or later. Note that Red Hat Release 6.8 includes curl 7.19.7. A newer version is required. We recommend the latest stable version. We have developed these examples using curl 7.49.1 on Red Hat and curl 7.43.0 on Mac OS X 10.11.6.

Curl usage for programmatic access is:

    curl <options> <endpoint>/request?<parameters separated with &>

The following options have proven useful in developing these examples.

    -v       verbose, prints internal details of performing the curl request, used for debugging
    -s       silent, suppresses all unnecessary information output, used when not debugging
    -L       follow the referral to a new location, automatically handles the Code 303 redirect  
             We do not use –L in the example script because the CMR-only query does
             not provide a file name, causing curl to provide a default file name 
             built from the query url. Instead, we handle the redirect in the script logic.
    -O       save the result in a local file
    -J       name the saved file according to the returned header, used with -O
    -w       write specified info to stdout, used to capture return codes, file names and urls
    -i       show returned headers, useful for debugging, not used in final script
    --dump-header       include the HTTP header in the output, used to get the returned file name
    --socks5-hostname   send HTTP connection through proxy, used for convenience 
                        in our development environment

The Example Script

The pa.sh example shell script attached implements the EGI Programmatic Access Request/Response flow. The script must be edited to utilize the desired EGI endpoint. The HOST and MODE variables should be changed to match the target environment. (Note, this is a bash shell script developed for use on a Linux host.)

The example shell script implements this logic sequence:

  1. Perform the HTTP request using curl

    curl -s -O -J -w " %{return_code} %{url_effective} %{redirect_url}" \  
    "EGI-endpoint-parameters url" > HTTP-response-file.txt
    
  2. Examine the curl status return code
    Non-zero status indicates curl encountered an error, so we display this information and exit. The most common problems are “connection timeout” and “file already exists” when attempting to write the output file.

  3. Examine the HTTP response status code
    a. if return_code=200 then all is OK and the result is in the correctly name output file (from the OJ options)
    b. if return_code=303 then we have a redirect case; examine the redirect_url

    • if the redirect url is for CMR, then curl it and flow CMR results to the screen. Note that in this case we also clean up the empty output file created by the initial curl.
    • if the redirect url is not for CMR, then curl it with -O -J to save result with the correct EGI zip file name

    c. otherwise, any other return code indicates an error and the output file should be the xml format error info

Programmatic Access Parameters

Programmatic Access parameters can be EGI parameters, CMR parameters, or a mix of both. Any parameter not recognized as EGI is passed to CMR. The parameters are represented as Key-Value Pairs (KVPs), written as KEY=value. Multiple KVPs in a url are separated with an ampersand.

If there are only CMR parameters and no EGI parameters, then this is a CMR-only query and EGI returns a 303 “See Other” code, providing a redirection url pointing to CMR. Following this url will give a result back to the user directly from CMR. (The curl –L option can be used to follow the redirect, or you can follow it using program logic based on the returned 303 code as shown in the example script.)

EGI is currently being enhanced to provide an OGC Web Coverage Service (WCS) compliant service interface that will allow WCS compatible clients access to DAAC hosted ECS data.
The WCS protocol specification defines a core set of requests supporting web based retrieval of coverages: GetCapabilities, DescribeCoverage, and GetCoverage.
The EGI KVP parameter keywords used for Programmatic Access are compatible with the WCS 1.0.0 GetCoverage request parameter keywords. However, a Programmatic Access client does not need to implement the full WCS protocol.
The client only needs to submit an HTTP web url that includes data selection criteria and processing options, using KVPs described in the tables below.

The KVPs that are passed to CMR determine the coverages to be processed, and correspond to a WCS client endpoint definition. Although the Programmatic Access KVPs can appear in any order in the url, by placing the CMR parameters first on the line the url will resemble the endpoint configuration that would be used by a WCS client.

Granules identified by the results of the CMR query are then processed according to the EGI parameters. The results are streamed back to the user and should be saved as a file. When there are multiple files in the processed results, the returned stream is in zip format. (In curl, use –OJ to automatically save this file using the EGI provided name.)

Some WCS keywords are implemented for use by WCS clients and are not necessary for Programmatic Access use. Note also that the full set of CMR parameters can be used for CMR-only queries.


The following table gives some useful CMR parameters; however any CMR supported parameters can be used.

Useful CMR parameters: (WCS Endpoint Parameters)

Short_name=aaaa

Specifies the short name of the collection used to find granules for the coverage requested. Can be used multiple times to return granules from multiple collections

Version=nnn

Specifies collection version. The version is treated like a string and must match the version field for that collection in CMR. Multiple versions can be specified.
Note: the Version parameter is also used by WCS clients to specify the version of WCS. When version=1.0.0, EGI understands it to be specifying the WCS versions. Otherwise Version is passed to CMR as a search parameter.

Discussion: The format of the version parameter depends on the the metadata that was provided to CMR when the collection was created. See the FAQ.

Updated_since=<datetime>

Can be used to find granules recently updated in CMR.
Example datetime: 2016-09-01T12:00:00Z

Bounding_box=n,n,n,n

This specifies a search filter to find only granules having a spatial extent that overlaps this bounding box, specified in decimal degrees of latitude and longitude.
Order is lower left long, lower left lat, upper right long, upper right lat.
This order is referred to as WSEN and is the same order as used for the EGI subsetting Bbox.

Time=<datetime>,<datetime>

Specify data datetime range filter for the CMR query. "Time" is the WCS compatible equivalent of the CMR "Temporal" parameter.
Note, see the Time keyword for EGI described below for temporal subsetting usage.

The start and end time values should be specified as a compound date and time: year-month-dayThours:minutes:secondsZ. Year-month-day is mandatory, the time part is optional. Year-month-day must use 4 digits for the year, two digits for the month and two digits for the day, separated by hyphens. If the time part is used, it must start with T and it must include three subfields for hours, minutes and seconds, separated with colons. The trailing Z is optional. It is standardly used to indicate GMT time, however all time parameters used in EGI Programmatic Access are GMT times. If the time part is not used, it is equivalent to T00:00:00.

The "time=" KVP must always contain two datetime values to specify start and end of a time window, separated with a comma. Here are some valid examples:
time=2016-01-01,2016-01-02
time=2016-01-01T12:00:00Z,2016-01-01T18:00:00Z
time=2016-01-01T00:00:00,2016-01-01T23:59:59

sort_key[]=<sort-option>

This is a CMR parameter to control the sort order of the returned results. Sort options are described here in the CMR documentation.
Example: sort granules by data coverage date in reverse order (newest first):
sort_key[]=-start_date



Useful EGI parameters: (WCS GetCoverage Parameters)
Coverage=/group/sub-group/sub-sub-group/dataset

WCS: Used to specify the coverage to be processed. Specifies the subset data layer or group for Parameter Subsetting. Multiple datasets can be specified separated by commas.

The dataset value always starts with a slash and the group-subgroup hierarchy are separated with slashes. If only a group or subgroup is specified, all lower level datasets are included in the processing.

Bbox=<W>,<S>,<E>,<N>

WCS: Bounding Box used for spatial subsetting. Coordinates are in decimal degrees. This is the same order as used for the CMR spatial filter parameter

Time=<start_datetime>,<end_datetime>

WCS Keyword: Specify data datetime range filter for the CMR query.
Note, the Time keyword is a shared CMR/egi keyword that also invokes temporal subsetting/stitching for applicable data sets.

WCS: Used for Temporal subsetting.
Note: The Time keyword also is used as a CMR temporal filter.

Format=<format>

WCS: Optional output file format specifier used for re-formatting. Supported values vary by data type.
[GeoTIFF, HDF-EOS5, NetCDF4-CF, NetCDF-3, ASCII, HDF-EOS, KML]
If this parameter is not used, then the output format is the same as the input format (no reformatting).



Administrative and formatting parameters:
page_size=<n>

This is a CMR parameter to control the number of granules in the page of returned results.

Discussion: When there are multiple granules returned from the CMR query, they are returned in sets called pages. The default page size is 10 granules. The page_size KVP allows the user to change the page size. Multi-page results can be accessed one page at a time using the page_num KVP described below.

Note that the system configuration parameter MAX_GRANS_FOR_SYNC_REQUEST limits the size of a request in EGI. Exceeding that number returns an error. Page_size should always be set to less than or equal to this request limit.

See the FAQ for more discussion.
page_num=<n>

This is a CMR parameter to select the page of results to be processed. The page contains the number of granules selected in the page_size KVP.

Version=1.0.0 WCS Clients Only: Indicates WCS Version 1.0.0 compatibility (optional)
Request=GetCoverage WCS Clients Only: Identifies the type of WCS request (optional)
token=<token>

Allows the user to provide an Earthdata Login token. This token is used as proof that the user has been authenticated by the Earthdata Login system. It is used for: - enabling user access to ACL protected granule rsults in CMR - enabling Programmatic Access delivery of results only to authenticated users - metrics collection



EGI Programmatic Access Usage Examples

The following examples address the set of scenarios identified in the NSIDC request for the Programmatic Access capability. Note that Programmatic Access can generally be used to search CMR and access any data sets configured for ESI processing at any of the ECS DAACs that use ESI.

The examples here show the command line that can be used directly with curl, as well as the command line to invoke the sample script. Long lines are broken up showing the line continuation character \.

These examples are dependent on available data in the internal DEV07 test mode. Parameter details may need to be adjusted to work in other modes.

  • Using curl directly
    (Note that the examples below show only the HTTP request part of the curl command.)

    curl -s -O -J -w
    “%{http_code}\n%{url_effective}\n%{redirect_url}\n%{filename_effective}\n”
    –dump-header response-header.txt \
    http://f5eil01v.edn.ecs.nasa.gov/ops/egi/request?KVP&KVP&… ” \
    >HTTP-response-code.txt

  • Using the example script

    pa.sh kvp kvp kvp …


Scenario 1 SPL3SMP Spatial and Parameter Subsetting and Reformatting to GeoTIFF
SPL3SMP Characteristics
  • Name: SMAP L3 Radiometer Global Daily 36 km EASE-Grid Soil Moisture V003
  • Format: HDF-5
  • Spatial Extent: Daily Global Composite, Bounding Rectangle: (85.0445°, -180°, -85.0445°, 180°)
  • Organization: 31 data sets in one group
Service Request Description
  • find SPL3SMP granules from Jun 1-3 2016
  • select soil moisture parameter
  • spatially subset over region of interest
  • reformat into GeoTIFF
Request Examples
  • using curl

      https://n5eil01u.ecs.nsidc.org/egi/request?short_name=SPL3SMP\  
            &version=003\  
            &time=2015-03-30,2015-04-20\  
            &Subset_Data_Layers=/Soil_Moisture_Retrieval_Data/soil_moisture \  
            &Bbox=100,-20,140,20 \  
            &format=GeoTIFF
    
  • using sample script

      ./pa.sh short_name=spl3smp \
            version=003 \
            time=2015-03-30,2015-04-20 \
            Subset_Data_Layers=/Soil_Moisture_Retrieval_Data/soil_moisture \
            Bbox=100,-20,140,20 \
            format=GeoTIFF
    

Scenario 2 SPL2SMA Reformatting to geotiff
SPL2SMA Characteristics
  • Name: SMAP L2 Radar Half-Orbit 3 km EASE-Grid Soil Moisture V003
  • Format: HDF-5
  • Spatial Extent: Single Orbit Swath, Bounding Rectangle: (85.0445°, -180°, -85.0445°, 180°)
  • Organization: 80 data sets in four groups
Service Request Description
  • find SLP3SMP granules within a temporal window
  • select the Radar_Data group
  • spatially subset to region of interest
  • reformat to GeoTIFF
Request Examples
  • using curl

      https://n5eil01u.ecs.nsidc.org/egi/request?short_name=SPL3SMA\  
            &version=003\  
            &time=2015-04-20,2015-04-27\  
            &Subset_Data_Layers=/Radar_Data\  
            &Bbox=60,10,100,30\  
            &format=GeoTIFF
    
  • using sample script

      ./pa.sh short_name=spl3sma \
            version=003 \
            time=2015-04-20,2015-04-27 \
            Subset_Data_Layers=/Radar_Data \
            Bbox=60,10,100,30 \
            Format=GeoTIFF
    

Scenario 3 GLAH12 Spatial and Parametric Subsetting
GLAH12 Characteristics

Name: GLAS/ICESat L2 Global Antarctic and Greenland Ice Sheet Altimetry Data (HDF5) V034
Format: HDF5
Spatial Coverage: Global Extent Bounding Rectangle: (90.0°, -180.0°, -90.0°, 180.0°)
Organization: 173 Datasets in 5 major groups; Each granule contains 14 Orbits

Service Request Description

find GLAH12 version 034 granules with data date of April 12-13 2007
select only the 1HZ data group
spatially subset to a lat-lon box

Request Examples
  • using curl

      https://n5eil01u.ecs.nsidc.org/egi/request?short_name=GLAH12\  
            &version=034\  
            &time=2007-04-12T00:00:00,2007-04-14T00:00:00\  
            &Coverage=/Data_1HZ\  
            &bbox=0,-80,100,80
    
  • using sample script

      pa.sh short_name=glah12 \
            version=034 \
            time=2007-04-12T00:00:00,2007-04-14T00:00:00 \
            Coverage=/Data_1HZ \
            Bbox=0,-80,100,80
    

Scenario 4 MOD10A1 reformatting to GeoTIFF, plus spatial and parameter subsetting
MOD10A1 Characteristics

Name: MODIS/Terra Snow Cover Daily L3 Global 500m SIN Grid V006 VERSION 006
Format: HDF4
Spatial Extent: MODIS Sinusoidal Tile Grid

Service Request Description

Find MOD10A1 version 6 granules from January 1 2011 between 00:00 and 02:00
Over Eastern Asia (130,30,140,85)
Select the NDSI_Snow_Cover dataset
Spatially subset to the desired area (130,30,140,85)
Convert to GeoTIFF

Request Examples
  • using curl

      https://n5eil01u.ecs.nsidc.org/egi/request?short_name=MOD10A1\  
            &version=006\  
            &time=2011-01-01,2011-01-01T02:00:00\  
            &bounding_box=130,30,140,85\  
            &Subset_Data_Layers=/MOD_Grid_Snow_500m/NDSI_Snow_Cover\  
            &Bbox=130,30,140,85\  
            &format=GeoTIFF
    
  • using sample script

      ./pa.sh short_name=MOD10A1 \  
            version=006 \  
            time=2011-01-01,2011-01-01T02:00:00 \  
            bounding_box=130,30,140,85 \  
            Subset_Data_Layers=/MOD_Grid_Snow_500m/NDSI_Snow_Cover \  
            Bbox=130,30,140,85 \  
            format=GeoTIFF
    

Scenario 5 ICESat, ICEbridge, ICESat-2 (preliminary)

ICESAT-2 users would like to subset data over an area and get data from ICESAT-2, ICESAT-1 and potentially ICE-Bridge.

  • to be provided
Request Examples
  • using curl

  • using sample script


Responses and Errors

Two types of errors can be encountered: errors encountered by the operating system when executing the curl program, and errors encountered by the curl program when communicating using the HTTP protocol. To detect either kind of error, the return codes should always be checked.

  1. curl execution errors
    This type of error is returned to the script from the curl program through the shell exit code. The following are most likely causes. The complete list of curl error codes is found here: https://curl.haxx.se/libcurl/c/libcurl-errors.html.
    * curl return code 7 timeout - could not establish connection to remote http server
    * curl return code 23, unable to write output files, file already exists

  2. HTTP protocol return codes
    * HTTP Return Code 200 “OK”, indicates success
    * HTTP Return Code 201 “Resource Created”, indicates xml error response file was provided
    This happens when the service request successfully executed, but no output was generated. See details in the response file. Common causes are:

    • no data found in the requested spatial or temporal subset
    • requested data layer was not found
  • HTTP Return Code 303 “See Other”, this indicates a redirect to another url
    This happens in the following cases:

    • a CMR-only query
    • a request that matched no granules in the CMR query
  • HTTP Return Code 400 “Bad Request”

    • Collection not configured for that operation
    • request exceeds configured limits
    • Parameter not recognized by CMR
  • HTTP Return Code 404 “Not Found”

    • Requested processing produced no output data

EGI Programmatic Access Configuration and Hardening

The Programmatic Access capability is configured by the DAAC staff specifically for each execution environment (mode).

The Programmatic Access capability also includes features for hardening the interface to mitigate the risk of user activities affecting DAAC operations.

EGI Programmatic Access System Configuration

Each DAAC operational mode is configured as a distinct data provider to a particular instance of CMR.
The programmatic Access components are configured so that the query communicates with the correct instance of CMR (identified by the endpoint) and searches the correct provider metadata (identified by the provider= parameter in the url).

This information is seen in the 303 redirect url when a CMR-only query is performed. For example, this CMR-only query redirect url is indicated in the http response header, showing that the CMR endpoint is the sit instance of CMR, and the search is for metadata from provider DEV07.

    REDIR='https://cmr.sit.earthdata.nasa.gov/search/granules?provider=DEV07&short_name=glah12&version=34'

The EGI endpoint to be used for the Programmatic Access request is also configured for the operational mode. This is the endpoint that must be used in the HTTP request url. For example:

    EDF DEV02 endpoint: http://f5eil01v.edn.ecs.nasa.gov/dev02/egi/request?
Hardening Considerations

The EGI components that implement Programmatic Access have been hardened to protect the DAAC systems from excessive load.

Here are items that the Programmatic Access user should be aware of.

  1. Transaction Size Limit
    A single request can result in a large number of granules returned from CMR. A configuration parameter allows DAAC operators to set a limit on how many granules can be processed in a single request. An end user who needs to process more than the limit can submit multiple requests, using the CMR page_size and page_num parameters.

  2. Cancel Request
    If a Programmatic Access request for numerous granules is consuming too many resources (such as CPU or memory), the DAAC operations staff can cancel the request. This takes effect after the current granule finishes processing.

References:

CMR search API
https://cmr.earthdata.nasa.gov/search/site/search_api_docs.html

ESI Service API
https://wiki.earthdata.nasa.gov/pages/viewpage.action?pageId=74515236

curl Documentation
https://curl.haxx.se/