15. GSI Tools
Gridpoint Statistical Interpolation (GSI) diagnostic files are binary files written out from the data assimilation code before the first and after each outer loop. The files contain useful information about how a single observation was used in the analysis by providing details such as the innovation (O-B), observation values, observation error, adjusted observation error, and quality control information.
For more detail on generating GSI diagnostic files and their contents, see the GSI User’s Guide.
When MET reads GSI diagnostic files, the innovation (O-B; generated prior to the first outer loop) or analysis increment (O-A; generated after the final outer loop) is split into separate values for the observation (OBS) and the forecast (FCST), where the forecast value corresponds to the background (O-B) or analysis (O-A).
MET includes two tools for processing GSI diagnostic files. The GSID2MPR tool reformats individual GSI diagnostic files into the MET matched pair (MPR) format, similar to the output of the Point-Stat tool. The GSIDENS2ORANK tool processes an ensemble of GSI diagnostic files and reformats them into the MET observation rank (ORANK) line type, similar to the output of the Ensemble-Stat tool. The output of both tools may be passed to the Stat-Analysis tool to compute a wide variety of continuous, categorical, and ensemble statistics.
15.1. GSID2MPR tool
This section describes how to run the GSID2MPR tool. The GSID2MPR tool reformats one or more GSI diagnostic files into an ASCII matched pair (MPR) format, similar to the MPR output of the Point-Stat tool. The output MPR data may be passed to the Stat-Analysis tool to compute a wide variety of continuous or categorical statistics.
15.1.1. gsid2mpr usage
The usage statement for the GSID2MPR tool is shown below:
Usage: gsid2mpr
gsi_file_1 [gsi_file_2 ... gsi_file_n]
[-swap]
[-no_check_dup]
[-channel n]
[-set_hdr col_name value]
[-suffix string]
[-outdir path]
[-log file]
[-v level]
gsid2mpr has one required argument and accepts several optional ones.
15.1.1.1. Required arguments for gsid2mpr
The gsi_file_1 [gsi_file2 … gsi_file_n] argument indicates the GSI diagnostic files (conventional or radiance) to be reformatted.
15.1.1.2. Optional arguments for gsid2mpr
The -swap option switches the endianness when reading the input binary files.
The -no_check_dup option disables the checking for duplicate matched pairs which slows down the tool considerably for large files.
The -channel n option overrides the default processing of all radiance channels with the values of a comma-separated list.
The -set_hdr col_name value option specifies what should be written to the output header columns.
The -suffix string option overrides the default output filename suffix (.stat).
The -outdir path option overrides the default output directory (./).
The -log file option outputs log messages to the specified file.
The -v level option overrides the default level of logging (2).
An example of the gsid2mpr calling sequence is shown below:
gsid2mpr diag_conv_ges.mem001 \
-set_hdr MODEL GSI_MEM001 \
-outdir out
In this example, the GSID2MPR tool will process a single input file named diag_conv_ges.mem001 file, set the output MODEL header column to GSI_MEM001, and write output to the out directory. The output file is named the same as the input file but a .stat suffix is added to indicate its format.
15.1.2. gsid2mpr output
The GSID2MPR tool performs a simple reformatting step and thus requires no configuration file. It can read both conventional and radiance binary GSI diagnostic files. Support for additional GSI diagnostic file type may be added in future releases. Conventional files are determined by the presence of the string conv in the filename. Files that are not conventional are assumed to contain radiance data. Multiple files of either type may be passed in a single call to the GSID2MPR tool. For each input file, an output file will be generated containing the corresponding matched pair data.
The GSID2MPR tool writes the same set of MPR output columns for the conventional and radiance data types. However, it also writes additional columns at the end of the MPR line which depend on the input file type. Those additional columns are described in the following tables.
GSI DIAGNOSTIC CONVENTIONAL MPR OUTPUT FILE |
||
---|---|---|
Column Number |
Column Name |
Description |
1-37 |
Standard MPR columns described in Table 11.20. |
|
38 |
OBS_PRS |
Model pressure value at the observation height (hPa) |
39 |
OBS_ERR_IN |
PrepBUFR inverse observation error |
40 |
OBS_ERR_ADJ |
read_PrepBUFR inverse observation error |
41 |
OBS_ERR_FIN |
Final inverse observation error |
42 |
PREP_USE |
read_PrepBUFR usage |
43 |
ANLY_USE |
Analysis usage (1 for yes, -1 for no) |
44 |
SETUP_QC |
Setup quality control |
45 |
QC_WGHT |
Non-linear quality control relative weight |
GSI DIAGNOSTIC RADIANCE MPR OUTPUT FILE |
||
---|---|---|
Column Number |
Column Name |
Description |
1-37 |
Standard MPR columns described in Table 11.20. |
|
38 |
CHAN_USE |
Channel used (1 for yes, -1 for no) |
39 |
SCAN_POS |
Sensor scan position |
40 |
SAT_ZNTH |
Satellite zenith angle (degrees) |
41 |
SAT_AZMTH |
Satellite azimuth angle (degrees) |
42 |
SUN_ZNTH |
Solar zenith angle (degrees) |
43 |
SUN_AZMTH |
Solar azimuth angle (degrees) |
44 |
SUN_GLNT |
Sun glint angle (degrees) |
45 |
FRAC_WTR |
Fractional coverage by water |
46 |
FRAC_LND |
Fractional coverage by land |
47 |
FRAC_ICE |
Fractional coverage by ice |
48 |
FRAC_SNW |
Fractional coverage by snow |
49 |
SFC_TWTR |
Surface temperature over water (K) |
50 |
SFC_TLND |
Surface temperature over land (K) |
51 |
SFC_TICE |
Surface temperature over ice (K) |
52 |
SFC_TSNW |
Surface temperature over snow (K) |
53 |
TSOIL |
Soil temperature (K) |
54 |
SOILM |
Soil moisture |
55 |
LAND_TYPE |
Surface land type |
56 |
FRAC_VEG |
Vegetation fraction |
57 |
SNW_DPTH |
Snow depth |
58 |
SFC_WIND |
Surface wind speed (m/s) |
59 |
FRAC_CLD CLD_LWC |
Cloud fraction (%) |
60 |
CTOP_PRS TC_PWAT |
Cloud top pressure (hPa) |
61 |
TFND |
Foundation temperature: Tr |
62 |
TWARM |
Diurnal warming: d(Tw) at depth zob |
63 |
TCOOL |
Sub-layer cooling: d(Tc) at depth zob |
64 |
TZFND |
d(Tz)/d(Tr) |
65 |
OBS_ERR |
Inverse observation error |
66 |
FCST_NOBC |
Brightness temperature with no bias correction (K) |
67 |
SFC_EMIS |
Surface emissivity |
68 |
STABILITY |
Stability index |
69 |
PRS_MAX_WGT |
Pressure of the maximum weighing function |
The gsid2mpr output may be passed to the Stat-Analysis tool to derive additional statistics. In particular, users should consider running the aggregate_stat job type to read MPR lines and compute partial sums (SL1L2), continuous statistics (CNT), contingency table counts (CTC), or contingency table statistics (CTS). Stat-Analysis has been enhanced to parse any extra columns found at the end of the input lines. Users can filter the values in those extra columns using the -column_thresh, -column_str, and -column_str_exc job command options.
An example of the Stat-Analysis calling sequence is shown below:
stat_analysis -lookin diag_conv_ges.mem001.stat \
-job aggregate_stat -line_type MPR -out_line_type CNT \
-fcst_var t -column_thresh ANLY_USE eq1
In this example, the Stat-Analysis tool will read MPR lines from the input file named diag_conv_ges.mem001.stat, retain only those lines where the FCST_VAR column indicates temperature (t) and where the ANLY_USE column has a value of 1.0, and derive continuous statistics.
15.2. GSIDENS2ORANK tool
This section describes how to run the GSIDENS2ORANK tool. The GSIDENS2ORANK tool processes an ensemble of GSI diagnostic files and reformats them into the MET observation rank (ORANK) line type, similar to the output of the Ensemble-Stat tool. The ORANK line type contains ensemble matched pair information and is analogous to the MPR line type for a deterministic model. The output ORANK data may be passed to the Stat-Analysis tool to compute ensemble statistics.
15.2.1. gsidens2orank usage
The usage statement for the GSIDENS2ORANK tool is shown below:
Usage: gsidens2orank
ens_file_1 ... ens_file_n | ens_file_list
-out path
[-ens_mean path]
[-swap]
[-rng_name str]
[-rng_seed str]
[-set_hdr col_name value]
[-log file]
[-v level]
gsidens2orank has three required arguments and accepts several optional ones.
15.2.1.1. Required arguments for gsidens2orank
The ens_file_1 … ens_file_n argument is a list of ensemble binary GSI diagnostic files to be reformatted.
The ens_file_list argument is an ASCII file containing a list of ensemble GSI diagnostic files.
The -out path argument specifies the name of the output .stat file.
15.2.1.2. Optional arguments for gsidens2orank
The -ens_mean path option is the ensemble mean binary GSI diagnostic file.
The -swap option switches the endianness when reading the input binary files.
The -channel n option overrides the default processing of all radiance channels with a comma-separated list.
The -rng_name str option overrides the default random number generator name (mt19937).
The -rng_seed str option overrides the default random number generator seed.
The -set_hdr col_name value option specifies what should be written to the output header columns.
The -log file option outputs log messages to the specified file.
The -v level option overrides the default level of logging (2).
An example of the gsidens2orank calling sequence is shown below:
gsidens2orank diag_conv_ges.mem* \
-ens_mean diag_conv_ges.ensmean \
-out diag_conv_ges_ens_mean_orank.txt
In this example, the GSIDENS2ORANK tool will process all of the ensemble members whose file name matches diag_conv_ges.mem*, write output to the file named diag_conv_ges_ens_mean_orank.txt, and populate the output ENS_MEAN column with the values found in the diag_conv_ges.ensmean file rather than computing the ensemble mean values from the ensemble members on the fly.
15.2.2. gsidens2orank output
The GSIDENS2ORANK tool performs a simple reformatting step and thus requires no configuration file. The multiple files passed to it are interpreted as members of the same ensemble. Therefore, each call to the tool processes exactly one ensemble. All input ensemble GSI diagnostic files must be of the same type. Mixing conventional and radiance files together will result in a runtime error. The GSIDENS2ORANK tool processes each ensemble member and keeps track of the observations it encounters. It constructs a list of the ensemble values corresponding to each observation and writes an output ORANK line listing the observation value, its rank, and all the ensemble values. The random number generator is used by the GSIDENS2ORANK tool to randomly assign a rank value in the case of ties.
The GSID2MPR tool writes the same set of ORANK output columns for the conventional and radiance data types. However, it also writes additional columns at the end of the ORANK line which depend on the input file type. The extra columns are limited to quantities which remain constant over all the ensemble members and are therefore largely a subset of the extra columns written by the GSID2MPR tool. Those additional columns are described in the following tables.
GSI DIAGNOSTIC CONVENTIONAL ORANK OUTPUT FILE |
||
---|---|---|
Column Number |
Column Name |
Description |
1-? |
Standard ORANK columns described in Table 13.7. |
|
Last-2 |
N_USE |
Number of members with ANLY_USE = 1 |
Last-1 |
PREP_USE |
read_PrepBUFR usage |
Last |
SETUP_QC |
Setup quality control |
GSI DIAGNOSTIC RADIANCE ORANK OUTPUT FILE |
||
---|---|---|
Column Number |
Column Name |
Description |
1-? |
Standard ORANK columns described in Table 13.7. |
|
Last-24 |
N_USE |
Number of members with OBS_QC = 0 |
Last-23 |
CHAN_USE |
Channel used (1 for yes, -1 for no) |
Last-22 |
SCAN_POS |
Sensor scan position |
Last-21 |
SAT_ZNTH |
Satellite zenith angle (degrees) |
Last-20 |
SAT_AZMTH |
Satellite azimuth angle (degrees) |
Last-19 |
SUN_ZNTH |
Solar zenith angle (degrees) |
Last-18 |
SUN_AZMTH |
Solar azimuth angle (degrees) |
Last-17 |
SUN_GLNT |
Sun glint angle (degrees) |
Last-16 |
FRAC_WTR |
Fractional coverage by water |
Last-15 |
FRAC_LND |
Fractional coverage by land |
Last-14 |
FRAC_ICE |
Fractional coverage by ice |
Last-13 |
FRAC_SNW |
Fractional coverage by snow |
Last-12 |
SFC_TWTR |
Surface temperature over water (K) |
Last-11 |
SFC_TLND |
Surface temperature over land (K) |
Last-10 |
SFC_TICE |
Surface temperature over ice (K) |
Last-9 |
SFC_TSNW |
Surface temperature over snow (K) |
Last-8 |
TSOIL |
Soil temperature (K) |
Last-7 |
SOILM |
Soil moisture |
Last-6 |
LAND_TYPE |
Surface land type |
Last-5 |
FRAC_VEG |
Vegetation fraction |
Last-4 |
SNW_DPTH |
Snow depth |
Last-3 |
TFND |
Foundation temperature: Tr |
Last-2 |
TWARM |
Diurnal warming: d(Tw) at depth zob |
Last-1 |
TCOOL |
Sub-layer cooling: d(Tc) at depth zob |
Last |
TZFND |
d(Tz)/d(Tr) |
The gsidens2orank output may be passed to the Stat-Analysis tool to derive additional statistics. In particular, users should consider running the aggregate_stat job type to read ORANK lines and ranked histograms (RHIST), probability integral transform histograms (PHIST), and spread-skill variance output (SSVAR). Stat-Analysis has been enhanced to parse any extra columns found at the end of the input lines. Users can filter the values in those extra columns using the -column_thresh, -column_str, and -column_str_exc job command options.
An example of the Stat-Analysis calling sequence is shown below:
stat_analysis -lookin diag_conv_ges_ens_mean_orank.txt \
-job aggregate_stat -line_type ORANK -out_line_type RHIST \
-by fcst_var -column_thresh N_USE eq20
In this example, the Stat-Analysis tool will read ORANK lines from diag_conv_ges_ens_mean_orank.txt, retain only those lines where the N_USE column indicates that all 20 ensemble members were used, and write ranked histogram (RHIST) output lines for each unique value of encountered in the FCST_VAR column.