23. TC-Pairs Tool

23.1. Introduction

The TC-Pairs tool provides verification for tropical cyclone forecasts in ATCF file format. It matches an ATCF format tropical cyclone (TC) forecast with a second ATCF format reference TC dataset (most commonly the Best Track analysis). The TC-Pairs tool processes both track and intensity adeck data and probabilistic edeck data. The adeck matched pairs contain position errors, as well as wind, sea level pressure, and distance to land values for each TC dataset. The edeck matched pairs contain probabilistic forecast values and the verifying observation values. The pair generation can be subset based on user-defined filtering criteria. Practical aspects of the TC-Pairs tool are described in Section 23.2.

23.2. Practical information

This section describes how to configure and run the TC-Pairs tool. The TC-Pairs tool is used to match a tropical cyclone model forecast to a corresponding reference dataset. Both tropical cyclone forecast/reference data must be in ATCF format. Output from the TC-dland tool (NetCDF gridded distance file) is also a required input for the TC-Pairs tool. It is recommended to run tc_pairs on a storm-by-storm basis, rather than over multiple storms or seasons to avoid memory issues.

23.2.1. tc_pairs usage

The usage statement for tc_pairs is shown below:

Usage: tc_pairs
       -adeck source and/or -edeck source
       -bdeck source
       -config file
       [-out base]
       [-log file]
       [-v level]

tc_pairs has required arguments and can accept several optional arguments.

23.2.1.1. Required arguments for tc_pairs

The -adeck source argument indicates the adeck TC-Pairs acceptable format data source containing tropical cyclone model forecast (output from tracker) data to be verified. Acceptable data formats are limited to the standard ATCF format and the one column modified ATCF file, generated by running the tracker in genesis mode. It specifies the name of a TC-Pairs acceptable format file or top-level directory containing TC-Pairs acceptable format files ending in “.dat” to be processed. The -adeck or -edeck option must be used at least once.
The -edeck source argument indicates the edeck ATCF format data source containing probabilistic track data to be verified. It specifies the name of an ATCF format file or top-level directory containing ATCF format files ending in “.dat” to be processed. The -adeck or -edeck option must be used at least once.
The -bdeck source argument indicates the TC-Pairs acceptable format data source containing the tropical cyclone reference dataset to be used for verifying the adeck source. This source is typically the NHC Best Track Analysis, but could be any TC-Pairs acceptable formatted reference. The acceptable data formats for bdecks are the same as those for adecks. This argument specifies the name of a TC-Pairs acceptable format file or top-level directory containing TC-Pairs acceptable format files ending in “.dat” to be processed.
The -config file argument indicates the name of the configuration file to be used. The contents of the configuration file are discussed below.

23.2.1.2. Optional arguments for tc_pairs

The -out base argument indicates the path of the output file base. This argument overrides the default output file base (./out_tcmpr).
The -log file option directs output and errors to the specified log file. All messages will be written to that file as well as standard out and error. Thus, users can save the messages without having to redirect the output on the command line. The default behavior is no log file.
The -v level option indicates the desired level of verbosity. The contents of “level” will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity above 1 will increase the amount of logging.

This tool currently only supports the rapid intensification (RI) edeck probability type but support for additional edeck probability types will be added in future releases. At least one -adeck or -edeck option must be specified. The -adeck, -edeck, and -bdeck options may optionally be followed with suffix=string to append that string to all model names found within that data source. This option may be useful when processing track data from two different sources which reuse the same model names.

An example of the tc_pairs calling sequence is shown below:

tc_pairs -adeck aal092010.dat -bdeck bal092010.dat -config TCPairsConfig

In this example, the TC-Pairs tool matches the model track (aal092010.dat) and the best track analysis (bal092010.dat) for the 9th Atlantic Basin storm in 2010. The track matching and subsequent error information is generated with configuration options specified in the TCPairsConfig file.

The TC-Pairs tool implements the following logic:

Parse the adeck, edeck, and bdeck data files and store them as track objects.
Apply configuration file settings to filter the adeck, edeck, and bdeck track data down to a subset of interest.
Apply configuration file settings to derive additional adeck track data, such as interpolated tracks, consensus tracks, time-lagged tracks, and statistical track and intensity models.
For each adeck track that was parsed or derived, search for a matching bdeck track with the same basin and cyclone number and overlapping valid times. If not matching against the BEST track, also ensure that the model initialization times match.
For each adeck/bdeck track pair, match up their track points in time, lookup distances to land, compute track location errors, and write an output TCMPR line for each track point.
For each set of edeck probabilities that were parsed, search for a matching bdeck track.
For each edeck/bdeck pair, write paired edeck probabilities and matching bdeck values to output PROBRIRW lines.

23.2.2. tc_pairs configuration file

The default configuration file for the TC-Pairs tool named TCPairsConfig_default can be found in the installed share/met/config/ directory. Users are encouraged to copy these default files before modifying their contents. The contents of the configuration file are described in the subsections below.

The contents of the tc_pairs configuration file are described below.

storm_id     = [];
basin        = [];
cyclone      = [];
storm_name   = [];
init_beg     = "";
init_end     = "";
init_inc     = [];
init_exc     = [];
valid_beg    = "";
valid_end    = "";
valid_inc    = [];
valid_exc    = [];
init_hour    = [];
init_mask    = [];
lead_req     = [];
valid_mask   = [];
match_points = TRUE;
version      = "VN.N";

The configuration options listed above are common to multiple MET tools and are described in Section 5.

model = [ "DSHP", "LGEM", "HWRF" ];

The model variable contains a list of comma-separated models to be used. Each model is identified with an ATCF TECH ID (normally four unique characters). This model identifier should match the model column in the ATCF format input file. An empty list indicates that all models in the input file(s) will be processed. Note that when reading ATCF track data, all instances of the string AVN are automatically replaced with GFS.

write_valid = [ "20101231_06" ];

The write_valid entry specifies a comma-separated list of valid time strings in YYYYMMDD[_HH[MMSS]] format for which output should be written. An empty list indicates that data for all valid times should be written. This option may be useful when verifying track forecasts in realtime. If evaluating performance for a single valid time, this option can limit the output to that time and skip output for earlier track points.

check_dup = FALSE;

The check_dup flag expects either TRUE and FALSE, indicating whether the code should check for duplicate ATCF lines when building tracks. Setting check_dup to TRUE will check for duplicated lines, and produce output information regarding the duplicate. Any duplicated ATCF line will not be processed in the tc_pairs output. Setting check_dup to FALSE, will still exclude tracks that decrease with time, and will overwrite repeated lines, but specific duplicate log information will not be output. Setting check_dup to FALSE will make parsing the track quicker.

interp12 = NONE;

The interp12 flag expects the entry NONE, FILL, or REPLACE, indicating whether special processing should be performed for interpolated forecasts. The NONE option indicates no changes are made to the interpolated forecasts. The FILL and REPLACE (default) options determine when the 12-hour interpolated forecast (normally indicated with a “2” or “3” at the end of the ATCF ID) will be renamed with the 6-hour interpolated ATCF ID (normally indicated with the letter “I” at the end of the ATCF ID). The FILL option renames the 12-hour interpolated forecasts with the 6-hour interpolated forecast ATCF ID only when the 6-hour interpolated forecasts is missing (in the case of a 6-hour interpolated forecast which only occurs every 12-hours (e.g. EMXI, EGRI), the 6-hour interpolated forecasts will be “filled in” with the 12-hour interpolated forecasts in order to provide a record every 6-hours). The REPLACE option renames all 12-hour interpolated forecasts with the 6-hour interpolated forecasts ATCF ID regardless of whether the 6-hour interpolated forecast exists. The original 12-hour ATCF ID will also be retained in the output file (all modified ATCF entries will appear at the end of the TC-Pairs output file). This functionality expects both the 12-hour and 6-hour early (interpolated) ATCF IDs to be listed in the model field.

consensus = [
   {
      name     = "CON1";
      members  = [ "MOD1", "MOD2", "MOD3" ];
      required = [   true,  false, false  ];
      min_req  = 2;
   }
];

The consensus field allows the user to generate a user-defined consensus forecasts from any number of models. All models used in the consensus forecast need to be included in the model field (first entry in TCPairsConfig_default). The name field is the desired consensus model name. The members field is a comma-separated list of model IDs that make up the members of the consensus. The required field is a comma-separated list of true/false values associated with each consensus member. If a member is designated as true, the member is required to be present in order for the consensus to be generated. If a member is false, the consensus will be generated regardless of whether the member is present. The length of the required array must be the same length as the members array. The min_req field is the number of members required in order for the consensus to be computed. The required and min_req field options are applied at each forecast lead time. If any member of the consensus has a non-valid position or intensity value, the consensus for that valid time will not be generated.

lag_time = [ "06", "12" ];

The lag_time field is a comma-separated list of forecast lag times to be used in HH[MMSS] format. For each adeck track identified, a lagged track will be derived for each entry. In the tc_pairs output, the original adeck record will be retained, with the lagged entry listed as the adeck name with “_LAG_HH” appended.

best_technique = [ "BEST" ];
best_baseline  = [ "BCLP", "BCD5", "BCLA" ];

The best_technique field specifies a comma-separated list of technique name(s) to be interpreted as BEST track data. The default value (BEST) should suffice for most users. The best_baseline field specifies a comma-separated list of CLIPER/SHIFOR baseline forecasts to be derived from the best tracks. Specifying multiple best_technique values and at least one best_baseline value results in a warning since the derived baseline forecast technique names may be used multiple times.

The following are valid baselines for the best_baseline field:

BTCLIP: Neumann original 3-day CLIPER in best track mode. Used for the Atlantic basin only. Specify model as BCLP.

BTCLIP5: 5-day CLIPER (Aberson, 1998)/SHIFOR (DeMaria and Knaff, 2003) in best track mode for either Atlantic or eastern North Pacific basins. Specify model as BCS5.

BTCLIPA: Sim Aberson’s recreation of Neumann original 3-day CLIPER in best-track mode. Used for Atlantic basin only. Specify model as BCLA.

oper_technique = [ "CARQ" ];
oper_baseline  = [ "OCLP", "OCS5", "OCD5" ];

The oper_technique field specifies a comma-separated list of technique name(s) to be interpreted as operational track data. The default value (CARQ) should suffice for most users. The oper_baseline field specifies a comma-separated list of CLIPER/SHIFOR baseline forecasts to be derived from the operational tracks. Specifying multiple oper_technique values and at least one oper_baseline value results in a warning since the derived baseline forecast technique names may be used multiple times.

The following are valid baselines for the oper_baseline field:

OCLIP: Merrill modified (operational) 3-day CLIPER run in operational mode. Used for Atlantic basin only. Specify model as OCLP.

OCLIP5: 5-day CLIPER (Aberson, 1998)/ SHIFOR (DeMaria and Knaff, 2003) in operational mode, rerun using CARQ data. Specify model as OCS5.

OCLIPD5: 5-day CLIPER (Aberson, 1998)/ DECAY-SHIFOR (DeMaria and Knaff, 2003). Specify model as OCD5.

anly_track = BDECK;

Analysis tracks consist of multiple track points with a lead time of zero for the same storm. An analysis track may be generated by running model analysis fields through a tracking algorithm. The anly_track field specifies which datasets should be searched for analysis track data and may be set to NONE, ADECK, BDECK, or BOTH. Use BOTH to create pairs using two different analysis tracks.

match_points = TRUE;

The match_points field specifies whether only those track points common to both the adeck and bdeck tracks should be written out. If match_points is selected as FALSE, the union of the adeck and bdeck tracks will be written out, with “NA” listed for unmatched data.

dland_file = "MET_BASE/tc_data/dland_global_tenth_degree.nc";

The dland_file string specifies the path of the NetCDF format file (default file: dland_global_tenth_degree.nc) to be used for the distance to land check in the tc_pairs code. This file is generated using tc_dland (default file provided in installed share/met/tc_data directory).

watch_warn = {
    file_name   = "MET_BASE/tc_data/wwpts_us.txt";
    time_offset = -14400;
 }

The watch_warn field specifies the file name and time applied offset to the watch_warn flag. The file_name string specifies the path of the watch/warning file to be used to determine when a watch or warning is in effect during the forecast initialization and verification times. The default file is named wwpts_us.txt, which is found in the installed share/met/tc_data/ directory within the MET build. The time_offset string is the time window (in seconds) assigned to the watch/warning. Due to the non-uniform time watches and warnings are issued, a time window is assigned for which watch/warnings are included in the verification for each valid time. The default watch/warn file is static, and therefore may not include warned storms beyond the current MET code release date; therefore users may wish to create a post in the METplus GitHub Discussions Forum in order to obtain the most recent watch/warning file if the static file does not contain storms of interest.

basin_map = [
   { key = "SI"; val = "SH"; },
   { key = "SP"; val = "SH"; },
   { key = "AU"; val = "SH"; },
   { key = "AB"; val = "IO"; },
   { key = "BB"; val = "IO"; }
];

The basin_map entry defines a mapping of input names to output values. Whenever the basin string matches “key” in the input ATCF files, it is replaced with “val”. This map can be used to modify basin names to make them consistent across the ATCF input files.

Many global modeling centers use ATCF basin identifiers based on region (e.g., ‘SP’ for South Pacific Ocean, etc.), however the best track data provided by the Joint Typhoon Warning Center (JTWC) use just one basin identifier ‘SH’ for all of the Southern Hemisphere basins. Additionally, some modeling centers may report basin identifiers separately for the Bay of Bengal (BB) and Arabian Sea (AB) whereas JTWC uses ‘IO’.

The basin mapping allows MET to map the basin identifiers to the expected values without having to modify your data. For example, the first entry in the list below indicates that any data entries for ‘SI’ will be matched as if they were ‘SH’. In this manner, all verification results for the Southern Hemisphere basins will be reported together as one basin.

An empty list indicates that no basin mapping should be used. Use this if you are not using JTWC best tracks and you would like to match explicitly by basin or sub-basin. Note that if your model data and best track do not use the same basin identifier conventions, using an empty list for this parameter will result in missed matches.

23.2.3. tc_pairs output

TC-Pairs produces output in TCST format. The default output file name can be overwritten using the -out file argument in the usage statement. The TCST file output from TC-Pairs may be used as input into the TC-Stat tool. The header column in the TC-Pairs output is described in Table 23.1.

Table 23.1 Header information for TC-Pairs TCST output.
		HEADER
Column Number	Header Column Name	Description
1	VERSION	Version number
2	AMODEL	User provided text string designating model name
3	BMODEL	User provided text string designating model name
4	STORM_ID	BBCCYYYY designation of storm
5	BASIN	Basin (BB in STORM_ID)
6	CYCLONE	Cyclone number (CC in STORM_ID)
7	STORM_NAME	Name of Storm
8	INIT	Initialization time of forecast in YYYYMMDD_HHMMSS format.
9	LEAD	Forecast lead time in HHMMSS format.
10	VALID	Forecast valid time in YYYYMMDD_HHMMSS format.
11	INIT_MASK	Initialization time masking grid applied
12	VALID_MASK	Valid time masking grid applied
13	LINE_TYPE	Output line type (TCMPR or PROBRIRW)

Table 23.2 Format information for TCMPR (Tropical Cyclone Matched Pairs) output line type.
		TCMPR OUTPUT FORMAT
Column Number	Header Column Name	Description
13	TCMPR	Tropical Cyclone Matched Pair line type
14	TOTAL	Total number of pairs in track
15	INDEX	Index of the current track pair
16	LEVEL	Level of storm classification
17	WATCH_WARN	HU or TS watch or warning in effect
18	INITIALS	Forecaster initials
19	ALAT	Latitude position of adeck model
20	ALON	Longitude position of adeck model
21	BLAT	Latitude position of bdeck model
22	BLON	Longitude position of bdeck model
23	TK_ERR	Track error of adeck relative to bdeck (nm)
24	X_ERR	X component position error (nm)
25	Y_ERR	Y component position error (nm)
26	ALTK_ERR	Along track error (nm)
27	CRTK_ERR	Cross track error (nm)
28	ADLAND	adeck distance to land (nm)
29	BDLAND	bdeck distance to land (nm)
30	AMSLP	adeck mean sea level pressure
31	BMSLP	bdeck mean sea level pressure
32	AMAX_WIND	adeck maximum wind speed
33	BMAX_WIND	bdeck maximum wind speed
34, 35	A/BAL_WIND_34	a/bdeck 34-knot radius winds in full circle
36, 37	A/BNE_WIND_34	a/bdeck 34-knot radius winds in NE quadrant
38, 39	A/BSE_WIND_34	a/bdeck 34-knot radius winds in SE quadrant
40, 41	A/BSW_WIND_34	a/bdeck 34-knot radius winds in SW quadrant
42, 43	A/BNW_WIND_34	a/bdeck 34-knot radius winds in NW quadrant
44, 45	A/BAL_WIND_50	a/bdeck 50-knot radius winds in full circle
46, 47	A/BNE_WIND_50	a/bdeck 50-knot radius winds in NE quadrant
48, 49	A/BSE_WIND_50	a/bdeck 50-knot radius winds in SE quadrant
50, 51	A/BSW_WIND_50	a/bdeck 50-knot radius winds in SW quadrant
52, 53	A/BNW_WIND_50	a/bdeck 50-knot radius winds in NW quadrant
54, 55	A/BAL_WIND_64	a/bdeck 64-knot radius winds in full circle
56, 57	A/BNE_WIND_64	a/bdeck 64-knot radius winds in NE quadrant
58, 59	A/BSE_WIND_64	a/bdeck 64-knot radius winds in SE quadrant
60, 61	A/BSW_WIND_64	a/bdeck 64-knot radius winds in SW quadrant
62, 63	A/BNW_WIND_64	a/bdeck 64-knot radius winds in NW quadrant
64, 65	A/BRADP	pressure in millibars of the last closed isobar, 900 - 1050 mb
66, 67	A/BRRP	radius of the last closed isobar in nm, 0 - 9999 nm
68, 69	A/BMRD	radius of max winds, 0 - 999 nm
70, 71	A/BGUSTS	gusts, 0 through 995 kts
72, 73	A/BEYE	eye diameter, 0 through 999 nm
74, 75	A/BDIR	storm direction in compass coordinates, 0 - 359 degrees
76, 77	A/BSPEED	storm speed, 0 - 999 kts
78, 79	A/BDEPTH	system depth, D-deep, M-medium, S-shallow, X-unknown

Table 23.3 Format information for PROBRIRW (Probability of Rapid Intensification/Weakening) output line type.
		PROBRIRW OUTPUT FORMAT
Column Number	Header Column Name	Description
13	PROBRIRW	Probability of Rapid Intensification/Weakening line type
14	ALAT	Latitude position of edeck model
15	ALON	Longitude position of edeck model
16	BLAT	Latitude position of bdeck model
17	BLON	Longitude position of bdeck model
18	INITIALS	Forecaster initials
19	TK_ERR	Track error of adeck relative to bdeck (nm)
20	X_ERR	X component position error (nm)
21	Y_ERR	Y component position error (nm)
22	ADLAND	adeck distance to land (nm)
23	BDLAND	bdeck distance to land (nm)
24	RI_BEG	Start of RI time window in HH format
25	RI_END	End of RI time window in HH format
26	RI_WINDOW	Width of RI time window in HH format
27	AWIND_END	Forecast maximum wind speed at RI end
28	BWIND_BEG	Best track maximum wind speed at RI begin
29	BWIND_END	Best track maximum wind speed at RI end
30	BDELTA	Exact Best track wind speed change in RI window
31	BDELTA_MAX	Maximum Best track wind speed change in RI window
32	BLEVEL_BEG	Best track storm classification at RI begin
33	BLEVEL_END	Best track storm classification at RI end
34	N_THRESH	Number of probability thresholds
35	THRESH_i	The ith probability threshold value (repeated)
36	PROB_i	The ith probability value (repeated)