VectorByte Methods Training: Dengue Forecasting Challenge

Authors
Affiliation

Alicia Arneson

Virginia Tech and VectorByte

Virginia Tech and VectorByte

Published

June 3, 2025

Overview

Using the skills you have gained in this year’s training, we are going to build some real forecasting models! We will focus on one part of a larger forecasting challenge called the Infodengue-Mosqlimate Dengue Challenge (IMDC). It aims to help make improved predictions of weekly dengue cases (incidence) in Brazil.

The IMDC involves three validation tests and one forecasting task, each focusing on predicting weekly dengue cases by state (UF) for specific epidemiological seasons:

Validation Test 1: Predict cases for the 2022–2023 season using data from 2010 to mid-2022.

Validation Test 2: Predict cases for the 2023–2024 season using data up to mid-2023.

Validation Test 3: Predict cases for the 2024–2025 season using data up to mid-2024.

Forecast Task: Predict cases for the 2025–2026 season using data up to mid-2025.

Each model must provide median estimates along with 50%, 80%, 90%, and 95% predictive intervals for all 27 Brazilian federative units.

See this link for the full challenge details: https://sprint.mosqlimate.org/2025/

For time’s sake, we will focus on making forecasts for just a few of the 27 states, and we will work with part of validation test 3.

Instructions

Using the modeling skills you have earned over the course of the workshop, you are going to produce 12 weekly forecasts of Dengue incidence for 5 separate states in Brazil. You can use any of the models we discussed in the workshop, or others you may have experience with. You have incidence data and several covariates ranging from 2010 to mid-2024 in the data set called dengueForecasting.csv.

The code file called dengueForecastingChallenge.R contains the infrastructure you need to quickly read in the data set and export a .csv file of the forecasts you make in a standard file. Once you have generated your forecasts, you will submit them here: Forecast Submission Form. We will score the forecasts using RMSE and a winner will be crowned!

Easier version

If that task sounds overwhelming, aim to produce those 12 weekly forecasts just for 1 state of your choice.

Some Tips

  1. Remember - if you are using any covariates you will also have to produce forecasting models for each of those to have data available during the forecasting horizon, because in real life, we don’t know what the future holds - we have to make educated assumptions using models!

  2. Don’t forget to split your data into a training and test set to see how your model does on unseen data after you fit it. It is also good to test your covariate forecasts in this way.

  3. Your training and test set selection should be representative of your forecasting goal. Since we want 12 weeks of forecasts, consider holding back the last 12 weeks of data as a test set (there is a convenient column in the data to make this split easy if you choose to do it this way).

  4. Before making your forecasts, remember to go back and fit your chosen model to the whole data set to give it access to the most relevant data.

Data Dictionary

Variable Name Definition
uf The Brazilian state code
date The date of the Sunday that falls on the associated epiweek
epiweek A continous numerical value associated with the weekly timescale
cases The total probable cases within the given state for the given epiweek
minTemp The minimum temperature across the state in the given epiweek
medTemp The median temperature across the state in the given epiweek
maxTemp The maximum temperature across the state in the given epiweek
minPrecip The minimum daily precipitation in mm across the state in the given epiweek
medPrecip The median daily precipitation in mm across the state in the given epiweek
maxPrecip The maximum daily precipitation in mm across the state in the given epiweek
minPressure The minimum daily air pressure at sea level across the state in the given epiweek
medPressure The median daily air pressure at sea level across the state in the given epiweek
maxPressure The maximum daily air pressure at sea level across the state in the given epiweek
minRH The minimum daily relative humidity across the state in the given epiweek
medRH The median daily relative humidity across the state in the given epiweek
maxRH The maximum daily relative humidity across the state in the given epiweek
meanRainyDays The average number of rainy days across the state during the given epiweek
meanThermalRange The average thermal range across the state during the given epiweek
enso The average el niño/ southern oscillation value across the state during the given epiweek
iod The average indian ocean dipole value across the state during the given epiweek
pdo The average pacific decadal oscillation value across the state during the given epiweek
population The total estimated population within the given state during the given epiweek (the same value within the same year)
test A binary variable that tells you whether the observation belongs in the test set if you choose to hold back the last 12 weeks of data (up to you). “Y” indicates that the observation should be part of that test set.

Citation

BibTeX citation:
@online{arneson2025,
  author = {Arneson, Alicia and Johnson, Leah R.},
  title = {VectorByte {Methods} {Training:} {Dengue} {Forecasting}
    {Challenge}},
  date = {2025-06-03},
  langid = {en}
}
For attribution, please cite this work as:
Arneson, Alicia, and Leah R. Johnson. 2025. “VectorByte Methods Training: Dengue Forecasting Challenge.” June 3, 2025.