The use of textual data to enhance forecasting performance isn’t new.
In financial markets, text data and economic news often play a critical role in producing accurate forecasts — sometimes even more so than numeric historical data.
Recently, many large language models (LLMs) have been fine-tuned on Fedspeak and news sentiment analysis. These models rely solely on text data to estimate market sentiment.
An intriguing new paper, “Context is Key”[1], explores a different approach: how much does forecasting accuracy improve by combining numerical and external text data?
The paper introduces several key contributions:
- Context-is-Key (CiK) Dataset: A dataset of forecasting tasks that pairs numerical data with corresponding textual information.
- Region of Interest CRPS (RCRPS): A modified CRPS metric designed for evaluating probabilistic forecasts, focusing on context-sensitive windows.
- Context-is-Key Benchmark: A new evaluation framework demonstrating how external textual information benefits popular time-series models.