In this paper, we introduce the IFS, a metric for instruction following. The metric detects language models' ability to follow instructions. First, IFS can distinguish between base and instruct models. We benchmark public bases and models, showing they're Well-formatted responses to partial and full sentences are effective. The metric can be used as a measure between model classes. We compute IFS for Supervised early stopping. Follow instructions early and finetune later. As an example, We show model predictions are objective. We show that semantic changes can be caused by auxiliary metric ObjecQA. When IFS decomposes, it steepens. IFS and semantic factors start a controllable instruct trend. Tuning and querying opens minimal instruct interfaces Foundation models are short-lived.
Updated 5 months ago