I think you’re referring to the Deepseek-R1 branch of reasoning models, where a small amount of SFT reasoning traces is used as a seed. But for non-“reasoning” models, SFT is very important and definitely imparts enhanced capabilities and reliability.