Training is performed in parallel with batching and is more flops heavy. I don't...

		imtringued 4 months ago \| parent \| context \| favorite \| on: Nvidia DGX Spark Training is performed in parallel with batching and is more flops heavy. I don't have an intuition on how memory bandwidth intensive updating the parameters is. It shouldn't be much worse than doing a single forward pass though.