Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
imtringued
4 months ago
|
parent
|
context
|
favorite
| on:
Nvidia DGX Spark
Training is performed in parallel with batching and is more flops heavy. I don't have an intuition on how memory bandwidth intensive updating the parameters is. It shouldn't be much worse than doing a single forward pass though.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: