Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

DVC is great for medium-scale projects in small teams, but that's where I'd stop with it. It only really makes sense for work that you're doing on your own machine, or an old-school Linux server type of setup, not something you'd use for modern-day ML work in a cloud environment.

Also I always thought the idea of using Git branches to track experiments was a bad idea. I would never want to only have one experiment "active" at a time. Even if I'm only running one process at a time, I still want to be able to look at outputs and such all side-by-side. Maybe there's some magic tooling they created that makes it workable.



FYI, you can use git worktrees [1] to work on multiple branches simultaneously

[1] https://git-scm.com/docs/git-worktree


Yeah, I know and love that feature for software projects, especially if I need to switch over to a bugfix while I'm deep in a topic branch.

But for a data project it would be a big pain to have separate worktrees just to work around what IMO is a usage anti-pattern to begin with!


DVC has `dvc exp` that doesn't require creating commits or branches. It's utilizing git custom references (technical details [1]). And it can be visualized in CLI or VS Code.

[1] https://iterative.ai/blog/experiment-refs

[2] https://marketplace.visualstudio.com/items?itemName=Iterativ...


Thanks! I've been using DVC solely for tracking data, and had basically ignored all of its other features.

I'll have to take a look at this. Most/all of my projects use small or medium scale data, and I consider DVC indispensable for tracking data therein. I wouldn't mind having a good system for tracking experiment results, although admittedly I find that a spreadsheet or text file does a pretty good job for what I need to do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: