Iterative.ai (Series A) | REMOTE, WORLDWIDE | FULL-TIME | OPEN-SOURCE
Developer tools for ML engineers. We need people who are passionate about building infrastructure that help ML teams manage data (large scale), make sane reproducible workflow, track models. We are building infrastructure to deal with datasets of the 5N LAION scale. Our flagship open source tool- DVC.org (12K+ stars), SaaS product - studio.iterative.ai.
We are looking for senior (SaaS and open source) engineers:
DVC has `dvc exp` that doesn't require creating commits or branches. It's utilizing git custom references (technical details [1]). And it can be visualized in CLI or VS Code.
Thanks! I've been using DVC solely for tracking data, and had basically ignored all of its other features.
I'll have to take a look at this. Most/all of my projects use small or medium scale data, and I consider DVC indispensable for tracking data therein. I wouldn't mind having a good system for tracking experiment results, although admittedly I find that a spreadsheet or text file does a pretty good job for what I need to do.
One of the maintainers here. I published this link tbh to specifically emphasize the experiment management aspect of DVC. Historically because of its name (Data Version Control) users perceived it as a pure replacement for LFS scenarios, while in reality it always had pipelines, metrics, etc, etc.
I 100% agree that managing large datasets by moving them around is not practical, and definitely not in LFS/DVC-style. There should be a level of indirection if reproducibility is needed (pointers are versioned to files, not the data directly, data should be staying in the cloud).
Here, I would love to one more time mention some other cool features that DVC has. E.g. `dvc exp` set of commands where it is creating custom git refs to snapshot experiments, of DVCLive logger that helps capturing metrics, plots, etc. And also VS Code extension [1] that provides quite cool experience for experiments workflow inside VS Code.
Point here is that for DVC the ability to capture some large files and directories (that do not fit into Git) was always a low level mechanism to support higher level scenarios (e.g. you need to save a model somewhere as an output of an experiment).
> I 100% agree that managing large datasets by moving them around is not practical, and definitely not in LFS/DVC-style. There should be a level of indirection if reproducibility is needed (pointers are versioned to files, not the data directly, data should be staying in the cloud).
I am not sure I understand that correctly. Are you saying that LFS/DVC manage the data suboptimally because they do not use some kind of pointer?
I only have some experience with DataLad[0], not with DVC or LFS. DataLad is built on git-annex, which does a pointer indirection through symlinks or pointer files in git. You basically manage the directory structure in git and can "get" and "drop" specific files as you need them. git-annex keeps track of where (e.g. on what (remote) system, which could be anything from a http server over s3 to a nextcloud via webdav and more) the data is and how it can be fetched. I always thought DVC did something similar.
Iterative.ai (Series A) | REMOTE, WORLDWIDE | FULL-TIME | OPEN-SOURCE
Developer tools for ML engineers. We need people who are passionate about building infrastructure that help ML teams manage data (large scale), make sane reproducible workflow, track models. We are building infrastructure to deal with datasets of the 5N LAION scale. Our flagship open source tool- DVC.org (12K+ stars), SaaS product - studio.iterative.ai.
We are looking for senior (SaaS and open source) engineers:
Iterative.ai (Series A) | REMOTE, WORLDWIDE | FULL-TIME | OPEN-SOURCE
Developer tools for ML engineers. We need people who are passionate about building infrastructure that help ML teams manage data (large scale), make sane reproducible workflow, track models. We are building infrastructure to deal with datasets of the 5N LAION scale. Our flagship open source tool- DVC.org (11K+ stars), SaaS product - studio.iterative.ai.
We are looking for senior (SaaS and open source) engineers, also for product engineers:
> Your project gets to have top tier programmers as maintainers, make all the nifty features, fix all bugs etc. for free or at minimum cost
In most case in these companies it's maintained and created at the expense of the company. I would expect that 90-99%% of the code, product development (talk to users, understand needs, etc), even devrel (marketing) (be on every conference to convince people to use it, that codification is good)- it's a lot of money, resources, and effort.
> New startup, cool idea, not much budget to hire engineers
In this case a few folks build it from the ground initially (most likely founders) I think. Please, let's not forget about this.
The important thing here is to discuss why did they do this (I meant relicensing it). Most likely- trying t create a moat from a lot of other VC-funded companies that play in this space? Not sure. It would be great to know their exact concern.
Iterative.ai (Series A) | REMOTE, WORLDWIDE | FULL-TIME | OPEN-SOURCE
Developer tools for ML engineers (MLOps). We need people who are passionate about building infrastructure that standardizes how ML teams manage data and models. We are building infrastructure to deal with datasets of the 5N LAION scale. Our flagship open source tool- DVC.org (11K+ stars), SaaS product - studio.iterative.ai.
We are looking for tech leads (SaaS and open source DVC.org) engineer:
Iterative.ai (Series A) | REMOTE, WORLDWIDE | FULL-TIME | OPEN-SOURCE
Developer tools for ML engineers (MLOps). We need people who are passionate about building infrastructure that standardizes how ML teams manage data and models. We are building infrastructure to deal with datasets of the 5N LAION scale. Our flagship open source tool- DVC.org (11K+ stars), SaaS product - studio.iterative.ai.
We are looking for tech leads (SaaS and open source DVC.org) engineer:
Iterative.ai (Series A) | REMOTE, WORLDWIDE | FULL-TIME | OPEN-SOURCE
Developer tools for ML engineers (MLOps). We need people who are passionate about building infrastructure that standardizes how ML teams manage data and models. We are building infrastructure to deal with datasets of the 5N LAION scale. Our flagship open source tool- DVC.org (11K+ stars), SaaS product - studio.iterative.ai.
We are looking for a senior Python (backend or systems programming) engineer:
Also senior ML Solutions engineer (like internal senior ML / MLOps consultant or a customer facing engineer who can code and enjoys helping other people every day with their needs). Remote is okay. Excellent communication skills, english:
Apply directly ivan AT iterative.ai. For a successful, personal (no agencies) recommendation we do 10K bonus.
More about us here: