I teach a lot of Python and data science (pandas, Polars, scikit learn, XGBoost...) in Jupyter.
(I also teach a bunch of software engineering best practices to folks who claim they don't want to be "software engineers".)
My experience is that with training, many of these issues go away. Just saw this again firsthand at a client last week.
My current take is that many focus on making the code newbie friendly. I think they should level up and start writing code for a professional audience, using software techniques that a professional would use. Newbies won't like this code.
I get a lot of flak on social when I post this from naysayers, but the overwhelming response from my students and those who have read my books indicates there might be something to this.
I strongly agree with your comment. I'm a software engineer and I was surprised to find out about the practices of some data teams.
I do believe we should bring software best practices to the data world, not only regarding code design but regarding infrastructure and tooling too (like versioning everything).
Still, I get where they're coming from. A software engineer would also be frustrated if they had to learn everything a data scientist knows (probably even more).
I think the tooling itself can solve this issue by encouraging best practices though.
I have a question I ask people I interview now for data science jobs -- can you run a python hello world from the command line? A non-trivial number of people who say they have multiple years of data science + python experience do not know how to do this!
People need to know how to write functions to write professional software, not just put a soup together of notebook cells.
Disagree with Matt even that they won't like this code -- it is just learning basic stuff like file structure and python environments, see how I approach it in the first two chapters running stuff from REPL here https://crimede-coder.com/blogposts/2023/EarlyReleasePython.
The people I train can certainly use the command line to run Python, create virtual environments, and install packages. (But many of them have never used it previously.)
I teach a lot of Python and data science (pandas, Polars, scikit learn, XGBoost...) in Jupyter.
(I also teach a bunch of software engineering best practices to folks who claim they don't want to be "software engineers".)
My experience is that with training, many of these issues go away. Just saw this again firsthand at a client last week.
My current take is that many focus on making the code newbie friendly. I think they should level up and start writing code for a professional audience, using software techniques that a professional would use. Newbies won't like this code.
I get a lot of flak on social when I post this from naysayers, but the overwhelming response from my students and those who have read my books indicates there might be something to this.