I assume HLL=Higher Level Language? Mojo definitely avails lower-level facilities than Python. Chris has even described Mojo as "syntactic sugar over MLIR". (For example, the native integer type is defined in library code as a struct).
> Whether it's 1, 2, or N kernels is irrelevant.
Not sure what you mean here. But new kernels are written all the time (flash-attn is a great example). One can't do that in plain Python. E.g., flash-attn was originally written in C++ CUDA, and now in Triton.
Well, Mojo hasn't been released so we can't precisely say what it can and can't do. If it can emit CUDA code then it does it by transpiling Mojo into CUDA. And there is no reason why Python can't also be transpiled to CUDA.
What I mean here is that DNN code is written on a much higher level than kernels. They are just building blocks you use to instantiate your dataflow.
> Whether it's 1, 2, or N kernels is irrelevant.
Not sure what you mean here. But new kernels are written all the time (flash-attn is a great example). One can't do that in plain Python. E.g., flash-attn was originally written in C++ CUDA, and now in Triton.