Hacker Newsnew | past | comments | ask | show | jobs | submit | ashirviskas's commentslogin

Oh, hi! Love your work! I'll be rooting for you that day (though most likely after that day, unless you're a superhuman).

Wanted to save up a few tokens when passing data to LLMs and did not like anything on the market, so I made minemizer.

Minemizer is a data formatter that produces csv-like output, but supports nested and sparse data, is human readable and super simple.

It produces even less tokens than csv for flat data, due to most tokenizers better tokenizing full words that contain a space before the word, and leads to less fragmentation.

There are many cool things I discovered while running tons of testing and benchmarking, but it's getting late here.

Code, benchmarks, tokenization examples and everything else can be found in the repo, but it is still very WIP: https://github.com/ashirviskas/minemizer

Or here: https://ashirviskas.github.io

EDIT: Ignore latency timings and token counts in "LLM Accuracy Summary" in benchmarks as different size datasets were used to generate accuacy numbers while I was running tons of experiments. For accurate compression numbers see compression benchmarks results. Or each benchmark one by one.

I will eventually fix all the benchmark numbers to be representative.


Why the name Minemizer instead of something like Minimizer?

I did the same before I started using devcontainers, they are super useful

I'm now super interested in that video, what was it like?

from 6.7 to 42 would be my guess.

But being serious, I personally have not seen a degraded e-ink display.


I've seen a couple minor, older-hardware cases when they've been powered off with something on the screen for years, but that's about it. in theory they can also "burn in" by not clearing the display occasionally (afaict it has something to do with accumulating charge) but most or all of those should clear eventually after cycling a bunch (afaict, though it can definitely persist to a minor degree for dozens of full refresh cycles). extreme ghosting, basically.

they seem pretty durable to me.


Even in a lot of direct sunlight or leaving it out in the heat?

No clue, I've only seen either fully working or physically broken ones. Oldest one I have still has mini-usb and no degradation can be seen. Though I only rescued it this year, it seems like it was used pretty roughly.

Religiously updating my TV? It has been patched since spring, someone clicking by accident "yes" for the update notice that appears randomly on the middle of the screen in the past 9 months would ruin it. I was religously *not* updating my TV and it still got too new software for the exploit :')

My tv has never nor will ever touch the internet so problem solved re: updates.

My LG TV is a little over a year old now and I refuse to allow it to connect to the Internet, ever, so I guess RootMyTV would work fine for me?

It's totally possible! Check it at https://cani.rootmy.tv/. There are multiple exploits, search around

Yep and I find that this really worsens LLM performance. For example `Ben,Alice` would be tokenized as `Ben|,A|lice`. And having to connect `lice` to the name `Alice` does not make it any easier for LLMs. However, formatting it as `Ben, Alice` tokenizes it as `Ben|,| Alice`. I found it kind of useful to improve performance by just formatting the data a bit differently.

I actually just started working on a data formatter that applies principles like these to drastically reduce the amount of tokens without decreasing the performance, like other formats do (looking at you, tson).


Most of the RAM may not be critical enough to crash the whole system. Just some random app you have open or a browser tab. So even if it is true, most bit flips should not crash a system.

Yes, I know that. So why aren’t my applications or tabs crashing at least once a day?

No clue, but you can most likely simulate this for a process on linux. Actually, I might just try that and see what happens.

Maybe it's the generation gap? As a kid, I loved that I could "play" on the PC with Clippy, which mostly consisted on trying it to appear and make it do something. I get that if you were trying to get some work done, it might have been annoying.


Does it have a terminal assistant that I have not heard of? Otherwise, the parent asks about an assistant that is able to run various tools and stuff, not just talk.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: