I worry that 7-Zip is going to lose relevance because lack of zstd support. zlib's performance is intolerable for large files and zlib-ng's SIMD implementation only helps here a bit. Which is a shame, because 7-Zip is a pretty amazing container format, especially with its encryption and file splitting capabilities.
I use ZSTD a ton in my programming work where efficiency matters.
But for sharing files with other people, ZIP is still king. Even 7z or RAR is niche. Everyone can open a ZIP file, and they don't really care if the file is a few MBs bigger.
Which reveals that "everyone can open a ZIP file" is a lie. Sure, everyone can open a ZIP file, as long as that file uses only a limited subset of the ZIP format features. Which is why formats which use ZIP as a base (Java JAR files, OpenDocument files, new Office files) standardize such a subset; but for general-purpose ZIP files, there's no such standard.
(I have encountered such ZIP files in the wild; "unzip" can't decompress them, though p7zip worked for these particular ZIP files.)
> On 15 June 2020, Zstandard was implemented in version 6.3.8 of the zip file format with codec number 93, deprecating the previous codec number of 20 as it was implemented in version 6.3.7, released on 1 June.[36][37]
> The `zip` command on Ubuntu is 6.0, which was released in 2009 and does not support zstd. It does support bzip2 though!
You probably mean the "unzip" command, which https://infozip.sourceforge.net/UnZip.html lists as 6.0 being the latest, released on 20 April 2009. Relevant to this discussion, new in that release are support for 64-bit file sizes, bzip2 compression method, and UTF-8 filenames.
The "zip" command is listed at https://infozip.sourceforge.net/Zip.html as 3.0 being the latest, released on 7 July 2008. New in that release are also support for 64-bit file sizes, bzip2 compression method, and UTF-8 filenames.
It would be great if both (or at least unzip) were updated to also support LZMA/XZ/ZSTD as compression methods, but given that there have been no new releases for over fifteen years, I'm not too hopeful.
there are A LOT of zip files using lzma in the wild.
also, how about people learn to use updated software? should newer video compression technologies not be allowed in mkv/mp4.
if you cant open it, well.. then stop using 90ies winzip
No. You can't get people to use updated software. You can't get a number of people to update past windows 7. This has been and will likely remain a persistent issue, and it's sure not one you're going to fix. All it will do is limit your ability to work with people. This isn't a hill on which you should die.
im okay with that. That being said, I have not had a single issue delivering zip files with lzma, and i KNOW that I have gotten MANY from the random sources.
I would also expect people to be able to decode h265 in an mp4 file.
Your proposal seems, to word it bluntly, retarded. You would have mp4 frozen for h264 for ETERNITY, and then invent a new format as replacement? or you would just say "god has bestowed upon the world h264, and it shall be the LAST CODEC EVER!".
get with the program. Things change, you cannot expect to be forwards compatible for ever. Sometimes people have to switch to newer versions of software.
If your customer is stuck in the 90s because his 90s technology works perfectly fine and he has no intention to fix things that are not broken. Then deliver stuff that is compatible with 90s technology. He will be happy, will continue to work with you and you will make money.
If your customer is using the latest technologies and values size efficiency, then use the latest codecs.
I usually default to being conservative, because those who are up to date usually don't have a problem with bigger files, but those who are not are going to have a problem with recent formats. Maybe overly so, but that's my experience with working with big companies with decades long lifecycles.
Your job is not to lecture your customer, unless he asked for it. And if he asked for it, he probably expects better arguments that "update your software, idiot". Your job is to deliver what works for him. Now, of course, it is your right to be picky and leave money on the table, I will be happy to go after you and take it.
not everything is a client<->customer relationship.
Professionally I can definitely support old stuff. It costs extra most often.
Conservative doesnt have to be stuck. I am not recommending we send h266 to everyone now, but h265 is well supported, as is AV1.
lzma support in zip has been widely supported for many years at this point. I am going to be choosing my "sane defaults", and if someone has a problem with that, they can simply do what they need to do to open it, or provide a damn good reason for me to go out of my way.
Installing new software has a real time and hassle cost, and how much time are you actually saving over the long run? It depends on your usage patterns.
the developer is hired by someone that gets to make that decision. Ultimately the customer does. Thats why some people spend extreme resources on legacy crap, because someone has deemed it worth it.
In the middle of San Francisco, with Silicon Valley level incomes, very possible. In the real world I still exchange files with users on rustic ADSL, where every megabyte counts. Many areas out there, in rural Mongolia or in the middle of Africa that's just got access to the internet, are even worse in that regard.
English is evolving as a hieroglyphic language. That floppy disk icon stands a good chance of becoming simply the glyph meaning "save". The UK still uses an icon of an 1840s-era bellows camera for its speed camera road signs. The origin story will be filed away neatly and only its residual meaning will be salient.
More 'useful' one is webp. It has both a lossy and lossless compression algorithm, which have very different strengths and weaknesses. I think nearly every device supports reading both, but so many 'image optimization' libraries and packages don't - often just doing everything as lossy when it could be lossless (icons and what not).
It's similarly annoying how many websites take the existence of the lossy format as a license to recompress all WebP uploads, or sometimes other filetypes converted to WebP, even when it causes the filesize to increase. It's like we're returning to ye olden days of JPEG artifacts on every screenshot.
I was thinking about this with YouTube as an example. A lot of people complain about the compression on YouTube videos making things look awful, but I bet there's a reasonable number of high-end content creators out there who would run a native(-ish, probably Electron) app on their local system to do a higher-quality encoding to YouTube's specifications before uploading.
In many (most?) cases, it's possible to get better compression and higher quality if you're willing to spend the CPU cycles on it, meaning that YouTube could both reduce their encoding load and increase quality at the same time, and content creators could put out better quality videos that maintain better detail.
It would certainly take longer to upload the multiple multiple versions of everything, and definitely it would take longer to encode, but it would also ease YouTube's burden and produce a better result.
AFIAK you can upload any bitrate to youtube as long as the file is <256GB.
So you could upload a crazy high bitrate file to them for a 20 min video which I suspect would be close to "raw" quality.
I don't know how many corners youtube cut on encoding though.
I suspect most of the problem is people exporting 4k at a 'web' bitrate preset (15mbit/s?), which is actually gonna get murdered on the 2nd encode more than encoding quality on youtubes side?
So apparently webp is also 'RIFF" which is the container for WAV files as well it seems. I did not know this. Also webp has its own specialized lossless algorithm. For things like icon art I generally just continue to use PNG. Is there an advantage to using Webp Losslesss?
Tar files also have the miserable limitation of having no index; this means that to extract an individual file requires scanning through the entire archive until you find it, and then continuing to scan through the rest of the archive because a tar file can have the same file path added multiple times.
That makes them useful for transferring an entire set of files that someone will want all or none of, e.g. source code, but terrible for a set of files that someone might want to access arbitrary files from.
I don't know about, had a dicey situation recently where powershell's compress-archive couldn't handle archives >4GB and had to use 7zip. it is more reliable and you can ship 7za.exe or create self-extracting archives (wish those were more of a thing outside of the windows world).
I understand that security has to compromise for the real world, but a self-extracting archive is possibly one of the worst things one could use in terms of security.
You're assuming things because things are already done insecurely. You can authenticate the self-extractor as well as the extracted content. The user gets a nice message "This is a 7zip self-extracting archive sent to you by Bob containing the files below".
As an incident responder, I've seen much more of regular archives being used to social engineer users than self-extracting archives, because self-extracting is not "content executing". it is better for social engineering for users to establish trust in the payload first by having them manually open the archive. if something "weird" like self-extraction happens first, it might feel less trustworthy.
Oh and by the way, things like PyInstaller or electron apps are already self-extracting and self-executing archives. So are JAR files and android APK's.
jar files are zip files, so they don't contain "self extract" code, instead they are associated with already installed extraction code.
however, once extracted, jar files do contain executable code, and that is a security issue. the java model pays attention to security, but if code can do something, it can do something bad. if it can't do something, it's not very useful, is it.
the windows kernel executes a self extracting 7z archive, java.exe extracts and executes .jar files. If the 7z self extractor was .net CLR bytecode it would operate very much the same as .JAR files. to your point though, the primary purpose of JAR files is not to compress and transport other files, they're supposed to be executables only. From a user's perspective, abuse potential is the main difference.
What are you compressing with zstd? I had to do this recently and the "xz" utility still blows it away in terms of compression ratio. In terms of memory and CPU usage, zstd wins by a large margin. But in my case I only really cared about compression ratio
people tend to care about decompression speed - xz can be quite slow decompressing super compressed files whereas zstd decompression speed is largely independent of that.
People also tend to care about how much time they spend on compression for each incremental % of compression performance and zstd tends to be a Pareto frontier for that (at least for open source algorithms)
This makes sense. A lot of end-users have internet speeds that can outpace the decompression speeds of heavily compressed files. Seems like there would be an irrational psychological aspect to it as well.
Unfortunately for the hoster, they either have to eat the cost of the added bandwidth from a larger file or have people complain about slow decompression.
Well the difference is quite a bit more manageable in practice since you’re talking about single digit space difference vs a 2-100x performance in decompression.
I definitely agree, I basically have unlimited time and unlimited CPU for decompressing. Available memory is huge too. The gains from xz were significant enough that I went with it.
I think it depends on what you're compressing. I experimented with my data full of hex text xml files. xz -6 is both faster and smaller than zstd -19 by about 10%. For my data, xz -2 and zstd -17 achieve the same compressed size but xz -2 is 3 times faster than zstd -17. I still use xz for archive because I rarely needs to decompress them.
It's pretty clear zstd blows everything else out of the water by a huge margin. And even though compressing with zstd is slightly slower than xz in this case (by less than 10%), decompression is nearly 8x as fast, and you can probably tweak the compression level to make zstd be both faster and better than xz.
If the email data is mostly text with markup (like HTML/XML), you might want to try bzip3 too.
It's also possible that a large part of your email is actually already-compressed binary data (like PDFs and images) possibly encoded in base-64. In that case it's likely that all tools are pretty good at compressing the text and headers, but can do little to compress the attachments, which would explain why the results you get are so close.
Interesting--thanks for checking! I had good experiences with bzip3 compressing Wikipedia XML dumps, to the point it even outperformed xz, so I thought something similar might happen here. Compression does remain a bit of a black art, where it's hard to predict what works without trying it out.
Overall I'm still slightly biased towards using zstd as a default, in that I believe:
1. zstd will almost always be among fastest formats for decompression, which is obviously nice-to-have everything else being equal.
2. zstd can achieve a very high compression ratio, depending on tuning; rarely will zstd significantly underperform the next best option.
Overall this is a pretty good case for using zstd by default, even if in some cases it's not noticably better than other formats. In your case, xz seems to be just as good.
yup, you should have tried just different -NN, and notice. I had a talk on zstd couple of years back, and one of the points was that it was better than xz across the board.
Use the pigz command for parallel gzip. Mark Adler also has an example floating around somewhere about how to implement basically the same thing using Z_BLOCK.
zip is such a shit standard, hell there are parts of it that are still undocumented and sharing documents between system zip implementations across mac and windows sometimes fails.
7-zip is the de-facto tool on Windows and has been for a long time. It's more than fast and compressed enough for 99% of peoples use cases.
It's not going anywhere anytime soon.
The more likely thing to eat into its relevance is now that Windows has built-in basic support for zipping/unzipping EDIT: other formats*, which relegates 7-zip to more niche uses.
7-zip is the de-facto tool on Windows and has been for a long time.
Agreed. The only thing I think it has been missing is PAR support. I think they should consider incorporating one of the par2cmdline forks and porting that code to Windows as well so that it has recovery options similar to WinRAR. It's not used by everyone but that should deprecate any use cases for WinRAR in my opinion.
As mentioned in another comment, zip support actually goes further back as far as '98, but only Windows 11 added support for handling other formats like RAR/7-Zip/.tar/.tar.gz/.tar.bz2/etc.
That allows it to be a default that 'just works' for most people without installing anything extra.
The vast majority of users don't care about the extra performance or functionality of a tool like 7-zip. They just need a way to open and send files and the Windows built-in tool is 'good enough' for them.
I agree that 7-zip is better, but most users simply do not care.
Windows zip is not in fact good enough. I've run into weird, buggy behavior, hanging on extract, all sorts of nonsense. I can see the argument that a universally-adopted solution is better, but that's different from windows just not working.
I'm not saying I would ever use it. I'm saying that for casual non-power users, it's good enough. They work with it and if it breaks once in a blue moon they don't care. They just want it to open the files they get and give them a way to send files compressed.
That is enough to bite into 7-Zip's share of users.
7-zip, through its .7z format, also supports AES encryption. I'd argue it's probably the easiest way to encrypt individual file archives that you need to access on both Windows and Linux. I have a script I periodically run that makes an encrypted .7z archive of all of my projects, which I then upload for off-site backup. (On-site, I don't bother encrypting.)
Is there something different about the built in zip context menu functionality now than before? I'm pretty sure you could convert something to a zip file since forever ago by right clicking any file.
It's been a long time since I used Windows, but back in the day I used 7-Zip exactly because it could open more or less $anything. That's also why we installed it on many customer computers.
On Linux bsdtar/libarchive gives a similar experience: "tar xf file" works on most things.
7-Zip is like VLC: maybe not the best, but it’s free (speech and beer) and handles almost anything you throw at it. For personal use, I don’t care much about efficient compression either computationally or in terms of storage; I just want “tar, but won’t make a 700 MB blank ISO9660 image take 700 MB”.
Windows 11 has shipped with bsdtar/libarchive for a few years. The gui shell support for archive files was recently changed to use libarchive which has increased the supported archive files which can be opened in the shell.
That's basically me! I really like 7-Zip because it opens most archive formats I have to work with and also the .7z format has pretty good compression for the stuff I want to store longer term.
in fact this is the first time I even hear about it, and I am semi-IT litterate. The prevalence of a compression standard is about how ubiquitous it is. For that one, I would vote "not even on the radar yet".
if by gui u mean the ability to right click a .zip file and unzip it just through the little window that pops up ur totally right. At least that + the unzipping progress bar is what I appreciate 7zip for
That's why 7zip should support it. People care about the convenience of the GUI and we all benefit from better compression being accessible with a nice GUI.
If you're expecting a "mobile first" or similar GUI where most of the screen is dedicated to whitespace, basic features involves 7 or more mouse clicks and for some reason it all gets changed every ~6 months then yes the 7zip GUI is terrible.
Desktop software usability peaked sometime in the late 90s, early 2000s. There's a reason why 7zip still looks like ~2004
You could have taken the 10 seconds to type in "PeaZip GUI" and seen that it is not a mobile interface and it is indeed much nicer than the 7-Zip interface.
Instead you chose to make a useless snarky comment. Be better.
PeaZip is popular? It seems a lot less tested than 7zip; Last time I tried to use it, it failed to unpack an archive because the password had a quote character or something like that. Never had such crazy issues in 7zip myself.
I use this one too. This is the best one out of all the alternative 7zip with zstd. UI is same, has all the options and faster than those for zstd in my comparisons.
Tried it, compression is worse compared to 7-Zip-zstd by mcmilk for zstd, for same speed. The removal of text from toolbar icons is enough for me to never use it again. 7zip can change file associations directly with it and very easy. Feels like NanaZip is worse in QoL than 7zip.
Being a bit faster or efficient won't make most people switch. 7z offers great UX (convenient GUI and support for many formats) that keeps people around.
Since Windows 11 incorporated libarchive back in October 2023 there is less reason to use 7-zip on windows. I would be surprised if any of my friends even know what a zip file is let alone zstd.
If you ever try to extract an archive file of several gigabyte size with hundreds of thousands of files (I know, it's rare), the built-in one is as slow as a turtle compared to 7z.
Glad I'm not the only one who feels this way. WinZip is a slow and bloated abomination, especially compared to 7-Zip. The right-click menu context entry for 7-Zip is very convenient and runs lightning fast. WinZip can't compete at all.
There are lots of 7zip alike with zstd support (it's a plugin effectively). On [corporate] Windows NanaZip would be my choice as it's available in Windows store.