One thing I admire about Snowflake is a real commitment to self-cannibalization. They were super out front with Iceberg even though it could disrupt them, because that's what customers were asking for and they're willing to bet they'll figure out how to make money in that new world
Have you interacted with Snowflake teams much? We are using external iceberg tables with snowflake. Every interaction pretty much boils down to you really should not be using iceberg you should be using snowflake for storage. It's also pretty obvious some things are strategically not implemented to push you very strongly in that direction.
Not surprised - this stuff isn’t fully mature yet. But I interact with their team a lot and know they have a commitment to it (I’m the other guy in that video)
even partition elimination is pretty primitive. For Query optimizer Iceberg is really not a primary target. The overall interaction with even technical people gives strong this is a sales org that happens to own an OLAP db product vibe.
I have to very much disagree on that.
All pruning techniques in Snowflake work equally well both on their proprietary format as well for Iceberg tables. Iceberg is nowadays a first-class citizen in Snowflake, with pruning working at the file level, row group level, and page level. Same is true for other query optimization techniques. There is even a paper on that: https://arxiv.org/abs/2504.11540
Where pruning differences might arise for Iceberg tables is the structure of Parquet files and the availability of metadata. Both depend on the writer of the Parquet files. Metadata might be completely missing (e.g., no per column min/max), or partially missing (e.g., no page indexes), which will indeed impact the perf. This is why it's super important to choose a writer that produces rich metadata. The metadata can be backfilled / recomputed after the fact by the querying engine, but it comes at a cost.
Another aspect is storage optimization: The ability to skip / prune files is intrinsically tied to the storage optimization quality of the table. If the table is neither clustered nor partitioned, or if the table has sub-optimally sized files, then all of these things will severely impact any engine's ability to skip files or subsets thereof.
I would be very curious if you can find a query on an Iceberg table that shows a better partition elimination rate in a different system.
Supporting Iceberg is eventually having people leaving you because they have better elsewhere, but this is birectionnal, it means you can welcome people from Databricks because you have feature parity.
It's not going to scale as well as Snowflake, but it gets you into an Iceberg ecosystem which Snowflake can ingest and process at scale. Analytical data systems are typically trending to heterogenous compute with a shared storage backend -- you have large, autoscaling systems to process the raw data down to something that is usable by a smaller, cheaper query engine supporting UIs/services.
Different parts of the analytical stack have different performance requirements and characteristics. Maybe none of your stack needs it and so you never need Snowflake at all.
More likely, you don't need Snowflake to process queries from your BI tools (Mode, Tableau, Superset, etc), but you do need it to prepare data for those BI tools. Its entirely possible that you have hundreds of terabytes, if not petabytes, of input data that you want to pare down to < 1 TB datasets for querying, and Snowflake can chew through those datasets. There's also third party integrations and things like ML tooling that you need to consider.
You shouldn't really consider analytical systems the same as a database backing a service. Analytical systems are designed to funnel large datasets that cover the entire business (cross cutting services and any sharding you've done) into subsequently smaller datasets that are cheaper and faster to query. And you may be using different compute engines for different parts of these pipelines; there's a good chance you're not using only Snowflake but Snowflake and a bunch of different tools.
When we first developed pg_lake at Crunchy Data and defined GTM we considered whether it could be a Snowflake competitor, but we quickly realised that did not make sense.
Data platforms like Snowflake are built as a central place to collect your organisation's data, do governance, large scale analytics, AI model training and inference, share data within and across orgs, build and deploy data products, etc. These are not jobs for a Postgres server.
Pg_lake foremost targets Postgres users who currently need complex ETL pipelines to get data in and out of Postgres, and accidental Postgres data warehouses where you ended up overloading your server with slow analytical queries, but you still want to keep using Postgres.
For testing, we at least have a Dockerfile to automate the setup of the pgduck_server and a minio instance so it Just Works™ with the extensions installed in your local Postgres cluster (after installing the extensions).
The configuration mainly involves just defining the default iceberg location for new tables, pointing it to the pgduck_server, and providing the appropriate auth/secrets for your bucket access.
This feels like a contentless article. They gave a statistic and crafted a narrative based on one person’s experience, which leaves me with many more questions than answers.
Agreed, a very shallow article with a few personal opinions. A little breather for Austin proper may be a good thing, housing prices and rents have come down significantly from a few years ago. The infrastructure is presently a mess and will be such for the next few years, along with the airport expansion. But the surrounding areas are still growing quickly, and there is no shortage of interesting startups in the area. The one obvious thing the article misse is the weather, which simply is not for everyone.
Hence why one should never waste time clicking on links without numbers, specifically change in distribution of data (deciles/quintiles), or at least median.
If there are no numbers about the distribution, there can be no evidence of the article’s claim, hence it is useless and meant to evoke emotion.
I’m a big fan of chezmoi (https://www.chezmoi.io/) which is a very capable dotfile manager. Chezmoi supports some useful advanced capabilities like work/home profiles and secrets manager integration.
Same for me. I'd done the same thing as the author with various methods like stow, symlink farms, etc. over the years. Chezmoi is good enough that I'm willing to let someone else handle maintaining all logic.
Yup, I tried a number of dotfile managers. I think yadm was the first one I started with and then ended up with chezmoi.
The main reason was because I discovered the power of templating. With Yadm it required an external dependency, envptl, then j2cli, and both of these became unmaintained, while chezmoi used the text/template standard library. After the task of converting my jinja2 templates to gotmpl I never looked back.
One of the other things I like about chezmoi is I significantly cut down any "scripts" to just a few as most of the logic became "deterministic", ie I would set conditions based on the host in chezmoi.toml.tmpl and then that would define how everything under that would run across multiple hosts, and devices.
I migrated to chezmoi recently my only gripe is `chezmoi cd` opening in a new shell but `chezmoi git` usually is what I need. The age [0] integration is nice.
I added an alias `cm='cd $(chezmoi source-path)'` to my shell config to cd to the chezmoi directory (without opening a new shell) so I can use all the usual commands (e.g. git) without need the chezmoi prefix. The alias is in a chezmoi-managed file, naturally.
Hey, I had never heard about chezmoi before reading your comment, but I just installed it. Took less than 10 minutes to set up from start to finish. I noticed that if you choose to use it to manage your `~/.ssh/config/`, by default chezmoi sets it up as `private_dot_ssh/` and so if your dotfiles are public it doesn't expose sensitive data like private key files such as `~/.ssh/id_rsa`. Smart!
The private_ only applies to file permissions so in this case it makes the .ssh directory only readable by the owner. This is checked for by openssh and the config will be ignored if it's readable by the group or all.
If you make your dotfiles repo publicly accessible, you will leak your private keys unless you use other features in chezmoi to protect them.
also a big fan of it because the templating feature makes it very easy to handle dotfiles with different locations on multiple machines and if you use multiple operating systems. Really not that many tools around that have good windows support.
You jest, but with an ESP32 flashed with ESPHome and a few dollars of electronics to regulate the power, I think controlling model trains would actually be quite doable. Your biggest challenge is probably dealing with network/scripting latency for events that need to happen in quick succession, like when dealing with switches.
Edit: there's also this https://github.com/aaron9589/esphome-for-model-railroading for the more serious model railroad enthusiast, though I'm not 100% sure if that actually controls the trains themselves (or just the switches and lights)
That's why I try as hard as I can to find either truly "dumb" devices with mechanical switches vs momentary buttons or devices that remember their last state after A/C power is restored. Hard to figure out the second option though without trying it unless a review happens to mention it specifically.
Awesome. The Home Assistant and related (ESPHome, Voice Assistant, Music Assistanc) communities are amazing and there is just a crazy amount of projects one can just pick up and use.
In my opinion, psql is the "perfect" terminal database client. It's fast, has the ability to easily switch between wide and narrow row formats, and has commands for the things I do frequently.
I have often wanted psql to work with other databases, because the other CLI clients are either bare-bones, or just plain unusable.
I heard a story about an Intel fab in Arizona that would always produce bad silicon at a certain time a day. After some investigations it was determined that a train passed by at that time every day causing enough seismic activity to disrupt the manufacturing process.
Actually, I love this. I remember having a teeny tiny phone back then (not quite a Zoolander size), and it was a lot of fun. Now everything is a boring rectangle of glass. Bring back the fun!
Have you tried any of the "human or Dall-E" tests?
How did you score?
I only scored as well as I did because I knew the kind of stylistic choices to look out for. In terms of "quality" I really don't understand how you've reached this conclusion.