Hacker Newsnew | past | comments | ask | show | jobs | submit | pradeepchhetri's commentslogin

Very cool project indeed!

I tried to check the kind of flights they flew in the world's dangerous airport (Lukla, Nepal)[0] and found they use ATR-72 series.

[0] https://adsb.exposed/?dataset=Planes&zoom=12&lat=27.7136&lng...


I prefer to use clickhouse-local for all my CSV needs as I don't need to learn a new language (or cli flags) and can just leverage SQL.

    clickhouse local --file medias.csv --query "SELECT edito, count() AS count from table group by all order by count FORMAT PrettyCompact"

   ┌─edito──────┬─count─┐
   │ agence     │     1 │
   │ agrégateur │    10 │
   │ plateforme │    14 │
   │ individu   │    30 │
   │ media      │   423 │
   └────────────┴───────┘
With clickhouse-local, I can do lot more as I can leverage full power of clickhouse.


I used to use q for this sort of thing. Not sure if there are better choices now as it have been a few years.

https://harelba.github.io/q/


How does it compare with duckdb, which I usualy resort to? What I like with duckdb is that it's a single binary, no server needed, and it's been happy so far with all the CSV file I've thrown at it.


clickhouse-local is similar to duckdb, you don't need a clickhouse-server running in order to use clickhouse-local. You just need to download the clickhouse binary and start using it.

  clickhouse local
  ClickHouse local version 25.4.1.1143 (official build).

  :)
There are few benefits of using clickhouse-local since ClickHouse can just do lot more than DuckDB. One such example is handling compressed files. ClickHouse can handle compressed files with formats ranging from zstd, lz4, snappy, gz, xz, bz2, zip, tar, 7zip.

  clickhouse local --query "SELECT count() FROM file('top-1m-2018-01-10.csv.zip :: *.csv')"
  1000000
Also clickhouse-local is much more efficient in handling big csv files[0]

[0]: https://www.vantage.sh/blog/clickhouse-local-vs-duckdb


Wanted to try it.

Debian package is of poor quality: not even sure if clickhouse local is included in there, I believe so but there is no manpage, no doc at all, and no `clickhouse-server -h`.

Went to the official page looking for a tarball to download, found only the `curl|sh` joke.

Went to github looking for tagged tarballs, couldn't find any. Looked for INSTALL.md, couldn't find any.

Will try harder later, have to weep my tears for now.


ClickHouse is a single binary. It can be invoked as clickhouse-server, clickhouse-client, and clickhouse-local. The help is available as `clickhouse-local --help`. clickhouse-local also has a shorthand alias, `ch`.

This binary is packaged inside .deb, .rpm, and .tgz, and it is also available for direct download. The curl|sh script selects the platform (x86_64, aarch64 x Linux, Mac, FreeBSD) and downloads the appropriate binary.


> Debian package is of poor quality

Can you elaborate more please? I would love if you can say what all can be improved to make debian package up to standards.


Thank you for your interest.

My comment was really about the state of documentation ("there is no manpage, no doc at all, and no `clickhouse-server -h`"). More specifically:

  % dpkg -S clickhouse-server | grep bin
  clickhouse-server: /usr/sbin/clickhouse-server
  % man clickhouse-server
  No manual entry for clickhouse-server
  % man clickhouse       
  No manual entry for clickhouse
  % /usr/sbin/clickhouse-server --help
  Unknown option specified: help
  % /usr/sbin/clickhouse-server -h    
  Unknown option specified: h
  % ls -l /usr/share/doc/clickhouse-server
  total 60
  -rw-r--r-- 1 root root   235 Dec  5  2022 changelog.Debian.amd64.gz
  -rw-r--r-- 1 root root  1437 Dec  5  2022 changelog.Debian.gz
  -rw-r--r-- 1 root root 33174 Dec 20  2018 changelog.gz
  -rw-r--r-- 1 root root 15057 Oct 29  2022 copyright


I use SQLite in a similar manner, but I'll have to check this out.


ClickHouse has a solid Iceberg integration. It has an Iceberg table function[0] and Iceberg table engine[1] for interacting with Iceberg data stored in s3, gcs, azure, hadoop etc.

[0] https://clickhouse.com/docs/en/sql-reference/table-functions...

[1] https://clickhouse.com/docs/en/engines/table-engines/integra...


I would say it doesn't but it is actively working on it

https://github.com/ClickHouse/ClickHouse/issues/52054


duckdb has the same issue[0], I submitted a PR, but it's been stalled

0 - https://github.com/duckdb/duckdb-iceberg/pull/78


Oh they just fixed this 9d ago and I guess this comment provoked them to close the issue!


I am looking forward to learn about such upcoming features in the community call https://clickhouse.com/company/events/v25-1-community-releas...


oh and now the developer reopened it because it is not actually fully complete, lol. Yep, Iceburg on Clickhouse is WIP. I am actively watching this because it is relevant for my company.


glad you are using CH :)


> If I had to only pick two databases to deal with, I’d be quite happy with just Postgres and ClickHouse - the former for OLTP, the latter for OLAP.

As the author mentioned, I completely agree with this statement. In fact, many companies like Cloudflare are built with exactly this approach and it has scaled them pretty well without the need of any third database.

> Another reason I suggest checking out ClickHouse is that it is a joy to operate - deployment, scaling, backups and so on are well documented - even down to setting the right CPU governor is covered.

Another point mentioned by author which is worth highlighting is the ease of deployment. Most distributed databases aren't so easy to run at scale, ClickHouse is much much easier and it has become even more easier with efficient storage-compute separation.


Sai from ClickHouse here. Have been living and breathing past year helping customers integrating Postgres and ClickHouse together. Totally agreed with this statement - there are numerous production grade workloads solving most of their data problems using these 2 purpose-built Open Source databases.

My team at ClickHouse has been working hard to make the integration even seamless. We work on PeerDB, an open source tool enabling seamless Postgres replication to ClickHouse https://github.com/PeerDB-io/peerdb/ This integration is now also natively available in the Cloud through ClickPipes. The private preview was released just last week https://clickhouse.com/cloud/clickpipes/postgres-cdc-connect...


Out of curiosity: Why not mysql? I am also surprised that no one has even mentioned mysql in any of the comments so far -- so looks like the verdict is very clear on that one

PS: I am also a fan of Postgres, and we are using that for our startup. But I don't know the answer if someone asks me, why not Mysql. Hence asking


To my knowledge, both Postgres and MySQL has their own strengths and weaknesses. Example: mvcc implementation, data replication, connection pooling and difficulty of upgrades were the major weaknesses of Postgres which are much improved over time. Similarly mysql query optimizer is consider lesser developed than that of Postgres's.

Overall I think Postgres adoption and integrations and thus community is much more wider than MySQL which gives it major advantage over MySQL. Also looking at the number of database-as-a-service companies of Postgres vs those of MySQL we can immediately acknowledges that Postgres is much widely adopted.


A few other things I would add:

- MySQL performs a bit better when reading by primary key

- Postgres performs a bit better when doing random inserts/updates.

- MySQL you don't need to worry about vacuums

- The MySQL query optimizer is nice because you can give it hints when it misbehaves. This can be a godsend during certain production incidents.

- Last I checked MySQL still has a nicer scaling story than postgres, but I'm not sure what the latest here is.

- Connection pooling is still heavily in MySQLs favor i.e. you don't need the PG bouncer for lots for scenarios.


There was an article from Uber on why they shifted from Postgres to Mysql: https://www.uber.com/en-IN/blog/postgres-to-mysql-migration/

I don't know how much of that article points are still valid.

The other part in favor of mysql (in my opinion) are that there are lots of companies that use mysql in production - so the access patterns, and its quirks are very well defined Companies like Square, YouTube, Meta, Pinterest, now Uber all use mysql. From blind, Stripe was also thinking of moving all its fleet from Mongo to mysql

Perception wise, it looks like companies needing internet scale data are using mysql


I think this might come down to… Oracle.

Obviously there are alternatives like MariaDB but Postgres is a quality long standing open source solution.


Streaming queries is coming in ClickHouse too: https://github.com/ClickHouse/ClickHouse/pull/63312


There was a great talk[0] recently from Laravel core team member about ClickHouse, Probably you will enjoy watching it.

[0] https://www.youtube.com/watch?v=_jjvaFWWKqg


You should look at ClickHouse which has good PHP client

[0] https://github.com/smi2/phpClickHouse


Since analytics data is generally write-heavy, I would recommend to use ClickHouse. You can use async-insert[0] feature of ClickHouse, thus you don't need to worry about batching events on your side. If you are looking for an embedded solution, you can use chDB which is built on top of ClickHouse.

[0] https://clickhouse.com/blog/asynchronous-data-inserts-in-cli...


You can leverage ClickHouse to process your music data. ClickHouse supports both TSV[0] and XML[1] data formats.

[0] https://clickhouse.com/docs/en/interfaces/formats#tabseparat...

[1] https://clickhouse.com/docs/en/interfaces/formats#xml


This syntax looks a lot like PRQL. ClickHouse supports writing queries in PRQL dialect. Moreover, ClickHouse also supports Kusto dialect too.

https://clickhouse.com/docs/en/guides/developer/alternative-...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: