I had to parse a database backup from Firebase, which was, remarkably, a 300GB J...

wruza · on March 19, 2023

The decision they made to store potentially multigigabyte-sized backups into a single json is just idiotic, to begin with.

taspeotis · on March 19, 2023

.NET has this built in with Utf8JsonReader [1].

> Utf8JsonReader is a high-performance, low allocation, forward-only reader for UTF-8 encoded JSON text, read from a ReadOnlySpan<byte> or ReadOnlySequence<byte>

Although it's a bit cumbersome to use with a stream [2].

[1] https://learn.microsoft.com/en-us/dotnet/standard/serializat...

[2] https://learn.microsoft.com/en-us/dotnet/standard/serializat...

simonw · on March 18, 2023

I've used ijson in Python for this kind of thing in the past, it's pretty effective: https://pypi.org/project/ijson/

Liron · on March 19, 2023

I downloaded a huge Firebase backup looking for a particular record.

I ended up using the “split” shell command to get a bunch of 1gb files, then grepping for which file had the record I was looking for, then using my own custom script to scan outward from the position of matched text until it detected a valid parsable JSON object within the larger unparseable file, and return that.

mlhpdx · on March 18, 2023

Back in the bad old days when XML consumers hit similar problems we’d use and event based parser like SAX. I’m a little shocked there isn’t a mainstream equivalent for JSON — is there something I’ve missed?

jahewson · on March 19, 2023

Oh yes, some time ago I wrote a nodejs module to handle large xml files like that https://www.npmjs.com/package/big-xml

For JSON, given that large files are generally record-based ndjson is the solution I’ve encountered http://ndjson.org/ and it works nicely with various tools out there using the .ndjson file extension