I'm pretty happy to see this, as conditionals can really help keep code manageable when trying to define CSS variables or other properties based on combinations of light mode, dark mode, high-contrast, contextual state in a document or component, etc.
if() isn't the only way to do this, though. We've been using a technique in Review Board that's roughly equivalent to if(), but compatible with any browser supporting CSS variables. It involves:
1. Defining your conditions based on selectors/media queries (say, a dark mode media selector, light mode, some data attribute on a component, etc.).
2. Defining a set of related CSS variables within those to mark which are TRUE (using an empty value) and which are FALSE (`initial`).
3. Using those CSS variables with fallback syntax to choose a value based on which is TRUE (using `var(--my-state, fallback)` syntax).
I recently implemented dark/light mode for the first time and was really surprised to find that in order to add a toggle I had to duplicate a both of vars/styles and use JavaScript. I'm looking forward to not having to deal with that cruft in the future.
Love seeing this one. My uncle was co-founder of Quarterdeck, and I grew up in a world of DESQview and QEMM. It was a big influence on me as a child.
Got a good family story about that whole acquisition attempt, but I don't want to speak publicly on behalf of my uncle. I know we've talked at length about the what-ifs of that moment.
I do have a scattering of some neat Quarterdeck memorabilia I can share, though:
This is a great feature for those who want a mix of positional and keyword-only arguments.
I should have mentioned originally (and I've since updated my post) that this and the kw_only= flag both require Python 3.10 and higher, so code that works with older versions can't opt into it yet.
That's annoying for sure. Though a different problem.
All the kw_only=True argument for dataclasses does is require that you pass any fields you want to provide as keyword arguments instead of positional arguments when instantiating a dataclass. So:
obj = MyDataclass(a=1, b=2, c=3)
Instead of:
obj = MyDataclass(1, 2, 3) # This would be an error with kw_only=True
The problem you're describing in boto3 (and a lot of other API bindings, and a lot of more layered Python code) is that methods often take in **kwargs and pass them down to a common function that's handling them. From the caller's perspective, **kwargs is a black box with no details on what's in there. Without a docstring or an understanding of the call chain, it's not helpful.
Python sort of has a fix for this now, which is to use a TypedDict to define all the possible values in the **kwargs, like so:
from typing import TypedDict, Unpack
class MyFuncKwargs(TypedDict):
arg1: str
arg2: str
arg3: int | None
def my_outer_func(
**kwargs: Unpack[MyFuncKwargs],
) -> None:
_my_inner_func(**kwargs)
def _my_inner_func(
*,
arg1: str,
arg2: str,
arg3: int | None,
) -> None:
...
By defining a TypedDict and typing **kwargs, the IDE and docs can do a better job of showing what arguments the function really takes, and validating them.
Also useful when the function is just a wrapper around serializing **kwargs to JSON for an API, or something.
But this feature is far from free to use. The more functions you have, the more of these you need to create and maintain.
Ideally, a function could type **kwargs as something like:
And then the IDEs and other tooling can just reference that function. This would help make the problem go away for many of the cases where **kwargs is used and passed around.
I don't see a point in using them in new code when I could just use a dataclass (or Pydantic in certain contexts). I've only found them useful when interfacing with older code that uses dicts for structured data.
That's always the challenge when iterating on interfaces that other people depend on.
What we do is go through a deprecation phase. Our process is:
* We provide compatibility with the old signature for 2 major releases.
* We document the change and the timeline clearly in the docstring.
* The function gets decorated with a helper that checks the call, and if any keyword-only arguments are provided as positional, it warns and converts them to keyword-only.
* After 2 major releases, we move fully to the new signature.
We buit a Python library called housekeeping (https://github.com/beanbaginc/housekeeping) to help with this. One of the things it contains is a decorator called `@deprecate_non_keyword_only_args`, which takes a deprecation warning class and a function using the signature we're moving to. That decorator handles the check logic and generates a suitable, consistent deprecation message.
That normally looks like:
@deprecate_non_keyword_only_args(MyDeprecationWarning)
def my_func(*, a, b, c):
...
But this is a bit more tricky with dataclasses, since `__init__()` is generated automatically. Fortunately, it can be patched after the fact. A bit less clean, but doable.
So here's how we'd handle this case with dataclasses:
from dataclasses import dataclass
from housekeeping import BaseRemovedInWarning, deprecate_non_keyword_only_args
class RemovedInMyProject20Warning(BaseRemovedInWarning):
product = 'MyProject'
version = '2.0'
@dataclass(kw_only=True)
class MyDataclass:
a: int
b: int
c: str
MyDataclass.__init__ = deprecate_non_keyword_only_args(
RemovedInMyProject20Warning
)(MyDataclass.__init__)
Call it with some positional arguments:
dc = MyDataclass(1, 2, c='hi')
and you'd get:
testdataclass.py:26: RemovedInMyProject20Warning: Positional arguments `a`, `b` must be passed as keyword arguments when calling `__main__.MyDataclass.__init__()`. Passing as positional arguments will be required in MyProject 2.0.
dc = MyDataclass(1, 2, c='hi')
We'll probably add explicit dataclass support to this soon, since we're starting to move to kw_only=True for dataclasses.
Generally-speaking, you probably shouldn't have to deal with these problems unless you're writing a tool that has to interface with certain SCMs or SCMs used in certain environments. I'll give you some examples for each of these points:
1. There are two important areas where encoding can matter: The filename and the diff content.
Git pays attention to filename encoding, but most SCMs don't, so when a diff is generated, it's based on the local encoding. If there are any non-ASCII characters in that filename, a diff generated in one environment with one encoding set can end up not applying to another (or, in our case, not being able to be looked up from a repository). This isn't common but it can happen (we've seen this on Perforce and Subversion).
Then there's the content. Many SCMs will actually give you a representation of a text file and not the raw contents itself. That text file will be re-encoded for your local/preferred encoding, and newlines may be adjusted as well (`\r\n`, `\n`). The text file is then re-encoded back when pushing the change. This allows people in different environments to operate on the same file regardless of what encoding they're working with.
This doesn't necessarily make its way into the diff, though. So when you send a diff from a less-common encoding to a tool to process it, and that tries to apply it to the file checked out with its encoding, it can fail to patch.
The solution is to either know the encoding of the file you're processing, or try to guess it (some tools, like ours, let you specify a list of preferred encodings to try).
It's best if you can know it up-front.
Bonus Fun Fact: On some SCMs (Perforce comes to mind), checking out a file on Windows and then diffing it Linux via a shared mount can get you a diff with `\r\r\n` newlines. It's a bad time and breaks patching. And it used to come up a lot, until we worked around it.
Also, Perforce for a while would sometimes normalize encodings incorrectly and you'd end up with BOMs in the diff, breaking GNU patch.
2. It does when you're working with them directly for applying and patching. If you're handing them off to a tool for processing, if there's any risk of one file in a sequence not being included, you can end up with breakages that maybe you don't see until later processing.
It's also just really nice having all the state and metadata up-front so we can process it in one go in a consistent way without having to sanity-check all the diffs against each other.
When working locally, it also depends on your tooling. `git format-patch` and `git am` are great, but are for Git. If I'm working with (let's just say) Subversion, I need to do my own thing or find another tool.
3. It's critical for the kind of information needed to locate files in a repository. Some systems need a commit-wide identifier. Some need per-file identifiers. Some need a combination of the two. Some need those plus additional data not otherwise represented in the path or revision (generally more enterprise SCMs targeting certain use cases).
It's also critical for representing information that isn't in the Unified Diff format (namely, anything but the filename). So, symlink information, file modes, SCM-specific properties on a file or directory, to name a few. This information needs to live somewhere if a SCM provides it, and it's up to every SCM to choose how and where to store that data (and then how it's encoded, etc.).
> Then there's the content. Many SCMs will actually give you a representation of a text file and not the raw contents itself. That text file will be re-encoded for your local/preferred encoding, and newlines may be adjusted as well (`\r\n`, `\n`). The text file is then re-encoded back when pushing the change.
Yeah, don't do that.
> This allows people in different environments to operate on the same file regardless of what encoding they're working with.
No it causes hard to understand bugs because now what people see on their device and what is tracked in source control differs, defeating the entire purpose of having source control in the first place. This isn't theoretical at all btw.
> The solution is to either know the encoding of the file you're processing
In general, there is no such encoding - source control tools need to be able to deal with files not valid in any single encoding.
We build a code review product that interfaces with over a dozen SCMs. In about 20 years of writing diff parsers, we've encountered all kinds of problems and limitations in SCM-generated diff files (which we have to process) that we wouldn't ever have expected to even consider thinking about before. This all comes from the pain points and lessons learned in that work, and has been a huge help in solving these for us.
These aren't problems end users should hopefully ever need to worry about, but they're problems that tools need to worry about and work around. Especially for SCMs that don't have a diff format of their own, have one that is missing data (in some, not all changes can be represented, e.g. deleted files), or don't include enough information for another tool to identify the file in a repository.
Better file formats cannot, by themselves, improve an inferior SCM tool that, for instance, processes files with the wrong text encoding or forgets deleted and renamed files: they would only have helped you for the purpose of developing your code review tool.
Standards are meant for interchange, like (as mentioned in other comments) producing a patch file by any means and having someone else apply it regardless of what they use for version control.
This isn't a tool for viewing changes to files or to ASTs. This is a way of being able to generate a single diff file for processing or patching that addresses the kinds of problems we've encountered in over 20 years of building diff parsing tooling and working with over a dozen SCMs with varying levels of completeness or brokenness of bespoke custom diff formats.
It's not an end user tool, but a useful format for tools like code review products to use.
In the early drafts, we played with a number of approaches for the structure. Things like "commit-meta", etc. In the end, we broke it down into `#<section_level><section_type>`, just to simplify the parsing requirements. Every meta block is a meta block, and knowing what section level you're supposed to be in and comparing to what section level you get become a matter of "count the dots".
The header formats are meant to be very simple key/value pairs that are known by the parser, and not free-form bits of metadata. That's what the "meta" blocks are for. The parsing rules for the header are intentionally very simple.
JSON was chosen after a lot of discussion between us and outside parties and after experimentation with other grammars. The header for a meta block can specify a format used to serialize the data, in case down the road something supplants JSON in a meaningful way. We didn't want to box ourselves in, but we also don't want to just let any format sit in there (as that brings us back to the format compatibility headaches we face today).
For the other notes:
1. Compatibility is a key factor here, so we'd want to go with base-level JSON. I'd prefer being able to have trailing commas in lists, but not enough to make life hard for someone implementing this without access to a JSON5 parser.
2. If your goal is to simply feed to GNU patch (or similar), you can still split it. This extra data is in the Unified Diff "garbage" areas, so they'll be ignored anyway (so long as they don't conflict, and we take care to ensure that in our recommendations on encoding).
If your goal is to split into two DiffX files, it does become more complicated in that you'd need to re-add the leading headers.
That said, not all diff formats used in the wild can be split and still retain all metadata. Mercurial diffs, for example, have a header that must be present at the top to indicate parent commit information. You can remove that and still feed to GNU patch, but Mercurial (or tools supporting the format) will no longer have the information on the parent commit.
3. Revisions depend heavily on the SCM. Some SCMs use a commit identifier. Some use per-file identifiers. Some use a combination of the two. Some use those plus additional information that either gets injected into the diff or needs to be known out-of-bounds. There's a wide variety of requirements here across the SCM landscape.
> The header formats are meant to be very simple key/value pairs that are known by the parser, and not free-form bits of metadata. That's what the "meta" blocks are for.
One more thing you should prepare for whenever you have "free-form bits of metadata". They somehow turn into: "some user was storing 100MB blobs in there, and that broke our other thing".
> 1. Compatibility is a key factor here, so we'd want to go with base-level JSON. I'd prefer being able to have trailing commas in lists, but not enough to make life hard for someone implementing this without access to a JSON5 parser.
This is what I was referring to. This is not json:
> #..meta: format=json, length=270
> The header formats are meant to be very simple key/value pairs that are known by the parser, and not free-form bits of metadata. That's what the "meta" blocks are for. The parsing rules for the header are intentionally very simple.
Exactly my point. That level of flexibility for a .patch format to support another language embedded in it is overwhelming. Keep in mind that you are proposing a textual format, not a binary format. So people will use 3rd party text parsing tools to play with it. And having 2 distinct languages in there makes that annoying and a pain.
How do they reasonable work around that though? If they want the ability to move away from JSON, you have to know that it is JSON before trying to parse it. And then you need to know how much data to read. So I can see why they put those 2 tidbits of info above data block.
Maybe they could have said too bad, JSON for life, we'll never change it. OK. But then you still need the length or a delimiter for the "end of json".
> Compatibility is a key factor here, so we'd want to go with base-level JSON. I'd prefer being able to have trailing commas in lists, but not enough to make life hard for someone implementing this without access to a JSON5 parser.
Everyone has access to a JSON5 parser. Everyone has to suffer for the sake of a few people who don't to pay the trifling tax of pip installing something --- when they're using an external library for a novel file format _anyway_?
That's just a lack of imagination. When you're making a product for teams that span everything from a brand new startup using the latest tooling to teams that are working on software that runs on embedded systems from the 90's, you need to consider things like this.
There are json5 parsers written in C89 out there. And your embedded systems from the 90s probably doesn't have a JSON parser built in at all either... If you're going to build your own json parser, adding json5 support on top is really trivial.
That doesn't mean it's not going to be difficult to use that parser. Not everyone has the luxury of being able to use third-party code, or having the time allotted to write a JSON5 parser. The JSON parser some places are using may have been written two decades ago and works well enough that there's little motivation to implement JSON5 support. Sometimes it's just company policy or internal politics that prevent the usage.
It's also just not that big a deal overall for the intended use of the DiffX format. It's mainly machine-generated and machine-consumed. There's human readability concerns for sure, but the format looks to be designed mainly for tools to create and consume, so missing a few features that JSON5 brings is not that big of a deal.
"That doesn't mean it's not going to be difficult to use that parser. Not everyone has the luxury of being able to use third-party code, or having the time allotted to write a JSON5 parser."
Why are these people the target market?
I understand it may be important to you, but that isn't the same as "matters to target market/audience".
On top of that, the same constraints you mention here would stop you parsing current git patch formats, and lots of other things anyway. So you were never going to be using modern tools that might care here.
This is all also really meta. Who exactly is writing software with >1% market share, needs to parse the patch format, and can't access a JSON parser.
Instead of this theoretical discussion, let's have a concrete one.
In this specific instance, those people are part of the target market because the project chooses to make them part of the target market. It's worked well enough for Review Board.
So the whole world should suffer through vanilla JSON because someone, somewhere, has an overbearing and paranoid software approval process? That's the attitude the delayed universal unicode adoption by a decade.
That's a bit dramatic. This isn't something as universal as Unicode. You really only need to care about this if you're writing tools that generate or consume the DiffX format, which is not something most people will be doing. The whole world isn't suffering their decision to use JSON instead of JSON5.
I don't think this is true, and honestly, I think it would be a mistake to consider it - they can't serve everyone, down that path is madness.
FWIW - I even have a JSON parser in my RTOS-that-must-run-in-less-than-512k.
I also think that target of "embedded systems from the 90's" makes no sense because the tooling for the embedded system, which is what would conceivably want to handle patch format, ran on the host, which easily had access to a JSON parser.
But let's assume it does matter - let's be super concrete - assume they want to serve 95-99% of the users of patch format (i doubt it's even that high).
Which exact pieces of software with even >1% market share that need to process patch format don't have access to a JSON parser?
Git is using a proprietary variant on top of Unified Diffs. Unified Diffs themselves convey very little information about the file being modified, focusing solely on the line-based contents of text files and allowing vendors to provide their own "garbage" lines containing anything else. Every SCM that tracks information beyond line changes in a diff fills out the garbage data differently.
The intent here isn't to let you copy changes from one type of repository to another, but to have a format that can be generated from many SCMs that a tool could parse in a consistent way.
Right now, tools working with diffs from multiple types of SCMs need at least one diff parser per SCM (some provide multiple formats, or have significantly changed compatibility between releases.
For SCMs that lack a diff format (there are several) or lack one that contains enough information to identify a file or its changes (there are several), tools also need to choose a method to represent that information. That often means yet another custom diff format that is specific to the tool consuming the diff.
We've spent over 20 years dealing with the headaches and pain points here, giving it a lot of thought. DiffX (which is now a few years old itself) has worked out very well as a solution for us. This wasn't done in a vacuum, but rather has gone through many rounds of discussion with developers at a few different SCM vendors who have given thought to these issues and supplied much valuable feedback and improvements for the spec.
Created by the Git team for Git's purposes, rather than something documented or proposed for wider adoption.
Other SCMs can and do use a Git-style diff format, but as there's no defined grammar, there are sometimes important differences. For example, Mercurial's Git-style diffs represent the revisions in a different format than Git's does with different meanings, reuse Git "index" lines for binary files but include SHAs of the file contents instead of any sort of revision, and have a header block that should be stripped out before sending to a Git-style diff parser.
Yep! We spent 20 years dealing with these problems and in those 20 years nobody really solved these pain points. So we talked to some SCM vendors, bounced ideas around, built a spec, got feedback from them, repeated off-and-on for a couple years until we got the current draft, and implemented it for our needs.
It's been a few years now, and so far so good for the purposes we built it for. And it's there for any other tool or SCM authors to use if it also happens to be useful to them.
Feels more like in 20 years nobody else really has those pain points.
1. For most people using multiple SCMs is just a huge and easily-avoidable mistake. Most people can just mandate a single SCM for a project and then all these problems are moot.
2. For the things listed in TFA
A single diff can’t represent a list of commits
That's what "patch" and "patch format" is for. It works great.
There’s no standard way to represent binary patches
Very unclear why anyone needs this. There's no standard way to code-review a binary diff (it depends what the blob is that you're diffing) so how would it help if you had this standard way to represent the diff?
Diffs don’t know about text encodings (which is more of a problem than you might think)
This goes away if people on a project agree a particular encoding (which is going to be utf-8 lets face it). If someone sends a diff in an incorrect file encoding via diffx it will still apply wrong if someone uses a non-diffx aware (aka standard) tool to apply it. So diffx doesn't really fix this problem.
Diffs don’t have any standard format for arbitrary metadata, so everyone implements it their own way.
This goes away if you just use one SCM for a project which you should anyway for everyone's sanity.
> 1. For most people using multiple SCMs is just a huge and easily-avoidable mistake. Most people can just mandate a single SCM for a project and then all these problems are moot.
You talk about SCMs, we're talking about VCSs. Where it's not just source code under control, or even source code with a handful of binary assets. Imagine dealing with a VCS that has to handle 15 years and a few petabytes of binary assets. Or individual files that were multiple gigabytes and had changes made to them several times per day. Can git do that gracefully just by itself? Or SVN? Even Perforce struggled with something like that back in the day.
>Very unclear why anyone needs this. There's no standard way to code-review a binary diff (it depends what the blob is that you're diffing) so how would it help if you had this standard way to represent the diff?
A standard way of handling the binary data doesn't mean understanding the binary data. You can leave that up to specific tools. What you need though is a way to somehow package up and describe those binary diffs enough that you can transport the diff data and pick the right tool to show you the actual differences.
> This goes away if you just use one SCM for a project which you should anyway for everyone's sanity.
And if wishes were fishes, I'd never be hungry again. If you have a lot of history, a lot of data, a lot of workflows and tools built up around multiple VCSs, then changing that to just one VCS is going to be a massive undertaking. And not every VCS can handle all of the kinds of data that might get input into it. Some are going to be good at text data, some might handle binary assets better. Some might have a commit model that makes sense for one type of workflow but not for another. For example, you might be dealing with binary assets where you can only have one person working on a specific file at a time because there's no real way to merge changes from multiple people, so they need to lock it. For text assets though, you might be able to handle having multiple people work on a file. To afford both workflows, your VCS now needs to not only support both locking modes, but be hyper-aware of the specific content to know which kind of locking to permit for specific files.
The world doesn't always fit into the nice little models that the most popular VCSs provide. So if you're trying to not limit your product to supporting just those handful of popular VCSs, you can't just assume everything will fit into one of those models.
> Imagine dealing with a VCS that has to handle 15 years and a few petabytes of binary assets.
Part of the problem is that you're fabricating imaginary problems that no one is actually experiencing, and only to try to argue that the solution for this imaginary problems is a file format.
That's a very strong statement. A less aggressive approach to discussion might involve asking for a concrete example of a problem rather than assuming bad faith argument.
Off the top of my head, and just spitballing, I would be more surprised if mature game devs or animation studios didn't want to version control pretty massive asset libraries.
> Off the top of my head, and just spitballing, I would be more surprised if mature game devs or animation studios didn't want to version control pretty massive asset libraries.
I once closed a bug with a comment that it was old enough to drink. Also that the lines mentioned in the bug no longer existed, although the file did. Couldn't even satisfy my curiosity about what it had originally looked like, change history didn't go back that far.
if() isn't the only way to do this, though. We've been using a technique in Review Board that's roughly equivalent to if(), but compatible with any browser supporting CSS variables. It involves:
1. Defining your conditions based on selectors/media queries (say, a dark mode media selector, light mode, some data attribute on a component, etc.).
2. Defining a set of related CSS variables within those to mark which are TRUE (using an empty value) and which are FALSE (`initial`).
3. Using those CSS variables with fallback syntax to choose a value based on which is TRUE (using `var(--my-state, fallback)` syntax).
I wrote about it all here, with a handful of working examples: https://chipx86.blog/2025/08/08/what-if-using-conditional-cs...
Also includes a comparison between if() and this approach, so you can more easily get a sense of how they both work.