If you have a bound on the size of the largest type in your program, then HM type inference is linear in the size of the program text.
The intuition is that you never need to backtrack, so boolean formulae (ie, SAT) offer no help in expressing the type inference problem. That is, if you think of HM as generating a set of constraints, then what HM type inference is doing is producing a conjunction of equality constraints which you then solve using the unification algorithm.
Richard Rorty, whose humanism and love of democracy MacIntyre despised.
Over the course of his career, MacIntyre went from an extreme left Marxist to an extreme right Thomist, and the only constant was his hatred of liberalism. He really couldn't stand the idea that people could believe in rationalism, feel the moral force of individual rights, or make purpose and meaning for themselves, all without appealing to an authoritarian source of control.
You're left with either Nietzsche's arbitrary will, or virtues (à la Aristotle). For the latter, MacIntyre attempted to develop a system of morality (? ethics?) based on human biology:
Once can certainly tell oneself that there is a certain purpose or meaning to one's life, but if you're a materialist, then (the argument goes (AIUI)) it's not true.
The arrangement of atoms is arbitrary and without meaning, and to call some arrangement(s) "good" or "bad" or better / worse is a value judgement that is just as arbitrary and meaningless.
He didn’t write as if he hated liberalism. Maybe he did. But in his work you get deep, principled critique from the basis of epistemology and selfhood.
Lenin wrote like someone who hates liberalism. Stephen Miller gives that vibe from the right, though I doubt he can write anything coherent at all.
> Perhaps, but 20 years after Rorty's death, he's largely forgotten.
No, he's not. Not at all. Rorty has been and always will be more important, and more famous, than MacIntyre. This is not to insult MacIntyre, who was important within philosophical circles but not so much in the general public, except perhaps within religious groups, with which I'm not well acquainted.
Rorty's breadth of influence was also greater than MacIntyre's, ranging from "Philosophy and the Mirror of Nature" to "Achieving Our Country", addressing vastly different subjects and audiences.
Not to mention the huge posthumous bump that Rorty got for being labeled "The Philosopher who predicted Trump." There was even a new collection of his essays out in 2022 [0].
This is an odd take. Rorty is one of the major philosophers of the 20th century. MacIntyre is more obscure, probably unknown to plenty of academic philosophers.
My sense is that they're pretty comparably famous. I think MacIntyre gets a bit of extra press from people who like him for basically religious reasons (e.g., the OP's author bio begins by saying he's "the Honorary Professor for the Renewal of Catholic Intellectual Life at the Word on Fire Institute"). But I'd guess that most academic philosophers have heard of MacIntyre and could name at least one of his books.
I do agree that it seems very weird to call Rorty "largely forgotten".
(One pair of data points, from the person whose knowledge of such things I know best, namely myself. I am not a philosopher in any sense beyond that of having a bunch of books on philosophy. If you asked me out of the blue to name a book by MacIntyre, I would definitely remember "After Virtue", might remember "Whose Justice? Which Rationality?", and would not be able to think of any more. I could give you a crappy one-or-two-sentence summary of what AV is about (which would e.g. largely fail to distinguish his ideas about ethics from Anscombe's) but couldn't tell you much more about his work. If you asked me out of the blue to name a book by Rorty, I probably wouldn't be able to but would probably recognize a couple of his. I could tell you I thought he did important work in the general area of epistemology but not more than that. So to me MacIntyre is a bit more famous than Rorty. But my sense is that that's a bit unrepresentative among not-really-philosophers, and probably quite a lot unrepresentative among actual philosophers.)
> So to me MacIntyre is a bit more famous than Rorty.
What you mean is that you know MacIntyre better than Rorty. To be famous is literally to be known about by many people, so there's no such thing as "famous to me".
I don't judge fame by my own familiarity, otherwise many obscure people would be "famous" and many famous people "unknown".
> But my sense is that that's a bit unrepresentative among not-really-philosophers, and probably quite a lot unrepresentative among actual philosophers.
Yes, obviously strictly speaking "famous to me" makes no sense. On the other hand, you correctly understood what I meant, and I made it clear at the outset ("One pair of data points") and at the end ("my sense is that that's a bit unrepresentative") that I understand that my own state of familiarity isn't anything like definitive and am not attempting to "judge fame by my own familiarity". So I'm not quite sure what point you're trying to make that actually needed making.
I mean, if you want to complain about people making judgements of relative fame on insufficient evidence, fair enough. But I'm having trouble figuring out why my comment is the one that requires that complaint, when the other three people in this thread passing judgement on the relative fame of Rorty and MacIntyre (1) in no instance give any more evidence than I did, and (2) in fact give no indication at all of where their opinion comes from.
(I actually don't think I quite do mean "that [I] know MacIntyre better than Rorty", though I agree that that's the specific thing I gave a bit of kinda-quantitative evidence about. I think what I actually meant is more like "I have heard more about MacIntyre than about Rorty". That correlates well with who I know more about, for obvious reasons, and in this case it matches up OK, but there are philosophers I know more about than either but who I would consider less famous even with the yes-I-know-strictly-incorrect "to me" qualifier; for instance, I have read zero books by M. or R. but one by Peter van Inwagen, but I have hardly ever heard other people talking about him and I think I encountered his work while browsing bookshop shelves. I know Inwagen better than MacIntyre but I hear about MacIntyre much more often. Again, I admit that you couldn't reasonably have got that distinction from what I actually wrote; to whatever extent I'm offering a correction it's a correction of my previous unclarity, not of any perceived misunderstanding on your part.)
> So I'm not quite sure what point you're trying to make that actually needed making.
My point is that the anecdotal data of one person is completely worthless. And for what it's worth (nothing), my own personal anecdotal data is the opposite of yours, so we cancel each other out. I would also note that the commenters on a MacIntyre obituary are an extremely biased sample.
> the other three people in this thread passing judgement on the relative fame of Rorty and MacIntyre (1) in no instance give any more evidence than I did, and (2) in fact give no indication at all of where their opinion comes from.
It's true that I've offered no empirical evidence for my claim. My objection to you is that you offered your own personal experience as a data point, whereas I did not, and indeed deny that my experience is data: "I don't judge fame by my own familiarity". I actually have no wish to get into a long argument about the relative fame of two persons and was mainly just reacting to the ridiculous, "20 years after Rorty's death, he's largely forgotten", which by the way was not supported with evidence either (and was not even numerically accurate, because Rorty died 18 years ago). In any case, another commenter did mention how Rorty has entered into the wider culture in at least one respect: https://news.ycombinator.com/item?id=44074114
The value of one person's anecdata is in fact not zero. I agree it's small. That's why I festooned what I said with caveats about how my own experience need not be representative, etc., etc. But it's not zero, which is why I thought it worth saying anything.
(Zero plus zero plus zero plus ... plus zero equals zero. But if you ask 1000 people and they all say "I've heard of X but not of Y" or "I've heard of them both but heard more about X than about Y" then you have, in fact, got pretty good evidence that X is more famous than Y. Even if they're in the comments on an article about X, which of course I agree will give you a biased sample.)
Anyway, I think this argument is taking up something like 10x more space than it actually deserves and don't propose to continue it further.
> The value of one person's anecdata is in fact not zero. I agree it's small.
It's less than zero. It's negative. Taking a very biased, unrepresentive anecedote and presenting it as positive evidence for some conclusion is fallacious and misleading. It's worse than presenting no data at all. You should have no confidence in a broad conclusion based on an anecdote.
> But if you ask 1000 people and they all say "I've heard of X but not of Y" or "I've heard of them both but heard more about X than about Y" then you have, in fact, got pretty good evidence that X is more famous than Y. Even if they're in the comments on an article about X, which of course I agree will give you a biased sample.
I couldn't disagree more. If you ask 1000 randomly selected people, that's pretty good evidence, but it's not good evidence if the sample is highly biased.
Let's translate your comment from scientific research to driving:
> If driving on the road suddenly makes car trips possible, then I'd say that the road is indeed designed to make things more difficult for those not driving on the road.
This is obviously silly, because the road makes car travel easier, not harder. Schemes like Horizon Europe make scientific research easier in similar ways.
For funding agencies to give money to universities, they and the univerisities have to make a whole bunch of critical but basically arbitrary decisions about how to handle the impedance mismatch between their respective organisations' internal finance procedures. Doing this over and over is wasteful of time and effort, and one benefit of big cross-national research funds like this is that a university can make these decisions once, and then don't have to do it over and over again, once for each funding agency in Europe.
Also, generous cooperation is an important part of geopolitics. Cooperation is how you convince your neighbors you are an ally rather than a threat.
> Schemes like Horizon Europe make scientific research easier in similar ways.
Roads allow any vehicle anyone built to work on them. Having EU-specific roads that cars built in the EU can easily drive on, while other cars require significant modification, would be a better analogy.
This analogy doesn't work, because it assumes that "roads any vehicle can drive on" is the default position outside the EU. But that's precisely not the case here - the default position in the Non-EU world is that cross-border cooperation and financing bears prohibitive problems. So Horizon is a significant improvement on the default position.
If you want roads as metaphor. Here it is: Roads profit those who are based along the road. A road between cities A and B profits those living and doing business in A and B more than anybody else living in a third city which is not connected to the road. That is just a matter of fact not a deliberate design decision to exclude someone.
The Horizon research framework exists to make it easier to form research projects across a set of countries. Everybody from a thirdparty country is at a disadvantage. But that is not because they, somebody in Brussels, wants to exclude someone. It is because the thirdparty country didn't put in the work to align it's local rules to the rules of the treaty.
Roads only help people who are in that location. The EU-specific roads (or programs) only help people driving (or researching) in the EU (and affiliated countries).
You are reacting to the title, not the actual article.
1. The author is a CS professor who wanted to make a "CS for non-majors" course that non-majors would find actually useful/interesting. So he asked a historian colleague what she wanted her students to know about computers.
2. She replied that she wanted her students to know that (a) databases exist, and if properly designed/indexed can make complex queries very fast, and (b) websites can be automatically populated from the results of DB queries, which makes these search results human-comprehensible.
3. In his CS department, databases and HCI/web design are courses which come late in the sequence, after stuff like algorithms, data structures, and networking.
4. To make (2) accessible without a bunch of pre-reqs, he designed an extension to Scratch/Snap (block-based visual programming languages) which let novices more easily write SQL queries and generate database-backed HTML documents.
5. As a result, he can now teach history majors CS concepts in a way which makes their relevance to historical work directly clear.
"On Inequality" is one of the very worst pieces of serious analytic philosophy I have ever read. The best part of Franfurt's intellectual tradition -- analytic philosophy -- is taking ideas seriously, and so to honor his life I'll try to make a brief but serious critique of "On Inequality".
This book essentially concatenates two essays, "Economic Equality as a Moral Ideal", and "Equality and Respect". In the first essay, Frankfurt argues that economic equality is less important than the worst-off having enough. As a result, political appeals to the importance of equality are bad philosophy. In "Equality and Respect", he generalises this to a general attack on the concept of egalitarianism.
The basic problem with his work is that the social world is so incredibly abstracted that the truth value of his claims is simply irrelevant -- they have no insight to offer about our actual world. In his essay, he considers different distributions of resources, but gives no serious thought to the actual mechanics of how resources are distributed.
This is fatal to his argument because resource actual distributions are endogenous, rather than exogenous, and so they can't be compared in a vacuum -- to say something meaningful, you genuinely have to take seriously the fact that control of resources is a form of power, and that the powerful tend to use their power to reinforce and maintain their status. So politics and policy -- things he explicitly denies having anything to say about -- are fundamental to understanding the issues involved.
Even making the tiniest possible effort towards taking these issues seriously blows up his argument. For example, suppose that we have people living in a perfectly competitive market. This is about the smallest possible step towards social realism that you can possibly take!
But even here, Negishi's theorem tells us that at a Walrasian equilibrium, the implicit social welfare function maximises the sums of each person's utility, scaled by the inverse of each person's marginal utility of wealth. Now, suppose our people have a utility function which is logarithmic in their wealth -- U(w) = log(w). Then their marginal utility U'(w) = 1/w, and the inverse of that is w. That is, ideal markets value individuals linearly in proportion to their wealth. So the utility function of the ideal market says that saving Jeff Bezos's life is worth the lives of 1.2 million Americans of median wealth, 8 million Poles, or 36 million Indians.
It makes one think that Frankfurt must have used "On Bullshit" as an instruction manual when writing "On Inequality".
Actually, there are very similar situations in Chinese culture!
My understanding is that when someone gives you a gift, in Chinese culture there is a social obligation to give them a gift of similar value at some point in the future. So giving a very expensive gift can be a very rude gesture, because it effectively inflicts a large, unexpected expense on the recipient.
In Western culture, there is a similar dynamic when it comes to dating and courtship. The general expectation is that men will ask the women they are interested in, and the women will then accept or refuse the offer. The difficulty with this is that it puts women in the position of having to reject someone who is 20 cm taller/20 kg heavier than them -- this can be very scary.
So it is artful/polite for a man asking a woman out, especially in the initial stages, to do so in a way that is ambiguous enough that if she decides she doesn't want to continue, she can back away without having to forcefully reject him. An overly strong confession of love removes this ambiguity, and thereby forces a woman into the uncomfortable position of having to directly reject the man. This is rude, in exactly the same way that giving someone an overly-expensive gift is rude in Chinese culture.
Naturally, though, in China very close friends wouldn't be bothered by how expensive gifts are, and that in the West, a woman pining for a particular man would be delighted if he directly declared his love for her. But we have etiquette to handle the failure cases, not the success cases!
Anyway, I'm always surprised how even though people are basically the same everywhere, and share the same desires and impulses, we end up building very different cultures. It's just amazing how situations can be radically different and utterly familiar at the same time.
Surprisingly, FRP doesn't have anything to do with dataflow constraints at all.
In FRP, a program is fundamentally a function of type Stream Input → Stream Output. That is, a program transforms a stream of inputs into a stream of outputs. If you think about this a bit more, you realise that any implementable function has to be one whose first k outputs are determined by at most the first k inputs -- i.e., you can't look into the future. That is, these functions have to be causal.
The causality constraint implies (with a not-entirely trivial proof) that every causal stream function is equivalent to a state machine (and vice-versa) -- i.e., a current state s, and an update function f : State × Input → State × Output. You get the stream by using the update function to produce a new state and an output in response to each input. (This is an infinite-state Mealy machine for the experts.)
Note that there is no dataflow here: it's just an ordinary state machine. As a result, the GUI paradigm that traditional FRP lends itself to the best are immediate mode GUIs. (FRP can be extended to handle asynchronous events, but doing so in a way that has the right performance model is not trivial. Think about how you'd mix immediate and retained mode to get an idea about the issues.)
When I first started working on FRP I thought it had to be dataflow -- my first papers on it are actually about signals libraries like the one in the post. However, I learned that basing it on dataflow and/or incremental computation was both unnecessary and expensive. IMO, we should save that for when we really need it, but shouldn't use it by default.
1. You seem to be confusing "dataflow constraints" with "dataflow". Though related, they are not the same.
2. Yes, the implementation of Rx-style "FRP" (should have used the scare quotes to indicate I am referring to the common usage, not actual FRP as defined by Conal Elliott) has deviated. And has deviated before. This also happened with Lucid.
3. However, the question is which of the two is the unnecessary bit. As far as I can tell, what people actually want from this is "it should work like a spreadsheet", so dataflow constraints (also known as spreadsheet constraints). This is also how people understand when used practically. And of course dataflow is also where all this Rx stuff came from (see Messerschmitt's synchronous dataflow)
4. Yes, the synchronous dataflow languages Lustre and Esterel apparently can be and routinely are compiled to state machines. In fact, if I understood the papers correctly the synchronous dataflow languages are seen as a convenient way to specify state machines.
5. It would probably help if you added some links to your papers.
Also, as someone who both works on open source reactive stuff (like MobX) and literally works on a spreadsheet app (Excel) I can say what we do is entirely different from what these systems do because of different constraints and both are reactive.
In a gist: MobX just keeps the whole dependency graph of what data needs to update when what computed runs. Excel can't do that because it has to be able to work on large files with impartial data.
Mathematically, RxJS and Vue/Solid Qwik like FRP are equivalent. There is interesting proof by Meijer (who invented Rx) in the famous "duality and the end of reactive". https://www.youtube.com/watch?v=SVYGmGYXLpY
> Mathematically, RxJS and Vue/Solid Qwik like FRP are equivalent.
Yes, these are basically the same thing. However, they have little to do with Conal Elliott's "FRP". Which itself is also badly named.
> Meijer (who invented Rx)
Well "invented". What's a bit surprising is that with both Rx and the Rx-style "FRP", there either is no public history of the ideas at all (Rx) or it is patently wrong (Rx-style "FRP"). Or a bit of both.
For example, the "Reactive" part of Rx-style FRP appears to come from the definition of system styles by Harel. Both the connection of synchronous dataflow with the term "Reactive" and with FP languages are made in the paper on Lustre, which is a language that integrates synchronous dataflow into an FP language. But there is nothing inherently FP-ish about synchronous dataflow, it was previously integrated into to the imperative language Esterel, and they also made a variant of C with synchronous dataflow.
Again, nobody mentions this, it is all presented as having been invented out of thin air and the principles of Functional Programming. (Or as having come from Conal Elliott's FRP, which is not true. Ask Conal Elliott).
Once I figured out the connections, I asked Erik Meijer, who has "I am the original inventor of Rx..." in his bio. He admitted that he was "inspired" by synchronous dataflow. And of course that is pretty much all it is. Except they dropped the requirement for it to be synchronous.
What do you get when you drop "synchronous" from "synchronous dataflow"? FRP, obviously ;-)
Just like Objective-C is Smalltalk + C, and Objective-C - C is ... Swift?
All this is documented in Meijer's actual paper though? (Your mouse is a database) as well as his aformentioned talk (duality and the end of reactive).
He for sure invented observables (as we know it and as the mathematically dual of enumerables) - that doesn't mean it was the first ever reactive system or the concept of data being dependent of other data.
I think the important contribution of Meijer and RX is realizing the duality of IEnumerable and IObservable which enabled them to use the same programming constructs. The meat of RX is in all the combinators so you can express complex behaviours very succinctly.
Hmm...Smalltalk had the same iteration methods over streams and collections since at least Smalltalk-80, so not sure what the important contribution here is.
Whether the flow is push or pull is a fairly irrelevant implementation detail for a dataflow system.
That doesn't mean it can't have big effects, but if you're relying on those, or they do become relevant, you should probably ask yourself whether your system is really dataflow.
> Note that there is no dataflow here: it's just an ordinary state machine.
It sounds like you've converted data-flow to its state-space form. It's still data flow, just in a variant that might be easier to compute.
FWIW you probably need a pair of functions :
next state = F(input, current state)
output = G(input, current state)
Which in the signal-processing/control systems world is
s = Ax + Bs
y = Cx + Ds
Aka the "state space" formulation where A, B, C, and D are matrices, x is the input, s is the state, and y is the output. There are infinite ways to formulate the state space and infinite equivalent signal flow graphs that represent the same thing.
> any implementable function has to be one whose first k outputs are determined by at most the first k inputs -- i.e., you can't look into the future. That is, these functions have to be causal.
def f(input_stream):
i = next(input_stream)
j = next(input_stream)
yield i + j
f(input_stream)
This function produces k outputs when given 2*k inputs, so it's either acausal or impossible to execute. Right?
For GP’s stream-transformer / state-machine equivalence to work, you need to sprinkle option types throughout so each input yields some kind of output, even if empty. So more like
def co():
i = yield None # hurray for off-by-one streams
j = yield None
while True:
i = yield i + j
j = yield None
This won’t help if the output stream produces more than one output item from each input item. You could sprinkle lists instead, but in reality multiple simultaneous events have always been a sore point for FRP—in some libraries you can have them (and that’s awkward in some cases), in some you can’t (and that’s awkward in others).
But I am not criticizing his stream-transformer / state-machine equivalence, I am just curious why he thinks functions of Stream A -> Stream B have to produce exactly 1 output for exactly 1 input.
Now, I know that Haskell in its pre-monad days used to have main have signature [Response] -> [Request]: the lists being lazy, they're essentially streams. Each Request produced by the main would result in a Response being provided to it by the runtime. This model actually has to be strictly 1-to-1, and indeed, it was so easy to accidentally deadlock yourself that switching to IO monad was quite welcomed, according to SPJ in his "Tackling the Awkward Squad" paper.
I guess what I wanted to say was that to me (given that the comment was presumably targeted at people who already know how all of FRP, stream transformers, and state transducers work, or can at least make a good guess) it was within the limits of acceptable sloppiness to mix up (State, Input) -> (State, Output), (State, Input) -> (State, Maybe Output), and (State, Input) -> (State, [Output]), or the equivalent on the stream transformation side. The point of the comment does not really depend on what you choose here.
Luckily, lawyers in the 1970s already figured this out, by anticipating the research on causality that computer scientists and mathematicians like Judea Pearl and Peter Spirtes did in the 1990s. Really!
In the first instance, you just can't use race as a feature, since it is a protected characteristic. But, you might also be worried that protected characteristics can generally be easily identified by looking for innocuous traits that correlate (since people tend to cluster into communities). For example, if you know an American's ZIP code and their three favorite musicians, you can determine their race with an accuracy in the high 90s. (Basically, the US is still just as segregated now as it was a hundred years ago, and black and white Americans tend to listen to different music.)
So when the US Civil Rights Act was passed, the courts came up with the idea of "disparate impact" -- when doing something like hiring, you are not allowed to base the decision on features that disproportionately affect one group rather than another, even if they are formally neutral, unless the feature directly impacts the ability to do the job.
In other words, you have to show that the features you are basing the decision on _causally influence_ the decision you are making, exactly like you see in Pearl's causal influence diagrams or structural equation models or whatever. Eg, if you want to hire a math professor, you can base the decision on the articles they published in math journals, but you can't base it on whether they like old school Goa trance.
So, what about black box neural networks, where you don't know which features are being used? In this case, it's pretty clear that you shouldn't use them directly when making a home loan, because the law wants to know what features are in use, and you can't answer the question of whether you're redlining when you have a black box. However, using black box techniques to learn (eg) the best random forest model to use is fine, because it lets you easily see which factors are going into the decision before deploying it.
FWIW, people have been doing this for decades already. (I did stuff like this back in the 1990s.)
This is really easy to explain: all of the colonial empires were horrifying engines of atrocity, oppression and death. Since imperial apologists hated socialism, that made it a very attractive ideology -- if the worst people alive hate something, it's got to have something to recommend it, right?
The Madras famine of 1877 (in which over 8 million people died) is characteristic. The British Empire didn't just fail to help, it actually made it illegal to offer food relief, on the grounds that it would distort the labour market and violate free-market principles. If you witnessed that, wouldn't you think the market and Moloch were one and the same?
It's worth noting that "concentration camp" is not the English translation of "Konzentrationslager": it's actually the other way around. The German word is the translation of an English phrase used to describe the British Empire's interment camps in the Boer War. IMO, it's impossible to look at, say, a picture of Lizzie van Zyl and not see a premonition of the Holocaust.
You are technically correct about the origin of 'concentration camp'. But this point in relation to the Holocaust is so pedantic as to border on denial.
Most people who died in the Holocaust did not die in 'concentration camps'. About a quarter of them were killed in mass shootings and buried in mass graves. Another quarter were sent to something better described as 'extermination facilities'. There was no camp and no concentration - all victims were gassed to death on arrival. The most famous concentration camp - Auschwitz - also had a facility like this where around a million people died immediately on arrival. While Auschwitz did have the standard features of concentration camps - imprisonment, starvation, forced labor, and brutal punishment and torture, most people who died in Auschwitz never experienced any of this - they were simply executed on arrival.
So, while the 'concentration camps' were of course unimaginably terrible - and not to excuse other crimes against humanity such as the Boer camps, the gulags, etc - to compare other atrocities to the Nazi Holocaust simply because of the presence of concentration camps erases and minimizes the true extent of Nazi crimes against Jews and Roma.
The short answer is: reference counting walks the dead part of the heap, and tracing gc walks the live part of the heap.
When a reference count of an object goes to zero, you recursively decrement the reference counts of everything the object points. This leads you to trace the object graph of things whose reference count has gone to zero. You never follow pointers from anything which is live (i.e., has a refcount > 0).
When you are doing the copy phase of a gc, you start with the root set of live objects, and follow pointers from everything that is live. Since anything pointed to by a live object is live, you only follow the pointers of live objects. You never follow pointers from anything which is dead (i.e., garbage).
If object lifetimes are short, most objects will be dead, and so RC will be worse than GC. If object lifetimes are long, most objects will be live, and GC will be worse than RC.
Empirically, the overwhelming majority of objects have a very short lifetime, with only a few objects living a long time. (This is called "the generational hypothesis">) So the optimal memory allocator will GC short-lived objects and RC long-lived objects. Rust/C++ encourages you to do this manually, by stack-allocating things you think will be short-lived, and saving RC for things with an expected long lifetime.
Beyond this, RC has a few really heavy costs.
Reference counting doesn't handle cyclic memory graphs. You need to add tracing to handle those, and if you are going to do tracing anyway, it's tempting to just do tracing really well and skip the refcounts entirely.
This is because the memory overhead of reference counts is high -- empirically, most objects never have more than a single reference to them, and so using a whole word for reference counts is lot of overhead. Moreover, the need to increment/decrement reference counts is really bad for performance: first, mutations are expensive in terms of memory bandwidth (you've got to maintain cache coherence with the other CPUs), and second, in a multicore setting, you have to lock that word to ensure the updates are atomic.
There are tricks to mitigate this (e.g., Rust distinguishes Arc and Rc for objects which can be shared between threads or not), and there are schemes to optimise away RC assignments with static analysis (deferred reference counting), but if you want to do a really good job of reference counting, then you will be implementing a lot of tracing GC machinery.
And vice versa! The algorithm in the link is partly about adding RC to handle old objects (empirically, as part of the generational hypothesis, objects which have lived a long time will live a long time more). In fact, Blackburn and McKinley (two of the three authors of the above paper), pioneered the combination approach with their paper "Ulterior Reference Counting."
Minor nit: the description above is true for copying GCs, but non-copying mark-sweep collectors generally still touch the header words of both live and dead objects in the heap in order to add dead objects to free lists. Mark-sweep-compact collectors also end up reading all of the mark words in the object headers of both live and dead objects in order to find the holes to be filled by compaction.
It's also not uncommon to have a copying young generation and a mark-sweep-compact tenured generation. That way, you get the advantages of not needing to scan the huge numbers of young dead objects, but the space savings of not needing 2x space for the older generation.
The intuition is that you never need to backtrack, so boolean formulae (ie, SAT) offer no help in expressing the type inference problem. That is, if you think of HM as generating a set of constraints, then what HM type inference is doing is producing a conjunction of equality constraints which you then solve using the unification algorithm.