The Tsetlin Machine outperforms neural networks

kurthr · on Jan 2, 2019

Just go straight to the paper or code... the write-up is almost content free.

Paper: https://arxiv.org/abs/1804.01508

Code: https://github.com/cair/TsetlinMachine

xtacy · on Jan 2, 2019

/r/MachineLearning comments on the same paper from April 2018:

https://www.reddit.com/r/MachineLearning/comments/89yp8g/r_t...

wetpaws · on Jan 2, 2019

See also: https://www.reddit.com/r/MachineLearning/comments/ab8na3/pro...

nl · on Jan 2, 2019

The paper is actually pretty interesting - it's a pity the headline is misleading.

The Reddit comment[1] is a good summary: More accurately: "The Tsetlin Machine - a new approach to ML - outperforms single layer Neural Networks, SVMs, Random Forests, the Naive Bayes Classifier and Logistic Regression on four carefully selected contrived datasets"

Notably, on their (weird!) "Binary Iris" dataset, the NN appears to be undertrained, and it is unclear what the headline "mean" accuracy figure is actually a mean of.

However, once we get over that, it's quite a different approach, and I can imagine places where it could be useful. Notably, as an anomaly detector it would seem to have interesting properties like interpretable results similar to a random forest.

[1] https://www.reddit.com/r/MachineLearning/comments/89yp8g/r_t...

nestorD · on Jan 2, 2019

They say that their results are interpretable due to being logical formulas but, later, show a toy problem in which their formula has 10000 clauses. By that definition deep-learning is probably interpretable.

However I do think that it is an interesting concept that would probably be worth testing if your inputs and outputs have a natural mapping to booleans.

alexyak · on Jan 2, 2019

The Tsetlin machine is based on Tsetlin’s automata, the so called finite automata with linear tactics. The are basically counters of rewards (up) and penalties (down) with various strategies of integration, proportionization and differentiation. They have minimum necessary for digital emulation of the perception functionality. They str not only due to Tsetlin but also yo Krinsky, Krylov, Varshavsky. Mikhail Tsetlin is the most prominent creator. It’s the work in the USSR in the 60s. So not new but very elegant!

simonw · on Jan 2, 2019

The entire implementation appears to be 300 lines of Cython: https://github.com/cair/TsetlinMachineCython/blob/master/Tse...

chvid · on Jan 2, 2019

So what is a Tselin machine?

Maybe I am lazy but a don’t see an exact description.

I am assume it is a function that takes an input, a state and produces an output. But what exactly is it?

A neural net is a fairly general construction. Is this not just a nn with a particular activation function, a particular network layout?

nestorD · on Jan 2, 2019

A Tselin automata returns always true or always false depending on its internal state (wich is basically a counter set during the learning process).

Using one automata per literal per clause you can learn any logical formula in normalized form (the automata tells you if you should include this literal or not). That section made the paper worth reading for me.

A Tselin machine is a refinement of the formula learning algorithm to get better learning properties, once trained you get a (long) formula (not precisely a logical formula since it uses a sum and a threshold instead of OR to improve robustness).

It is important to note that they manipulate boolean inputs and outputs and, thus, it is not a general alternativ for neural network.

andrewla · on Jan 2, 2019

As far as I can tell from looking at the code that implements it, and ignoring the training for the moment, the model itself takes a binary vector, and takes the sum of (the characteristic function of) AND's of elements of that vector, and returns whether the sum exceeds a threshold.

So an example model with an input vector `x` might be

  x[1] & ~x[2] & x[4] + x[2] & x[4] + x[1] & x[3] > 1

EDIT: okay, so that's the effective evaluation model, but for training purposes, it is stored internally as a series of real numbers for each potential coefficient, and if that number exceeds the "number of states", then it is considered active. So the model above in binary form would look like

  +: [1,0,0,1],[0,1,0,1],[1,0,1,0]
  -: [0,1,0,0],[0,0,0,0],[0,0,0,0]

But internally this could be represented (with number_of_states == 6) as

  +: [7.1,5.0,4.0,8.0 ],[3.7,6.1,5.9,7.8],[8.0,3.0,6.5,5.2]

EDIT 2: Each term apparently is assigned a sign flag, and the threshold is 0, so not strictly an increasing sum.

shakna · on Jan 2, 2019

For a detailed understanding, you need the paper. [0]

But, loosely speaking, it is a classifier that works best with less information, utilising a series of formulas. It only really does bitwise ops for recognition, and is easy to implement and fast.

If I were to describe it in more flowery terms, hashing and finite automata had a baby.

[0] https://arxiv.org/abs/1804.01508

chvid · on Jan 2, 2019

Why can't I just get a reasonable definition of what a Tsetlin machine is?

As far as I can read it a function that takes an input (a vector of real numbers in 0..1 (?)), a state (in form of an integer) and produces an output (in form of something?).

brownbat · on Jan 2, 2019

What constitutes reasonable?

Here's an example of someone explaining something to five different levels of audiences. https://www.youtube.com/watch?v=OWJCfOvochA

None of these approaches are prima facie "unreasonable." But if you get the wrong level it won't feel useful.

I don't think it's crazy to have open teaching discussions on HN, I think you lay out a fair hope that someone can break something down here. But I'm just not sure which level you're looking for.

Precision and simplicity are often competing constraints.

chvid · on Jan 2, 2019

Okay.

Let me try getting my point across with an insult instead.

Some dude in a marginal university in Norway has managed to convince a few people, a couple of students and a writer for a glittery university magazine, that he is a genius.

His approach is to do something trivial but bury it in a completely obscure terminology. This technically sorta works on a toy example but not really on more challenging tests. But it is very convincing for people who are not trying to understand what is going on but rather are impressed by technical words and academic rank.

That is why it is getting negative feedback here and on other technical forums.

And that is why noone can answer a simple technical question.

majewsky · on Jan 2, 2019

> Let me try getting my point across with an insult instead.

As a rule of thumb, when you write such a sentence, just stop and go outside for a while.

chvid · on Jan 2, 2019

Sorry about my choice of wording; I was just annoyed with the bold claims and the apparent lack of definition of this specifc concept.

coldtea · on Jan 2, 2019

Here's the code, check it out:

https://github.com/cair/TsetlinMachine

>And that is why noone can answer a simple technical question.

Who is "noone"? Some people in a small thread on HN that first read about this today and most of which do not have anything to do with NN or Tsetlin's automata (e.g. startup founders, JS programmers, embedded programmers, and so on)?

Note also that you haven't made any "simple technical question" -- not a specific one that is. Just asked for it to be explained to you, and then rudely complained again when someone did that it wasn't "reasonably" explained.

You also seem impatient to wait for one (considering that you posted 4 hours ago, and most of US is sleeping still), it has to be pronto for you it seems...

Here's also a simple explanation from another member:

"The Tsetlin machine is based on Tsetlin’s automata, the so called finite automata with linear tactics. The are basically counters of rewards (up) and penalties (down) with various strategies of integration, proportionization and differentiation. They have minimum necessary for digital emulation of the perception functionality. They str not only due to Tsetlin but also yo Krinsky, Krylov, Varshavsky. Mikhail Tsetlin is the most prominent creator. It’s the work in the USSR in the 60s. So not new but very elegant!"

>His approach is to do something trivial but bury it in a completely obscure terminology. This technically sorta works on a toy example but not really on more challenging tests. But it is very convincing for people who are not trying to understand what is going on but rather are impressed by technical words and academic rank. That is why it is getting negative feedback here and on other technical forums.

Possibly. Also possible that all the negative feedback are from people who don't understand the math either, and/or are too invested in traditional NN, and is the usual resistance to new ideas.

Apparently you don't have the means to classify it to one or the other case, but you do want to be rude and make a judgement nonetheless.

isoprophlex · on Jan 2, 2019

Dude. Take a break.

Your problem is your own inability or unwillingness to set aside a few moments to read the - in my opinion reasonably clear - definitions in the linked articles. Don't take it out on others.

You are not entitled to an easily grokkable explanation, tailor made for your exact level of knowledge.

tree_of_item · on Jan 2, 2019

> Why can't I just get a reasonable definition of what a Tsetlin machine is?

It's in the paper. You don't have to read the whole thing, it's pretty early on.

nestorD · on Jan 2, 2019

The Tsetlin Machine being a specialized method, it makes sense that it beats neural networks in its domain of predilection (decision from, potentially noisy, binary inputs).