Hacker Newsnew | past | comments | ask | show | jobs | submit | data_scientist's commentslogin

Awesome! It looks like we could extract the depth from the video, but my knowledge of ffmpeg is too limited for that. Do you know a way to do it?


By default, mp4 files from iphones do not include depth. The RGBD videos from telegies are custom mkv files that include depth (done by directly using libmatroska, not via ffmpeg), which the webapp can display.


Really nice! I would love to add all my notes here, but I sometimes needs to include images, and this is a blocking feature for me. Any plan to include that?

A really nice to have would be to have inline Latex mathematical formula, but it may be too specific for your app.


Thanks! Definitely will be adding file uploads.

Not planning on adding Latex anytime soon though.


I needs to connects to multiple servers at the same time, and tmux sessions shouldn't be stacked. How can I do that without tabs?


If you have 2 tmux servers with different hotkeys you can totally nest them safely. I have a different hotkey prefix for my machine and then leave the default for servers. Works great!


Use multiple panes and/or windows? It's a terminal multiplexer, each panes is a separate terminal, just like tabs.


A short optional sample of every composers and songs would be nice, just to get an idea of the music. I can't remember music in my head (kind of Aphantasia for music), so it's really hard for me to choose just from the list.


This is a great idea tbh, most people actually know far more classical music than they think they do - but they couldn't tell you the name of a piece or its composer for the life of them.


What about using machine learning? It sounds like a discovery issue, and recommender systems work very well for that (eg. the side bar of Youtube).

I just discovered Neocities BTW, it sounds very interesting!


I've got a fancy 1080Ti and Tensorflow. If you have any particular things I could try or should read about, I'm happy to look into doing some research! Googling for "Tensorflow recommender" gave some interesting starting points.


It doesn't have a good accuracy. I have yet to see a real-life dataset where it's better than to just call LogisticRegressionCV from scikit-learn. For bigger datasets, you may use vowpal wabbit or Fasttext. It may be a little bit slower for the training (but not so much), and as fast as LR for the training. What is the purpose of using an algorithm when another one is just better?


There's a reason for that. NB needs to fit more params than LR (and other linear discriminant learners), e.g. for binary classification, NB needs to fit 2*N params while LR just N.


Not true. Let's derive it. Here's NB for a binary classification:

    p(c|w)*p(w) = p(w|c) * p(c)

    p(c|w) = p(w|c) * p(c) / p(w)

    p(c|w) = p(w|c) * p(c) / [sum_i p(w|c_i) * p(c_i)]
Let's look at the probability of class 1.

    p(c_1|w) = p(w|c_1) * p(c_1) / [sum_i p(w|c_i) * p(c_i)]
Notice how the numerator is going to show up in the denominator. We can simplify that by bringing it into the denominator:

    p(c_1|w) = 1 / [sum_i p(w|c_i) * p(c_i) / p(w|c_1) / p(c_1]
Then cancel it out:

    p(c_1|w) = 1 / [1 + p(w|c_0) / p(w|c_1) * p(c_0) / p(c_1)]
Not let's apply the NB assumptions:

    p(w|c_0) / p(w|c_1) = prod_i p(w_i | c_0) / p(w_i | c_1)
Now, if you take the log of the final p(c_1|w) I derived, the product in p(w|c_0) / p(w|c_1) turns into a exponentiated sum, giving you one parameter per word, plus an intercept for the log of p(c_0) / p(c_1). You end up with exactly the same 1/(1+exp(linear stuff)) with the same parametrization and form you have in logistic regression [0].

This is a broader thing that a given graphical model with a given fixed parametrization can be generatively or discriminatively trained. They will end up learning different models, but that's because they make different assumptions, not because of different parametrizations.

[0] linear stuff = intercept + sum_i (coefficient of word_i)


I may be missing something, but how do you take a log on the RHS of p(c_1|w)? The denominator has a sum i.e. "1+ something", which the log doesn't distribute over.


Heh, nope you're right I was writing to quickly. I meant to say to take a log of of the probability term in p(c_1|w). So you take p(c_1|w) = 1/(1+stuff) and turn it into p(c_1|w) = 1/(1+exp(log(stuff))). stuff is a product, so log(stuff) is a sum. Good catch.


Oh, I see it now. Thanks for clarifying!

In that case, you needn't have kept the denominator around at all. Since P(w) is not based on a class i.e. its not class conditioned, the classifier could have directly calculated P(C_1|w)/P(C_0|w). The P(w) term cancels out, and you end up with the product of ratios of feature probabilities conditioned on the classes.

Note though that, for K-classes, K>2, the number of parameters you need to store would blow up. You would need have these N ratios: P(w_j|C_x)/P(w_j|C_y) for all possible classes x,y. Here N is the number of features. So, in all N*C(K,2) values. On the contrary, multiclass softmax regression (the discriminative analogue for NB) would need NK parameters.


In the multiclass problem, they have the same number of degrees of parameters. You can see this by choosing a reference class c_y. The NB parameter for class c_x and feature w_j is f_jxy = log(p(w_j|c_x)/p(w_j|c_y)). If you want a new reference class c_w, you can see that f_jxw = f_jxy - f_jwy. No new learned parameters needed. You need one of those parameters per N features and (K-1) classes that aren't c_y. So you get N(K-1) features. This is the same as for multiclass softmax regression: N(K-1) instead of the NK you wrote (which you can see by working it out with a reference class c_y). It really is the same parametrization. The analogue with NK parametrized softmax may be more straightforward if you just use NK naive bayes features of the form f_jx = log(p(w_j|c_x)) for each class and check the equivalence with softmax.

It really is just a special case of a more general rule that a given PGM with a fixed parametrization can be trained discriminatively or generatively.


I'm sorry I didn't have a chance to respond before -you've been very patient with your responses! Thanks!

Yes, this is quite neat!

I will look up the rule/theorem you mention. It seems "kind of" reasonable in the sense the same number of independent parameters should show up in any representation, but (and this is probably me just being dumb) I need to think about why wouldn't the form of the representations not affect the number of parameters.

Although, if m parameters in a representation can be mapped to n (m>n) parameters in another, the m parameters aren't really independent.

OK, now I am just talking to myself :-)

Also, in this context, the results around discriminative classifiers learning better than their generative counterparts (asymptotically) is something I need to think about. [1]

Well, there goes my next few evenings.

[1] https://ai.stanford.edu/~ang/papers/nips01-discriminativegen...


Does anyone have a good book for advanced linear algebra? I learned the basics in school - eg. Matrix diagonalization, svd) but I need for ml to really understand things like angle between flats, tensor decomposition, etc


SEEKING WORK / REMOTE or Paris France

I'm a data scientist with experience in various types of machine learning projects. I have a statistical background, and strong SWE skills.

I have experience in structured data, NLP, images, deep learning, recommendation systems, predictive maintenance, etc.

Why should you hire me? - I can understand your problem, torture your data and find the right model - or I will try something new, out of the box, if you need it - I know how to work with best practices for the code and how to integrate into a workflow - I love to deliver quality work

Drop me a mail! matthieu@databiz.io


Nice statistical property: this is the same thing.

P(B|A)>P(B) <=> P(A|B)>P(A)


My main issue with R^2 is that it is an artificial indicator. It has no business meaning. The best alternative is to build a metric that is related to your business issue. How many $ do you loss if you're wrong by 1%? Or by 10 units? Do your business loss is linear in your error? Is it symmetrical?

These are the kind of questions you have to ask yourself. Defining a metric is hard, and there is no good shortcut.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: