Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pyrite – open-source video conferencing (garage44.org)
187 points by jvanveen on Feb 19, 2022 | hide | past | favorite | 43 comments


Not really sure what this offers compared to Jitsi. The author says Jitsi is complicated, but my company switched to onprem jitsi at the start of covid and it's been a pretty smooth ride (5000+ users).


I don't know Jitsi at all, but the things about Galene (and Pyrite) I have enjoyed.

* Low resource requirements. You can serve lots of users off a Rasp Pi

* Single binary + JSON Config.

* Designed to be modified. Galene is API driven so you can ship your own custom UI. I co-ran a conference a while back and Galene let us give a custom experience easily. Users had no idea what we were using on the backend, and we customized it exactly to what we wanted.

* Written in Go/Easy to understand. You can get into the guts of Galene's jitter buffer etc.. pretty easily. Great for learning and understanding the software you are running.

* Maintainers are really accessible! Galene's developer runs a mailing list and you get bounce ideas of him and some other really smart people on it.


I was curious and looked through the code of Galene briefly and found the following, which may partially answer your question. For context, I am familiar with the Jitsi code and have written a calling server (and written about it: https://signal.org/blog/how-to-build-encrypted-group-calls/).

Galene appears to be less mature than Jitsi. For example, it uses REMB feedback messages from the client to calculate allowable bitrates rather than calculating the bitrates itself (as Jitsi and Signal's SFU do). Worse, it appears that what it does with that information is erroneous. I could be wrong, but it looks like the bitrate allocation code (see https://github.com/jech/galene/blob/e8fbfcb9ba532f733405b1c5...) only allocates the bitrate for one of the video streams, not all of them. Perhaps the author did not realize that there is one REMB sent back for all the video streams by WebRTC rather than one per stream (see, for example, here: https://source.chromium.org/chromium/chromium/src/+/main:thi...). Further, I find the spatial layer switching code to be strange. For examples, it doesn't go down a layer unless it's 150% over the estimated allowable bitrate, which gives a lot of opportunity for inducing latency.

In short, I think Galene has a ways to go before it works as well as Jitsi (Videobridge), and thus Pyrite group calls are unlikely to work as well as Jitsi group calls (for 1:1 calls, I don't know; I didn't look into that).

Oh, and just a reminder, the SFU we use for Signal group calls is also open source: https://github.com/signalapp/Signal-Calling-Service.


Thanks for the review, pthatcherg.

> For example, it uses REMB feedback messages from the client to calculate allowable bitrates rather than calculating the bitrates itself (as Jitsi and Signal's SFU do).

I'm not sure what you are saying. Galene listens to REMB messages and computes its own bitrate, then combines the two informations. https://github.com/jech/galene/blob/e8fbfcb9ba532f733405b1c5...

> I could be wrong, but it looks like the bitrate allocation code only allocates the bitrate for one of the video streams, not all of them

The code you point at is not the bitrate allocation code, it's the code that chooses which scalable layer to assign to a given client. The code you're looking for is here: https://github.com/jech/galene/blob/e8fbfcb9ba532f733405b1c5...

> In short, I think Galene has a ways to go before it works as well as Jitsi (Videobridge) [...] the SFU we use for Signal group calls is also open source

Jitsi VideoBridge is a great piece of software, no doubt about it. I'm sure that Signal's SFU is competently done too.


Yes, you are right that there is also a simple loss-based congestion control mechanism (https://github.com/jech/galene/blob/e8fbfcb9ba532f733405b1c5...) and a min() between it and the REMB. I missed that part. However, that appears to still be immature, only in a different way than I thought.

If I'm right that only one server->client video stream (called "down track" in the Galene code) is receiving the REMB message, then only one will use the REMB value and the rest will fallback to loss-based congestion control. If I'm wrong and all the server->client video streams are receiving the REMB message, then all of them will use that same value which will be higher than a calculated by the loss-based congestion control for each stream independently, so in effect they will all be falling back to loss-based congestion control (when there is more than 1 video stream; for 1 video stream it probably works fine).

Either way, it appears that each server->client video stream is independently running a loss-based congestion controller, all of which will be battling each other (like N TCP streams do). That can work, I guess, but it's better to run one congestion controller and then divide that bitrate among the various video streams, which is what I meant by "bitrate allocation".

In other words, selecting video layers to send is exactly what I mean by bitrate allocation. Sorry for being unclear about that. The code you linked to is estimating the client->server bitrate for a given video stream. What I was looking for is code that will take a bitrate (from one congestion control mechanism, whatever it may be) and then divides that between the various video streams that flow server->client by selecting which layers to forward. I couldn't find that, and now I see why: because each video stream has its own congestion controller, and they apparently compete with each other where they are likely all loss-based in practice.

Loss-based congestion control for video conferencing isn't as good as latency-based congestion control because it will cause more latency.

Thanks for pointing out what I was missing. Now that I understand it better, I can see that it would work fine for 1 video stream (when there are only 2 clients in the call), but then likely falls back to loss-based congestion control for more than 2 clients, which will work, but not as well.

If this is accurate, then I'd make a few suggestions for Galene: 1. Use one "maxBitrate" calculation per-"rtpConn" instead of per-"downTrack" and then divide that bitrate between those downTracks rather than doing N such independent calculations. This will avoid the problems from having congestion controllers competing with each other. 2. Feed the REMB value from the receiving client into the unified calculation. Then you'll get the benefit of latency-based congestion control (assuming the client is doing latency-based congestion control). 3. Switch the loss-based mechanism in the server to a latency-based system using something like transport-cc (which I think Pion supports).


Ah, I see where your confusion stems from.

Galène doesn't bundle multiple streams in a single PeerConnection: it puts each stream (audio+video) in its own PeerConnection. Thus, if we can assume that the audio traffic does not significantly contribute to congestion, then performing congestion control per-track or per-connection is exactly equivalent.

(It's a tradeoff. Bundling reduces the amount of ICE traffic and makes for faster connection establishment, but it ends up putting multiple unrelated streas into a single transport-layer flow, which confuses traffic shapers and AQMs. I'm betting that things like fq-codel are being deployed as we speak; if my bet is wrong, then bundling will turn out to be the better choice.)


Oh, that's interesting. That's not as bad as I thought, although I am surprised that's the approach that was taken (unbundled).

In that case, if you're relying on REMB from WebRTC with multiple PeerConnections, then, yes, each will be handled separately and you'll likely not end up with loss-base congestion control but will be using the latency-based congestion control WebRTC is going.

Some potential problems I can think of with this approach:

- It won't scale to a large number of streams. I'm not sure at what point it would cause problems, but I'm guessing it would work OK for... 20? 40? Most of the time, the extra ICE traffic and per-connection RTCP is probably small. A long time ago, there were issues with per-PeerConnection memory usage, but I think we (back when I worked on the WebRTC team) fixed them to a reasonable degree, so it should work for 20-40. You have to do a bunch of DTLS handshakes, but you could get around that by using SDES. You'll opening a lot of NAT bindings, potentially at the same time. Again, with a smaller number of streams, this might be fine. But at some point, you may cause issues with consumer-grade NATs.

- The WebRTC receive-side congestion controllers (sending back REMB) will effectively be competing with one another. I'm not sure how well they work with a large number doing so (or even a small number.) I'd be interested to hear how many you can have running at the same time before you notice problems.

- You can't easily prioritize one stream over the others if you want an "active speaker" view. I suppose you could something where you "steal" bits from one PeerConnection and give it to another, but that will probably contradict what you're trying to do with playing nicely with traffic shapers and AQM.

I'd be very interested to hear your findings with how traffic shapers and AQM respond to non-bundled traffic vs bundled traffic. What characteristics do you seek work better or worse? Higher rates? Lower latencies? Lower jitter? Less loss? And how do you test (given that network behavior can vary so widely one network and the next)?

If you'd like to talk more directly I'm "peter at signal.org".


It looks like we're now understanding each other.

> It won't scale to a large number of streams.

Galene was designed for lectures and conferences, where a small number (1-5) of streams are sent to hundreds or thousands of receivers (and the budget is virtually nonexistent, because teaching and public research are eternally underfnded). It works beautifully for that particular application. It also happens to work well for medium-size meetings (25 senders, 50 receivers), which is a nice bonus, but not what the software was designed for.

> I'd be interested to hear how many you can have running at the same time before you notice problems.

We've been doing staff meetings with 25 senders and 50 receivers (50 people attending, half of which have their camera switched off). However, our main goal is supporting large lectures (2 flows distributed to hundreds of students, with students only switching their camera on in order to ask a question or show their cat): the commercial offerings work reasonably well for meetings, but are completely inadapted to lecturing.

> You can't easily prioritize one stream over the others if you want an "active speaker" view.

Oh, that. We simply forcibly switch the background streams to the lowest spatial layer. It works very well, but relies on the senders implementing either simulcast or SVC. Which is something we may safely assume (all desktop browsers implement at least simulcast, and lecturing is done with laptops).


If you're main focus is to mostly receive 1-2 video streams while sending 0, then, yeah, I guess bundle vs non-bundle doesn't matter much. I'm pleasantly surprised to hear that 25 PeerConnections work well for you.

About simulcast: why would a sender not support simulcast? Something to do with VP8 hardware encoders?

I'm still interested to hear what you have learned about AQM and the like.


galene is a single GO-binary afaik. Jitsi needs and XMPP-server installed.


The separate components of Jitsi make it quite clean to scale and customise though. A monolith doesn't seem superior except for the very smallest deployments where simplicity is key.


I made quite a bit of money as a freelancer at the start of covid setting up Jitsi servers for clients - it isn't easy... especially when you start to have multiple servers etc


It's true that it's complex, but the complexity is easily managed with modern tools. It fits very neatly into k8s for larger deployments, for example.


> A monolith doesn't seem superior except for the very smallest deployments

True. Galene is designed to encourage small deployments: it's a single binary that is simple enough and cheap enough so that every high school can have their own instance and not rely on an external IT team.

For large deployments, where accidental complexity is not as much of an issue, you'd most probably want to use some more complex and more flexible software.


Wait, this is written in Go? Nothing wrong with that, but why is it called Pyrite which evokes that snakey language?


> Pyrite is a web(RTC) client for the Galène videoconference server.

So probably a play on the name of the parent project.


“ I searched where the name Galene came from and learned that it is a lead-containing mineral. It would only be logical to name this side-project after another mineral. Pyrite, or a fool's gold, felt to be a good description.”


Not to be confused with https://github.com/microsoft/pyright and dozens more smaller projects.


Are you using Jitsi meet or the lib? I've been trying to work with the API to integrate into a solution and it hasn't been pretty.


The most interesting thing for me is actually the galene server, but playing around with the demo server and looking at the documentation it seems to be a fair bit behind Jitsi in ease of use and deployment.

(I built a one-shot template to deploy and run Jitsi on Azure - https://github.com/rcarmo/azure-ubuntu-jitsi - and it's been trivial to maintain over the past two years, for a small group of friends and monthly "open sessions")

I'm not enamored of the Pyrite UI (again, Jitsi seems simpler), but I'll keep an eye on both.


Something that is desperately missing from video conferencing software is walkie-talkie-like interface and capabilities that allow for stop-start convos. This might seem silly but it would help tremendously for those in developing countries.

Lots of our collaborators are in developing countries with terrible internet, so we end up resorting to video chat + phone call.


FreeConferenceCall[.com] allows to participate in meetings by phone and SMS.

That's beside of typical video/chat conferencing capabilities.

Client is pretty small ~20 Mb monolithic (a.k.a. portable) executable that can be loaded in seconds even on low bandwidth networks.

If someone is interested, here https://terrainformatica.com/2022/02/19/freeconferencecall-a... I've explained how it was done (in 9 months).


Phone calling is part of the problem and does not solve the issue at hand.

The central issue is that video conferencing software assumes face to face meetings. However if you use video conf for demonstrating anything it all goes to hell due to choppy video and compression.

If you ever extend your software to have stop start system let me know. It will spread like wild fire on developing countries


Push to talk can be nice though IME requires diligence and trust. I had a private Mumble server to chat with coworkers outside the company resources. It was OK-ish. Folks didn't really buy in to the always stay connected then push to talk or side rooms style.

Slack seems to be trying to find a middle path with huddles.


So Pyrite is a vue3 frontend to Galene the video conference backend. I do like go's easy of deployment.

desktop it appears to be nice, on Android the UI is barely usable, is this a desktop-only UI so far?


Not to be confused with https://github.com/microsoft/pyright


It would really help if the author had a list of "features" on the github project. It is a bit difficult to figure out what exact functionality the project supports.

* https://github.com/garage44/pyrite/


Pyrite inherits most of Galene's features, which are listed at https://galene.org. (Galene's server has some extra features that are not used by Galene's native client yet, but were designed for third-party frontends, but I don't think Pyrite is using any of those yet.)


Besides this vs Jitsi, I wonder if there is a tldr about how Zoom managed to capture so much of this space.


I think it's region and connection speed dependent, but for several years I noticed that meetings with zoom had wayyyy less problems with freezes, bad quality, audio drops and stuff when compared to all other tools.


This has been my experience, even now. I still prefer it to Teams and WebEx because of the reliability.

I do like that Teams allows for a richer chat experience, but often just keep a Teams chat and Zoom call going when needed.


Then again, Teams is an outlier in terms of performance and reliability. Most things will compare positively with that.


Our company read.ai builds add-ons for videocon clients so I'm pretty familiar with most clients.

- Zoom is just simply the best in terms of stability, performance, user experience, and in particular ease-of-first-use. Call quality is the best.

- Webex is a hot mess. They've been scrambling to catch up and have made stride but have a ways to go. Install is a nightmare

- google meet feels like it's barely changed in 5 years. Makes my fans take off. Decently stable at least

- Teams is decent all-around, featureful, and actually had a good SDK before Zoom, but install/userex and stability struggle at times

- is skype even a thing still?

Hands down the two biggest factors are time-until-first-use, friction, and stability of the audiovideo stream. Zoom simply dunked on the other legacy players in this regard.


Webex got better under pressure from others but it's what we used at first and the CEO of my mid size employer got talked into zoom near the beginning of covid and it was just very easy to use and much less problematic than webex which we used before but didn't need nearly as much so we put up with its faults. Then covid hit and zoom just worked better despite the encryption debacle.


It was free and easy to get started with, and worked.


So like jitsi except not free as in freedom


For quite a different definition of "worked", especially 2+ years ago.


Dunno, I used it 2+ years ago with friends who had setup their own instance (I presume meet.jit.si was also a thing back then but I don't know that for sure) and that worked as well as any other web-based call solution at the time.


And all web-based solutions (maybe with the exception of BBB, didn't use that back then) were quite a bit worse than Zoom with its native apps.


...we were talking about web clients I thought? I've not used a desktop client to do calls since maybe Skype in 2015 or so, I didn't realize anyone still used that because of the extra hassle.

Even MSN did video calls and that shut down basically the year Zoom was founded, let alone was used by anyone. This wasn't new among desktop clients, either.


Zoom is widely used with the native clients and mobile apps, so I was talking about those. Not sure how good/not good Zoom on the web is/was.


I've assumed that control over the end point client native software allows them to focus on removing pain points. Codecs. NAT traversal etc.

Once a critical mass of users, big corporates etc. found their solution easier, then they got the network effect.


The first year of covid webex wasn’t really innovating while Zoom was flinging out new features. WebEx is now going faster and adding more and more features.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: