Universe packages are not supported by Ubuntu unless you activate Ubuntu Pro. Thus, if you install ffmpeg on Ubuntu without Pro, it will contain several active vulnerabilities. The full five years only applies packages in the main repo.
I wanted to find another reason to not use Ubuntu for servers (besides Snap being forced on everyone) and this was it.
At least, in Debian, most of the packages I use on my server are from their main repos. Occasionally there are a few from other sources but by the time a new Debian patch is released, those other packages are also updated.
That is also absolutely unchanged compared to "since forever". Canonical supports "main", while "universe" and "multiverse" offer best-effort community support (aka from debian). They now additionally offer a dedicated team for those repos.
Honest question, since the arch wiki seems surprisingly spotty on this: Which arch repos are covered by their security team? Just core? Or also extra? More than that? AUR surely not, right?
Not even "from debian". Sometimes they can't be bothered to copy debian packages that fix security issues if the package is in universe, and just leave it vulnerable for the entire duration of the LTS.
It's not the case for this example of ffmpeg (it's actually not patched), but make sure to check the actual changelog. Sometimes the version is kept, but the patches are backported, so a plain version comparison is not enough.
> the training cutoff date isn't some absolute date from which they never allowed any new data to be trained on.
It's not a question of whether they are "allowed" to train on new data; the question is whether they have trained it on data containing information about current events. If you know they've implemented a Continuous Integration (CI) system for this, you should link to a source. However, I don't think this is true, as there would be no reason for a cutoff date otherwise.
> Instead there are bits and pieces of newer information captured in the updated models, but it's not a meaningful enough amount to ever rely on.
This seems more like an opinion of the technology's limitations in general, rather than an assessment of the likelihood that new information will be incorporated into its weights and biases.
They are most certainly being fed LLM content. However, I think this "model collapse" narrative is over-subscribed. Here are some things to keep in mind:
(1) Real content is not generated via a synthetic loop: Humans use generative AI in complex ways, intermixing human-generated and AI-generated content. Imagine a person who writes the first draft of an essay, then uses ChatGPT to rewrite parts of it. These are certainly many human additions, modifications, and stylistic flourishes.
(2) The most dramatic effects of model collapse were seen when training multiple generations of AI agents on content generated by the previous agent. This is a very academic scenario.
(3) There is already a lot of junk consumed by these models. RLHF is aimed at eliminating these junk responses. I am not aware of any research that explores how the full training cycle is affected when RLHF is employed.
Also, there is a lot of training material out there that was not used by the original GPT-3 model. The primary limitation is hardware.
I have come across an increasing number of obviously generated content. Recipes, product reviews, and anything Buzzfeed was known for. I only expect more and more of it. Just wait until 2024's "top 38 React server component state management libraries you need to learn this year" posts come up on dev.to.
Edit: well look at that. I'm not saying this was generated, but it might as well could be. These "learn from these repos" posts are everywhere now.
It's also fairly well established now, I believe, that part of OpenAI's secret sauce is focusing on high-quality data sources; that is, probably those least likely to include unmodified ChatGPT outputs.
Is it going to remain academic? I can easily imagine the spammy content farm / listicle business model evolving to be fully automated, creating an input loop.
Sure, there will be some pollution. It's very multivariate and depends on factors like content split, generation quality, and novel information. A scenario in which all of your data is generated by the previous model and you run n training loops is academic.
It's also worth noting that when OpenAI created Whisper, they had to heuristically remove many transcripts from poor ASR systems, and they definitely didn't catch them all.
I think it should be noted that this enforces grammatical constraints on the model's generated text, but it doesn't do anything to properly align the content. This would be useful if you needed to ensure a server delivered well-formatted JSON, but it I suspect it wont solve a lot of alignment issues with current language generation. For example current iterations of Llama and GPT often do not label markdown code-blocks correctly. Using grammar-based sampling, you could enforce that it labels code blocks but you couldn't enforce correct labeling since this is context-dependent. You also couldn't invent a novel domain-specific language without aligning against that language and expect good output.
Also important to call out that anytime you have a freeform string it's pretty much an open invitation for the LLM to go completely haywire and run off into all sorts of weird tangents. So these methods are best used with other heuristics to bias sampling once you get to free-form text territory (i.e. a repetition penalty etc)
The major benefit of tenure is as a protection for pursuing the truth when researching controversial academic subjects. It protects the behavioral genetics research which takes place at UT Austin from left wing attacks and the positions some professors have taken on Texas A&M's annual drag show from right wing attacks.
As a student in the Texas system, I've raised questions which my professors refused to talk about fearing political backlash. Tenure is an important institution. I think you can only go so far to incentivize good teaching through termination.
I've always been curious why people so fervently dislike Gibson. I think the most genuine criticism is that Spin-Rite is not a backup solution and people may rely on it as such. Ideally, no one should need it since all data should be replicated and backed up. Any drive can fail at any time for any reason and it may be totally unrecoverable.
[Side Note: He also once claimed in a "testimonial" that a special ops team recovered data off of a hard drive during a mission in which they hit a terrorist with a computer.]
That being said, he produces a free security podcast which is quite good. He knows his stuff.
> Any drive can fail at any time for any reason and it may be totally unrecoverable.
While in principle this is true, I have been using hard drives for more than 30 years now in PCs and I have never had one fail. I still back things up to separate drives since there's always a first time, but I've never used SpinRite or any other extra "protection" over and above what my OS provided.
There are stats on failure rates and bathtub curves. Consumer hard drives these days have an AFT of ~1.41%. Never used SpinRite and I don't know if there is evidence for it but I suggest you backup your data.
The podcast is great. Provides great information and is more than happy to provide corrections when some calls him on it. Takes a very scientific approach to issues.
Seems like the issue here is not EA or a rationalist world view. The issue seems to be the lack of compartmentalization of the professional and personal lives, a lack of procedural boundaries around social conduct, and a general lack of maturity. This doesn't seem particularly rational...
Perspective from someone who maintains Debian packages in the community repo here:
Package maintenance is time consuming and difficult. It requires a lot of volunteer work. Individual maintainers are overworked and unpaid. Packaging software often requires managing complex dependencies, writing documentation, developing packaging toolchains, and patching software.
Furthermore, stable release of a particular software version is even more of a challenge for package maintainers. Often upstream FOSS maintains only patch HEAD and release a new version. The responsibility of backporting changes to previous versions is left to package maintainers. To provide secure versions of old software, you're asking maintainers to have intimate familiarity with the OSS code bases and follow the dev process etc.
If I had community supported software exposed to the internet, I would be very concerned with the current state of things. I would want to ensure that individuals are invested with maintaining this software in a full-time capacity. It is important that "main" receives free Updates. Ubuntu Pro seems like it enhances the OSS ecosystem. As an personal user, you can get a free subscription courtesy of Canonical.
It is important to remember that as the end user, you are choosing to enable the community repo. Without Canonical, you wouldn't even know this version of the software is vulnerable.