svc0's comments

svc0 · on Nov 14, 2023

Universe packages are not supported by Ubuntu unless you activate Ubuntu Pro. Thus, if you install ffmpeg on Ubuntu without Pro, it will contain several active vulnerabilities. The full five years only applies packages in the main repo.

xinayder · on Nov 14, 2023

I wanted to find another reason to not use Ubuntu for servers (besides Snap being forced on everyone) and this was it.

At least, in Debian, most of the packages I use on my server are from their main repos. Occasionally there are a few from other sources but by the time a new Debian patch is released, those other packages are also updated.

isaacremuant · on Nov 14, 2023

That's absolutely terrible and not clear at all.

I've been tempted to go back to Arch and I think this can be a good motivator.

diffeomorphism · on Nov 14, 2023

That is also absolutely unchanged compared to "since forever". Canonical supports "main", while "universe" and "multiverse" offer best-effort community support (aka from debian). They now additionally offer a dedicated team for those repos.

Honest question, since the arch wiki seems surprisingly spotty on this: Which arch repos are covered by their security team? Just core? Or also extra? More than that? AUR surely not, right?

guappa · on Nov 14, 2023

Not even "from debian". Sometimes they can't be bothered to copy debian packages that fix security issues if the package is in universe, and just leave it vulnerable for the entire duration of the LTS.

Happened to me.

svc0 · on Nov 14, 2023

Just to be clear, on Arch ffmpeg is outdated (6.0 vs 6.1.) This means it has three security vulnerabilities.

viraptor · on Nov 14, 2023

It's not the case for this example of ffmpeg (it's actually not patched), but make sure to check the actual changelog. Sometimes the version is kept, but the patches are backported, so a plain version comparison is not enough.

panarky · on Nov 14, 2023

Debian's ffmpeg is at 6.1, no subscription nonsense required.

charcircuit · on Nov 14, 2023

Universe is supported on a "best effort" basis.

https://ubuntu.com/security/esm

svc0 · on Oct 26, 2023

> the training cutoff date isn't some absolute date from which they never allowed any new data to be trained on.

It's not a question of whether they are "allowed" to train on new data; the question is whether they have trained it on data containing information about current events. If you know they've implemented a Continuous Integration (CI) system for this, you should link to a source. However, I don't think this is true, as there would be no reason for a cutoff date otherwise.

> Instead there are bits and pieces of newer information captured in the updated models, but it's not a meaningful enough amount to ever rely on.

This seems more like an opinion of the technology's limitations in general, rather than an assessment of the likelihood that new information will be incorporated into its weights and biases.

svc0 · on Oct 26, 2023

They are most certainly being fed LLM content. However, I think this "model collapse" narrative is over-subscribed. Here are some things to keep in mind:

(1) Real content is not generated via a synthetic loop: Humans use generative AI in complex ways, intermixing human-generated and AI-generated content. Imagine a person who writes the first draft of an essay, then uses ChatGPT to rewrite parts of it. These are certainly many human additions, modifications, and stylistic flourishes.

(2) The most dramatic effects of model collapse were seen when training multiple generations of AI agents on content generated by the previous agent. This is a very academic scenario.

(3) There is already a lot of junk consumed by these models. RLHF is aimed at eliminating these junk responses. I am not aware of any research that explores how the full training cycle is affected when RLHF is employed.

Also, there is a lot of training material out there that was not used by the original GPT-3 model. The primary limitation is hardware.

gherkinnn · on Oct 26, 2023

I have come across an increasing number of obviously generated content. Recipes, product reviews, and anything Buzzfeed was known for. I only expect more and more of it. Just wait until 2024's "top 38 React server component state management libraries you need to learn this year" posts come up on dev.to.

Edit: well look at that. I'm not saying this was generated, but it might as well could be. These "learn from these repos" posts are everywhere now.

https://dev.to/triggerdotdev/17-javascript-repositories-to-b...

caesil · on Oct 26, 2023

It's also fairly well established now, I believe, that part of OpenAI's secret sauce is focusing on high-quality data sources; that is, probably those least likely to include unmodified ChatGPT outputs.

spiffytech · on Oct 26, 2023

> This is a very academic scenario.

Is it going to remain academic? I can easily imagine the spammy content farm / listicle business model evolving to be fully automated, creating an input loop.

svc0 · on Oct 26, 2023

Sure, there will be some pollution. It's very multivariate and depends on factors like content split, generation quality, and novel information. A scenario in which all of your data is generated by the previous model and you run n training loops is academic.

It's also worth noting that when OpenAI created Whisper, they had to heuristically remove many transcripts from poor ASR systems, and they definitely didn't catch them all.

svc0 · on July 21, 2023

I think it should be noted that this enforces grammatical constraints on the model's generated text, but it doesn't do anything to properly align the content. This would be useful if you needed to ensure a server delivered well-formatted JSON, but it I suspect it wont solve a lot of alignment issues with current language generation. For example current iterations of Llama and GPT often do not label markdown code-blocks correctly. Using grammar-based sampling, you could enforce that it labels code blocks but you couldn't enforce correct labeling since this is context-dependent. You also couldn't invent a novel domain-specific language without aligning against that language and expect good output.

newhouseb · on July 22, 2023

Also important to call out that anytime you have a freeform string it's pretty much an open invitation for the LLM to go completely haywire and run off into all sorts of weird tangents. So these methods are best used with other heuristics to bias sampling once you get to free-form text territory (i.e. a repetition penalty etc)

brucethemoose2 · on July 21, 2023

But since its llama, some examples could be trained into a lora.

I can imagine a system where, for instance, a markdown lora and a markdown grammar file can be hotswapped in and out.

svc0 · on April 21, 2023

What could possibly go wrong?

HideousKojima · on April 22, 2023

I see a libel suit in the near future, at a minimum.

svc0 · on April 21, 2023

The major benefit of tenure is as a protection for pursuing the truth when researching controversial academic subjects. It protects the behavioral genetics research which takes place at UT Austin from left wing attacks and the positions some professors have taken on Texas A&M's annual drag show from right wing attacks.

As a student in the Texas system, I've raised questions which my professors refused to talk about fearing political backlash. Tenure is an important institution. I think you can only go so far to incentivize good teaching through termination.

svc0 · on Feb 15, 2023

I've always been curious why people so fervently dislike Gibson. I think the most genuine criticism is that Spin-Rite is not a backup solution and people may rely on it as such. Ideally, no one should need it since all data should be replicated and backed up. Any drive can fail at any time for any reason and it may be totally unrecoverable.

[Side Note: He also once claimed in a "testimonial" that a special ops team recovered data off of a hard drive during a mission in which they hit a terrorist with a computer.]

That being said, he produces a free security podcast which is quite good. He knows his stuff.

pdonis · on Feb 15, 2023

> Any drive can fail at any time for any reason and it may be totally unrecoverable.

While in principle this is true, I have been using hard drives for more than 30 years now in PCs and I have never had one fail. I still back things up to separate drives since there's always a first time, but I've never used SpinRite or any other extra "protection" over and above what my OS provided.

svc0 · on Feb 15, 2023

There are stats on failure rates and bathtub curves. Consumer hard drives these days have an AFT of ~1.41%. Never used SpinRite and I don't know if there is evidence for it but I suggest you backup your data.

https://www.backblaze.com/blog/backblaze-drive-stats-for-q3-...

icecube123 · on Feb 16, 2023

The podcast is great. Provides great information and is more than happy to provide corrections when some calls him on it. Takes a very scientific approach to issues.

svc0 · on Feb 4, 2023

Seems like the issue here is not EA or a rationalist world view. The issue seems to be the lack of compartmentalization of the professional and personal lives, a lack of procedural boundaries around social conduct, and a general lack of maturity. This doesn't seem particularly rational...

svc0 · on Feb 1, 2023

Perspective from someone who maintains Debian packages in the community repo here:

Package maintenance is time consuming and difficult. It requires a lot of volunteer work. Individual maintainers are overworked and unpaid. Packaging software often requires managing complex dependencies, writing documentation, developing packaging toolchains, and patching software.

Furthermore, stable release of a particular software version is even more of a challenge for package maintainers. Often upstream FOSS maintains only patch HEAD and release a new version. The responsibility of backporting changes to previous versions is left to package maintainers. To provide secure versions of old software, you're asking maintainers to have intimate familiarity with the OSS code bases and follow the dev process etc.

If I had community supported software exposed to the internet, I would be very concerned with the current state of things. I would want to ensure that individuals are invested with maintaining this software in a full-time capacity. It is important that "main" receives free Updates. Ubuntu Pro seems like it enhances the OSS ecosystem. As an personal user, you can get a free subscription courtesy of Canonical.

It is important to remember that as the end user, you are choosing to enable the community repo. Without Canonical, you wouldn't even know this version of the software is vulnerable.