Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some art, technical and scientific illustration in particular, requires a great deal of precision, and ability to interpret information. That work isn't going away any time soon, and is similar to what is required of professional translation. A lot of art does not require that.


Are you under the impression that right now, as of today, the publicly-available AI models are ready to replace humans for all types of art outside of scientific and technical illustration?

Because that's not true at all. The AI can't even draw hands yet. To say nothing of its ability to handle multiple people and objects interacting in complex scenes.

I'm concerned that the discussions about AI art on forums like HN get distorted because you have people sharing their views on art here, even though they don't actually have a serious and nuanced appreciation of art and they don't have a good understanding of all the types of work that artists do. Maybe you'd be fine with reading a comic book where everyone has seven melting fingers, but people who take comic books seriously as an artistic medium would not.


> Because that's not true at all. The AI can't even draw hands yet. To say nothing of its ability to handle multiple people and objects interacting in complex scenes.

This seems to be purely an issue of the size of the network. Parti (https://parti.research.google/) demonstrates that as the number of parameters increases, with no change to the underlying architecture, a lot of these problems simply go away. Basically just throw more compute and memory at the problem and everything gets fixed.


People who buy art do not buy it because of the technical execution. You may need to execute a piece in some way to get a desired effect, but the technique is the mean not the goal.

This is not to take away from the achievements of AI. It's that creating pictures adhering to a prompt with some degree of creativity is very little of what art is. Maybe it will replace some part of commissioned illustrations where the artist's name does not matter (e.g. some avatar pic?).

We still value, financially, some material goods for much more than they cost to produce. Or for much more than their almost identical mass-produced counterparts.


> It's that creating pictures adhering to a prompt with some degree of creativity is very little of what art is.

I mean, it's the majority of commercial art - you get a prompt from the client, you maybe flesh it out in a few different directions with sketches, then you refine a final piece. And AI is incredibly good at this process - instant results, infinite patience, and it's free. A very hard combination to meet.

Calendars, book covers, video game assets, green screen backgrounds....

Even in a case like video game animations, where the AI can't build every frame, it can still give you a good reference photo. From there you just need a cheap artist to fill out the frames - a huge cost savings, and a big blow to the artistic community.

Where do you get started as an artist, without any of those? Obviously, Fine Arts isn't nearly as effected, but how do you get your start when you can't build a name from your cool book covers, or get famous off Magic: The Gathering card illustrations?


Well, we’ll see how it performs, if it’s ever made public.

The 20B images don’t look that much more impressive than what SD is already doing (aside from the ability to render text), and in some cases they look worse. It’s hard to tell because the resolution is so small, but even in the 20B “astronaut riding a horse through a pond” image, it looks like his hands are still nonsensical.


This nitpick about hands sounds desperate. Here we are, with a tech so powerful that it overshadows the default hype it's surrounded by (no small feat, most technologies fail to live up to the hype as you know) ... and the critics merely move the goalpost a tiny bit further, even if the tech scales so well as to make their new goalpost irrelevant in a year.


It's not a nitpick. It might be a nitpick if hands were the only thing it couldn't do. But it struggles with a lot more than just hands.

>the tech scales so well as to make their new goalpost irrelevant in a year.

This just brings me back to my original question. Self-driving cars have been "a year away" for many years now, and now companies are starting to hint that human assistance may be required for the foreseeable future [1]. So, why the confidence that art will be an easy problem to solve with just more scaling, when that approach hasn't eliminated the need for humans in any other domain?

[1]https://www.reuters.com/technology/truly-autonomous-cars-may...


I have a suspicion that generative art is going to hit a data wall, also. All of these models are constrained in what patterns they can learn because image captions are not very precise. They can rehash common motifs associated with keywords, but they’re not good at following specific instructions. (“The chair is at the corner of the rug, turned 15 degrees to the left, with the leg nearest the camera aligned with the edge of the fireplace.”) For them to meaningfully improve in this regard, I have to imagine someone will need to locate a trove of a few billion images with exceptionally high quality captions, and well distributed throughout the space of possible image types, subjects, themes, and styles.


I think that details like angle and position will be resolved by using basic sketches as a starting point (we can already make images that sort of conform to layouts as well as prompts), and subdividing the image into assets it then has to stitch together in subsequent steps, and then adjusting lighting/contrast/style as a set of filters in post processing. The wall is lowered quite a bit when you don't insist on doing everything from a single magic prompt

(This will be great from the point of view of art creation; not so great from the point of view of supposedly rendering humans obsolete)


That makes sense. I don’t think that will render humans obsolete; I think it will just increase their productivity and ultimately raise the standard of quality expected. It means artists can explore and iterate on ideas faster than if they had to lay down preliminary artifacts manually. But it doesn’t eliminate the need for authorship: someone still needs to decide what to communicate visually and how to communicate it.


Sure, but eventually we’re going to hit on environmental and cost-effective power limits of training, and it’s not worth the cost to train the model.

AFAIUI, that’s part of the point that Gebru was trying to make before she was fired.


For now I can run my stable diffusion on a vintage laptop from a decade ago, on CPU (!), and it doesn't even utilize most of my RAM. And training this model was still cheap compared to, say, a google senior engineer yearly salary. The limits of scaling are further than laymen may want to believe.

With an order of magnitude more parameters it won't just do hands, it will do quite a bit more.


Gebru doesn't have anything good to say about AI, it's only downsides. She's biased being an activist and all.


Her position doesn’t invalidate her arguments. How good is a product that can’t stand up to criticism?

Edit: to tie it back to the original criticism: what’s the maximum training cost we’re willing to accept for the model? How can we guarantee return greater than the increased training cost?


This makes me wonder what the costs of training an actual human artist are. Are AIs less efficient?


You don't have to retrain an AI to spin up another instance - just download a 4gb weight file via a magnet link floating around the internet and run some python code in terminal on your old PC. This kills the comparison.

And training a real living breathing person in a rich OECD country is going to be costly - no offense meant, I'm actually not from OECD.


You can create a lot more copies of the AI.

But then again, when it comes to commercial needs, a human doesn't need "retraining" every time you ask them to draw something they weren't familiar with when they went through art school...


Doesn't "the AI" train on art produced by people? "Just expand the dataset, just increase the parameters" seems like it should hit a wall fairly quickly... and still not be very good, because deep learning systems have no insight.


Every instagram, facebook, and tiktok photo with associated text data is a potential pair for training.

In the smartphone age, the case for data hunger looks pretty weak.


But not as weak as the case that the route to production grade commercial art is reached via biasing the training dataset more towards sloppy social media images...


There is no problem here, for any moderately "in" person it's obvious you can bias the model towards aesthetics by concentrating the highly liked images in the dataset. And if it isn't enough, just ask your audience to sometimes rate the aesthetics of the image and use this as a signal for dataset curation.

Artistic styles are often just thin semantic filters over the base 3d geometry that can be learned from photos, and learning these shouldn't require many examples.


> I don't think there's been any industry that's been ended by AI yet, and yet people are strangely confident that art is going to be the first.

Technology is making something that used to take a lot of practice and skill be accesible to those without any of it. A monkey can now draw two ovals, label it an owl, and run an image-to-image conversion with Stable Diffusion to get a pretty good sketch of an owl [1].

Is it better than what a good artist could do? Irrelevant.

Is it better than what a cheap illustrator I find on Fiverr could do? Irrelevant.

The only important point is that I no longer need an illustrator to get myself an owl. I draw some lines, I pick some words, and presto I have an illustration.

The question of whether it's "art" is entirely irrelevant.

> Are you under the impression that right now, as of today, the publicly-available AI models are ready to replace humans for all types of art outside of scientific and technical illustration? Because that's not true at all. The AI can't even draw hands yet. To say nothing of its ability to handle multiple people and objects interacting in complex scenes.

I think this is severely underplaying the speed at which things are changing and basing an argument about things that the AI currently can't do. DALL-E was anounce in Jan 2021 and it's still locked behind API access. Stable Diffusion came out Aug 2022 and I can run it on <$2,000 laptop. That's not 2 years. Do you think hands are going to be a long term roadblock?

As for complex scenes, you can currently string that together with a Stable Diffusion plugin for photoshop/gimp.

[1] https://www.reddit.com/r/StableDiffusion/comments/wwv7zk/sta...


But if I want a good picture of an owl, I Google "owl" and get many more options than I could possibly ever have time to pick from. Stable Diffusion is essentially doing the same thing as Google, except presenting a kind of average result instead of showing me all the results in its DB.

Now, this may actually be helpful in that it gets around copyright claims - but that's the only real difference.


And you are free to search through the whole catalog of google results until you find an owl that looks exactly like you want. Though this is going to get harder as you want something more specific than a simple owl.

But the approach for stable diffusion is just as easy whether you want just "an owl", or "an owl in X's style with A, B, and C"


Changing the prompt until it generates what I want is not that different from changing my search terms until the result I want is closer to the top.

Now, I should of course note that search engines already employ ML techniques to actually interpret search terms, so to some extent the point is moot - ML is important to actually solving this problem.


But searching on google doesn't "generate" anything, If your image isn't on the web, there's nothing to bring "closer to the top".


Sure, but chances are, it is already on the web.

And of course, it's also possible that the image I want can't be generated by SD/DALL-E/etc.


Go ahead and get me a photo off Google images of an alpaca in a suit playing chess in vibrant digital painting style.

Without meaning to sounding rude about it... I'll wait.


I'd be curious to see if you could get that from the AI as well.

I tried generating that exact prompt a few times at theartbutton.ai and all the results were nonsensical.

For example: https://theartbutton.ai/image/OW1HZLfhjg6DFvJtk4vQZzUYqI7pGG...


Here are my best attempts: https://imgur.com/a/obZH7X5

Not a very wide range of what I could do with the idea in terms of composition, but just some variations of finishing touches/intermediate steps. I achieved this with some human-in-the-loop iteration and inpainting, but it was no more than 15-30 minutes toying around with it, and I'm no artist.

If you have a semi-decent graphics card and would like to experiment with a bunch of extra settings and tools than are readily available online, this is a good repo for that: https://github.com/AUTOMATIC1111/stable-diffusion-webui


> they don't actually have a serious and nuanced appreciation of art and they don't have a good understanding of all the types of work that artists do.

I would extend this lack of nuance and understanding to the deep learning implementation side also. A lot of people seem to have some very foundational misconceptions about what deep learning is and what it does. In the case of generative art: these models are “simply” sampling from the frozen statistical structure they have learned from web images and their captions. They don’t understand the relationship between objects in space, they have no ideas or feelings to express, and they communicate nothing. That’s why the even the best output of these models tends to have a perceptible hollowness: you can detect the lack of a coherent authorial intent in it.


I recently went to stable diffusion for some art for a D&D campaign guide, to make the thing more immersive. While the pictures are impressive, there are a lot of things about the generated art that just don't make sense: In one picture, a tower had a staircase down 1/3rd of the way from the door to the ground, just stopping at that point. Most had issues like this. Several other pictures I wanted were impossible to generate.

The field of "art that needs human communication skills" seems to be a lot broader than just scientific illustration.


A significant chunk of what you're describing can be solved by a combination of better prompt engineering and repeated inpainting.

SD obviously doesn't understand language in the same way we do, so it can be tricky to describe things in a way that will match your expectations. Once you start to understand the tricks here, it gets easier and easier.

Inpainting will let you fix a lot of the rest. Staircase stops? Select the area where it stopped, get the AI to generate more. People are already doing this to create very complex artwork where there are issues with faces, hands, etc. https://www.reddit.com/r/StableDiffusion/comments/x9u8qh/img... is a great example of how you can quickly iterate over a scene.

One of the other things people struggle with is consistent characters and settings, but people have found ways to improve this with Midjourney - https://docs.google.com/document/u/1/d/e/2PACX-1vRahIr3-h_V3...

There's more of a learning curve to these tools than most people think, but it's also still miles and miles away from the learning curve required to actually be proficient at the technical aspects of making art.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: