Everyone has Access to Dall-E, and My Art is Still Better: A Sitdown with Karan4d (Part 2)
The following is the condensed and lightly-edited transcription of a conversation between Karan4d and Max Cohen (@cohenthewriter) recorded in August 2022. This is part 2 of 3. You can read part 1 here and part 3 here.
Max: Let me get a bit more into the artistry aspect. I’m curious about your process, and please keep in mind that I’ve never made anything generative, I have no idea how it works. But philosophically, how do you ideas become distilled into actual art pieces? But also, more immediately, what programs or flow-charts are you using to bring an idea to fruition?
Karan: Right…so, the standard that has come to be in the current state of computer-vision AI art is prompt art. Txt2img [Text to Image]. Now something most important to keep in mind before I go on about that is — being in the AI art scene before Diffusion or Dall-E2, any of these Txt2img paired-set models, the process was very different. And a lot of artists who work now have forgotten this part of the process, because it didn’t go away, but people take it for granted.
There is, with GANs, an important process called data-setting. You want to have images, or whatever your input data is — if you’re making a text model, text; an image model, image; a video model, video; audio model, audio, whatever it is — you need a very nuanced, specific, general, etc. data-set. You learn regiments and techniques for about how to prepare a data-set. Your data-set matters. It’s the brain of the GAN or the NLP, whatever it may be. No, it’s not the brain, it’s the memory. It’s all the things it knows. It’s not its ability to know things, it’s all the things it knows. Therefore everything that comes out of it is going to be utilizing all those aforementioned things as its inputs, as its paint, as its brushes, as its colors, whatever you want to say.
Nowadays, people want to really try to focus on prompt engineering with these Txt2img pairs. People try to focus on: What should I type that’s going to give me what I want based on the images the model is trained on? Coming up with great prompts means coming up with great images. The better you are at prompt engineering, the more specific you can get your images. The language you’re speaking to a Txt2img pair is different for every single data-set. Most of them are trained on similar stuff, but essentially it comes down to this art of prompt engineering, and a lot of artists are very careful about their prompts, though a lot of artists share their prompts. I have my own models but I also love to use models like Simulacra or something. I used to mess around with Midjourney, and you could definitely find the prompts I’ve used in there too. And you can definitely get very beautiful results if you use my prompts. Feel free. I don’t have a problem with that, though a lot of people do hold theirs close to the chest.
Your prompts is one part of the secret sauce. The thing is, these Txt2img pair games, people who didn’t make the model, people who are just using it, they forget that there’s a data-set of images…and you can fine-tune that to change what images are in there. You can fine-tune the input of the text to change what captions are in there. So even though the model may not know Skygolpe, or even though it doesn’t know Fewocious, do something about it! Add the images and the captions to the data-set! You can convert the data-set into a CSV file from the TF records format and you can split the CSV into small enough portions that you can actually open it on your computer, and then you just edit one of the cells, and change the image, and change one of the captions, and do it for a couple of them if you want to really reinforce it.
So the two really important things for any AI artist using those things is
- Prompt Engineering, which 90% of people are already on top of, and
- Data-setting and fine-tuning. Most people are just not going to fucking bother with —
Max: When you sit down to look at one of these programs, like, do you have an idea of what concepts you want to explore?
Karan: I have a prepared data-set! So those lists of ideas is where prompt engineering comes into play. It’s the ideas in words.
I have this data-set, I know which artists are in the data-set, I know which styles are in the data-set. Let’s say I want to combine: My most recent work is a fusion of a Bauhaus art style and Rembrandt’s work. A fusion of old-master style and Oskar Schlemmer’s work. It’s a fusion of this geometric — throw-in words like geometric, liminal, shimmer, canvas, levels, photorealist, work, blah blah blah — you find these words that work for you, that keep consistent these aesthetics that you want, or you just fuck around and see what sticks. What do you like? And then keep those parts you like and change something. What do I want this mix of these two genres, what do I want it to be? Do I want it to be the grim reaper, do I want it to be a goddess, do I want it to be a princess, do I want it to be…
A lot of my subconscious inspiration is coming from FromSoftware — Bloodborne, Dark Souls, Elden Ring, etc. — I have always been a massive massive fan, massive Berserk fan…dystopian high fantasy is what clicks for me. So the base for what I’m exploring always comes from this element of dystopian high fantasy, but the work itself gets filtered through all these different art styles and it comes out to be something entirely different.
Max: You said it comes out entirely different: What is it like where you’re ultimately ceding control of your art to an unpredictable program? Is it frightening, is it exciting?
Karan: I used to think it was total guesswork and total surprise every time, but it’s like trying a new mixture of two colors for the first time. And you get a new color! A lot of times it’s this brown goop, but sometimes you get purple, magenta, turquoise, or something. And it’s really not that unexpected.
What happens is you don’t know what you’re getting when you’re trying a bunch of things, and you start trying out different parts you like — It’s like building a molecule! You don’t know what the atoms are, but then you start getting them together, pull them together, and you start building the molecule, you say “Okay, I’ve generated this a hundred times, and it generally looks like this.” The reason people think it’s a surprise and totally ceding control is because we’ve been thinking too small-scale! Once you’ve been doing it for awhile, you know what the fuck it’s going to look like! So yes, there’s some level of, “Oh, this is a cool version of this style that I knew it was going to produce, but I got what I wanted.” There are these aesthetic rules that I’m looking for and may not have defined because they’re new! But I know I like them.
For me, in this whole art thing — A meaning is great, a story is great, and it’s certainly good for sales/marketing, etc., but when you’re doing the damn thing, all I care about is if it fucking looks good. If it doesn’t look good, tell me the story, maybe I’ll care about it. But I better be nutting the second I see it.
If people see my work and they’re like, “Oh my god!” before they go, “What does this mean?” then I’ve succeeded. If people first say, “Well what does this mean,” then I’ve failed. Because you can ask me [about the meaning of the piece], I’m sure we’ll find something together. But it’s about a good piece, and you know good when you see it! And if I think it’s good, it’s really fucking good.
Max: So as we were talking about before, you have all these pieces that you share on social media, share on Twitter, but then you have the things you actually mint. How do you know when something is ready to be minted? Is there some kind of —
Karan: It’s almost entirely arbitrary unless somebody has requested for it to be minted. Or if it’s supposed to appear at some gallery or something. It’s usually just the thing I made when I was in the mood to mint. And I don’t even sell pieces like that anymore. People aren’t trying to buy Karan4D works. My survival in the Web3 space has come from getting to work on these bigger projects outside of just GAN/Txt2img pair stuff. The art is something I’m doing to push a boundary in myself.
Like I don’t just do theTxt2img pair, I have this overpaint algorithm, which paints every single piece of the painting with individualized brushstrokes, which you’ll notice is unique to my model, doesn’t show up anywhere else in anyone else’s art in the AI space. They don’t have it. And then I Upscale it. So you get these nice, 4k, hand-painted stroke-for-stroke pieces that are just an overpaint of the GAN-generation of the Txt2img pair. That’s the stuff that I’m doing because it’s cool, and it’s beautiful, and because it’s what brought me here and what I know, but now my work has come to things like the [Virtual Curator].
I just wanted to be in this world and this space, and learning how to do AI stuff has definitely opened up some opportunities to make money, like freelance on my own which has been good, but in terms of minting and all that, if people care then I’ll get more on that horse.
But this whole thing is really nepotistic at the end of the day, isn’t it? And it shouldn’t be! Because when it is, most of the art is fucking terrible. Laughs hard. To be frank with you.
But I don’t know. I mint stuff that I like — when I’m emotional I mint something. When I’m in tension I mint something. It’s very primitive. I tried for a long time to be a calculated minter, and do this whole thing — finally got on SuperRare after so long, and it was like “Wow. I worked so hard to do this, and now I’m going to be really careful, and I’m going to mint this carefully crafted, deep-to-my-heart series that meant the whole world to me.” [I did that] and one guy gave a fuck about it and bought a piece. And one went to MOCA, because another guy did give a fuck about it clearly. And another piece was given to [Bryan Brinkman], cuz I love that guy, and he’s actually had a huge impact on me feeling confident enough to stay in this space from early on.