Transcription de l'entretien
Matthieu Rouif
Co-Founder & CEO of Photoroom
"Impact of Gen AI technologies on the photo business."
Hi I’m Caroline Goulard, I’ve created two companies in the field of data visualization. I’ve been working for more than 10 years to build bridges between humans and data and I’m very happy to host this new episode of the Bridge, the Artefact media that makes data and artificial intelligence understandable for all. Today I’m with Matthieu Rouif, Matthieu hello.
Hi Caroline.
Can you introduce yourself ?
Yes of course. So I’m Matthieu Rouif, I’m an engineer by training and I’ve been working for the past 15 years on photo editing with a bit of AI, actually a lot of AI. Starting from smartphone but also all platforms and it’s been a very interesting past years working on photo editing as a CEO and co-founder of PhotoRoom.
Great! Can you tell me the story of PhotoRoom ?
Of course. So as I told you my past is a lot of working in startups and photo and what happened is in 2014 I was working in a startup Stupefix that was acquired by GoPro two years later. We were doing video editing for everyone, so video memories from all the content you can have as an individual, and well I was at GoPro and we were doing a lot of AI for these apps, I was in charge of all the photo editing apps there, and well it came to my mind that creating visuals to promote the app, to promote what I did was very difficult, I’ve been working on Photoshop for 20 years and at the same time we had an amazing AI team on the side and they were doing incredible things but that was just on paper and not like all users. So I left GoPro to start PhotoRoom and to make all this AI for photo editing accessible to all. Really the idea of PhotoRoom is to make studio quality photos accessible to everyone in the world. And so we started in 2019.
All right, so you started working with AI quite soon ?
Yes.
Before the arrival of generative AI, did it change a lot for you this last version of this technology ?
Yes, it changed a lot, I mean the potential and all you can do from it is now huge and amazing. It didn’t change everything in the company actually it was kind of a natural step for us. So taking a step back in 2019, we started doing background removal, so you start from a product or a photo and you create a white background so that’s pure AI, at least AI made it like 100x better and we had a ton of success with that, especially there was the covid part so in e-commerce it’s really important to have this white background. The thing is that the white background and all you can do from that is not for everyone. If you’re if you’re selling fashion white background is amazing, if you’re selling food then you don’t want to have a pizza on a white background so we had a good success, we were leading all around the world in 2022 but we we saw that for some users the background we were offering kind of looked like collage. And we saw this amazing tech coming from DALL-E by OpenAI at that point, and we thought that’s exactly what our users want and we started to apply it for the product and the photo we take. So you start from a photo and then you create a scene for it. We literally paused the company when we saw that, it was summer 2022 and say “okay this is the future of photo editing, this is the biggest photo innovation in the past 10 years” and we jumped on it. And we were the first to release this generative IAI feature for e-commerce in early 2022.
All right, so e-commerce is that your main use case or are there some others ?
E-commerce is a big thing for us, we started with resellers and now we address everyone who wants to sell products online, so we are the best at product photography especially creating these scenes for the product people are selling. So e-commerce is definitely a big thing for us, we have a Shopify app, we are pouring a lot of e-commerce website so we start from mobile but also import already in e-commerce websites. Then, I mean we have 100 millions downloads, so it’s not all e-commerce, we have casual users, we have creators using the app, we have also restaurant owners, like for food as I told you and we go even bigger. So Barbie, I’m sure you heard about the movie, they use the PhotoRoom Tech to create a Barbie poster for all their community and they created this website used by 10 millions of users. We have very different use cases, but at the end of the day what we do is you start from a photo and we make the photo look pro, we elevate the photo and for all users.
Right and from your point of view, does it sound more like a revolution or an evolution, those generative AI Technologies ?
No I think it’s a revolution in that case, for us it was a natural step because that’s what the users were asking for but I do think all you can create from generative AI is a huge change, I mean I started my first startup with the mobile revolution, I took the first iPhone class development at Stanford, I was on compus to create a photo app actually and it was big, it had a lot of potential, all the usage, it took a few years but it changed our lives, changed our society. I think that’s what we’re seeing with the generative AI part, everything you can create, the way you can build things as a developer, the way you create as a designer, everything is changing.
Aside from this photo editing part, do you see some other big changes in the way we create, we work, we sell stuff ?
Yeah. I like the idea of the assistant, people say co-pilots you know, so if you’re using a tool like PhotoRoom, you can use Figma, you can use ChatGPT or Copilot, today it helps you create so it’s like you have an expert that’s on your side and it helps, you this expert helps you create and I think that’s kind of the short-term vision of the incumbent of like all the tools today. So already that, and Co-pilot for almost free is changing a lot, all the developers at PhotoRoom, they’re using generate AI to code better, faster, it gives you ideas, all the growth team, the marketing team is also using this generative AI products so I think it changes the way we work a lot . I think the future is more like an autopilot. I mean if you’re building cars and you think okay I’m going to help you, I’m going to give you a map, I’m going to tell you how to go there, you’ll be the co-pilot in your car and that’s what you build when you already have a car. But I think the real vision is building the autopilot which would be like the Tesla version of that, in autopilot, so you say “okay I’m in Paris, I want to go to Cannes, I want to have a scenic view, I want to make a stop in this city and get me there” and it just does the job and you don’t even know how to drive. That’s very important for us at PhotoRoom because it means we can help people that don’t know how to do photo, and so it just gets their job done and I think that’s the future of how we work and how we help people in society.
So it brings you a bit far from photo editing, removing the background, how do you see the next step ? Is it just you you enter some text and PhotoRoom creates the perfect scene for you ?
Yeah I mean we’re not doing the autopilot for cars to be clear, but I think it’s more like in the future you kind of have an agency for a very little cost and you pay the agency, the autopilot and the result you get. But for us, for e-commerce for instance it’s like okay you tell me what’s your menu, I’ll build the menu with photos to help you, probably go on Instagram, create some posts, make them iterate, try to find the image that sells most in your case, get your customers to do ads maybe, so that’s really a little like if you had a photo marketing agency for free or not for free but like very accessible and very fast, I think that’s the future. It’s not only photos, it’s video it’s text it can iterate, and I think the value is this agency knows you very well. So you can give them a mood board, you tell where you are, what kind of person you are, you tell them your story and what you like and then the autopilot PhotoRoom will create the photo for you automatically as if you had hundreds of people working for you.
So you will become direct competitors of DALL-E or Midjourney ?
I think our co-audience is e-commerce, there might be some use cases where you can use one or the other in some way, I think Midjourney is more creative, it’s up in the funnel, it’s quite technical you know. I see Midjourney more like a Photoshop competitor, like where you have to know what you want, you have to describe complex prompts. For us it’s really about the 500 million people who are not amazing designers and how to give them like this value of creating amazing photos. And what I would say also has started from e-commerce is, one of the problem of generative AI is it hallucinates. So you have that with text, like you ask two plus three and it gives you six you know, or you ask for scientific paper that’s talking about I don’t know the latest photovoltaic cells you know for energy, and it creates something that doesn’t exist and you have the same for photo. You can ask for like a very good photo of this pasta dish for instance, and well maybe if you ask DallE or Midjourney maybe it creates pistacho from that, you know the problem with hands it’s creating extra fingers, all these things are hallucination. And that was one of the issue we faced with PhotoRoom when we launched the first generative AI tool in 2022. People would put their product and initially it would hallucinate so it would create extra details and it’s fine when you’re creative but when you sell then it’s like okay it’s not the photo and they return it. So for e-commerce, DALL-E and Midjourney don’t work, so what we do is we take the photo, we remove the background and then we just create the background but don’t create artefacts around the product and for e-commerce it’s a requirement so in that way we’re a bit different from what they do, but the field is moving so fast so maybe we do text to image in the future, maybe they’ll do a bit more what we do so it’s moving fast.
Right, what I like with what you say is making everyone a creator thanks to those artificial intelligence tools which is a bit like Steve Jobs in the early days when the computer was like a bicycle for the human mind to go where he cannot go without and so how do you see these artificial tools, like a change for the society ? Will we still live the same way, build companies the same way ?
Yeah I think it changes a lot the way we work, I mean I know how to code and develop for instance but I don’t do that on a daily basis, so same I don’t do photo editing on a daily basis, it helps me be very productive and so I think you give access to a lot of agencies to everyone. So tomorrow you’ll have entrepreneurs who have like a photo agency, accounting agency, financing and data probably, and they can do that by themselves so it kind of removes the the need for large teams in some cases. We see that already with the internet like entrepreneurship, the barrier to creation of a company got lower and lower. Today already most of sales on Shopify or Amazon Marketplace, there are teams of five and less so the future for me is really an acceleration of that and you’ll have people by themselves doing like 10 millions in Revenue alone and a lot of one to five teams, we have this explosion of small businesses, maybe 500 millions maybe 1 billion small businesses, and they all have this autopilot to help them on their daily job so that they can focus on their craft like for a restaurant or jewelry shop, but they have this small assistant that helps them. When you think about it, having large corporations with like all this functions because of maybe globalization, has been kind of a very short period of History, maybe 50, 60s years in the 20th century, so I think men love to in a small groups and be creative and I think it’s a fantastic opportunity for the society to lean a bit more in entrepreneurship and a lot more creation.
All right, but how many people do we need to build PhotoRoom, Midjourney, or DALL-E, how many are you at PhotoRoom ?
Today we’re a bit more than 50, I don’t know the exact number for Midjourney but I think they’re a bit smaller, OpenAI has a huge impact on the world and they around 500 maybe 600 now, but I mean a few month ago they were less than 500, and so you’ll have more and more companies that are being very impactful. That’s what I see from AI, that’s what I see in e-commerce but actually numbers of revenue per employee are going up in a lot of companies, like as a general trend so it does like show this is kind of a micro trend that’s going to be accelerated with Gen AI according to me.
All right, so removing all the barriers from the person to the market, like Amazon Marketplace did by allowing everyone to sell everywhere in the world, those are the marketplaces ?
Exactly and I mean for PhotoRoom it started as an iPhone app and so we have all the platforms now but starting on an iPhone, Apple made the full distribution of of PhotoRoom so therefore in 2019 the first country we were featured was Vietnam, then we were featured in Japan, it’s really like you have marketplaces where you can distribute yourself and as an
entrepreneur you can focus really on the craft and building the product and there are a lot of platforms to distribute but autopilot will help you build so I think it’s going to be fascinating. I think what I love is everyone is starting their own business, that has kind of a unique story to tell. So you have more diversity in commerce because everyone kind of tells their story and that’s what we do with Gen AI at PhotoRoom, the image we create for you when you do your marketing, you have your visual which is unique so you know that’s the beauty of Gen AI, everyone has a unique story and gen AI make makes sure that the visual story is a single unique story for everyone and I think it’s beautiful.
Isn’t it the final danger for you to have directly the iPhone or other smartphones as competitors because they can embed those features directly into the devices ?
Yeah, I mean I’ve been working on photo apps for 15 years and there are a few features that we invented at Stupeflix and then GoPro and now at PhotoRoom that Apple did after. Of course when you do something smart then they do it for the platform. I think what’s important is to know what user you’re focusing on and so to give you the example of background removal we started with background removal, Apple’s put it on uh the iPhone I think two one or two years ago, well we didn’t see any change, and the truth is Apple is really focusing on making beautiful memories, beautiful photos that’s how they make ads in all the capitals of the world. At PhotoRoom we are focusing on helping people do their job and so that might be 10% of all the photos you take but it’s a different need, like it’s what you do in a studio, it’s not what you capture in the real world. So all the features for that are different, like Apple requires a lot of privacy and you can’t put everything on the Cloud, stuff like this, and so focusing on e-commerce you do a different product and so that’s kind of the strength and you can’t do like 1,000 photos in two seconds on an iPhone because that’s not what it’s made for and we do focus on this use case which makes it like an amazing experience. And at the end of the day, time is money for professionals so having this like focused experience that makes it work very well is important.
Sure. So you have been at YC in the US can you tell us a bit how it was to be an entrepreneur in the United States and what’s the difference for you between the United States scene and the French one ?
Yeah so we did XXX in 2020 and I think the American texting is really good at telling very ambitious stories, they really help you do that, sometime, at least for me as an engineer, I think we don’t tell the full ambitious story when we pitch our idea and our vision and so this really helps us. And I mean they have like so many advice, like you see, people think it’s just three month that they helped me when in reality they help you during all the life of your company so we still have calls and so on, yeah it help us a lot set our vision, hire the first people for the company, the community is amazing, the knowledge that is shared in the community is crazy. Now if you look at at the generative AI space, well you obviously have a lot more density, a lot more startups, there’s already maybe a dozen unicorns in the generative AI space in the Silicon Valley, but Paris is not bad. Paris is actually doing pretty well, the main office of Hugging Face is in Paris, you have Mistral, well PhotoRoom, Dust, LightOn, so you have a very good ecosystem. I think it’s powered by good universities, Grandes Ecoles, Math trainings, it’s really good for AI, good labs like the Fair Lab that Yann LeCun started for Facebook. So all this talent density makes it a very good ecosystem and for me it reached like escape velocity and it can be independent and create amazing companies in the future. You even have a poolside with like a developer tool assistant, they moved from the US to France because the ecosystem is really good and because you have great talents, so you know it’s great, it’s quite a good ecosystem.
Are you planning to open offices in the US ?
That’s a good question, maybe next year, most of our customers are in the US and we did a partnership with Barbie, Netflix worked with us for their marketing campaign, we’re working with some major marketplaces in the in the US so yeah we might have a few people there next year.
Right and going back to the European scene, there is important discussions about regulation for generative AI and AI in general, do you take part into this debate ? What form does it take ?
Not directly, I think the ecosystem is great because the government listens, so I mean that when we voice concerns or what we think we’re heard. Personally, I’m afraid for European startups, I’m afraid of regulatory capture which means every time you have a new tech if you create a law, well it’s easier to adapt to the law when you have thousands of lawyers and so the risk for me is that we create too much regulations, more regulations than the US and then who can handle the regulation, it’s existing players who have big resources, deep pockets and they they don’t care, it’s fine and they can develop but for the startups it’s very difficult to adapt and it’s not only the money but it’s also time you spend with the lawyers to explain what you do and it can kill an ecosystem to have that. So yeah I’m a bit afraid that we do more regulations than necessary. Obviously it’s a debate we need to have but we need to be aware that it’s better to be creating the new technology than regulating it for us, for the ecosystem, for the economy and so I think we don’t know yet, everything is better to regulate by the usage than the large langage model which would create a lot of burden on the tech ecosystem in Europe.
So what could be the necessary minimum ?
What everyone was saying and what was the initial version of the the law was to regulate by usage so if you’re doing medical stuff then you should be regulated, but at the end of the day it’s the usage that matters and not the way you create, you don’t regulate code you know, Python is not regulated because it’s a powerful developer platform so I think the way to regulate is the the usage. I think open source is very important and people doing open source they don’t have resources, they don’t have lawyers so it’s important not to block them and I mean if it weren’t for open source, PhotoRoom wouldn’t be here because we always start from open source and then we train our model. We’re also thinking of actually training our own foundation model for image and release it to the community but if it’s blocked on the open source side then I mean, it’s kind of giving back to the ecosystem so if it takes extra money to give back then you don’t do it and so that’s real risk if we regulate too much I think.
That’s really interesting these links with the open source environment, do you contribute and how ?
Yeah, a small startup is about focus but I believe the open source part is going so fast on generative AI, that the real challenge for the research is to be close to the open source. For instance for the background removal, we are the best in the world and I think it comes from the fact that we’re always close to the research and the open source part, so every six months, every three months, we update our algorithm with the research, so we’re close to the open source community. We contribute for instance on the stable diffusion algorithm, we were the first to accelerate by 100% with open source contribution all these models, so we do contribute on the main trunk because if you stay close to the main trunk and you have a branch then if the trunk is higher your branch will get higher so that’s kind of how we think about it.
So you said you have an engineer degree, how do you see the evolution of the education for engineers but maybe also for all the other students to be able to to deal with this new generative AI ?
I think it’s going to change a lot of things, it’s interesting because my dad did engineering school in the beginning, he started and he was doing industrial design and something that now, 20 years later doesn’t exist as a class anymore and I think we’ll have a lot of things that don’t exist today that will be all that people will learn in school in 20 years. We’ll have a gen AI native generation that will do things differently, like my kids are already using ChatGPT, at least the eldest on a daily basis, and they’re learning from that, they’re having fun so my elder she’s creating drawings for instance for her and all her classmates from ChatGPT. You know the future is going to be like other tools that you use, like we said with Google, it will change a lot how we learn and how our brain works, you know having access to information like at the tip of your finger for our generation with mobile, it changed a lot how we see culture, every time you have a debate you kind of search the information, now it’s kind of reasoning so it kind of expands your brain on what you can think about with a constant assistant. The question for the next generation is can they get the fundamentals when it’s that easy to create something and I guess like people would say the same thing on the internet 30 years ago, so I’m quite optimistic on that but you can’t help but wonder what’s going to happen, how we are going to learn in the future. So it’s a big unknown but I think it’s important that we adaptt and we learn how to use this tool and the best way to learn is to be using them and leveraging them on the daily basis I think.
You have a very optimistic vision and it’s great in comparison with a lots of people who are a bit afraid of these generative AI possibilities.
I’m very optimistic, I think I’m I’m relying on the vision that we have many years before we have a general intelligence. I don’t think I saw a lot of like pessimistic movies so it doesn’t need to be the the full story. Overall I’m quite optimistic on what we can do because you can create so much and working in the startup ecosystem you always have to reinvent yourself so I think that’s the fun part of being in this ecosystem, but it’s true that for a lot of people who have job today you have to ask yourself how can I do my job tomorrow and maybe reinvent. It’s moving so fast that it’s going to change a lot in the current generation, and how we work and that’s the goal of the society, the government to make sure these people are not left on the side and we we provide a way to teach them how to use it. But the best way at the end of the day to not be afraid of a new technology is to to use it, play with it and see all you can create with it.
So what’s the next Revolution ? How do you see this field evolving four, five years from now ?
I’m very excited about multimodal and video, but multimodal is the idea that you can leverage different forms of content at the same time, AI can read the photo and understand and do things from that. I think we’re going to be surprised by the service you can create for multimodel like we have hackathons every three months at PhotoRoom and the things the team has built were phenomenal and so there’ll be many services that help many people, coming from the multimodal approach and then in five years I think you can have new ways, new products we haven’t imagined yet, but coming from this, it’s so easy to build something new and you see ChatGPT, it goes from zero to 100 million in less than six months, so massive distribution.
So what would be your recommendation for people who would like to explore further this subject ? You have a book, an article or a ?
I love learning by doing so I think my first recommendation would be to use these tools, use PhotoRoom of course, use ChatGPT and try to remove the friction of using it. You can exchange with ChatGPT now, use Gemini from Google. I love this quote of “The future is already here, it’s not evenly distributed.” and you can actually live in the future using this and these tools and understand what is we’re going to build from that. My first recommendation is using the tools. I think the investors from the US, Sequoia, Benchmark, they wrote some very interesting articles on step two of generative AI especially about the user interfaces of the product we’re going to use. In the future you’ll have like new products, new interfaces that are going to be built with Gen AI and that’s going to be fantastic and that’s what we see at PhotoRoom, you don’t want a text box to create images, you want to move things around and that’s what we built. Te last part I’m really excited about, the multimodel I told you about just earlier, the idea of photos plus text and understanding even videos and putting all of that together, it’s going to be amazing.