Episode 67

How To Build & Scale A Visual-Dubbing AI Company (NeuralGarage CTO POV) - w/ Subho

Jun 26, 202500:53:11Video episode
How To Build & Scale A Visual-Dubbing AI Company (NeuralGarage CTO POV) - w/ Subho thumbnail

In this episode my guest is Subhabrata Debnath. Subho is a co-founder and CTO at Neuralgarage, whose proprietary solution VisualDub provides state-of-the-art LipSync using AI while maintaining exceptionally high visual fidelity.

Who this is for

  • You want to make the thing real enough that strangers can see it, use it, or buy it.
  • You would rather hear Subho's version while the mess is still fresh than get another polished hindsight sermon.

Key takeaways

  • Build & Scale A Visual-Dubbing AI Company (NeuralGarage CTO POV) - w/ Subho
  • for all. How? At present the problem that we started tackling primarily is how to make dubbed content seamless. What...

Transcript

The full conversation, right here. Auto-captions, lightly cleaned, still very much a real human conversation.

Open source video
10,680 transcript words85 transcript blocks
00:00:02

NL Garage is trying to marry and align the audio and visual cues so that it looks as if Squid Games was shot in a language of your choice. What I want is to stop the game once and for all. How? At present the problem that we started tackling primarily is how to make dubbed content seamless. What are some of the others that you have also gotten the chance to work with? Amazon and what else? Coca-Cola have used for something very interesting. During the last ICC World Cup, what they used is for momentum marketing. They used to use visual up version and release something where hash was speaking about what happened just in the match. Boom. Boom. Boom.

00:00:47

And it did not really come off as an ad because he was giving something which is extremely contextual and relevant to the moment. I'm Nan Pandi. This is the Ready Set Do podcast. And in this episode, my guest, that's right, I'm not calling them not experts anymore, is Shubhub DNA. Shouho is a co-founder and CTO at Neural Garage, whose proprietary solution visual dub provides state-of-the-art lip sync using advanced AI while maintaining exceptionally high visual fidelity with the hit feature query 2. Hu Garage also earned the world's worst visual dubbing credits. So see the real challenge that we are tackling is that the content has already been shot. Yeah, I know you were at SXSW recently and I think you shared that you also won. What was that like? We got a

00:01:31

also won. What was that like? We got a connection to Disney Universal who we met after our wishes. So that was really good as well without putting any dollar on the marketing. In line with our theme of learning from somebody that's just a few steps ahead of us. My goal with this episode is to spotlight the incredible work that's being done at Neural Garage and how Shu's team is putting India at the very global forefront of advanced AI visual dubbing technology. Subscribe on YouTube and any of your favorite podcast apps for weekly episodes every Wednesday and daily clips from those episodes on Instagram and YouTube. And now without any further ado, here's Shou.

00:02:09

Welcome to the Ready Set Do podcast where we learn from journeys of not experts who are just two steps ahead of us. Shou, welcome. Thank you for having me over. So excited to talk to you about Neural Garage and all of the wonderful things that you guys have been building out there. Um, to set the stage a little bit as is now tradition on this podcast.

00:02:27

bit as is now tradition on this podcast. What is your hot take on the AI startup industry that outsiders don't generally know about? Uh, very honestly, you know, I've been doing AI for about last 10 to 12 years. Uh one of the things is that it is not as sexy as it looks. Uh there are three things which have really happened in last four to five years. Uh which makes it look really good. One is the uh content guys have really up their game. Uh there have been a lot of uh you know open-source research and at present even research scientists are aware of it and even startup that we need to get our

00:02:59

and even startup that we need to get our story out. uh because you know we have multiple stakeholders like their investors their clients and uh more you can actually tell your story more they are able to relate to it uh but in the complete back end of it uh it's all you know maths and algorithms uh it is something which um I I know a lot of people who are interested in AI uh they primarily see the shiny bits of it but if you really want to contribute uh you have to go back to your main and core principles those things even today are not easy to do. Uh in terms of a AI solution, when people say stuff like is going to take your job and things like

00:03:36

going to take your job and things like that, uh that's a different vertical. But AI on a vertical itself, if you really want to make a contribution to it, it's still the same that it was last 10 to 15 years ago. That's fascinating. And the first thing that comes to mind is uh one of the courses I took at Purdue was it was just called machine learning, right? So this is my first semester. I was like, okay, let's study machine learning. So I show up to class and for and thankfully this was still the first week so you're allowed to change your courses before they're locked in. And this professor continues to talk about just straight math matric matrices for uh you know a whole hour straight and I was like how is this machine learning like this is not what I

00:04:17

machine learning like this is not what I signed up for and of course you're right that's exactly what you said right that is what it all boils down to obviously at the time I had no idea so I was like yeah I ended up actually not taking that course just because I was not prepared or actually even interested to do that math heavy of you know a curriculum But yeah, I think what you're saying definitely rings true. And I do agree that not many people realize that sure you can wipe code your apps now. First of all, the there's there limited, right? There's only so far you can go.

00:04:48

right? There's only so far you can go. But yeah, I do agree that the real uh kind of expertise must be developed by yourself in you know huddled over putting in those long hours understanding the very bowels of how all of these things work because see uh just a small addition to what you said if you can write code that means that you know enough context was already available in the internet for the model to learn right so if you really want to make a contribution that means it is basically something that AI today can't do that is what the industry would value if it is already available at scale. Uh then you are doing something which is already democratized.

00:05:25

already democratized. So the value for that is much less. So what's your take on all of the GPT rappers that are now becoming a thing? Do you think that's just a fad? Uh I think some of them will find their value. uh but what it has really done is uh it has changed SAS or certain products that we had right uh earlier product itself the rate at which you could shim products especially SAS products uh that used to itself be a big defensibility uh I think to you know not just distribution and business knowledge and stuff like that but also the rate at which you could ship products but with some of these rappers which are

00:05:58

some of these rappers which are essentially helping you to code faster or go to you know point A to point B faster uh I think that industry is changing very rapidly. Sounds fair. Some of them Yeah. Some of them will find uh value but what I've seen generally uh is that you know the product road map is essentially changing so fast. It is very difficult to you know have a very good prediction as to where it'll go in the next three years. Absolutely. And even big companies like so Grammarly you know one would think that they have a nice big moat and then uh Apple intelligence dropped its own version of Grammarly now. So what happens to grammarly now and this is grammarly we're talking about. So there's way smaller companies than that also that can just get wiped

00:06:41

than that also that can just get wiped out overnight by one of these like you know big companies. We have three uh you know just put in one you know point what I said absolutely for example Slack uh Slack is something every coder uses in their development teams. So initially sack slack had a very good growth right and Microsoft teams uh they offer a bunch of features together uh but they were really lagging behind but because Microsoft has distribution it comes in free with a total subscription they're being slack in the over distribution I think similar Apple because if you are already using it they're wanting you to go for another subscription exactly if it's a little absolutely wow definitely probably the longest uh you know hot take segment that we've ever had but I

00:07:25

take segment that we've ever had but I think that was pretty pretty enjoyable. So but shifting gears can you tell us about what Neural Garage is and why you started it? Sure. Uh neural garage you know if I have to define it in a smaller segment uh it would be we are trying to make communication seamless and uh this doesn't just mean communication between two individuals. It also means the communication that uh a movie producer wants to have with you marketing professional wants to have with you. Uh at present the problem that we started tackling primarily is how to make dubbed content seamless. Today when Squid Games is made uh dubbing solves only one part of the problem. It only allows for comprehension. It allows you to

00:08:06

comprehension. It allows you to comprehend what is being said but how it is being said is very different because actors facial movements expressions are very different in the original Korean cont. NL Garage is trying to marry and align the audio and visual cues so that it looks as if Squid Games was shot in a language of your choice. As a consequence, what we're trying to do uh we are trying to make the hero of Squid Games a global star, not just a Korean star. We're trying to make Shah Rukh Khan a Hollywood actor. We're trying to make Tom Cruz a Bollywood actor so that you know content access can be democratized. Because today really what happens is uh for example I'm not a big fan of Korean content and one of the

00:08:44

fan of Korean content and one of the reasons is that I just watch the ones which are really popular. Yeah. Yeah. Because dubbing is a distraction. It really does not come up. Agree. Definitely agree. Yeah. And uh with the you know and good content is good content no matter what the balker is. I've seen a few foreign show as well. For example a German series called Dark. I watched it because it was spins but again there's so much more short uh borders are waiting to get crossed so neural wants to be a catalyst in that movement so that more and more content can cross now for our listeners can you help provide kind of a competitive analysis is this something that neural garage is pretty much at the forefront of or are there other companies that have in the recent past or are currently also trying to solve this problem

00:09:28

also trying to solve this problem because the only reason I asked this shu is I've just simply never heard of something that in my mind now as you say all of this feels like a definitely you know not only robust but lowhanging fruit in terms of this is a problem that we need to solve. So what can you share about that? Sure. So essentially uh you know certain problems are there which are kind of obvious uh but we learn to live with it unless someone shows you another way. It's true. Yeah. uh and if I have to define this problem uh you know it's not just in this is cinema industries this also exists in case of let's say influencer uh someone anywhere

00:10:07

let's say influencer uh someone anywhere there's media right you can probably find a use case for this any sort of media yeah right uh so essentially if I divide the entire uh landscape into people or companies which are focusing on the digital space meaning YouTube or Instagram um there would be one segment of companies and then there would be another segment which is focus focusing on TV plus screens. Uh what happens on TV plus screens is the quality of video which are shot is higher because you have different cameras for that. Uh your movie formats are very different.

00:10:37

movie formats are very different. Basically things which go on theaters. Uh neural garage is trying to target the second se and uh both the markets are very large markets. Uh we have certain players I won't take names uh in the digital segments who have found a lot of value in this. uh in fact both Google and Facebook uh at their conferences have said that they want to create something and they are creating something which they will integrate directly in their distribution platforms that is uh Facebook ads and YouTube right uh the reason that we are focusing on the you know movie segment is because of two things uh one is that it is really hard to do right and from a very uh technical standpoint we would want some level of defensibility in that

00:11:19

some level of defensibility in that while open source research primarily focuses on things which are available online because you can have the data for that you can have the algorithms for that but when it comes to you know screens or uh content which goes on TV plus screens uh the video files are very different just for a reference 2 hours movie would be around 500 GBs inside which goes on theaters uh here not exposed to these kind of uh video things which are required to train sub algorithm uh algorithms itself they are very different and there are very strong defensive has met. Uh so Naraj is focusing on that. Apart from that what we've really seen over the last few years is that when it comes to let's say influencing or content which is digital there are other factors which play as

00:12:03

there are other factors which play as well. Uh but when it comes to let's say theatrical content or films or OTD uh people have already realized this problems at least the buyers for example Netflix about 5 years ago they never used to dup their content. uh today with content you know on an average 34 plus languages right uh maybe from your location it's just six or seven but if you use a VPN and change the location you'll see there you know 10 more languages uh which are sanctioned for that geography now this problem is already well established what it does to a filmmaker is that it allows them to monetize their content when they send OTT or they syndicate their content or sell to broadcast what it does to OTT is to equalize this demand for content. Uh

00:12:48

to equalize this demand for content. Uh even now I watch a lot of OTT content but amount of good content is less right. Uh so it allows users basically to maximize their past investments and get multiple streams of it. Totally identified this problem. We really wanted to focus on that and uh the second reason was that you know our co-founder uh Anjan who is also a very good friend of mine this last 10 years he actually had this blog and we started the company because of that. Uh I think uh it was co times and uh we all of us were watching a lot of content and he primarily loves Korea content and one of the weird things is that you know if you really look at something regularly you started seeing it more closely and then

00:13:32

started seeing it more closely and then he was down faults in it right. uh so as he consumed the content more he started seeing that you know dubbing has certain flaws and uh we needed to do something in order to fix that and uh we have been working on AI for a very long time so we thought of something and then how you know we got that's how it's done amazing wow what a what an inspirational story so what is your specific role uh at neural garage like what are you most concerned with uh I'm the chief technology officer of the company okay awesome in our case uh technology ology specifically needs research. Uh onion is the chief product officer. We have purposely separated the product out

00:14:11

purposely separated the product out because each of them you know needs their uh specific niche and contribution. Absolutely. Y well you know what we are really trying to do is something which is uh very complicated but we know it has immense value from day one. What we've really done is we've spoken to a lot of customers uh or prospective customers and we have regularly incorporated that inside our research. uh and because of that the research segment the reason we have separated that out is because we feel that as we go forward uh our research team will be one of the prime um you know drivers of the company because again uh iterating fast at a research lay is essentially very crucial you know in this CI first world. Yeah that

00:14:51

in this CI first world. Yeah that definitely rings true because yeah you have to be not just competing with competitors but also with the latest and greatest models out there. So I understand if this is kind of you know if you're like I can't get into that this is way too hairy but I am really curious to just on a you know very high level very broad level understand how this is done right like how you're able to take in I'm assuming an input frame which is just a person's face or something as they're talking and then um edit that just change their expressions and like their lip movements to sync that with the audio. Um, is there are you able to just share maybe just a very overview version of I'm just so curious

00:15:33

overview version of I'm just so curious to learn how this is done and I don't even really come from deep tech. I come from what I'd like to refer to as shallow tech. So and and as do most of my listeners I I I can say so anything you can offer on that would be very very helpful. So see the real challenge that we are tackling is that the content has already been shot. Correct.

00:15:53

already been shot. Correct. There are some technologies which are the say which create aars or which I don't exeute video generation there the advantage that those technologies have is that they creating a new video. So there is no quality reference point they can say we are creating a high quality 4K contact but the moment you have a reference point to match it is very difficult. So in our case we are changing a part of the face right the moment you're changing a part of the face you have the rest of the face as a benchmark or a relative comparison right and so the things that we do is we use the audio activations as a controller to transform the now what goes into this is we change 40 facial muscles or representation of those muscles in the

00:16:36

representation of those muscles in the face. So that basically means when I hey naman and hey naman these two lead to different movements uh starting from my jaw to my spine lines everything has to move in conjunction because the vision that you know we at neural garage have is that it cannot be you know something uh which looks AIish we are not yeah yeah exactly trying to create uh are trying to let's say make squid games uh libs in English we are trying to make it such that it looks as if it has been shot in English. Exactly. Y there is real value in that. So that is the part that we're targeting and because of which we transform the facial muscles.

00:17:16

which we transform the facial muscles. Uh after we do the transformation, a big challenge is also how do you blend it back? Because if you're changing something per frame uh there is also a temporal component that is uh there are 25 frames let's say in a second. All of those movements they have to also come together. I cannot si have my lips open and then sadly close right that when so that's that component as well. So all these components come together apart from that there are other challenges.

00:17:43

from that there are other challenges. For example if I am speaking like this and I say n my you know chin goes down if I look down and say naman my chin goes in and out. So there also a depth component right. Yeah 3D component. Correct. Yeah 3D component. So all these things are taken into action. There are moving cameras there are moving persons.

00:18:01

moving cameras there are moving persons. uh but again we are solving one challenge at a time. Uh we are very proud that you know we have been able to contribute uh significantly to this growing of this industry. Uh but we always have a very strict target as in uh today you know I would say we are at a stage in terms of our quality of outputs that uh if I tell you with CI and if you look very hard enough maybe you'll find it. Our benchmark is that even if I tell you and it's your content still you'll not be able to make up.

00:18:30

still you'll not be able to make up. That's incredible. Wow. That is so cool to even like you know just think about. And also on the development phase um what were some problems that so all of these things that you mentioned you would have probably huddled together before you got started and you would have probably discussed some of these things right we'll run into this problem that we need to take into account blah blah. Um what were some of the unexpected developmental hurdles that you had to face? Especially any that uh come to mind as being especially annoying that you just simply did not expect while while this is being bu uh one of the challenges that we did not really anticipate was that you know we never had experience of working with these kind of video. Uh you mean like the really big 500 GB is that? Yeah.

00:19:15

the really big 500 GB is that? Yeah. Yeah. Uh so 500 GB is a movie reference. uh we started with advertisements. and Amazon was one of our first clients. Uh so when Amazon shared the content with us, uh our code was not even ready to read it because what happens is that uh I'll just give one small technicality to uh yeah these we love technical details here. So these variables in uh you know that that we use in our algorithm they are generally 32-bit float and 32 bit means uh we have red green and blue values where each of them are 8 bit 8 bit 8 bit and uh when it you know 8 bit means there are you know eight locations in the memory each of them and because

00:19:55

in the memory each of them and because it's binary so that means that you can represent 2 ^ 8 red uh red values 2 to the^ eight green values and two to the^ eight uh blue values and the last one because it's a 32-bit float to have in 24. Last one is an alpha channel which means transparency. Yes. Yep. Opacity. Mhm. The challenge with that is that you know these videos once we we got from Amazon they were something called a prores.

00:20:21

they were something called a prores. So these are 10 bit per channel RG&B. Uh and our number 8 bit RG&B. uh that too on uh you know note quality data took a lot of step to learn this do it well a lot of algorithms some some hacks this and that but uh that was really frustrating because that was completely out of syllabus uh because you know we had worked with images primarily we had worked with videos but these super high quality videos because these were inputs to our algorithm and it's like to AI one thing that you you know are very confident with AI is that it's garbage in then it's going to be garbage out

00:20:57

in then it's going to be garbage out yeah that's uh the AI whatever models we create no matter how good the algorithm is they can represent the intelligence component but the knowledge comes from the data uh just essentially how the intelligence is applied on the test right and uh because our data was significantly different there uh we had a very last second uh challenge that was very frustrating to fix uh but over the time to lab that we did uh because certain things you know you don't really face unless you go into production uh and Yeah, that's so true. Yeah, these learnings in the video build that we had over the course of last uh two and a half years have been really massive. Uh you know that was just one challenge going forward today uh you know we work with something called EXRS which are

00:21:43

with something called EXRS which are essentially lossless formats. Uh to someone who has even worked in vision for 10 years to me PNG was a lossless format right. Uh so you have JPEG images which are highly compressed and uh then you have PNG images which are not that compressed. So EXRs are literally uh you know uh 12 to 25 MB files. So imagine you have a 1second video and that 1 second video becomes 25 files each of 20 MB right this is the quality uh of the cameras which I use when they shot when they shoot on something theatrical. So the challenges and the learnings have messed. So how did you end up fixing that particular problem?

00:22:24

end up fixing that particular problem? So did you have to go back to your algorithm and make the input sizes way bigger or what's the fix there? Uh yeah so initially uh right now what we did was we had to ultimately retrain our entire algorithm. Oh wow. Oh god. Also rebuild it. We had to also rebuild it. So basically there were two things right which happened that time. we had to find a quick fix uh which was yeah which goes within let's say about a one month you know leway period we had so what we there did was so RGB is one of the color representations which are coverly used uh there's another representation which is called HSV H is for hue S is saturation and B is right

00:23:06

for hue S is saturation and B is right hsv yep yeah and uh so we had studied of uh these in our graphics and computer vision courses so RGB if it's like a cube right in a 3D axis you turn it uh and make it a cylinder then that you can represent it as a HSV channel where V essentially is like the gray component in the RGB which represents what intensity is there because even if you see a gray film or a gray thing you know what is happening you know the edges all the the uh so if you think about it for intensity we did not need 10 bit uh because it does not change uh because what is going on the sharpness of edges they almost stay the same. It was mainly

00:23:49

they almost stay the same. It was mainly the color components which we had to do. Yeah. And saturation is responsible for the color components. Hue essentially is how much of red or how much of blue how much of green is there. Uh so that also we used reduce format but it was a hacky solution. It did not provide the know ultimate power that we wanted to show that uh but it was a good thing to you know do last second. So essentially we fixed for the another component going into other in the color space but I would say that the main reason that we could do it was because we had taken some course uh you know in our initial year and uh we just remembered that and

00:24:23

year and uh we just remembered that and we were able to fix that. So luckily or you know I would say we are very blessed that somehow we have taken that no idea it's going to be useful in the future but yeah yeah that's just how it goes right you never know what when you might need a certain tool that was in your tool belt for the last 10 years and you've never used it until now where you have to use it and there's just no substitute for that. Um while you were sharing the Amazon story, I am curious.

00:24:49

sharing the Amazon story, I am curious. I know you've done a bunch of work. So I kind of just want to uh let you talk a little bit about what sort of work you've done. So obviously K32 is big. That's probably you know if as far as I'm aware the biggest uh project that you guys have done but what are some of the others that you've also gotten the chance to work like you said Amazon and what else? Uh so you know today there are about uh 50 plus global brands uh which are us using uh they so these would include likes of uh Coca-Cola uh within Amazon also there is Amazon fashion there's an uh there's Amazon tail uh all of these brands have used us

00:25:26

tail uh all of these brands have used us uh we have clients like Nestle there's Packard uh the list goes on there's ITC there's Britannia uh again a lot of subbrands of Britannia as well uh the use cases that we have really seen have sprung uh uh it was not something that we had anticipated uh we were primarily looking to make D content seamless and there the use case being an ad which is short let's say in Hindi uh how do you make it more receptible in southern markets let's say for example uh so when that ad goes into a Tamil market uh dubbing artist in Tamil just dubbed it right but again the fishing mob best go for a so we were already solving that problem but some of our clients invented certain use case for example Let's say

00:26:08

certain use case for example Let's say you shoot something in December. Uh it loses its relevance in February because now the messaging has to be different, right? The discounts are different. The festival is different. So your only option is to either run the same ad or you go for a re-shoot. Uh again re-shoot is extremely expensive. So what we were doing is we were doing dialogue with pacem where you could repurpose something that you have already shot but in the same language you make the person say something else. Uh now the advantage that you have with this is uh you are looking at video as you know one point right and now this point is limited because uh in the spatial or the geographical region if you are let's say shot some if you've shot something in

00:26:51

shot some if you've shot something in Gujarati geographically solely relevant in Gujarati right uh you're extending the borders up to this how how much this content is going to be valued across the borders by using visual d that is one uh but you're also increasing the value of this in the temporal zone that is you're making it relevant for a longer amount of time. So if I utilize this in a graph essentially it makes more spatial area it also increases the relevance over time. So these two use cases we've seen big use usage apart from that Coca-Cola have used us for something very impressed uh during the last ICC World Cup what they use used us was for momentum marketing. So basically uh hers

00:27:32

momentum marketing. So basically uh hers was their brand ambassador and uh what used to happen is that immediately after the match they used to use use a visual version and release something where hash was speaking just what about about what happened just in the map for example. Okay. I see. Okay. Right. And it did not really come up as an ad. Uh because he was giving something which is extremely contextual and relevant in the movie. Yes.

00:28:03

and relevant in the movie. Yes. And people have only been able to leverage this kind of marketing with influencer but with celebrity very difficult to do because of their time time crunches. Right. obviously and apart from that uh even geographically uh before the you know match used to start uh there was voting etc. So they used to say That's for calc etc etc. So both geo targeting and momentum marketing these are use cases which are invented by our clients. Uh you know these use cases keep popping up. Uh case three again was a very interesting use case. Uh essentially if you know about the movie it's about uh the jaywa chapter uh so a lot of the actors there are British actors and uh they have difficulty speaking in Hindi. uh to think about it acting itself is a performance which has

00:29:00

acting itself is a performance which has two components right one is the acting itself uh and the second is more of a memorization component which is dialog delivery correct yeah uh what we are able to do is to disentangle these two things uh because if you made mistakes in post you can fix it with visual d right uh because you can change essentially what need to speak so if you just focus more on what's focus and less on dialogue delivery Visual is something that you can do for case specifically because people could not speak something in Hindi and it's a Hindi film. So the actors comfortably acted in English and later on their things were changed as if it looks that they have been shot in

00:29:41

it looks that they have been shot in Hindi. So that was the use case. The it allows a director or a producer to cast the best person you know for the role not being limited by language. Wow. That is just so I mean this yeah a lot of this I I I did had have a decent idea before you know when we connected earlier but the thing that's kind of really blowing my mind is that yeah and I don't know why this I didn't just think of this before but you're not just limited to replacing what somebody said in a certain language. You can add new words, remove words that they said. Like you, this is one of those what is a video, you know, moments anymore cuz like what is a video? Like does it have

00:30:22

like what is a video? Like does it have to be uh unedited now? What does it mean to have it be edited? So I guess where I'm getting at is um do do you see any ethical constraints around any of this? Like have you had anybody come up to you and be like, "Hey, why don't you let the actors act and and not, you know, change that?" so that we can appreciate the you know performance for what it is instead of like AI you know tweaking it. Yeah.

00:30:50

of like AI you know tweaking it. Yeah. So see uh from day one only we have been very focused on use cases which allows everyone in the chain to win. Uh for example you have had a lot of uh suggestions as to do key phase or phase swapping. Uh that is something we don't want here. No more. Yeah. Right. Cuz where do you stop? Right. You know, might as well. Yeah. And let's say you know the content was already been d used to look bad. So this gives actor an opportunity. opportunity. I get that. Yeah. And when it comes to let's say a dyno replacement thing as well. Uh what I think is and what I've seen also contracts have evolved right.

00:31:26

seen also contracts have evolved right. Initially when we started this people were not aware of this but now every time let's say an ad is used at least in some of the contracts that I've seen uh ana can get paid uh earlier if you think about it let's say a big actor is there and there's a shoot he can only do one particular shoot because he's allocated to that right now if there is some value that he's getting out of this he can be at million pieces at the same time he doesn't even have to call the shook if he's shorting once right so it benefits him as well it's not that you know we're

00:31:57

him as well it's not that you know we're trying to replace the acting performance that is something that has to come from the actor. We're trying to only separate out the memorization component that is dialog delivery. Right? So again internally also we always try to debate and ask ourselves exactly that what are the problems this industry faces. Uh today when you interact with anyone and you will also know it to some extent because you record video. Uh it is very difficult to get a video right and uh there are components to it. uh you know one of it being is this memorization of lines right and mistakes that you make which are specifically speaking mistakes uh now consider this that instead of one single person speaking in front of the camera there's an entire scene so there are other mobile bus there are other

00:32:41

are other mobile bus there are other people also speaking to you so everything has to come together and this t I've been given by a director only reputed director that about 20 minute you know uh for scene generally is recorded in a uh at least when it comes to you know Indian god uh because you know actor often makes dialogue mistakes. Yeah, I'm sure. Yeah, it must happen all the time.

00:33:03

sure. Yeah, it must happen all the time. Only in a multi character scene, it is even more difficult because everyone has to do the scene perfectly. Right now, if you can fix all of that post, it benefits everyone. You know, you can get the film faster uh and you're still focusing on performance. And ultimately, what I've seen is that if everyone wins and if the, you know, technology is good, uh then there is no stopping it.

00:33:25

good, uh then there is no stopping it. uh you know people are going to adopt it uh no matter what the also the way yeah definitely this idea that you're sharing of kind of um separating right the performance and the delivery is not something I had considered ever actually so yeah I'm going to need to think about how I feel about that but continuing forward here uh because you mentioned the industry I do think that must have been so fascinating right for um somebody like you who's just you know an engineer this kind you know does his thing to suddenly be exposed to this uh highmoving um glamorous uh Bollywood/ Bollywood adjacent industries. So what was that experience like and do you have

00:34:07

was that experience like and do you have any anecdotes that you know that come to mind when it comes to interacting with people from this industry be it as you said directors or actors or uh you know what have you. uh so you know one of the important things that we did when we started the company was that uh we got a co-founder who understands this industry so uh our founder and CEO Mandar so he used to head wcom uh he had run a revenue business for MTV for quite some time in India got it and uh he spent 22 years in vineyard entertainment so he has a lot of connections and the advantage that we had with that was that

00:34:41

advantage that we had with that was that from day one we went to these studios we asked them what were their file formats uh you know what exactly uh do you have what you don't have uh what you got up top what softwares you used so the client interactions have been really good because it's very difficult to actually interact with a client and understand his actual needs in the industry uh you know some of the directors are producers I'm a fan myself uh so it was you know really interesting to talk with them uh some interesting stuff that you know I can't name the actor but again one of the top in uh Bollywood whatever heard is that that he because you know essentially we were

00:35:19

because you know essentially we were pitching this that you don't need to memorize your life and that particular actor never memorizes his life what he does is he pays it on his co-stars or the cameraman's head and if you observe that again he's one of the top actors in India you will never see he's looking at the camera he's always looking here he's looking here saying a dialogue but he's reading a line uh and he focuses more on the performance rather than the dialogue delivery but it was an interesting thing for me is I've been a fan for you know over 25 years. So yeah. So you were at this shoot? Uh no. So we have not met

00:35:52

this shoot? Uh no. So we have not met him directly but he was okay. Okay. This this is something that was communic. Yeah. Cuz I feel like they don't really allow anybody at this shoot. Isn't that right? No no no. So that's that's not really the case. Uh I have not been in buff because this interaction was uh by the agency who handles that and he was interested in the investment as well. Uh so we were interacting with them and pitching this as a use case and they said you know FYI my guy does not do this because of the uh okay I see we have been to several shoots essentially we wanted to understand also uh you know how shots are done because what happens is that for let's say that you have

00:36:28

is that for let's say that you have occupied some time for an actor you want to shoot for a particular brand uh what happens is that there are certain lines also which the actor has to say right uh we wanted to see how much time uh you know he actually spend And how many how many retakes are there? Uh because all of this is important as to how much our software will be valid in the in this industry. Yeah, totally. You get been to several shoes. We've been to I think the shoots of Batai, they've been a lot of them uh because uh you know the brands which uh you know have them as brand ambassadors uh they wanted us in the shoot as well so that everything

00:37:06

the shoot as well so that everything goes right. So yeah. Got it. Huh. And then Yeah. So you mentioned your uh like the one of the founders that had connections at Biocom and um MTV. You you said he had spent a bunch of time in the industry. So really only asking this for you know any enterprising individuals such as yourself that are younger that have tech skills that now want to you know congregate with others in a certain industry to find to also become co-founders. How did you come across this person or how did you meet?

00:37:38

across this person or how did you meet? Was this just online or how did that that time it was online because it was co uh we really started interacting but uh basically he was a childhood friend of one of my college professors and wow and uh what had really happened was that uh he was also taking an industry break uh basically he I think he he was going to one of the you know top premier uh uh management colleges uh to do his uh business degree. uh because he wanted some you know gap from his corporate career but that got converted to a distance degree and he was not really interested in that and he was enjoying some good time with his family. uh but when we had this idea and we spoke to him, we really wanted to

00:38:23

we spoke to him, we really wanted to have some insights as to how much the industry would vary this whether this is a problem in our head but is it really a problem and uh what he immediately got us was that he went and uh along with us visited about 50 odd studios in Mumbai uh all pop studios and we spoke to the technical heads there uh people who do tapping uh voice actors everybody and uh after that only uh we realized that look this is a much bigger problem than we are thinking about it. Uh and uh we should probably get together and swap and uh the chemistry was really good. Uh so yeah that's how we decided to overcome. Wow. Huh. So I think yeah takeaway here is just upskill right

00:39:06

takeaway here is just upskill right learn actual helpful skills that help you build products or at least underlying technologies for these products and then maybe eventually if not just by uh network effects but you will probably get in the right room with the right people that you know can help you realize that and if not then you can just realize that yourself. So that's that's the CEO, CTO, CPO done. What's who's the fourth person? I thought I think you mentioned four co-founders, right? Me, Anjel and Shubash we know each other since 2013. Uh so the Swas you know have taken courses in it.

00:39:42

you know have taken courses in it. Uh I mean that time AI was not a buzz word. It was one of the can happen in the next decade buzzwords. So but we really didn't because we liked it and that was one of the common you know owning factor. Uh so Shubash looks after operations. So he's also officer. uh or at the back of it you know we are all technologists and in stage uh in a startup obviously uh you will focus and contribute as much to technology as you can because as you know you know this AI way one of the things that has happened is very frankly uh the cost of hiring a candidate has really gone up especially

00:40:20

candidate has really gone up especially if he has some AI under his belt so if you have worked on AI and you can contribute there that helps more than the operations that'll be real that's actually so interesting to hear you say that because I feel like in a lot of ways. Um, from what I hear with some of my peers and really some of the people that I help to that I help mentored in the US at least is that it's really hard out here even for even if you have like actually helpful um AI skills or machine learning skills even. Yeah. So, but I'm I'm very glad to hear that you have a

00:40:52

I'm very glad to hear that you have a different take, right? because obviously that means um you know that things are looking up in India at least for for these type of roles which they should according to me. So yeah I know you were at SXSW recently and I think you shared that you also won. So do you mind sharing your experience around that? What was that like? How was it to go up against um international uh companies or international tools that do maybe similar things maybe slightly different but yeah how was that whole experience?

00:41:20

but yeah how was that whole experience? uh you know last year it has been really crazy for us uh because u I mean when I say last year I primarily mean the financial year uh we were a part of tech crunch disrupt uh we got selected by AWS and uh SX happened after that and uh all of these again u global events uh crazy experiences uh got to meet a lot of other global founders who were solving verb level problems sub problems and other thing but SX was very special the reason was that it was very specific to the industry that web uh it was blah blah and uh the segment that we were nominated for was also on trend entertainment sport uh sports contest.

00:41:58

entertainment sport uh sports contest. Oh wow. And um uh you know there was a pitching competition which again which happened there and uh we were very proud when we showed that demo uh on stage on the last screen uh there was you know complete 10 seconds clap and uh we were we hadn't accounted for that in the pits uh but luckily we were able to finish one night uh but it was a very pleasing moment uh for us uh the experience itself was amazing because uh you know a lot of Bollywood actors were there I think Uh I didn't expect Batman, Ant-Man, and Iron Man to be at the same event. Uh so that's crazy, dude.

00:42:40

event. Uh so that's crazy, dude. Wow. It was the main reason it was difficult was that there were crazy lines obviously. Uhhuh. But as a part of winner of physics or top five finalists, uh we also had a premium pass. So we as events uh so we had to choose between whether you want to watch should operate down a junior or be an athletic. So we you know there were a lot of films which were also being long you know that I think there was a movie called uh death of a unicorn uh so we watched that at the paranal theater uh and then again there was a uh the movie of Ben Affleck I forget the name we couldn't watch the

00:43:13

I forget the name we couldn't watch the full movie of it because there was another you know event meeting investors etc but again really amazing experience and uh there were a lot of people also who are upcoming filmmakers right and upcoming actors and it was very important for us to talk to them as well because how they view technology uh was very important. There were certain events where the labor union of motion pictures with you know was speaking and uh uh you know there was one specific event which was regarding uh you know it was kind of AI versus how the labor union sees it and uh you know initially entering the room I was a bit skeptical as to someone who's building in this area but there's a general perception which uh a lot of people have pushed

00:43:54

which uh a lot of people have pushed that yeah he's going to replace everything this and that. uh which is not really you know a case that most are building for and uh but what we heard really inside was that people also want to really adopt it and I remember you know uh I forget her designation but she was one of the heads of one of the labor unions of motion pictures uh with that she said look there's no point in resisting technology uh she said that she distinctively reme distinctively remembers there was one entire uh proof which they had within their uh you know within their sub was who represented um uh projection uh which used to happen in those days right uh before the IMAX set a technology scheme and union resisted

00:44:43

a technology scheme and union resisted digital cinema a lot I see and was that look they don't exist today some people have played thems and moved on because ultimately uh you know for the filmmaker for the actor everyone entire uh I would I won't say money exactly but you know where they find value it can be in different houses where the industry totally right uh and if technology sometimes dictates you know dictate terms in uh no matter how much you plan no matter how much we do but subtitles we just need to adopt technology and move forward so that was a big take for us you know when it comes to SSSW there was also music part of it but we primarily focused on the uh filmmakers

00:45:23

primarily focused on the uh filmmakers and the film part of it but you know pretty good experience is in terms of the interactions that we have had we've also had a you know good interactions in terms of international media houses we got a connection to Disney Universal multiple uh whom we met after the official so that was really good as well without putting any dollar on the marketing budget yeah seriously I think and that's like the in my mind the biggest benefit of events like this right that you just get put in a room with everybody from random uh adjacent industries not even related industries sometimes but that can actually really add value and that goes both ways right it's like a a mutually beneficial relationship in in rooms like that which

00:46:03

relationship in in rooms like that which is why I've always been so fascinated by these events um one other note on that is when you interact with like founders from the west right have you picked up on any recurring patterns or um yeah really looking for like have you found any core differences between Indian founders versus founders in the west I'm just curious uh So you know one of the things that uh are there is that um I you know at the core of it I'm also a geek and a nerd right uh and uh even if I meet someone there uh ultimately it takes some amount of time to understand who we actually are and after that people who are I mean general coders

00:46:45

people who are I mean general coders they are mostly the same uh so you can interact with them very fast you realize that it's uh spirit uh in that so in that segment it is the same the main difference seen this that and I'll separate out uh and the reason I'm making this separation is because I'm from a technical background. We have founders who have really made their story look really good. They have a very good uh narration skill and uh they can tell the story of their product and their life really well. uh so when it comes to those founders in the US uh I think they have a much better skill set as to the because of the interactions that they have had because Silicon Valley in general especially when it

00:47:20

Valley in general especially when it comes to VCs and investors uh they realize that you can lose a lot even if you pass on so they are more more patient capital with excess you know there so the encouragement that the founders get for carrying something poor is much more so that helps the economy and uh If you I mean even now you know most of my work is talking to an entity which responds in commands right so that does not help our tech I mean speaking skill really the networking event no matter how much we prepare uh those are opportunities for us to improve that but uh the successful founders that that I've seen uh they're very very good uh nation skill compared to you know founders in this part so essentially

00:48:02

founders in this part so essentially you're kind of hinting that in a lot of ways it just kind of boils down to PR it doesn't boil down to PR Yeah, like you have to do everything else right obviously but what you're saying is the separation can be how one markets themselves now that can be how you talk the things you say how polished your pitch is all of that is is that just to make sure I'm getting that right yeah essentially that uh but you know also uh how big you can dream uh here I think there's a cap on that as well uh because of the you know they say uh for example in India uh I

00:48:37

they say uh for example in India uh I would say venture capital is still evolving if I compare it to let's say something in the Silicon Valley. Yeah. Totally. Yeah. Right. So in that sense the entire ecosystem or every stakeholder there would also evolve uh people who have got where and you know I'm not really differentiating between an Indian founder and a US. I'm also talking about founders there who are Indians right? Uh they have also improved that because it's that's just the environment that you environment day. Yeah. Yeah. That's such a good point. Yeah, I think Nawal had a quote where he was like deciding where you live is the most important decision you'll make in your life and h yeah if you're a tech founder why

00:49:19

you're a if you're a tech founder why would you be anywhere that's not Silicon Valley right and which is what people kind of extrapolated from that which makes sense like you said everyone you need is there the money is there what everything that makes you a better founder person whatever tech techie is there so yeah that that's such an interesting call out. Yeah. And not something I would have gotten to. Final question before we let you go here. Um to anybody out there that's building AI forward or AI core uh products or services, what is one piece of advice that you would like to give them? Um for for people specifically building with AI? I think if people are building with AI uh if they're looking at today it says uh then obviously exposure to a lot of tools uh platforms which give you um

00:50:04

of tools uh platforms which give you um you know a lot of tools together uh just get you know get exposed to them so that you know how to get your work done faster uh but if you are really you know thinking about starting with AI or going or you know trying to create a playbook where you use AI strongly then I feel that you know it's better that you look at core princip uh because these tools etc will also evolve with time and they'll probably saturate in about one to two years amount of time and then you would have a more refined set but what won't change uh is that today there's transformers and there GPTs but tomorrow maybe they'll not change they'll be better

00:50:41

they'll not change they'll be better overall right but if you focus on vector algebra probability theory these are what makes up the code and you'll be in a position not to just catch up these faster but also when you can contribute to them so if you start Yeah, absolutely. Wow. And so what lies ahead though for Neural Garage and for you like what's the next immediate near future looking like for you guys? Uh I can't name it but in about let's say you know couple of months or so again we'll be releasing one of India's largest films this year uh with one of India's largest production houses. Let's go love it. and uh multiple more such uh pleasant surprises for us also uh this year and uh yeah I mean hopefully uh the

00:51:28

year and uh yeah I mean hopefully uh the future and the stars in the line no totally and again you're making the stars align right it's not like you're sitting around um you know twiddling your thumbs so and I feel like that is the thing that I've taken away from this chat which I've you know so thoroughly enjoyed it's just that same notion of you know if you want to do something you just go you just have to go out and build it. Like there's no point just doing math or just writing your making your little machine learning projects like all of that are fine but if you don't actually take the jump to um you know doing what it is that you actually want to do yeah like your life would

00:52:05

want to do yeah like your life would have obviously been completely different right you wouldn't be winning SS SSW and featuring in K32 and you know so yeah um Shou thanks so much for taking the time here today I thoroughly like I said enjoyed this so much and I'm sure any listener that tuning in would be also be able to take away so much value. Um, is LinkedIn still a good place to find you cuz that's where I will be I'll be linking your um, LinkedIn in the description box for anybody to reach out. Is that the best way to get to you?

00:52:37

out. Is that the best way to get to you? I'm active on LinkedIn and Instagram. Okay, awesome. I can link both of those and yeah so if anybody listening if you have any uh you know if you're building something cool or if you have any questions for Shou feel free to reach out or if you have any uh if any brands are listening that would that could leverage noodle garage you know where to find but yeah thanks so much for taking the time this has been such a pleasure thank you Nan thank you for having really enjoyed this chat

Transcript-backed moments

A few lines worth stealing before you hand over the full hour.

Open on YouTube
00:00:02

NL Garage is trying to marry and align the audio and visual cues so that it looks as if Squid Games was shot in a looks as if Squid Games was shot in a language of your choice. What I want is to stop the game once and for all. How? At present the problem

00:00:20

for all. How? At present the problem that we started tackling primarily is how to make dubbed content seamless. What are some of the others that you have also gotten the chance to work

00:00:28

have also gotten the chance to work with? Amazon and what else? Coca-Cola have used for something very interesting. During the last ICC World Cup, what they used is for momentum marketing. They used to use visual up

00:00:37

marketing. They used to use visual up version and release something where hash was speaking about what happened just in the match. Boom. Boom. Boom. And it did not really come off as an ad because he was giving something which is

00:00:49

because he was giving something which is extremely contextual and relevant to the moment. I'm Nan Pandi. This is the Ready Set Do podcast. And in this episode, my guest, that's right, I'm not calling

Show notes

In this episode my guest is Subhabrata Debnath. Subho is a co-founder and CTO at Neuralgarage, whose proprietary solution VisualDub provides state-of-the-art LipSync using AI while maintaining exceptionally high visual fidelity. With the hit feature Kesari 2, Neuralgarage also earned the world's first visual dubbing credits.

More in AI + Tech Careers

Same mess. Different guest. Pick the next conversation that feels closest to your real life.