there are six basic algorithms that you really need to know regression L regression km gaming support Vector machines and Tre somebody that is trying to break into data science how should
How to Break Into Data Science (Interview Prep Masterclass from ex-Amazon and Walmart Data Scientist) - w/ KarunGuide
Data scientist career guide
The data career path looks clean from far away and weirdly foggy up close. Data scientist, data analyst, data engineer, business analyst, ML engineer. Cool. Which one is the actual door? This guide pulls together the episodes where guests stop speaking in job-board language and get practical.
Best for students, analysts, engineers, and career switchers trying to pick the right data lane.
A data science story, data engineering story, and analytics story should not sound identical.
Dashboards, pipelines, models, and business decisions all count when the work is clear.
Where people get stuck
They learn tools before choosing the problem.
Python, SQL, Tableau, Spark, statistics, ML. All useful. Also a great way to wander forever if you never decide what kind of data work you are trying to be hired for.
Where the guests get practical
They talk about interviews, dashboards, pipelines, and judgment.
The best data conversations on Ready Set Do are not abstract. They get into the work: what to build, how to explain it, and how to stop sounding like a bootcamp brochure.
First moves
Start here if the problem on your desk is real right now.
Short enough to scan. Direct enough to use.
From the transcripts
The lines worth clipping.
These are short on purpose. If one of them lands a little too hard, good.
to break into data science how should these people be thinking about Landing their first road I'm Nam Pand and in this episode featured not expert is karun tachan karun got his Masters in
How to Break Into Data Science (Interview Prep Masterclass from ex-Amazon and Walmart Data Scientist) - w/ KarunWhat exactly does a masters in econometrics entail? When we actually do econometrics, we combine real world data with a theory. Seen with econometrics many times we are trying to estimate
How To Transition From Economics Academia To A Career In Data Science - w/ Bhoomikamany times we are trying to estimate causal effect and not correlation. I'm Nam Pandandy and in this episode featured not expert is Bumika Jagi. Bumika is a data scientist and has two
How To Transition From Economics Academia To A Career In Data Science - w/ BhoomikaFor somebody trying to make the jump into data engineering, what are some of the important skills? Any resources that you found that really helped you or maybe any tools that are often needed? I
How To Get Hired As A Data Engineer - w/ Sammaybe any tools that are often needed? I have interviewed people before like candidates for data engineering roles. If you don't have that real experience that you can show me, then the next best
How To Get Hired As A Data Engineer - w/ SamFull transcript
The full EP 45 conversation is here too.
If you came here for the raw language instead of the cleaned-up takeaway version, good. That is the whole point.
Anchor episode: How to Break Into Data Science (Interview Prep Masterclass from ex-Amazon and Walmart Data Scientist) - w/ Karun
there are six basic algorithms that you really need to know regression L regression km gaming support Vector machines and Tre somebody that is trying to break into data science how should these people be thinking about Landing their first road I'm Nam Pand and in this episode featured not expert is karun tachan karun got his Masters in data science from CMU after which he had stints at Amazon as well as Walmart was lucky enough to land an internship at Amazon and convert that work there as a data scientist and applied scientist and now working as senior scientist at world one your experiences at CMU how is that different from masterm computer science you have the lead code blind 75 Le code SQL 50 and advanced SQL 50 if you are
SQL 50 and advanced SQL 50 if you are comfortable with all three you would good in 80% interviews could you help us un tangle data engineering data science and machine learning engineering I completed BC 18,000 bookings 4,000 oneon-one so most of my impressions of the job market come from those conversations our discussion today is focused on data science and everything that it entails for you to get hired as a data scientist most Mente s out on the preparation for dynamic programming and that usually comes to by the in 2025 meta is going to start replacing their mid-level Engineers with AI ice spark is very common Docker is very common airflow is very common visualization tools sablo is market leader data science has the distinct roles prodct
science has the distinct roles prodct data science and appli data science in keeping with our theme of learning from somebody that just two steps ahead of us instead of an expert my goal with this episode is to highlight The Incredible Journey that karun has had in the data science field and what you can learn from it to get hired yourself This Is The Ready Set to podcast and to support it subscribe to me on YouTube and leave me up to a f star rating on Spotify or your favorite podcast app there are also links in the description and now my friends without any further Ado here's karun Welcome to The Ready Set do podcast where we learn from journeys of not experts who are just two steps ahead of us karun welcome hey he ni we so so
of us karun welcome hey he ni we so so glad to have you um I want to kick off here by firstly congratulating you on being uh the top 100 in the data top mate uh list that had come up recently I think so from there I kind of want to open the door to um for you to just kind of share a little bit about the type of mentoring that you do in this field and especially around some of the most common pitfalls that you see in most of the students that reach out to you for help for sure why just maybe to give a bit of context on why I started when uh
bit of context on why I started when uh I was looking for a job in 2023 I had sort of built a somewhat small Network on LinkedIn by then um basically when uh I sort of made a post of open to work uh quite a few people in my network reached out to me and said that they can help sort of navigate the current job market and figure out how to land uh Ro in their companies uh uh people uh some of them who I I did know know some of them who were a bit new uh and I was like pleasantly surprised by people's willingness to help and I sort of wanted to pay it forward that's how I sort of
to pay it forward that's how I sort of uh started on top me uh and yeah um primarily on top may do a lot of one-on-one mentoring I uh talk to at least one person a day uh apart from them yeah that's not really man wow that's just incredible yeah it's it also sort of helps in like multiple ways one just to keep the energy up uh like people who are actually looking for a job have a different kind of dry uh than once you sort of get settled into your job yeah um folks who are fresh out of college or in college they work on like really interesting really amazing sort of like Cutting Edge projects uh which sort of again inspires you and also it it sort of helps drive like most of the content I generate on LinkedIn like I get to figure out uh I mean understand what uh
figure out uh I mean understand what uh the problems they facing in the current job market is uh what kind of questions they get in the inters and therefore it informs a lot of my content as well and uh uh yeah yeah and uh apart from oneon-one sessions do a lot of webinars as well in terms of like tackling the job market and uh yeah uh I think completed uh recently like 18,000 bookings cor th000 one-on ones so most of my uh I guess impressions of the job market and everything come from those uh bookings and conversations essentially and based on that I would say the most common pitfalls at least in terms of Landing a role uh I guess I could divide it into
role uh I guess I could divide it into uh two areas one is thought of the on the projects area so uh with within like projects it's basically uh how they sort of pick problems to solve like most students I feel don't really pick business oriented problems like the most accessible data sets tend to be the ones that are uh kind of commonly us some tle which is like the housing price prediction one or the handwriting recognition I know there's some medical stuff as well right on there yeah not yeah yeah uh so medical of being like bless cancer detection few things like that uh they are more of what we call toy problems they're really good to help you understand machine learning and get a handle on how to build models but may
a handle on how to build models but may not really come across interesting or relevant to a hiring manager or a recruiter who's going through your so selecting problems uh or like actual relevant business problems really helps you stand out in the current market and that is like one of the most common issues I see with most of the mes that I work with the kind of problems that they select and then a few other things with how they approach problems as in you need to be aware that model metrics and business metrics are different if in your resume you write you got an accuracy of like 90% or F1 those don't have a lot of meaning to a hiring manager it's more about the kind of impact that you're able to create in terms of business metrics and uh then the other thing is when they're
uh then the other thing is when they're talking to you in an interview it's about how you present how you tackle that problem most uh mentees have this habit of oh I tried this particular uh like a set of like these three algorithms this one worked the best so I went with that typically in the industry that may not be how you go about usually you do like what's called like a li literature review to see how people have solved this problem in the past and maybe um based on that figure out okay these methods have seem to work well therefore I will uh try and test out these methods typically whenever you make a decision or a choice behind a model or a metric it has to be backed by some kind of not necessarily research but just survey and it it becomes kind
but just survey and it it becomes kind of simplified in day and AG because of like flexity and sha GP makes ler review a lot more so make selection of project and that process I I feel like that is sort of the main um Pitfall and then there's another region with like applications but yeah we can dive that dive into that lat last one for sure yeah so I was going to bring that up um down the road a little bit so from already from what you've laid out I think that's very interesting in that for three that makes total sense to me right like they've probably seen that same problem being solved a few hundred times by now so what are you doing to differentiate yourself so I
doing to differentiate yourself so I really like what you said about picking a problem that you want to solve and that you know not everyone out there is solving um so the second thing from there is and admittedly a lot of this discussion might be you know I'll be trying to catch up with what you're trying to lay down just because this is not my field but it is kind of cuz I do come from technology but you know just not as deeply embedded into the code kind of things probably as you are so I guess what I'm wondering um karon is say somebody uh you know is just a student they don't actually have a lot of experience under their belt outside of projects and or just you know solving algorithms on leap code and such um how do you recommend somebody that is just
do you recommend somebody that is just trying to break into data science irrespective of whether they finished their bachelor or Masters cuz I feel like there probably isn't a lot of difference if you're just a you know amateur student without any work so how should these people be thinking about um Landing their first role I know you mentioned projects but what are some other things that they could potentially be looking uh so uh projects is the one that we highlight the most because that's the sort of the most effective way to convince hiring managers that uh you can actually deliver the sales uh but apart from that the next way of like Landing a role in in data in general um would be networking to a certain extent
would be networking to a certain extent uh primarily primar because it's the job market is kind of uh noisy right now it's uh very hard to stand out there are a lot of AI tools that will tailer a lot of your resumes and cover letters for you so recruiters aren really able to distinguish between the one candidate or the other uh there are a lot of companies even coming up that will do the applications for you uh for like as low as like a all of four fire applications so uh yeah the market is definitely quite noisy which is why uh if you can get a referral from an employee who's working in the role that you're targeting like a data analyst referring or data analyst role uh that can have a lot of weightage apart from that yeah and a
weightage apart from that yeah and a portfolio right probably is that a thing that happens in this field or not really yeah the the projects sort of if you B them together uh in like a website that is definitely standard but again that that would as long as you select good projects and uh present them well on the resume then that's sort of like 80% of the lifting portfolio is sort of like the 20% maybe optimization you can do but yeah the project selection comes but if if like you feel your projects are good the next best way to stand out is networking and maybe uh like networking is what I sort of emphasize uh there have been some people who have mentioned mentioned that um sort of uh building a brand can kind of help the sense of yeah
brand can kind of help the sense of yeah post on L yeah creating post on LinkedIn it it's just in my opinion it takes a while to create that kind of presence and get that kind of traction so uh for me it's usually product number one second is networking networking don't is limited to like selling connection requ on LinkedIn uh in most cities I mean most cities that you are in that would be like meetups that are happening so feel free to go and talk to people there and uh uh connect with people that hasb that's a really good call out and it's funny because just earlier today I saw this post on LinkedIn that did did such a great job you know kind of demystifying what it means when somebody says networking so the way this
says networking so the way this particular post approached it was um just make a list of well actually you would just be just search your field right so just search data science and a bunch of people will show up and then every day send out like whatever 10 15 connect requests and don't stop there once they accept then reach out to them and further build that connection you know maybe don't bug them for a coffee chart immediately but if you have stuff in common then you can you can be like hey I really need help with this do you do you think you could look at my project blah blah so obviously lots of resources out there I know you share a ton so for see anybody listening to this just pause go to LinkedIn follow Karen right now I promise it will be worth
right now I promise it will be worth your while cuz just in this past week while I was looking through your stuff to prepare for this discussion I was just blown away by not just the depth but the breadth of the content that you Shar so really appreciate you doing that yeah of of course man like you're doing you're doing God's work here so um shifting GS just a little bit um for our audience that you know is just kind of tuning in and also really myself um do you mind providing like a really quick and dirty snapshot of kind of your career path so far just so that we can you know contextualize where all of the information that you're sharing is coming from obviously in addition to the 20,000 or so people that you helped already it would be good for us to kind
already it would be good for us to kind of know um you know what the basis of all of your expertise uh for sure so um did my b in engineering obviously uh guy in India so uh was computer science and engineering uh and uh landed a role as sort of like a full stack developer uh kind of AP developer uh I was in that role for about two years and was very lucky to meet uh sort of my first mentor who uh wanted to develop like uh an analytics wi to our engineering department um essentially started working as a data analyst uh sort of worked on Halo so I guess a big data analyst uh and then had the opportunity to work a bit more on the predictive
to work a bit more on the predictive sides got introduced to ml met a few more mentors who uh taught me a lot uh but reached a point where um a lot of those folks uh essentially moved away from the company uh I plateaued in terms of uh learning uh didn't have a lot of mentors who could guide me at that point and that's when I decided okay maybe a masters might be a a good idea uh this is like in my third year uh tried the first time uh I I had like a decent gr score and a few papers and I think by that time uh V Dell we had sort of filed
that time uh V Dell we had sort of filed a patent so he say I was yeah you could say uh I was kind of maybe overconfident that I would get like uh uh an admit at a one of the colleges I only applied for five and those five were kind of uh big colleges I got rejects from op uh so yeah next time uh that definitely humbled and uh yeah time uh again focused uh sort of into I mean like essentially reestablished myself okay how did I get here I got here because of mentors because of a lot of feedback but let me go into that process again beout a lot of folks who are working at uh
a lot of folks who are working at uh graduated from a few of my colleges specifically my dream College which are CMU reach out to them how do you land your admit again um mostly everything was through Linkedin uh got them to review a lot of my Sops uh and how to write good loas and how to write a good resume what kind of projects to choose how to present those projects all those things and yeah went back into that mentor and feedback Lo helped re elevate my application and was very lucky enough to get an admit from CMU did my Master's year yeah uh master in data science and then um again got a bit Lucky in the sense uh we I joined in 2019 2020
sense uh we I joined in 2019 2020 obviously the pandemic hit so um a couple of us uh got like our initial internship offers isn't it like 2020 beginning Jan Feb March was kind of a bit vague in the job market because companies were uh calling back essentially their offer letters uh it it improved like eventually from Summer onwards because uh Amazon uh a lot of like the e-commerce and Fan company started hiring nmas uh and was lucky enough to sort of land an internship uh in at Amazon and convert that work there as a data scientist and applied scientist for about 20 off years and now working as senior D scientist at Walmart wow what and so at Walmart you've been around two years give take yeah almost two years a year and a half wow awesome so we will be definitely going into you
so we will be definitely going into you know Day in the Life stuff just a little bit down the road for now I want to kind of shift the light a little bit to your experience at CMU and while you share kind of you you know what some of the most notable courses or experiences that you got there were if you could kind of also um jrose that beside what generally and um a course in or like a masters in data science entails that would be helpful for you know maybe anybody that that's considering studying that so how is that different from you know just a computer science maybe Masters in
computer science maybe Masters in computer science obviously I get that it would be more specialized but in what specific ways and then yeah if you could just also kind of share about your uh some of the best courses maybe that you did at CMU and pretty much how it helped you get to where you are for sure uh so I guess Master CMU uh I think uh in terms of the kind of information you actually get almost all colleges are about at the same level because is basically free anybody can sort of learn anything they want from the Internet it's mostly about um hearing that from uh a professor who's sort of well Regard in the field CMU happens to have like professors who uh work on The Cutting Edge or essentially have established The
Edge or essentially have established The Cutting Edge in their field so it's always good that whenever you have a question you can ask them and they typically always have an answer so that is definitely like uh uh you learn a lot faster uhhuh and um the in terms of again like the course content it's about the same everywhere I would say it's more the supporting structure like every course has like a humongous amount of Tas uh so whenever you stuck get stuck with code you have office hours almost every other day and these St can really help you debug the code so again learning is exponentially faster and the other thing is the assignments and exams are extremely hard oh wow like uh and that's
extremely hard oh wow like uh and that's true for all of these colleges right that you said in that bracket that kind of fall okay yeah uh CME especially is kind of uh hedonistic in the sense that they actually Pride themselves on the fact that their SCS or School of computer science students get the least amount of sleep uh really in so so yeah the assignments are definitely uh gring um lot of late nights uh to so essentially in data science problems you're trying to fit a model apply different types of techniques to improve that model and get a certain amount of performance so depending upon the amount of performance you get you get certain grade so let's say if it's like 70% you get c80 and it gets exponentially harder to get a better grade so yeah you
to get a better grade so yeah you essentially I still remember okay I'm letting the model run so if it's DL models you're running it over EPO so you set a timer when you think that 100 AO are going to finish you go to sleep you get up like an hour later and you see how them it's performing so yeah uh the assignments are definitely growing yeah the assignments are definitely gr and uh but yeah the the learnings very exponentially uh faster than any other place that I have been um and in terms of courses cm is again like really good in that manner most of the coures are available openly on YouTube uh intro to machine learning uh intro to deep learning machine learning for text
learning machine learning for text mining these are three courses that I really like and I still use the nodes from those courses before I go into interviews like I have like my notes that I sort of keep along with my passport of that import so uh yeah uh I still to you those notes before my interviews and uh yeah um all the posts are available on YouTube you can just search for them and amazing you can see yeah I'll be linking all of those in the show notes for anybody willing to check those out and actually um like I want to kind of keep this moving but I do also want to pause and ask you uh this one thing that you said about uh that assignment so it's funny because um
that assignment so it's funny because um I went to for my masters I went to produce I just studied Engineering Management so nothing as crazy as or as deeply Tech as what you did but one of the courses that I did take which was probably the best course that I took in my entire Masters it was called artificial intelligence or intro to AI something like that it was still like it was a 6,000 level course so um there were some phds and such that also took it in their first years and I remember for the um project or the final project for that course we had to either Implement a paper that had been published in the last two years in I forget the names of the journals but I'm sure you would know but the it wasn't even like you could just pick anything
even like you could just pick anything but you had to pick one of those two or three and in the last two or three years and then you had to reimplement what they had done so obviously I mean I mean I it just drove me crazy I somehow I don't even know how but I was able to find enough find an easy enough paper to just kind of implement that my reason for sharing all of this random things with you is when you say that you have to train the model until it gets to like a certain degree of accuracy I think is what you said MH can you just give us an
what you said MH can you just give us an example for us to understand what exactly is going on here so if you could just maybe lay out the problem statement and just for a to contextualize exactly what it is that makes it such a challenging assignment SL problem to solve uh for sure now let's say Okay um in intro to DL uh one we have like um four or five five to six like main assignments and then you have like a major project so in let's in one of the assignments uh essentially it was a computation problem kind of uh um a multiclass classification problem essentially you just need to um bucket it into one of
just need to um bucket it into one of like the 128 classes um so um you the essential idea is you you have to figure out what kind of uh cvdl framework would actually work well on the data set that is provided in addition what kind of uh additional data processing techniques like in CV it's sort of uh uh things that notoriously work well in uh in CV don't work in other domain so CV has a lot of degre stuff like data augmentation specific stuff that works really well uh a lot of uh techniques like teacher enforcing that also kind of work fairly well uh in terms of like fine cing a model so it it's basically um um um simple simplest way to look at it is
um simple simplest way to look at it is you have an objective function you want to uh minimize it as much as possible and you're uh trying a bunch of different techniques in terms of L uh changing the model changing the number of layers in the model um uh trying different augmentation techniques trying different learning techniques to sort of get that minimiz station as much as possible so so a lot of hit and TR yeah yeah so that's what I was getting I guess um would a one way of thinking about it would be you're kind of choosing between K Means versus K nearest neighbor you like you would have to implement both of those and find out which did better is is that b off or is that kind of it's it's sort of like that
that kind of it's it's sort of like that the only difference is like um I guess the only call out I want to make is like typically yes folks sort of do that in the sense that uh they try out they fit each model and they figure out which one works the best uh the ear the only sort of addition I would make to that is fit each model figure out what went wrong based on what went wrong make an assumption how it could improve then go and choose the next model so you know instead of like sequentially fitting every model but but yeah basically it's that yeah wow that is so cool like these are some of the times that I really feel like you know I should have done more there but um yeah
should have done more there but um yeah maybe I can your YouTube courses that you mentioned hopefully will scratch that it for me so um moving forward here um next I want to kind of uh understand what it was like for you to go through the whole internship process so obviously I know you said lots of late nights you're doing all of your these assignments which seem to never end but on the side you also have to keep applying for these internships and such so can you paint that picture for us in general what does that internship search look like um and then what are some of the ways that you would recommend
the ways that you would recommend current students in master's degrees all over the us or the world can you know maybe we can focus on the US I guess but yeah how they can maximize the chances of Landing that internship um so how my process looked like would be a quite a lot different from what I recommend because I learned a lot um my process was a bit uh halfhazard in the sense that uh so the first thing first and foremost thing is to make sure you have your project like your portfolio or set of projects that you want to sell in your resume ready oh yeah so um once um so that I was able to get together fairly quickly finally
get together fairly quickly finally because the course at CMU was fairly intensive within the first semester itself we had like three to four projects we could put on our res okay which is pretty good uh like we we got that feedback from our um seniors that these are the courses that if you take by the end of first semester itself you'd have good products on a resume so that was quite helpful um so resume by the end of first semester we most of us had that down uh the networking part was where I was pretty late to the game I would say I didn't Network enough uh and I focus a lot more on cold applications um was lucky enough to get a few callbacks uh I think primarily uh a bit because of those projects that uh
a bit because of those projects that uh our senior said would work well in the market uh and uh mostly how the application process looked like I guess is uh you spend the majority of your day on your assignments and on your coor towards the tail end of the day like an hour in the hour at sort of like in the night uh you focus on applications you set a timer and you try and apply as much as possible because it is sort of like a very uh uh um disheartening process the app I mean nobody looks forward to that I can assure you yeah it's uh it's uh it's yeah it's definitely uh something that can burn your family quickly so you take an hour put on some music and just apply try and
put on some music and just apply try and take the emotional aspect out of it um and yeah apart from that for our roles lead code was required and I was a bit weak in lead code so uh mornings I would try and spend maybe an hour trying to solve at Le Le one or two problems um and the rest of the theory portion how course work was taken care of it like MLS Theory uh and I had my notes that I would sort of refresh before any interview came around so that was sort of like the Rhythm I fell into the only optimization I would make on that Rhythm um was uh definitely continue with the lead code definitely continue with the
lead code definitely continue with the applications if you aren't in a uh a course that helps uh keep your mldl theory uh up to date or if you don't have like that set of notes that you can quickly refer to before interviews make sure you have those the only additional thing is uh maybe you have 100 connection requests you can send out a week uh uh I mean again LinkedIn sort of updated itself and now you can only send five for free I guess a month a week or a month I guess so you might have to go into LinkedIn P but if you have LinkedIn premium 100 con request a week with transion messages use that wisely uh
transion messages use that wisely uh send it to um the advisor give is your connection message should be to the point requesting a referral for this specific job ID and the on liner on a project that relates to that job description then if you don't have that one liner then don't send that con request because people may not respond wow that is not something that I had heard before but it does make just so much sense just because you know having been having been somebody that has been there also um even though I wasn't applying for data science rules rules per se I would was definitely having a oneliner but it just used to be come to think of it just really generic and it wasn't really actually adding any weight
wasn't really actually adding any weight to why I would be considered for that versus somebody else you know so I really love that call out because that that immediately makes you stand out that not only have you done the work but it's right there for somebody to see if they're interested you know like you've implemented that so that's really cool um are there any lead code playlists and such that are fairly easy to find on this or is that kind of a more um hidden type of information would you say for this this specifically for uh data science uh typically I ask many Spooks on arrays strings and dynamic programming that's what's most commonly asked uh most mentees sort of skimp out a bit on the
mentees sort of skimp out a bit on the preparation for dynamic programming and uh that usually comes to buy them um so I would say focus on those three um if you go to Le code and if you just search dynamic programming patterns there is a really good post that breaks it down I think there are like six main patterns that it's broken into uh it's sort of like learning trig in high school uh initially you just by heart a lot of formulas you know you have no idea what to use when but eventually if you keep at it long enough that sort of pattern recognition your brain kicks in and you're like oh this is that pattern I just need to use it here you just need to keep trying till that pattern
to keep trying till that pattern recogition starts to kick in Le code blind stify is the list that I sort of recommend everybody start with and also before your interviews lels makes sense and then just to confirm it does sound like though that even if if somebody were to just spend all the time just doing lead code it would probably not be enough right they would still need to understand the theory they would need to have projects like you need almost it sounds like three different sectors that you need to be covered off well in in all of them in order to you like land position that's right right definitely uh in terms of data science interview prep so data science is uh has sort of
prep so data science is uh has sort of two kind of distinct roles there's pic data science and applied data science so perodic data science is a bit more focused on working closely with your business stakeholders and helping them make sort of go noo decisions so it's a lot more SQL statistics uh caal inference and um dashboarding uh to essentially help business make better decisions applied data science is a bit more modeling focused so for Applied data science the interview has uh I would say kind of three main verticals one is obviously the coding part lead code and uh dat structors algorithms and SQL these are the two things that you need to be fairly good at lead code you have the lead code blind 75 similarly you have lead code SQL 50 and advanced
you have lead code SQL 50 and advanced SQL 50 these are three lists if you are comfortable with all three you would PR pretty good in I would say 80% interviews um apart from that uh the next vertical is sort of the stats and basic ml vertical so statistics means hypothesis testing AB testing these are the most commonly asked topics be very good in those two uh and ml there are like six basic algorithms that you really need to make sure you know which is L regression L regression KNN K means support Vector machines and thre when I say p i mean like the full spectrum from bag so all the t uh so those six algorithms what the model assumptions are when to use what uh what are each one's uh sort
use what uh what are each one's uh sort of strong points and weaknesses that's typically where questions come from like uh what assumption for intergration how do you test for heteros intergration why is randomization important random Forest so those kind of questions can come up uh and the next vertical is sort of your specialization vertical or the team's special ization vertical could be like a DL Fe it could be more uh machine learning business domain oriented Fe you can uh see more like case study questions uh like if it's if you're going for a rexis team they might ask you questions like how do you design the Instagrams for yous so they're trying to figure out what kind of data you're hoping to Leverage What kind of HS you would experiment with and how you
would experiment with and how you devaluate and improve those and this was on the product side you said right what you're currently talking about with the for you as opposed to the applied okay makes sense yeah yep yep no no um the for you page is just the third vertical of the apply oh I see I see makes sense uh for product signs the interviews would look kind of I mean quite a bit uh different like you would still have your SQL coding rounds um then you you typically also have stat rounds uh but the other rounds the case study rounds are more called these met um rounds where you really need to understand business metrics and how to sort of Define them and more importantly there are um uh questions that typically uh I
um uh questions that typically uh I would say product manager or Consulting um roles like a case study like a case interview I mean almost right kind of yeah those kind of questions uh typical questions like uh the traffic went down the site over a period of Time how would you try and Resort it those kind of questions so more of like Road causing an issue that there are like four main categories there is a book I called I think it's called H the PM interview it's a pretty good book oh yeah it's so famous yeah yeah yeah yeah so that's a pretty good book to get started with I think that's very interesting because I don't think until just now I had even consider that there is such a strong link between uh data science and these type of roles so I'm almost wondering why even go through
I'm almost wondering why even go through the trouble of you know what you did which is like all of those assignments all of the St this nights all of those lead code grinding if you if somebody knows that they want to for sure get into the product side of data science is there I mean obviously the answer is no but I'm still tempted to ask quote unquote a shortcut in terms of what they could follow to only aim for these rules and not the applied ones or is that just possible Right now the market is fairly competitive so it uh if somebody has like the same projects uh and somebody with a master's degree would kind of be preferred so right now the market is to is competive uh is competitive uh 2025 it could get better uh I would say Masters definitely does
uh I would say Masters definitely does improve your chance uh but it's not definitely not the only way like I know a lot of product data science folks who sort of after their Bachelors they got into data analytics uh they did SQL dashboarding and helping Drive decision making and that sort of matured into product data science roles so that is definitely another way to go about it it's also a fairly common path in product data science to come from data analytics and data analytics is like very receptive to freshers uh not that's true that's very true yeah and not really uh that um it doesn't prefer Master students that much it's more I would say maybe a bit more focused on whether you can um provide Insight I would say makes sense and I guess from there I think the difference between data analytics and data science is now
data analytics and data science is now pretty clear to anyone listening I guess could you also similarly help us uh untangle data science data engineering and machine learning engineering I know there's some overlaps at least but it would be great if you could just kind of for people that are confused where one ends and the other begins uh so you know just if you could just separate those out for me for sure uh so let's start with maybe engineering because that's sort of top of the funnel um so their um role essentially is to make um usable data available um so usable in the sense that uh the data uh is a is uh the
that uh the data uh is a is uh the quality of the data is high and it's delivered at a specific or agreed upon time at a location every let's say whatever the agreed frequency is so it involves a lot of Engineering in the sense uh if you need to bring the data over you might need to know streaming services not just like SQL bys but streaming services like Apachi CK if it's a lot of data and you need to pre-process it you need to know distributed uh Computing like ppar uh sometimes you need to bring data over in async manner so you need to know um stuff like AP apach Kafka um so a lot of that and then a lot of your Ops side which you see in pretty much any kind of
which you see in pretty much any kind of engineering which is making sure everything's working fine and or that type uh uh it it could be used in the distributed processing but Ops by Ops I mean more like monitoring uh in the sense that sort of making sure everything is up and running and if anything goes wrong they get alerts and they're able to fess so bit of that it's mostly like like the field it's engineering for like basically data that's stoper funel making the usable data available to and uh yeah they sort of the high performance in the field essentially uh create data sets that uh from which you can extract a lot of value and uh yeah it's a uh if you are really good at your
it's a uh if you are really good at your job it's something that pays uh often a lot of different ways because your single high quality valuable data set can be used by so many different people to create so many different reports which can drive a lot of insights so uh other people are depending upon you so use your work gets like exponential recognition with which is what the high performers do uh data science would be just below the funnel they utilize the data uh for product data science to help Thrive decision making apply data science to build models that either uh Drive decision making in uh one way or maybe improve like customer experiences in in in maners like the accommodation systems so um that would be the data science uh field they have a bit of mostly uh a heavy focus in statistics
mostly uh a heavy focus in statistics machine learning and modeling machine learning engineering um sort of uh is uh the bottom of the funnel um okay the the the field essentially is mostly about um once the model is sort of defined okay this is the model that we are going to be uh using to make uh let's say our predictions for this particular problem they are responsible for hosting serving deployment scaling and maintaining that model that's typically what machine learning Engineers does but it differs a lot from company to company like companies like face ml are the ones who sort of choose and improve the model I don't think they have um like um applied data scientist they have just product
data scientist they have just product data scientist and machine learning Engineers Google almost everything is handled by software Engineers it's just they have roles like software engineer machine learning uh and software engineers and then they have data scientists who are just product data scientists so it it differs a lot from company most of my experien is sort of contextualized by Amazon which has like okay data scientist applied scientist machine learning engineer sort of divided out M that makes a lot of sense especially because um on this podcast I had uh nikl pentul who you I know you might know so he's um obviously at Adobe he's a senior machine learning engineer and he shared that a lot of his job is
and he shared that a lot of his job is not just what you mentioned so actually tweaking the models and making sure that they work to the degree that they would want it to but also worry about the ux side of things like how do you use that exact model to make sure that the product goal is served and it was really interesting to me that he was wearing both those hats because it kind of makes intuitive sense for that to be the case but I can also see another company where they would just kind of Dy that up and they would be like your job ends here just write the model and you know you're good and then someone else just maybe like a more proactive or just a more Hands-On product uh manager would then
Hands-On product uh manager would then be responsible for you know the ux side of things or how to implement and they would work with um what are they called Product designers I believe is the term which one of whom I also talked to he he helped design or he currently helps design Edge uh product design so that was really cool because he had also mentioned that at Microsoft they had uh like just an entire swats and armies of these just model developers and then his job is to just make sure that it gets deployed one like one small feature at a time and they don't ship something so big that everybody just kind of loses their minds and they're like what is this we didn't ask for this we we don't
this we didn't ask for this we we don't want this so it's really cool how yeah just this entire landscape is just so diverse and there's just yeah I just yeah really really I'm just really intrigued by how it all um goes down on the ground level yeah really appreciate that deal the RO can definitely overlap a lot they depend on the company and how they sort of Define the JD that but I guess from the other side of things as an interviewer or as an applicant I think if you have your um potential you know USP well defined like you know this is what I'm good at this is my entire experience all Point getting towards this one thing and then after that it just becomes a matter of
after that it just becomes a matter of making sure that you do a really good job at highlighting what it is that getting is you know and then kind of you pray from there so similar to and the reason I bring that up is nikel for instance his whole career just happened to be all around documents like since his first job after bachelor's he was working with documents his internship was with like a legal Insurance document company and then it just made so much itive sense that um he did you know ended up getting hired at Adobe which is literally the only company that comes to mind when you think document so um from there kind of Shifting Gears a little bit um can you talk about your um interview experience or maybe actually just the hiring experience at Walmart because I think that was from that was
because I think that was from that was when you were no longer a student you were just a full-blown person in the market I think you would qualify as probably a junior level or just a actually what was your level at Amazon before I assume that uh mid I would say Okay mid okay perfect so when you were that midlevel uh data science engineer in the market thriving um what was that experience like for you and for people like that um you know what would you share in terms of how they can be looking at or they should be looking at getting hired for the next role uh again I I guess uh tackling the job market the main sort of pillars remain the same which is networking comes to uh first Ian sorry projects come first have your resume ready next is networking um with
resume ready next is networking um with sort of like your projects and your resume and your LinkedIn up toate um so those sort of uh I guess mechanisms remain pretty much the same uh it just differs in um sort of how you communicate what you have done in the resumés um so like if if you are an entry level uh or or a junior l data scientist you would focus a bit more on uh uh sort of execution whereas for midlevel it's a bit more on design and ideation so uh you talk a little bit more about how you uh experimented with stuff and chose uh whereas uh in entry you might just be like okay this more about uh implementation and how uh you achieved a particular objective so uh and in terms of an internship versus fulltime are
of an internship versus fulltime are there any differences in the process or is that also kind of the same much uh kind of the same um because uh so it it for me it was slightly different primarily because when I was um like I mentioned I had like an offer about pull so I needed to uh land an offer soon so what my Approach was I leveled down I aimed for an entry level internship so an entry level internship and uh sort of like uh applying for a senior goal U uh which is what I li at Walmart that obviously has like certain differences uh but in terms of like an internship Wass a full-time role I think the process is roughly the same from an interview standpoint um yeah the process
interview standpoint um yeah the process ofly the same in intervie like number of interviews and what you're tested on so um this is also something that I I just feel like comes up across Industries across whatnot um when you're trying to as you said level down where your experience says that you should be joining as a mid-level but you're okay with joining as a junior level um how do you I guess play those cards with the hiring manager and such cuz in my experience like the one time that happened with me it did not go well I was basically told that hey you're wasting your time as well as ours so maybe maybe don't do that but I'm just curious to um learn about how you you know kind of played that cuz it is
know kind of played that cuz it is tricky right it's not as easy as it sounds yeah for for sure uh for me it's mostly uh the kind of action verbs I use primarily in my reume really so yeah so for me I I felt like for entry Lev resumes um it should be focused a bit more on implementation and driving execution whereas like mid-level resumes was more about uh leading and designing and like I mentioned ideation so uh tweaking those things a little bit making sort of like my experience section slightly smaller and project section slightly bigger those I felt uh might um essentially not make it seem like to the them that hey I'm just Landing this road because I need something and when something better
something and when something better comes along I'll just leave so sort of those two weeks in the resume I felt like made me a kind of like suitable for entry level yeah no that's a weird call out actually that yeah that makes a lot of sense and I don't know if many people would think to do that so that makes it also you know a really good call out and then I guess from there can you share maybe just what a day in the life looks like for an applied uh data science professional uh in like the mid midlevel category for sure uh so a day in the life kind of differs uh based on where
life kind of differs uh based on where you are in the project so maybe just walking through the faces of a project so initially um we mostly get our um sort of like problem statements from business or uh from uh leadership like data Science Leadership who um sort of established the vision for the next one year and uh based on those Visions those goals what we call the one-ear plan goals uh that sort of gets trickled down and we have like sub goals that's within the team so we have like a bit of direction from our data Science Leadership as well as direction from business on what problems are good to solve uh that sort of gets mixed up and we get like a road map for the year so initially when you're on a project
initially when you're on a project you're in the what we call Disc Discovery phase uh you're trying to understand uh what exactly um or how best to actually solve that problem which all teams should be uh what kind of data is available what's the quality of that data sometimes at the end of disree we might actually uh find that the quality of the data is not sufficient to build a quality model and they might drop the project then and then uh but yeah typically it's a lot of um I would say simple stats and visualizations to check the quality of data talk to other teams communicate I mean like um talk to other teams and see if uh they would be open to like collaborating and giving us data so that we can improve the model a lot of
we can improve the model a lot of hypothesis testing lot of data quality assessment so that's going to be how the day is uh day is a lot of meetings and a lot of our data testing uh once the discovery is over and once you figure out that okay this is solvable the uh essentially develop what's called like a BRD a business requirement document of like this requirement this how we going to solve it this sort of like rough timelines uh stuff like that that's uh after that that's when when you actually start trying to solve the problem so you do a bit of literature survey so again uh def P barrier in that pH the day day life kind of differs but typically do a bit of liter review of how people have solved this problem in the past you
solved this problem in the past you figure out what kind of models you want to experiment with and you try and test out those models things don't always go well uh but uh yeah you try and solve problems as when they come up and you try and build out okay this is uh for a vzer version uh you you may get like four to six weeks to try out different experiments and for a vzer version this is what I got uh this is how the model sort of uh false shop and these are some improvements that I think I can do in the future and then you need to put your vzer into production and see if it's actually making an impact so typically
actually making an impact so typically you have uh a no ml Baseline Solution that's sort of your a and your B is your ml solution you compare and see if you're making an impact if you make a good impact you get sort of green light to continue improving the model so yeah depending on which space you are in in the product like in restory Phase talking and data testing here it's experimentation here it's uh doing error analysis on the model to figure out where it's falling shot and how to improve the model and yeah this last phas can go on for years because was up there exactly yeah so um first of all really appreciate that that was such a great Deep dive um two questions off of that first do you normally just use is just agile a thing in in terms of how
just agile a thing in in terms of how you kind of run those projects like at least from a life cycle POV and then secondly what are some of the tools and or languages that come to mind when you think of each of those cases that you just for sure uh is agile a thing uh yes in certain ways in a sense that um so uh at Amazon uh we didn't have like week long Sprints um our our PM essentially felt that a week is too short of time for data scientist to work on a model and get insights which is kind of Cl so he ran like four week Sprints the start of the month we would have like a um what we call a grooming uh session uh to
what we call a grooming uh session uh to make sure that everything sort of uh in the J uh and then um weekly meetings would be there for each project you're working on where you would give updates and at the end of the month uh you would essentially um have what we call retro retrospective see yeah that sounds like pretty agile to me all things yeah that's basically agile it's just the definition that the Sprint this in one we it's basically for yeah Walmart sort of yeah it's kind of similar um and uh yeah in terms of tools um the initial phase like I said Discovery it's kind of again everybody uses different things some people like just Excel uh myself I I like python a lot and most of the packages in Python
lot and most of the packages in Python support whatever I want um uh and um whatever is the internal tool for communication and meetings is what we use um again for model experimentation uh in terms of literature review Google Scholar and the mix of chat gbt perplexity that's typically what I use okay and again uh python and a couple of packages within it would be what's heavily used and uh in terms of like uh the in how you interact with the infra whatever your infra is that's be what use like in Amazon obviously AWS is what everything runs on right um at Walmart it's Google uh so pretty much gcp is what everything runs on and we have like a couple of internal tools for primarily U EPL like U handling the data we every
U EPL like U handling the data we every company has like their own specific tools so you might need to pick up a few of those and uh yeah once you go into like AB testing and productionz uh a few open source tools come into the picture Vice paath is very common uh Docker is very common airflow is very common um and a couple of like visualization tools tblo is sort of market leader um and I thought it would be powerbi but I guess probably not if you're not affiliated to Microsoft that doesn't make as much sense possible possible yeah uh yeah I mean th part yeah I mean um so AWS has is it has its own motion called quick site so we were encouraged to use Quick
site so we were encouraged to use Quick site I never even heard of that yeah yeah yeah so but tblo sort of I I did see a lot of team still using tblo and Google also has its own version called I think look at uh but uh a lot of games still do use tblo so that there are lot ofation tools but I think tblo kind of still exists for some I mean like every a testing it again depends upon the cloud platform you are on for sure yeah that that that Mak sense all the and I guess the in hindsight while you were answering that I just realized that hey if somebody just had all of those things on their resume wouldn't that be
on their resume wouldn't that be great and that that to me that was just I guess an added bonus of you sharing that that's not even like why I was asking I was just genuinely curious about what you know how much you're jumping around from to to to um and then really just the last question here before I let you go and it's really more of I'm just looking for your take more than anything else but with all of this AI stuff um and literally last week we had Mark Zuckerberg say that in 2025 meta is going to try to replace you know or start replacing their mid-level Engineers just with AI you know like that basically not replace them but just do away with them so in this entire you
do away with them so in this entire you know Feer that we are in currently I was going to say that we're headed but actually that's wrong because we're literally in it now um how do you think this all plays down like do you see AI writing models for us or do you think that some software engineering Fields will be impacted others not so much but just generally looking for your take in terms of the whole um software engineering AI Dooms D theor is out there uh I don't know if I would call it like a doom Theory uh so for me AI is just um so I used to code uh heav in C when I was in college now I'm pretty
when I was in college now I'm pretty much on Python and I could never even imagine going back to C so for for me uh AI is just another huge rapper language on what I want to do the essence of a software developer which is that kind of logical thinking figuring out uh how to get things to work that is still going to be required the amount of work that you can get done would probably exponentially improve because you're probably going to be communicating the same way that we are definitely not in 2025 that's sort of like an Elon Musk uh self-driving cars are going to be around kind of prediction but uh yeah I I mean they they sort of have to make those
they they sort of have to make those kind of predictions so that they can motivate their teams but probably In Our Lifetime definitely yeah uh I think um um his um proposition of mid-level Engineers um being replaced by AI definitely I think in our lifetimes that could uh happen but it it just means that um just like when uh instead of coding something like Pascal you might take like uh days to implement iting machine assembly code right yeah yeah essentially so it it's just another step so that you can exponentially increase so the road of software developer will never go away that logical thinking the idea of like getting systems to work all of that is still going to be valuable you just need to dive deeper into the business specialize in your business domain a lot more so that you can uh translate those
more so that you can uh translate those business goals better into software uh yeah I think a lot more new roles will hopefully open up absolutely um as we wrap up any last uh final pieces of advice for pretty much doesn't have to be even jobseekers but really any person that's even remotely interested in data science uh I think primarily just having mechanisms that help you keep up to date in the field uh is going to work don't sort of force yourself to be that person who reads a research paper every we or anything like that mechanisms that you enjoy it can't be something like podcast uh it can be like just following a Creator on LinkedIn or even Instagram a lot of us are on Instagram nowadays so uh anything
are on Instagram nowadays so uh anything that helps you keep up to date so that whenever you hear something interesting you can sort of maybe one of the weekends read up on it a bit more or watch another podcast on it a bit more just so that you're up to date so that whenever you feel like okay maybe I should look into uh upskilling stuff like that you can just having that uh mechanism is uh is good and I I think that's pretty much all there is to it uh there's no like uh uh urgent doomsday coming everything's going to roll out slowly they going to have time everybody's going to have time so just uh make sure you keep an open mind are
uh make sure you keep an open mind are willing to up skill you should be fine love that optimism and I just want to add to that that you know as just myself being a person that doesn't really come from the field like I'm pretty far from data science if you're being honest but just kind of and kind of on the border on defense Enthusiast type of guy um to your point about just being or having mechanisms to keep up to date keep oneself up to date um I listened to the um Lex fridman interview with the cursor with the developers that made cursor which is just it's basically just purely technical you know they just go really into the weeds of things and I was just
into the weeds of things and I was just blown away by how much I enjoy it and I definitely did not think I would enjoy that at all so definitely what you're saying Rings very at least in my case so yeah people out there just you know it's not it feels inaccessible sometimes but just give it a shot and I think You' be surprised by just how much you can glean um and apply to your life to you know really just make it better so um really want to thank you today karun for taking the time to talk to us and to share your expertise and your experience catch you all in the next one new episodes every V
Source episodes
These are the conversations this page is built from.
Go to the source if you want the longer version, the full transcript, or the guest in their own words.
Episode 45
How to Break Into Data Science (Interview Prep Masterclass from ex-Amazon and Walmart Data Scientist) - w/ Karun
Data science interviews have become their own weird theater: LeetCode, dashboards, vague case studies, and a whole lot of pretending. Karun keeps it grounded here and walks through what actually matters if you want the job, not just the buzzwords.
Karun • Jan 29, 2025
Open episodeEpisode 53
How To Transition From Economics Academia To A Career In Data Science - w/ Bhoomika
A clean career pivot sounds nice until you are the one in the middle of it. Bhoomika walks through moving from economics academia into data science, what her background gave her, and how to make a non-linear path feel honest instead of apologetic.
Bhoomika • Mar 19, 2025
Open episodeEpisode 57
How To Get Hired As A Data Engineer - w/ Sam
Data engineering is the role people find after they get tired of vague 'learn data' advice. Sam makes the path concrete: what the job really asks for, which tools matter, and how to get hired without pretending you woke up fluent in all of it.
Sam • Apr 17, 2025
Open episodeEpisode 23
How To Break Into A Career In Data Analytics (Without Any Prior Experience) - w/ Sai
Sai is the Lead Data Analyst at Blue Cross Blue Shield of South Carolina. Coming from an electrical engineering background, Sai managed to not only get into a top Data Analytics program at University of Connecticut, but also break into his current position without having any prior work experience whatsoever.I believe anybody who is currently in the job market can immediately appreciate what a gargantuan task that is - and Sai breaks down for us his exact strategy using which he was able to succeed.
Sai • Sep 4, 2024
Open episodeEpisode 15
How to Perform Storytelling with Data - w/ Sohan
Sohan is a data visualization aficionado and loves effective story-telling using data. He’s also been Top Data Visualization Voice on LinkedIn on numerous occasions, spreading the knowledge and empowering others to excel at business analytics and data visualization.
Sohan • Jul 11, 2024
Open episodeEpisode 69
How To Crack Machine Learning Interviews (Microsoft & Walmart Sr Data Scientists POV) - w/ Nirmal & Karun
Machine learning interviews have become a strange mix of theory, product sense, and please-do-not-waste-my-time energy. Nirmal and Karun pull the curtain back on what candidates keep getting wrong, what hiring teams actually notice, and how to stop rehearsing answers that sound smart but do not land.
Nirmal • Jul 9, 2025
Open episodeFAQ
The obvious questions are usually the right ones.
So here are the straight answers.
How do beginners choose between data analyst, data scientist, and data engineer roles?
Start with the work you want to do most often. Analysts explain decisions. Data scientists model and test. Data engineers build the systems that make data usable. The titles overlap, but the daily work feels different.
What projects help most for data science jobs?
Projects with a clear question, a clean method, and a plain-English result. A smaller project with judgment beats a giant notebook no one can understand.
