Episode 57
How To Get Hired As A Data Engineer - w/ Sam
One of the twenty most-watched Ready Set Do episodes on YouTube right now.

Data engineering is the role people find after they get tired of vague 'learn data' advice. Sam makes the path concrete: what the job really asks for, which tools matter, and how to get hired without pretending you woke up fluent in all of it.
Who this is for
- You are trying to get hired without sounding like everybody else in the pile.
- You would rather hear Sam's version while the mess is still fresh than get another polished hindsight sermon.
Key takeaways
- Get Hired As A Data Engineer - w/ Sam
- Switching to Tech from a Communications Background
- Data engineering is the role people find after they get tired of vague 'learn data' advice.
- Sam makes the path concrete: what the job really asks for, which tools matter, and how to get hired without pretending you woke up fluent in all of it.
- engineering which is the subject of our discussion. I said let me learn enough about Python to at least have these...
Need the cleaner version?
I pulled the sharpest parts of this lane into a guide so you do not have to reconstruct the answer from memory later.
Fast scan timestamps
Transcript
The full conversation, right here. Auto-captions, lightly cleaned, still very much a real human conversation.
For somebody trying to make the jump into data engineering, what are some of the important skills? Any resources that you found that really helped you or maybe any tools that are often needed? I have interviewed people before like candidates for data engineering roles. If you don't have that real experience that you can show me, then the next best thing you can do is you can tell me about a project that you worked on. I'm Naman Pandy and in this episode featured not expert is Sam Lafell. Sam is a solutions architect at Snowflake and has extensive past experience in data engineering which is the subject of our discussion. I said let me learn enough about Python to at least have these conversations and that one decision fundamentally changed the trajectory of
fundamentally changed the trajectory of where my leg was headed. What are some tools that are must haves in your on your resume and then maybe if there are any niche or lesserk known tools that are only now becoming prevalent. I've known a lot of people that have gotten hired out of college like 21 22 23 as solutions architects at different companies. just interacting with my data engineer friends which I have a grand total of two of three. Yeah, three now.
total of two of three. Yeah, three now. There you go. I used to feel like no I'm not touching AI. It's too new. It's changing too much. In line with our theme of learning from somebody just two steps ahead instead of an expert. My goal with this episode is to supply you with all the knowledge tools and skills that you need to either start or transition into a career in data engineering and what this career track entails exactly. Both of those are very helpful courses to jump into as well as the this is a ready set to podcast.
the this is a ready set to podcast. Subscribe for weekly episodes featuring not experts from all walks of life ever and daily clips. And now without any further ado, here's Sam. Welcome to the Ready Set Do podcast where we learn from journeys of not experts who are just two steps ahead of us. Sam, welcome. Thanks. Excited to jump into data engineering and explore that field. Before we jump in, I like to begin always with what is your controversial opinion/h hot take on the overall field of data engineering uh as you perceive it. Sure. Uh hot take data engineering is is not maybe one of it's maybe one of the least engineering fields that has the name engineering in it.
the name engineering in it. And and and why is that? Right. Data engineering is is a lot of things. It's product engineering. Data engineering is stakeholder communication. Data engineering is a lot of things other than just how can we most efficiently get the data from point A to point B, right? There's a lot of things within data engineering that, you know, you don't necessarily know about until you're working in the role and then you see, oh, like this is actually a really big part of my job is to to not only do it, but also tell other people why we should do it like that. Not only the how because I think we learn all about the how. Uh, a lot of times though, our job
how. Uh, a lot of times though, our job becomes the why. So I definitely feel called out here because that's exactly what um where I land in terms of you know if some if you had quizzed me that hey what's data engineering that's exactly what I would have said is that you have um things called pipelines that you use to get from get data from point A to B and that's that right but yeah so as you said clearly there's so much more to that and I would love to explore that but before we kind of make our way to that can you help just contextualize where data engineering lands in the
where data engineering lands in the overall, you know, quote unquote data umbrella roles. So, I know data science um and then obviously um which was the other one analytics, right? So, I know it has to do somehow with these things as well, but if you could please help us delineate where one starts and the other begins or where one ends and the other begins, that would be super helpful. If you find anyone to do that, let me know. I think and and why do I say that?
know. I think and and why do I say that? Because data engineering is is inherently also data analytics like there one is contained within the other and and sometimes sometimes analytics is contained within engineering. I'd say that's the most common trend but also you know sometimes you need to be a better analyst than you do an engineer to be a good engineer. Uh so I would say data analytics and data engineering really really have a lot of overlap and it can be hard to kind of delineate between the roles. Uh if you're already a really good data analyst, you can probably be a decent data engineer as long as you pick up some maybe software engineering uh chops. But let's let's try our best, right? So data engineering, thinking about source systems, thinking about where does data
systems, thinking about where does data come from, how is data created, uh like the big bang of any data, right? Where is it coming from? That's what we spend our time thinking about, right? Because what data engineers spend your time thinking about is is source systems and data and where does it come from and then you also spend your time thinking about where are you going to land it and how is it going to land and what you know and that can also have a lot of overlap with like application developers and software engineers because sometimes you're not the one creating that data.
you're not the one creating that data. Sometimes you're the one just taking the data from wherever it landed. If we go the route of okay the data engineer doesn't think about the source they only pick it up from wherever it landed. Let's say it lands in an object store like S3. Uh sometimes that's the first place the data engineer touches it and then they push it through maybe into Snowflake or into iceberg and then data bricks unity catalog is picking it up or something, right? We we push it into a usable format for then data scientist and data analyst to work with. So I think that's maybe the clearest delineation of a data engineer is you're picking up data in some sort of raw form whether it's the most raw form or like a
whether it's the most raw form or like a a basically the raw form just landed somewhere that you can access then your analyst is is touching it and however they need to touch it right a lot of times that's going to be building data uh when you think about something like snowflake or something like data bricks uh the term term like medallion architecture has been uh coined which the medallion architecture just means three stages of data. Okay or at least delineation between data stages. So you have your raw which is a bronze layer, your silver layer and your gold layer.
your silver layer and your gold layer. Okay. And then that's also sometimes unclear who's responsible for those layers but a lot of times it'll be the data engineer who kind of helps build the pipelines pass it through the layers once the analyst and the data scientists know how they need to receive data in uh analyst a lot of times will be responsible for like dashboards interacting with product teams uh doing a lot more of that business talk right like talking the talk of the business and then coming back to the engineer and saying okay this is what we need this data that flows through every day for the dashboard, the Looker dashboard or the PowerBI dashboard that the analyst is building. Uh and then the data scientist comes in again data scientists
scientist comes in again data scientists really need to have a lot of good analytics skills as well. And the data scientist will create a model right a lot that we're talking like a machine learning model to do some sort of uh predictive or prescriptive uh analytics. So even in that it's analytics. It's just forward- facing analytics or or forward telling analytics. We're trying to predict the future instead of predict instead of explaining why something in the past happened. We're trying to explain what's going to happen in the future. So data scientist and data analyst also do a lot of similar work.
analyst also do a lot of similar work. Uh and they have a lot of overlap in the skill set. It's just you have a different focus, right? Different idea. So I don't know if that was uh at all helpful because there's a lot of things that do have overlap here. No, that was very helpful especially to put things into perspective in terms of how closely these roles react, you know, relate and um adapt with each other. Um my one followup on that is and I think yeah I have wondered about this before as well just interacting with my data engineer friends which I have a grand total of two of. So um the thing that always trips me yeah three now there you go. So
trips me yeah three now there you go. So the thing that always trips me Sam is so you mentioned you set up those pipeline infrastructures and all of that is done. What I always wonder is once you do that isn't it supposed to be at least on paper kind of a one and done type deal where once that's set up every time as you mentioned when you have new data coming into your um S2 or wherever it will just continue to flow. So can you help me understand what circumstances come in clearly regularly that make you go back again and again and you know have to make new pipelines and make any changes to that man I I'm sure uh I'm sure a lot of people have had those questions in the past and it's
those questions in the past and it's definitely something that I've been asked before. So with data engineering uh in data in general, right? Data is not uh software engineering in that you know I I made the button I wrote the code for the button on the web page and now the code just always compiles and always runs and it's always a button on the web page. Like that's not data. Data is people doing things online or in person and and we're trying to record actions people have taken. Uh and if and if anyone knows anything about like software testing, right? There's a whole field of software testing where we just try to break things or try to or like data quality even quality testers where we try to break things. We try to
we try to break things. We try to essentially imitate what we think people might do uh to pass through some bad data and try to see how we'd handle that. But you know, we can't always get 100% of those situations. Uh and so what happens with data pipelines and and especially data pipelines that are only written using a portion of the data that actually exists. Uh because a lot of times when you're writing like a pipeline, you'll sample the data and you'll build a pipeline based on just what you know about the sample data. You don't always get everything. And so what does that mean? Um it means that you you define something as a like an integer column, right? because the the number should be like I don't know 1 through 10, right? But then somehow there's a 9.5 in there, right? There's always a
9.5 in there, right? There's always a 9.5. Now our whole pipeline breaks because our column was defined as an integer, but a 9.5 is not an integer and now everything breaks. And you would think, wow, you guys are so smart. Like how did you not handle that? But a lot of time we build so specific in data warehouses for like speed and performance and optimization right we want to define that it's an integer so that way there there a lot of optimization happening behind the scenes with your code so if the data warehouse knows it's an integer it can handle the whole column as an integer and things go faster right why did the pipeline break okay it's because we got some some bad data came in uh and bad doesn't always mean bad it just means unexpected right
mean bad it just means unexpected right we've never seen Exactly. Yeah. It doesn't have to be malicious necessarily. It's just something that wasn't uh proactively prepared right to to be handled. Absolutely. That's super helpful. I think um at this point would be a great uh time for you to if you don't mind share just a brief snapshot kind of of your career uh the various places that you've gotten to where multiple hacks. I know currently you're a solutions architect which we will be getting into in just a bit but yeah just want you to kind of set the context for where all of your you know um um information really and insights are coming. Great question. How can I tell my life story in less than five
tell my life story in less than five minutes? Let's find out. So when I was in college, right, we'll go back 10 years. Uh I was and we we'll fly through this. So I was I was doing non uh STEM related activities, right? Id majored in communication, uh, also in Spanish. I had minored in psychology. I was, right, I was doing my best to stay as far away from math and programming as I could.
from math and programming as I could. Was that intentional or was that just how it was at the time? That there's a little bit of both, right? It was a little bit of um, ignorance into when I was moving from high school to college that I just didn't know what I wanted to do. And I knew I like talking to people, so I said, "Let's just go talk to people all the time. That'll be my career."
all the time. That'll be my career." That makes sense. And you know, in a way that is my career. Uh we'll get into that. Yep. And then so going through those couple years of of university, I I learned a lot and I had had a good time, but then after university, right, it came to okay, now you need a job. Uh and of course, my four years of university learning how to talk didn't really get me a a great career, we'll say. Yeah. I mean through no fault of yours it's just tough right how many jobs are there to begin with you know maybe I'm ignorant but yeah it has never seemed like a exactly like a thriving market to me right and you know they they tell you or
right and you know they they tell you or at least they told us when we went into communication as a major you can do anything right that was the thing that that got you is like oh I can do anything yeah but you can do anything also means kind of like you can do nothing exactly Yep. So, uh, you get out and and I had a couple jobs and I met I met some really cool people. I I learned things of course along the way. Uh, and luckily my first job um did kind of it really changed my path and my trajectory. Uh, so I would I was making calls to random people, right? I didn't know these people were cold calling essentially
people were cold calling essentially sales and I was selling courses. Uh, and some of those courses were Python courses. and I said, "Wow, I whenever anyone responds to me," which was like one out of every hundred people, uh, I didn't know how to talk to them because everyone hung up on me. So, I did once I got past the initial like, "Why are you calling me?" I didn't know how to continue those conversations because I knew nothing about Python. I was used to getting hung up on and and overall my morale was just pretty low. So, I said it was rough, man. Yeah, it was really tough. Um, but something happened, right? And I would say this is a takeaway is your curiosity can take you places you maybe never imagined it could take you. Uh because I said let me learn
take you. Uh because I said let me learn enough about Python to at least have these conversations and that one decision changed the rest of my life right it it fundamentally changed the trajectory of where my life was headed. So in learning just data camp started on some Python. This was uh 2018 and I said let's just get a little bit of Python knowledge. Well that little bit of Python knowledge turned into I really like this. Uh I started playing around just with pandas and Python and learning about right how to bring in a CSV and how to analyze it on a pretty basic level. Wow. I said hm there's something here right? something about the patterns and the numbers and I feel cool like typing on a keyboard and like doing Panda stuff. I was like,
like doing Panda stuff. I was like, "Wow, I can really I can make something happen here." So, I I talked to my alma mater, which is where I went for undergrad. I went to the same place for grad school. I talked to their program. Sorry, where was this? This was at NC State in Raleigh, North Carolina. Nice. Nice. Uh, so I talked to one of their programs, one of their master's programs, and I said, "Hey, what do I how do I do this? Like, how do I get in here? what do I need to do?
I get in here? what do I need to do? What do I need to know? Um, they gave me some kind of classes I need to take to get prepared because I didn't do right enough math in undergrad. So, I had to get prepared for that for some sort of grad program. And that was a decision I made because I said, well, you know, I could I could do this on my own. I don't have to go to grad school, right? I could self-learn for and build a portfolio and build projects. It'll probably take me three to four years, right? and then I could probably break in to the space with my own learnings and my own portfolio. And I identified that and then I said or I can cut that
that and then I said or I can cut that time down to like one year just do grad school and but of course the trade-off there is money, right? I absolutely took took a loan and went to grad school and decided to go this route. Uh I would say though for me and that position and like my time value of money and the the way I made those pros and cons like 100% worth it. um po pregrad school to postgrad school I tripled my salary and of course we all love that right we all love the that's incredible the money element of the data field right exactly yeah but I also found something that I really love doing I found something that I wake up
doing I found something that I wake up every day and I'm I'm just happy to do it of course you have your days where maybe you don't love work but overall like this is a career that that I really love honestly over the last couple years I've hopped around a lot I had really focused in data science until this was September of 2023. Okay. I took the jump. Um I I found an opportunity. I've been preparing myself for this opportunity. I had started to identify that data engineering was something I was more interested in than data science necessarily. There's reasons for that, of course. But I took the leap in September 2023. I was given a a pretty challenging opportunity uh to go work and essentially
and essentially re rearchitect and rebuild a pipeline that was written in Scala Spark. Uh and I also had a team working with me. Uh so I had to learn Scala like I had to learn Scola, Hadoop, Map, Hive. I had to learn like all these different things like on the fly at this role. But it really helped me grow as an engineer. And I think that's another thing, right? You have to be curious and you have to be willing to step in somewhere that you don't know anything about and figure it out. Uh otherwise, you'll probably just fall in your face or you won't go as far as you think you might want to go.
as you think you might want to go. And that was it. Like from that point on, I haven't looked back. I I mean, I'm working as a solutions architect now, but I still very much am a data engineer at heart. Uh with some data science chops to go with. Very cool. Um, yeah, first of all, that was so concise, which I know was your goal, but also so illuminative in terms of finding out where you've been, what drives you, and kind of, you know, just all the roads that you've taken. I do want to double click on two things from what you just shared. The first being um if you could please take us back to that time where you were first exposed to Python, where you were trying to learn it. And obviously up until that time I'm sure you had basically no exposure to any of
you had basically no exposure to any of it like you know programming, math, whatever the case might be. Can you help or can you talk through um I understand the motivations that you were like you know you have to sell these these things so it makes sense that you would understand it but what were some of the things that you did that really helped you in terms of overcoming that initial you know where it just seems so overwhelming it's like an ocean there's so much where does one even start so any I'm asking this from the perspective of a person that like you at the time is not from the tech field in any way or form but is trying to teach themselves programming or whatever the case might be. So any uh tips or insights you could offer there
tips or insights you could offer there would be helpful. Oh man, people aren't going to like it. Uh you got to you got going to like it. Uh you got to be comfortable being to be you got to be comfortable being uncomfortable. Uh I think is the first thing. It's just you have to get used to existing in a place of growth uh and in living in that growth and living in being uncomfortable because it's not going to come easy. Uh even even the basic like print statement, right?
basic like print statement, right? You're like an import of a package like pandas or something with um maybe just like a basic transformation. What what I would call a basic transformation now in 2018 was not a basic transformation for me. Oh yeah, right. And then maybe a print to try to print like a like a mean of one of the columns of that data, for example. Right. That's another thing.
example. Right. That's another thing. Everything I did was based around data. Uh but I just I had to get comfortable being being uncomfortable and then I had to push myself past the like I can't do this, right? When I had when I hit the I can't do this, that's when I had to push harder. Mhm. And now, right, with with AI and with chat GBT and and of course all these other things that are flying around our heads every day, it's easy it's easy to not do that, right? It's easy to just go like, okay, like three minutes I can't figure it out. uh let me ask chatbt y like I didn't have that paste the solution in and call it a day
paste the solution in and call it a day I mean I had stack overflow so a lot of times it was like you know trying to piece together some stuff from stack overflow and see if it applied to my situation which early on it did but as you get further and further on like things stop applying to you because you're getting kind of specific and then the second thing that I wanted to double click on was you mentioned later in your journey you um found data engineering to be more fun or you were attracted to it more than data science which is what you had been doing until that point. So I am curious to explore what about data engineering uh pulled you towards it right. uh data I think data engineering makes data science science possible, right? And so a lot of also analytics I mean anything really but a
analytics I mean anything really but a lot of anything a lot of places that I had been before had not had great data engineering practices right and so because of that data science and data analyt analytics suffered uh and so you'd have data science or data analysts that were spending most of their time finding and cleaning data which is a very common issue actually uh not just the places that I was and that wasn't And right, I I just felt this really big disconnect between my title, right, and what, you know, the outside world thought I did and what I was actually doing on a day-to-day basis, right? And that that disconnect kind of was really uncomfortable for me. Even though like why does it matter, right? I mean, you have the title, you do something at
have the title, you do something at work, why does it really matter what you do at work? Like it just did like should it matter? I don't know. For me, it mattered. And also I didn't feel fulfilled in the work that I was doing every day because again it it wasn't my job. So I felt like this isn't contributing to what I think I want to be achieving. Right. Or the reason why you were hired in the first place.
you were hired in the first place. Right. Right. Was probably under the guise of like like we're going to build some models, right? And you're going to get to deploy these things out to production and blah blah blah. Right. Uh now that never really happened. It'll build a model, but the model was a one-off model that I put into a PowerPoint for some exec to present one time to them, and then that power that that model never made it into into production. So, I never felt like I was actually having an impact in that. I I was brutally honest with myself sometimes and said, "What if I just disappeared tomorrow? What if I just didn't log on tomorrow?" Right? Like, would they miss me? Would they even know I was gone? And a lot of times the answer to that was no. Like probably
answer to that was no. Like probably not. That's pretty humbling. Yeah, that sucked. I hated feeling like that. Um but it it was necessary, right? It was necessary to acknowledge that, you know, I wasn't where I wanted to be. And and that and that's not the same for everyone, right? Other people might have the experience of, you know, if they don't know I'm I'm working tomorrow or not, great. It means I can go work remotely by the pool. uh and and drink a pina colada or whatever and like have a relaxing that's not me like I'm lo when I'm logged in like I'm locked in I'm I'm wanting to get stuff done and I'm wanting to make a difference and yeah so
wanting to make a difference and yeah so I think the recurring term or theme there is just impact you know you it was important for you to make as much of an impact as possible and I think you just realized that where the most impact was just from your experience sounds like up until that point was data engineering and then naturally from where um say somebody's or trying to yeah for somebody trying to make the jump into data engineering what are some of the you know important skills any resources that you found that really helped you or maybe any tools that are often needed I know a lot of this information is just you could just Google but obviously I'm just trying to understand from your perspective any flavor that you could
perspective any flavor that you could add to what somebody's Google search might result in that would be super helpful helpful Uh, Google's going to tell you that you should have side projects. And unfortunately, that's true. You should have a side project. You should do something on your own that shows one, it shows your curiosity, and it shows that you're willing to take the time and the effort to do something that doesn't necessarily contribute, I mean, to the company you're working at, but it shows that, you know, you want to learn this and you want to do this. Uh the the second thing I'd say is is there's some there's some theory that you need to know, right? There's some things in data engineering that exist that don't exist in data science or things you don't think about in data
or things you don't think about in data science. Uh and there's some books like fundamentals of data engineering that you should probably read uh by Joe Rice. He's awesome also. Uh that you should probably read when you're like getting into the field. And then once you're in the field, right, there's a lot of other books you can read. But I say that's that's the first one. Side projects, right? Pet projects to show that you can you can do this. Um, what are some examples of good projects that people could be doing? Good question. Or if there's a resource that you're aware of, I would be happy to link it, but if you know of any, that would also be helpful.
know of any, that would also be helpful. A good project is something that you care about, right? A good project is something that you're interested in that when you talk to me about it. I I have interviewed people before like candidates for data engineering roles kind of in a similar spot there. When at this stage you don't even need to talk about optimizations. You don't even talk about hey I I cut down pipeline execution from 40 seconds to 10 seconds.
execution from 40 seconds to 10 seconds. So the project that you're showing me you pulled from an API or or you you know even even if you generate some random data um I think that's fine as long as the data is like kind of challenging to work with. It's not just like I generated three integer columns like give me something more a little bit more than that. Yeah. And then you know I went from this source system into this landing stage right I I moved it over to this spot and then this is how I did it.
this spot and then this is how I did it. So I think the just showing that you can move data and also if you can have a visual for me like an architecture visual. Yeah. Because I build those a lot at work and and that is something important that even if it's bad right it at least shows me that you're thinking about okay I need to present this to uh someone else someone else is going to have to understand what I did. So what's the best way to communicate that information? And we go back to the very beginning of this call when when I talk about how data engineering is not just engineering, it's stakeholder management, it's communication, it's product and project management. So there there's that piece of it is going to come in too when you're showing off a
come in too when you're showing off a project. Um if you're not showing code, you're showing me the the architecture diagram and you're telling me about the project. If you're showing code, then I'm going to expect some other things like comments, right? You should have your code commented. It should be documented. You should have a read me so we understand how to use it or how to walk through it. So, it's a lot of communication and documentation that's going to go into your project that as a junior you might not be thinking it's that important because you might be thinking the the code and the the performance is really important. It's actually not because your code and your performance is going to be bad because you're a junior. We all know that. So
you're a junior. We all know that. So what's actually more important is your ability to communicate what you're doing or at least what you're trying to do. If you can do that, then I can work with you on the technical piece. Uh but what I can't work with is someone who doesn't know how to communicate or doesn't want to communicate. And so what about um so yeah, just so I understand this generally the so I'm just trying to put myself in the shoes of somebody that's making a project. So I start with say a bunch of data on some S2 server that I just got off like a student discount or whatever. We'll just assume that this is still free. I know there are other
still free. I know there are other resources one could use also but yeah not to get into that too much. So we start there and then you're saying that one should use these libraries or any libraries on Python to bring them to Python itself and then like do some sort of analytics separate them clean them and etc and then throw them somewhere for somebody to see like can you help flesh out these gaps that right now exist in my head at least. Uh so yeah let's say you're you're hitting like a public uh Amazon S3 bucket. Okay. Right.
public uh Amazon S3 bucket. Okay. Right. That's where your data lives. So you're taking that and then we're going to assume in this context that the data is not ready to go into like a PowerBI or Tableau just yet. Perfect. Y that's other steps you need to do to get it into that PowerBI or Tableau. Uh so what you're going to you connect to the data in Python, you connect to that S3 bucket, right? So that's your first step. And then past that you need to decide how you're going to just build the to how you're going to just build the data whether you're just going to bring it in uh in its raw form and then send it you know transform it however you want to transform it and then send it to
want to transform it and then send it to a destination which you'll then pull from Tableau or if you want to add another layer to this we can now have three layers. We could have a bronze or a raw layer which is that raw data living in the public S3 bucket. We now have a silver or transformed layer which is you know I I did some things to clean the data and now you have a third layer which is your gold or your curated zone uh or layer and at this point this is where the data should now be ready to throw into Tableau or PowerBI and what's the difference in that third layer is usually just it's an aggregate of some form whether that aggregate is built on an hourly basis or a daily basis right
an hourly basis or a daily basis right we're assuming some sort time interval here because the data that we work with in organizations there's there's a time element to everything right I we did this many sales last week we did this many sales this week there's there's time kind of throughout uh here unless we're maybe talking about something else like biology or something right there's other fields where maybe time is not as important but in this space in most spaces like time is is the the constant yep 100% uh so we aggregate it up into the in the curated layer and And then at this point, we're ready to send it in Tableau or PowerBI. And I think it would be good in a in a project just to have
be good in a in a project just to have just some like a dashboard or a couple charts that you can show off and be like, okay, here's, you know, the data in its raw form was this. I took it through these two steps of transformations to get it into the gold layer and then here it is like plotted or charted uh in one of these tools and and you can do everything for free, right? You don't have to spend any money on any of that. And it's just a way to be able to understand that you know how to work with data and transform it from one layer to another and then present it out to some sort of audience. Amazing.
out to some sort of audience. Amazing. Yeah, that now makes complete sense and yeah, really appreciate you filling in the gaps there that existed because you know as somebody that so I work I I come from tech also but just not uh I guess the data space as much. I'm more of a you know I've been a transformation analyst. So I work more so with low code no code tool environments. It's funny because um when you were discussing that one of the tools that came to mind that I use and that you probably don't use as much just cuz you're probably more advanced is Alterx. Have you are you at
advanced is Alterx. Have you are you at all familiar with Altrix and how that works? I've come into scenarios where I'm rebuilding something that had previously existed in Altrix. Okay. Uh but I've never worked in Altrix. But I do understand that it's a a drag and drop data transformation tool essentially. Um kind of similar to data IU but a little bit less fleshed out I think than data. Yeah. Gotcha. Yeah. I don't know that one. I I've never used it. But what I was getting at was um yeah it it would not have occurred to me that essentially what Alterix does is where you can have you know a a tool or an object to clean up rows or to remove duplicates blah blah. It sounds like um what you laid out is all of that being done just
out is all of that being done just straight up in Python and yeah that that is kind of the most fundamental light bulb moment that I had which you know yeah and like that really put helps put things into perspective so I appreciate you sharing that. Um and then from there what are some tools that are must you know defin like must haves in your on your resume and then maybe if there are any like niche or lesser known tools that are only now becoming prevalent that you would obviously be aware of um that would also be appreciated. Sure. Uh, Uh, Python obviously a must. Uh, unless you're doing like there of course some people might argue me on this, but I I do think Python is a must just because every company's using Python um in some way or another and Python really is the biggest language in
Python really is the biggest language in the data engineering space right now. Uhhuh. There's of course other languages like Golang and Rust uh that are also kind of gaining popularity in this space. And then within Python, um, think about something like polers, which is a a Python polers as a package, which is essentially is is replacing pandas as, okay, as something that can handle a lot more data, right? process a lot more data in the same amount of time that happens to it. Uh and there's there's reasons behind that that we won't get into now and how those work behind the scenes but polar and also um duck db duck db is also a really good tool. just again within Python within the Python umbrella but another good
the Python umbrella but another good tool to have an understanding of those those are the two maybe like lesser known that I would speak to but again they require like a Python knowledge right that it is kind of umbrella within Python what I am curious about however is um when we were thinking earlier I know you shared that um you happen to be one of the younger uh you know employees in the solutions architect domain so I'm very curious to explore how all of your expertise as it pertains to data science/engineering kind of in some way must have bubbled up to make you go down this route. So I am curious to explore what that looks like and you know what drew you towards this and if this is a
drew you towards this and if this is a viable option for more I mean I know you can't really speak for the gen pop but for most data engineers would this type of path make sense? Sure. So I think I also want to level set that maybe one of the younger ones. I'm currently at Snowflake. Uh so maybe one of the younger solutions architects at Snowflake. Uh maybe each company is going to have a different idea, right, of what they want from their solutions architect. I've known a lot of people that have gotten hired out of college like 21, 22, 23 as solutions architects at different companies. Gotcha. So those companies maybe will have some sort of onboarding and training program that'll last six months to a year where they teach you all about their product,
teach you all about their product, right? And then you become a solutions architect at that company because they took you from the ground up to teach you all about that product. Uh and I think that's those those are also, you know, really great roles to get into. Um I have friends that have done that have come out really well from those programs. currently um as a solutions architect at Snowflake. The the landscape here is a little bit different because it's not it's not just that I'm a solutions architect at Snowflake, right? It's that I majored in communication that I used to be a data scientist. It's that I've I'm in a data engineering mindset right now. And what does that mean? It means I can talk to you about anything that snowflake does
you about anything that snowflake does and not just pull from the snowflake experience but pull from the previous experience and pull from you know a previous life and then because you know because we've been here five years across data spaces I'd spent four years and two more after that in college and after college doing a lot of practice with communication and a lot of practice with talking to people. Right? We're talking about years of of storytelling and years of data. Uh, and those two things together are what make me able to do the role right now at Snowflake. Uh, and I think a lot of companies in tech will have a similar experience with solutions architect that it's not just what that company's good at, but it's what the other companies around that company are good at. Uh, and what
company are good at. Uh, and what generally what generally you think about when you're working in data engineering. Like what does a data engineer think about on a day-to-day basis, right? I need to know that. What does a data scientist think about on a day-to-day basis? Like I have to know that too. And so because of my experience, that puts me in a really good position to be where I am today where I have a mix. Right?
I am today where I have a mix. Right? Now, what I do is very heavily technical and very heavily non-technical, right? I I talk to people about their problems all day and then I also work out ways to solve those problems. Uh so I have to be able to do both. I have to be able to talk to the CTO and I have to be able to talk to the data engineer. So it that's why I am one of the why it that's why I am one of the younger ones because it takes a lot of different skills to be working as a solutions architect specifically I think at Snowflake and across tech in general.
at Snowflake and across tech in general. Makes sense. Totally. And um it does sound like it is it it feels pretty closely interrelated to me or intertwined maybe is the better word which and and I'm referring to data engineering and solutions architect stuff. So were there any skills or tools that you had to learn separately or was it more of just consolidating all of the knowhow that you've accumulated over the years to now do this specific job? I've had to get a lot better at storytelling and non-technical communication. uh because even though I I have the background um since I became a data engineer, I I'd not had to kind of flex those muscles. Uh and so as a solutions architect, I' I've had to practice a lot of storytelling, had to practice a lot of of the point A to point Z and helping
of of the point A to point Z and helping people kind of follow a story. um as well as other areas that I had never thought about like platform administration, systems administration uh from like um a logistics and a cost perspective on snowflake also when we have things like role-based access control. So we think about like uh giving or providing access roles and users to groups like those are things I hadn't thought a lot about before that I have to think a lot more about more now.
have to think a lot more about more now. So, it's more eagle-eye. I have to be able to do like 5,000 ft or 50,000 ft or whatever they call it in the business world. I have to be able to do a lot more of that uh than I used to have to do. And then I also now have to think about AI a lot more uh because I do I do have things kind of people are asking about AI and how does Snowflake use AI and how do they use Snowflake AI at their client at their company and so I have to have a lot more conversations around that. So, I'm reading some books and doing some things on the side with AI just so I'm better equipped to have
AI just so I'm better equipped to have those conversations. Which is a perfect segue to what was going to be my next question, which is that what can you share about? I understand if there's stuff that you're not, you know, supposed to talk about, but yeah, what are some things when it comes to AI in the data engineering space that you know, maybe current uh job seekers in this domain should be looking at, maybe aware of that would help them? Yeah.
aware of that would help them? Yeah. Yeah. AI is is huge right now. Uh I think it will be huge for the next couple years. Uh and that's you know I used to have different feelings about that. I used to feel like no I'm not touching AI. It's too new. It's changing too much. Uh I think AI especially generative AI with LLMs uh large language models and then MLMs uh or LMM sorry large multimodal models are are L LMMs. um they're not going anywhere. They're they are going to stick around and we need to be able to know how to have those conversations. We need to also understand the limitations of what Genai can and can't do. Uh and so we can help ourselves by there's a couple
help ourselves by there's a couple courses online like one is by hugging face another is by Stanford. Uh I definitely recommend both of those different levels of technicality of course but both of those are very helpful courses to jump into as well as the uh AI engineering book by by Chip Huan. I think just having having your vocabulary sound and understanding how to talk about AI uh and and the different methods to to fine-tune or to prompt a model and which models are are meant for which situations. I think you need to be able to have those conversations even if you don't have a project with AI. I'm sure that would be helpful also. But even if you don't have a project with AI, you can at least talk about it and you're not making things up, right? You you're talking about it
up, right? You you're talking about it and you know that you're saying the right things uh and and approaching it from the the seat of this is just something else I have to learn, right? Don't treat it like a fad. Don't treat it like magic. Don't treat it because it's not any of those things. It is just now another topic that we need to understand. You would be kind of better off to also understand NLP, natural language processing. Uh because that's kind of the predecessor to LLMs.
kind of the predecessor to LLMs. Exactly. Is just, you know, summarization. Summarization existed before chat GPT. Uh and then relating one thing to another with just a big body of text existed before chat GPT. So if you can understand some of those kind of precursor concepts as well, it' be very helpful. super great resources you threw out there. I'll be sure to link them in the show notes for anybody that's curious and yeah u really appreciate you taking the time today Sam. I've learned so much um and you know as somebody that's you know not really from even this field and so that gives me a decent amount of confidence that even somebody that is would would still be able to take away so much. So really appreciate you taking the time and just wanted to add one last
the time and just wanted to add one last thing that I talk to a lot of guests because of this and as a consequence of the podcast and um very rarely do I come across people that actually care about what they're doing like at least when it comes to their professional life or their work you know like cuz most people will be really obsessed with say it's something that they're doing for themselves or as a hobby or whatnot and it's yeah it's just rare and it's so refreshing honestly to um find that there are in fact people that you know love working and that actually believe in the work that they do and they put themselves in positions where they are able to do the type of work that they
able to do the type of work that they want to do. So yeah just wanted to share this that random observation. So yeah thanks so much man really really appreciate it. I appreciate it and I think that's um the what you just said. I work hard and I care about my work but that puts me in a position where I can do the work that I want to do. I think is exactly why I work hard and work extra and do things on the side when I can. Appreciate that brings us to the end of that episode with Sam Lafell. If you would like to support me, the easiest way to do that is by subscribing on YouTube and leaving me up to a fivestar rating on Spotify or any of
fivestar rating on Spotify or any of your favorite podcast apps. It also goes a long way if you go and share this with your friend or if you tell somebody that hey I just found my new favorite podcast. Thank you all for your time. New episodes every Wednesday. Catch you all in the next one.
Transcript-backed moments
A few lines worth stealing before you hand over the full hour.
For somebody trying to make the jump into data engineering, what are some of the important skills? Any resources that you found that really helped you or maybe any tools that are often needed? I
maybe any tools that are often needed? I have interviewed people before like candidates for data engineering roles. If you don't have that real experience that you can show me, then the next best
that you can show me, then the next best thing you can do is you can tell me thing you can do is you can tell me about a project that you worked on. I'm Naman Pandy and in this episode featured
Naman Pandy and in this episode featured not expert is Sam Lafell. Sam is a solutions architect at Snowflake and has extensive past experience in data engineering which is the subject of our
engineering which is the subject of our discussion. I said let me learn enough about Python to at least have these conversations and that one decision fundamentally changed the trajectory of
Show notes
Data engineering is the role people find after they get tired of vague 'learn data' advice. Sam makes the path concrete: what the job really asks for, which tools matter, and how to get hired without pretending you woke up fluent in all of it. If data science has started to feel crowded, this is a cleaner door in.
More in Career Tactics
Same mess. Different guest. Pick the next conversation that feels closest to your real life.
