Data engineering may not be a ‘new’ field per se, but it is one that has seen tremendous growth in the past decade, and has steadily risen in importance within organizations of all stripes. As a result of the increase in demand, people with a variety of technical and mathematical backgrounds have been drawn to the profession. Given the requirements of the data engineer profession, it’s not surprising that many software engineers have also taken up the gauntlet and transitioned to data engineering jobs. In this blog, we’ll discuss what some of the requirements of the data engineering profession are, how they relate to software engineering, and talk about some tips for making the transition.
Want to learn more about the move from software engineer to data engineer?
Check out this free eBook from O'Reilly, 97 Things Every Data Engineer Should Know, Chapter 52 from John Salinas, a software engineer turned data engineer at USAA
Why switch from software to data engineering?
Data engineering is appealing as a career option for many reasons. After all, it’s stimulating, cutting edge, and in high demand (which of course translates to higher salaries and job security). But all of this is of course true of software engineering too. So what’s the appeal?
While it’s true that average salaries are a bit higher for data engineers, this is far from the only benefit. As data engineer John Salinas put it, “The move from software engineering to data engineering is rewarding and exciting.” First, you get to retain all of the elements that make software engineering a great career, such as solving technical challenges in ways that maximize simplicity for end-users.
But, as Salinas also put it, “you expand your craft to include analytical and data-related problems.” This includes looking at business from a highly dynamic perspective. While software certainly needs to be updated and redesigned, data infrastructure has to adapt on a daily, even hourly basis to a shifting landscape of data inputs and business needs. And, you get to design new paradigms that ultimately will become the data backbone of new software applications.
Data science is a hybrid field that combines the skills of a statistician, a business expert and a programmer. Aside from the possibility that the skills associated with the programmer may be the most difficult to attain, software developers also bring an understanding of data structure that you won’t get in a college stats class, where they don’t teach about arrays, data frames, stacks, queues. All this knowledge is necessary to make the leap from comprehending structured data stored in a tabular database to the more ethereal concepts of Big Data.
Skills for the transition
A software developer already has a good chunk of the skills that make a great engineer in their toolkit, such as:
Understanding of networks
Programming (especially if you’re fluent in Python)
With that said, there’s still a bit of a learning curve, even for a software engineer. For instance, you do need to bring an understanding of statistics. For instance, if you’re building a data pipeline, do you know what data to clip? It’s critical to know, from a statistical standpoint, what constitutes an outlier. And this question may be something you need domain expertise to answer as well.
Here are a few areas to check your knowledge and skills, and if necessary, ramp up/brush up:
Statistics. A data engineer may not need to have the analytics chops of a data scientist, but as you’re going to be providing data for this audience, you have to have a pretty solid understanding of their needs. That means brushing up on stats. Fortunately, there are a ton of online courses to help with this. If you don’t know where to start, you might check out Khan Academy.
Data warehousing platforms. This is where it gets a little complicated, because there’s a plethora of platforms, and the ones that are required are going to vary greatly from company to company. However, we did a bit of research into the top skills for a data engineer, and some of the big ones that came up commonly in job descriptions were: Snowflake, Amazon Redshift, Google BigQuery, BM Db2, Oracle Autonomous Data Warehouse
Data visualization platforms. The top ones tend to be Tableau, PowerBI, Looker, and Domo. The good news is that if you master one of these, it’s quite easy to transition to another one. So pick one and master it.
People skills. Data engineers, scientists and analysts all have to provide information that’s useful to line of business users. That means you have to be able to deal with a lot of different people, who may have different priorities, skills, and levels of understanding. It may not require the level of soft skills required by sales or marketing, but it nonetheless requires an ability to work well with diverse teams.
Good luck with your journey from software engineer to data engineer! If, along the way, you want to learn how to make data engineering much easier, and sidestep such complicated processes as ETL, learn about the data fabric.