Guest Post by Vladislav Supalov
I’m Vladislav! If you care about AI, machine learning, and data science, you should have heard of data engineering. If you haven’t, or would like to learn more – then this is *exactly* for you. Helping companies to make use of their data is a fascinating topic! I’ve spent quite a bit of time building MVP data pipelines and would like to help you avoid one of the worst mistakes you can make when starting out on a serious project.
Having solid data plumbing in place is pretty darn important if you want to work with company data without wasting time and money. The natural train of thoughts when people want to make use of data “the right way”, usually ends at “we should hire a data scientist”.
That’s a mistake in almost every case. You need to take care of data engineering before that. Here are a few of my favourite pieces of writing on the topic:
This one is brief but worth a read. The most important points made, is the wasted time and an observed high tendency for a data scientists who are not given the right tools to quit.
A complete story of getting an analytics team up and running within 500px. Samson did a lot of stuff right, which is admirable. Take note of the tech choices, Luigi, in particular, to get data into a data warehouse. A great example of a well-thought-out way to work with data. One of the major mistakes he points out: not putting enough effort into data evangelism.
An utterly amazing interview, full of great advice. I especially love that he points out that you should take care of making both event and operational transaction data available. Only if you combine them, you have a complete picture.
A very long interview with the Head of BI at Stylight. Konstantin did an impressive job in his first year and shares a lot of insight. This is not exactly about data engineering but on the topic of giving a company access to data and how to approach it. One of the most important takeaways for me was his advice to secure a small win for as many people as possible in the company when starting out. There are a lot of low-hanging fruits and you get the best ROI and a lot of goodwill from making them available.
Hope you’ll get a lot of value from those articles! If you want to learn more about data engineering, data pipelines and the stuff I do, scroll to the bottom of the last article and subscribe to Beyond Machine and Vladislav‘s mailing list.