Why is it important to know Data Science? If you are a Product or Business Decision Maker

By Dr. Stylianos (Stelios) Kampakis

Data science can be difficult to understand. It is a field that spans a history of more than 100 years, and encapsulates artificial intelligence, machine learning, statistics and other disciplines that tried to solve the problem of extracting useful insights from data. Decision makers in small, medium and big enterprises are encountered with difficult technical dilemmas all the time. It doesn’t matter whether you are a CEO of a small company, a startup founder, a manager in a big corporation or a product manager, you will face questions such as :

  1. How can I use data science?
  2. How is technology X (e.g. Natural Language Processing, or Deep Learning).
  3. Should I go for commoditized services or develop machine learning capabilities in-house?
  4. How should I choose which data scientist to hire?

Decision makers rarely have the time to spend days educating themselves about a subject. What they need, is distilled information which will help them make an nformed decision as fast as possible. This is why we partnered with Stylianos from The Tesseract Academy We developed a Bootcamp that teaches all the basics of data science in half a day. However, I realised, through the feedback of my clients, that they needed something to accompany the seminar. Something which would help solidify the knowledge.

You can join our Data Science Bootcamp For Decision Maker in Berlin on the 29th of March. Apply here.

DECISION MAKER’S HANDBOOK TO DATA SCIENCE

This is why I wrote the Decision Maker’s Handbook to Data Science. In the 142 pages of this book I outline topics such as:

  • History of data science.
  • Differences between AI, machine learning and statistics.
  • Data management and building the right data strategy.
  • Thinking like a data scientist (without being one).
  • Hiring and managing data scientists.
  • Building the right culture for data science.

Obviously, a book is not the same as an interactive session where we go through exercises and participants can ask questions. But it is the next best thing. The book always takes the perspective of the non-technical decision maker, making difficult concepts simple and explaining how they can be used in business in order to extract maximum value.

You can join our Data Science Bootcamp For Decision Maker in Berlin on the 29th of March. Apply here.

What Can Predictive Analysis Help Your Company?

What is predictive Analysis?

Predictive analytics involves advanced statistical, modeling, data mining and one or more machine learning techniques to dig into data and allows analysts to make predictions. Predictive analytics is used to forecast what will happen in future.

For any kind of business, the ability to generate predictive analytic which enables businesses to identify potential events and opportunities, and either avoid or capitalize on them, as the case may be. In order to identify the indicators of events and opportunities, the use of data to make predictions that benefit the business.

The real value of predictive analysis can best be illustrated by describing the major use cases that exist in business today, below are the main 7 use cases and applications.

1. Churn Prevention

According to Neil Patel’s article on their website, the most famous SEO and online traffic expert. Predict churn can help massively for subscription model business.

2. Customer Lifetime Value

Measuring customer lifetime value is highly used on retailer and media industries. Predict customer lifetime value can facilitates marketing decisions and budget.

3. Product Propensity

Propensity modeling tracks customers buying habits as well as other actions such as opening a marketing email, signing up to a loyalty program, or participating in feedback surveys. The model correlates customer characteristics with anticipated behaviors or propensities.

4. Sentiment Analysis

Sentiment analysis is the process of determining the emotional tone behind a series of words, used to gain an understanding of the attitudes, opinions and emotions expressed within an online mention.

Sentiment analysis allow retailers and brands alike to understand the opinions of consumer feedback and User Generated Content. Is it negative or positive? What is the context of their opinion? Are they talking about the product or just a feature within the product?

As a result, more and more companies are using sentiment analysis to make sense of the huge amount of consumer feedback that is coming their way and it helps to drive conversion.

Shop.com, for example, is using sophisticated sentiment analysis and Artificial Intelligence technologies to analyze online opinions about the products they sell. Moreover, those opinions are then displayed on product pages and across the site and are turned into actionable insights and recommendations. Now shoppers can find the answers according to their own specific needs.

5. Up- and Cross-Selling

Cross-Selling

Amazon attributes up to 35% of its revenue to cross-selling – both the “Frequently Bought Together” and “Customers Who Bought This Item Also Bought” 

Up-Selling

 JetBlue’s “Even More Space” initiative, which allows passengers to buy seats with more legroom. This upsell was projected to net JetBlue $190 million in additional revenue in 2014 (compared to $45 million in 2008, when it first launched). Even broader, let us consider statistics from the travel industry: 48% of airline passengers and 59% of hotel guests are interested in upgrades and additional services

The Value of Predictive Analytics

Those use cases mentioned above are widely needed in the current businesses. If your company has not started to have a plan to implement predictive analysis, the company will suffer extremely lose of money. With predictive analytics, your company can have a proactive approach to the increase the revenue and reduce the costs. Predictive analysis can help you to plan for the future, and identify new areas of business.

How To Build Recommender System With Machine Learning And Deep Learning?

Building Recommender Systems using different approaches : Deep Learning and Machine Learning?

The most requested application in machine learning and deep learning in Berlin? There are numerous e-commerce companies are based in Berlin, there are numerous job opening to hire data scientists to build a recommender system for their platform?

Learn how to build recommender systems from our trainer from London. Stylianos Kampakis spent over eight years at teaching, training coaching Data Science, Machine Learning and Deep Learning.

Automated recommendations are everywhere – on Netflix, Youtube, Zalando app and on Amazon. Machine Learning Algorithms learn about your unique interests and show the best products or content for you as an individual.

However, do you actually know how does the recommender system work? Do you know there are several ways to build a recommender system? Do you want to learn all of them? Recommender systems are complex, but it is for sure to be able to start in 1-2 days.

Learn a hands-on; you’ll develop your own framework for evaluating and combining many different recommendation algorithms together, and you’ll even build your own neural networks using Tensorflow to generate recommendations from real-world cases.

We’ll cover:

  • Building a recommendation engine
  • Evaluating recommender systems
  • Content-based filtering using item attributes
  • Neighborhood-based collaborative filtering with user-based, item-based, and KNN CF
  • Model-based methods including matrix factorization and SVD
  • Applying deep learning, AI, and artificial neural networks to recommendations
  • Real-world challenges and solutions with recommender systems
  • Case studies
  • Building hybrid, ensemble recommenders

Who should join? 

  • Software developers interested in applying machine learning and deep learning to product or content recommendations
  • Engineers working at, or interested in working at large e-commerce or web companies
  • Computer Scientists interested in the latest recommender system theory and research

How To Hire Data Scientist For Your Company?

What Does Data Scientist Do? 

A data scientist is someone who makes value out of data. Such a person proactively fetches information from various sources and analyzes it for a better understanding of how the business performs, and to build AI tools that automate certain processes within the company.

There are many definitions of this job, and it is sometimes mixed with the Big Data engineer occupation. A data scientist or engineer may be X% scientist, Y% software engineer, and Z% hacker, which is why the definition of the job becomes convoluted. The actual ratios vary depending on the skills required and type of job. Usually, it’s considered normal to bring people with different sets of skills into the data science team.

Data scientist duties typically include creating various machine learning-based tools or processes within the company, such as recommendation engines or automated lead scoring systems. People within this role should also be able to perform statistical analysis.

In this article, we present a sample data scientist job description, for you to adjust depending on your actual needs to create a perfect job advertisement, and to find the person that will help you get the answers you are looking for.

Data Scientist – Job Description and Template

Company Introduction

{{Write a short and catchy paragraph about your company. Make sure to provide information about the company culture, perks, and benefits. Mention office hours, remote working possibilities, and everything else you think to make your company interesting. Data scientists like to take challenges – anything that shows how the role could make an impact might help attract top talent.}}

Job Description

We are looking for a data scientist that will help us discover the information hidden in vast amounts of data, and help us make smarter decisions to deliver even better products. Your primary focus will be in applying data mining techniques, doing statistical analysis, and building high-quality prediction systems integrated with our products. {{Depending on your needs, you can write very specific requirements here, like: “automate scoring using machine learning techniques”, “build recommendation systems”, “improve and extend the features used by our existing classifier”, “develop internal A/B testing procedures”, “build system for automated fraud detection”, etc.}}

Responsibilities

  • Selecting features, building and optimizing classifiers using machine learning techniques
  • Data mining using state-of-the-art methods
  • Extending the company’s data with third-party sources of information when needed
  • Enhancing data collection procedures to include information that is relevant for building analytic systems
  • Processing, cleansing, and verifying the integrity of data used for analysis
  • Doing ad-hoc analysis and presenting results in a clear manner
  • Creating automated anomaly detection systems and constant tracking of its performance
  • {{Select from the above and add other responsibilities that are relevant}}

Skills and Qualifications

  • Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
  • Experience with common data science toolkits, such as R, Weka, NumPy, MatLab, etc {{depending on specific project requirements}}. Excellence in at least one of these is highly desirable
  • Great communication skills
  • Experience with data visualization tools, such as D3.js, GGplot, etc.
  • Proficiency in using query languages such as SQL, Hive, Pig {{actual list depends on what you are currently using in your company}}
  • Experience with NoSQL databases, such as MongoDB, Cassandra, HBase {{depending on project needs}}
  • Good applied statistics skills, such as distributions, statistical testing, regression, etc.
  • Good scripting and programming skills {{if you expect that the person in this role will integrate the solution within the base application, list any programming languages and core frameworks currently being used}}
  • Data-oriented personality
  • {{Mention any other technology that such a person is going to commonly work with within the organization}}
  • {{List education level or certification you require}}

What Happens When You Hire a Data Scientist Without a Data Engineer?

Hey Folks,
Vladislav‘s photo
Guest Post by Vladislav Supalov
I’m Vladislav! If you care about AI, machine learning, and data science, you should have heard of data engineering. If you haven’t, or would like to learn more – then this is *exactly* for you. Helping companies to make use of their data is a fascinating topic! I’ve spent quite a bit of time building MVP data pipelines and would like to help you avoid one of the worst mistakes you can make when starting out on a serious project.
Having solid data plumbing in place is pretty darn important if you want to work with company data without wasting time and money. The natural train of thoughts when people want to make use of data “the right way”, usually ends at “we should hire a data scientist”.
That’s a mistake in almost every case. You need to take care of data engineering before that. Here are a few of my favourite pieces of writing on the topic:

What Happens When You Hire a Data Scientist Without a Data Engineer

This one is brief but worth a read. The most important points made, is the wasted time and an observed high tendency for a data scientists who are not given the right tools to quit.
A complete story of getting an analytics team up and running within 500px. Samson did a lot of stuff right, which is admirable. Take note of the tech choices, Luigi, in particular, to get data into a data warehouse. A great example of a well-thought-out way to work with data. One of the major mistakes he points out: not putting enough effort into data evangelism.

Your Data Is Your Lifeblood — Set up the Analytics It Deserves

An utterly amazing interview, full of great advice. I especially love that he points out that you should take care of making both event and operational transaction data available. Only if you combine them, you have a complete picture.
A very long interview with the Head of BI at Stylight. Konstantin did an impressive job in his first year and shares a lot of insight. This is not exactly about data engineering but on the topic of giving a company access to data and how to approach it. One of the most important takeaways for me was his advice to secure a small win for as many people as possible in the company when starting out. There are a lot of low-hanging fruits and you get the best ROI and a lot of goodwill from making them available.
Hope you’ll get a lot of value from those articles! If you want to learn more about data engineering, data pipelines and the stuff I do, scroll to the bottom of the last article and subscribe to Beyond Machine and  Vladislav‘s mailing list.

How Does OLX Group Utilize Their Data and Machine Learning Algorithm? 

The Machine Learning Secret Behind One of the Biggest Online Marketplace “OLX Group”

OLX Group is a global online marketplace operating in 45 countries and is the largest online classified ads company in India, Brazil, Pakistan, Bulgaria, Poland, Portugal, and Ukraine. It was founded by Alec Oxenford and Fabrice Grinda in 2006.

A platform that connects buyers and sellers in more than 40 countries and has hundreds of millions of customers per month faces many challenges that are to some extent similar but also somewhat different from online retail.

There are main 3 challenges: 

Challenge 1: User experience. When the user navigates the platform and what are the recommendations and the results when doing searches, etc.

Challenge 2: Identifying what is that makes some advertisements much more liquid (easy to sell) than others.

Challenge 3: The reminder after purchasing the items. Predicting if an item is sold 15 days after its entry into the system. If you want to learn more of predictive Analysis, register our predictive analysis on Python here.

As part of the solution for good user navigation and browsing experience, it is useful to have a good estimate if a specific advertisement has been already sold so that we don’t show it again in the recommendation or search output. This is a probabilistic time-series prediction problem. Another important aspect connected to the previous case is identifying what is that makes some advertisements much more liquid (easy to sell) than others. For this particular case, understanding how the model is making decisions is really important as the outcome can be provided to the sellers in order to improve the liquidity of their advertisements. For the reminder of we will focus on this specific liquidity prediction problem, predicting if an item is sold 15 days after its entry in the system, and we will use XGboost and eli5 for modeling and explaining the predictions respectively.

XGboost is a well-known library for “boosting”, the process of iteratively adding models in an ensemble of models that target the remaining error (pseudo-residuals). These “week learners” are simple models and are only good at dealing with specific parts of the problem space on their own, but can significantly reduce bias while controlling variance (giving a good model in the process) due to the iterative fitting approach followed in constructing this type of ensemble. The data we have available for this problem include textual data (the title and textual description of the original advertisement, plus any chat interactions of the seller and potential buyers), as well as categorical and numeric data (the category of the advertisement, the brand and model of the item, the price, number buyers/sellers interactions for each day after the entry, etc.). The data sample we are using here is a relatively small part of the data from some countries and categories only, so in many of its properties, it is not representative of the entire item collection. Nevertheless, let’s start with some basic data munging.

The histogram of the day that an item was sold is shown above. We can easily see that most items are sold in the first days after the respective advertisement is placed, but there are still significant sales happening a month later as well. With respect to the day, an advertisement is added to the platform, we can see that there is a peak on weekends, but other days are roughly at the same level. Finally, with respect to the hour, an advertisement is added to the platform, we can see in the figure below that there is a peak around lunchtime and the second peak after work hours. One way to capture more complicated relations is to use the pair plot functionality of the seaborn library. In this case we will get the combinations of scatterplots for the selected columns, while in the primary diagonal we can plot something different, like the respective univariate distributions. We can see that the number of buyers interaction in the first day is a strong predictor if an item will be sold early or late. We can also see that category it is a very important predictor as well, as some categories, in general, tend to be much more liquid than others. Now that we are done with the basic data munging we can proceed to make a model, using the XGboost library.

Using a hyperparameter optimization framework we can find out the hyperparameters that work best for this data. Since we are interested also on the output confidence of the prediction itself (and not only on the class), it is typically a good idea to use a value for min_child_weight that is equal or larger than 10 (given that we don’t lose in predictive performance) as the probabilities will tend to be more calibrated. The ranking of the features from the XGboost model is shown in the figure above. Although feature ranking from tree ensembles can be biased (favoring, for example, continuous or categorical features with many levels over binary or categorical features with few levels) and in addition, if features are highly correlated the effect can be split between them in a non-uniform way, this is already a good indication for many purposes. Now we select one specific instance at prediction time. Using eli5 we get an explanation of how this instance was handled internally by the model, together with the most features that were the most important positive and negative influences for this specific sample.

As we can see the sample was classified as being liquid, but still, there was some pull down from the text properties (title length, title words, etc) which we can use to provide guidance to the seller for improving the advertisement.

Check out the other predictive analysis article here and register now at Beyond Machine: Machine Learning Predictive Analysis on the 27th of March.

XAIN- Leif-Nissen

5 Problems that we tried to solve at traditional conference

You should abandon traditional conferences but come to our open-air AI Summit

  1. It is time to break rules to only go to traditional conferences and host in hotel rooms. Why? Who wants to sit in the closed environment for the whole day and listen to a lot of keynote talks?
  2. Those conferences have talks last from 30 minutes to 1 hour. Do you know human being has only 20 minutes attention, as an organizer. why do you bother to ask speakers to prepare at least 30 minutes to talk?
  3. There are numerous tech/startup festivals, they are fun, relaxed with endless alcohol and food. We know it is amazing, but what do you bring home from the “FUN” festivals?
  4. Finally, those conferences/shows cost you a fortune.

Agenda is announced on the website

Final Chance to RSVP!

7 reasons you should join us — the world first OPEN AIR AI summit in Berlin

 

WHY SHOULD YOU COME?

  1. The statistic shows the size of the global market for artificial intelligence for enterprise applications, from 2016 to 2025. In 2016, the enterprise AI market is estimated to be worth around 360 million U.S. dollars worldwide.

2. We are the world FIRST OPEN AIR MACHINE INTELLIGENCE SUMMIT, we had two last evening summits in rooftop bars in Berlin and Paris and this summer will be held in a Biergarten. We know boring conference rooms kill creativity.

AI market estimation from till 2015

3. We can awesome speakers confirmed to talk on the stage.

  • Reiner Kraft, VP@Zalando
  • Alex Housley, CEO@Seldon
  • Romeo Kienzler, Chief Data Scientist@IBM
  • Claudio Weck, Head of Data Science@MHPLab: A Porsche Company
  • Johannes Schaback, Co-founder@Visual Meta
  • Ulrike Franke: Drone&warfare scholoar@Oxford

4. We have mentoring session hosted by TechstarTechstar SAPRockstar and other exciting accelerators/incubators.

5. We also offer workshops session hosted by IBM Watson Chief Data Scientist, Romeo Kienzler, he is also instructor at Coursera.

The description of his workshop at M.I.E Summit Berlin.

6. We are INTERNATIONAL, we are the first media&community from Berlin expanding outside of Berlin and had speakers from London, Paris, NYC, Amsterdam. Zurich…

More info, please visit our website.

7. Early bird tickets are going to be out soon! Hurry up before the end of May.

MIE Summit

The founding story behind Beyond Machine (rebranded from M.I.E)

Beyond Machine (rebranded from M.I.E) was spawned from Lele and Irene constant frustration during the founding of their AI startups. In the end of 2015, they both left their jobs at rising mobile ad tech and product companies. Lele first started SoCrowd and pivoted to Deckard after 3 months. Irene wanted to tackle the challenges of visual recognition.

It quickly became apparent that there was a need for a more developed community an outlet for media around Machine Intelligence. After running a fruitful and inspiring Evening Summit, they decided to take Beyond Machine. to the next level, founding a media company.focus on the training, re-education, and networking in the field of AI and innovative technology. Irene decided to leave FindEssence, the first company she co-founded and push forward the growth of Beyond Machine.

 

Beyond Machine’s mission: To connect people in AI industry globally, bringing profound and engaging content and to start a conversation about job substitution issues.

The blurb of M.I.E Summit 2017:

M.I.E. Summit Berlin 2017 is the World’s first open-space machine intelligence summit, which will be held on the 20th of June 2017.

This event will give you the opportunity to learn, discuss and network with your peers in the MI field. Backdropped in one of Berlin’s most vibrant and artistic locations, break free from traditional conference rooms and share a drink in a typical Berliner Biergarten.

The M.I.E Summit Berlin 2017 will provide you with two in-depth event tracks (keynotes, workshops, and panels) as well as over 25 leading speakers and unparalleled networking opportunities.

 

Agenda is announced on the website