ML&DL Bootcamp Berlin- Recommender System

How To Build Recommender System With Machine Learning And Deep Learning

Do you know what is the most requested topic in machine learning and deep learning in Berlin? 

There are numerous e-commerce companies are based in Berlin, there are numerous job opening to hire data scientists to build a recommender system for their platform?

Learn how to build recommender systems from our trainer from London. Stylianos Kampakis spent over eight years at teaching, training coaching Data Science, Machine Learning and Deep Learning.

Automated recommendations are everywhere – on Netflix, Youtube, Zalando app and on Amazon. Machine Learning Algorithms learn about your unique interests and show the best products or content for you as an individual.

 

However, do you actually know how does the recommender system work? Do you know there are several ways to build a recommender system? Do you want to learn all of them? Recommender systems are complex, but it is for sure to be able to start in 1-2 days.

Learn a hands-on; you’ll develop your own framework for evaluating and combining many different recommendation algorithms together, and you’ll even build your own neural networks using Tensorflow to generate recommendations from real-world cases.

We’ll cover:

  • Building a recommendation engine
  • Evaluating recommender systems
  • Content-based filtering using item attributes
  • Neighborhood-based collaborative filtering with user-based, item-based, and KNN CF
  • Model-based methods including matrix factorization and SVD
  • Applying deep learning, AI, and artificial neural networks to recommendations
  • Real-world challenges and solutions with recommender systems
  • Case studies
  • Building hybrid, ensemble recommenders

Who should join? 

  • Software developers interested in applying machine learning and deep learning to product or content recommendations
  • Engineers working at, or interested in working at large e-commerce or web companies
  • Computer Scientists interested in the latest recommender system theory and research

What You Need to Know about Data Mining and Predictive Analytics

What You Need to Know about Data Mining and Predictive Analytics

 

Have you ever wondered how Netflix knew to suggest that new sci-fi comedy that’s now your go-to binge watch? How does the service keep making smash-hit original shows? It’s not because its programming team is really good at throwing darts at an idea board. Netflix seems to know you because it actually does.

Marketers are living in the world of big data. One of the greatest challenges they face isn’t getting information on consumers. Rather, it’s pulling something useful from those gigantic stores of data. Two methods of digging out useful insights are data mining and predictive analytics.

Data mining and predictive analytics are sometimes confused with each other or rolled together, but they are two distinct specialties. As you examine the big data your company collects, it’s important you understand the differences between data mining and predictive analytics, the unique benefits of each, and how using these methods together can help you provide the products and services your customers want.

What is data mining?

Much of what you do produces data. Did you use a loyalty card last time you went grocery shopping? You can bet the grocery store was eager to collect all the information it could about this specific trip and your buying habits. Your credit card company got in on the game, too. Then, after you put the groceries away and sat down to watch your new favorite sci-fi show on Netflix, the media giant was learning about you through data points.

What happens to all of this data? How do your grocery store, your credit card company, and Netflix use it to give you more personalized service? How do they use it to encourage you to buy more?

Data mining plays a key role in this process.

Investopedia has an excellent definition of data mining: It’s “a process used by companies to turn raw data into useful information. By using software to look for patterns in large batches of data, businesses can learn more about their customers and develop more effective marketing strategies.”

In other words, data alone is pretty useless, even if you have massive amounts of it. To make any sense of the data, you need a system of organizing it, and then searching for patterns and insights. That’s exactly what data mining does, and it’s important to understand some data mining techniques and how they work.

Data mining is all about organizing and interpreting data.

If you own an online clothing retail shop, you obviously need to understand your customers as well as possible so you can offer them the clothing choices they want. When customers log on to your site, you can use cookies to track their activities. You’ll see data points that may include:

  • What time they visited your site
  • What device they used to access your site
  • Which pages they visited
  • Which items they put into their shopping cart
  • Which items they purchased together
  • Whether they compared items
  • How often they come back to your site

This is only a fraction of what you can learn about a single person. Think about what you could learn from all the visitors who land on your site each day. Once you’ve captured all that information, it’s time to process and use it.

Step One: Data Warehouse

Unsurprisingly, the first step in the data mining process is collecting all of that information and electronically storing it in a data warehouse. A warehouse can exist on a company’s private server or on the cloud.

Step Two: Organization

There’s no way you can glean useful insights from unprocessed data. Many companies choose to hire a data scientist to create organizational rules for the data warehouse.

Step Three: Insights

With the right organization, you can use specialized software to begin identifying patterns and trends in your data. For example, you may discover that women aged 30 to 35 from Massachusetts are more likely to buy Product B if they first purchase Product A. It stands to reason that if someone in that demographic purchases Product A, you should create an algorithm on your site that encourages them to buy Product B as well.

Use data mining to get to know your customer

The more you know about your customers, the better you can serve them. Effective data mining allows you to:

  • Discover patterns in massive amounts of data that would be impossible for a human alone to comb through
  • Make better purchasing and pricing decisions
  • Market more effectively and more personally to consumers

The results of data mining are easy to predict. You save on costs, increase your ROI, and impress your happy, loyal customers. Here’s one more big benefit of data mining: It is essential for effective predictive analytics.

What is predictive analytics?

Data mining gives you the insights, but what are you going to do with this information? In many ways, predictive analytics is the logical continuation of data mining. Predictive analytics is the means by which a data scientist uses information, which is usually garnered from data mining, to develop a predictive score for a customer or for a certain event to occur.

Companies often use these predictive scores to:

  • Assign a consumer a lifetime value based on how much they are predicted to spend with a company
  • Determine the best next offer to a customer based on demographic information and past actions
  • Develop marketing models for future ad spends
  • Forecast future sales numbers

One good way to understand how predictive analytics works is through an event roughly 64% of Americans have faced: applying for a mortgage. Banks, understandably, don’t want to give mortgages to risky applicants who may default. Therefore, when potential homeowners come in to request a mortgage, they have to give the bank lots of information, including:

  • Current income
  • Employment status
  • Savings-to-debt ratio
  • Credit score

The bank uses this information to predict whether the applicant would be a low or high risk for a mortgage. It also uses the information to determine how much money and what interest rate it is willing to offer the applicant. Of course, banks will never be able to predict with perfect accuracy who will pay their mortgage and who will not. The 2007–2008 housing crisis demonstrated the fallibility of bad predictive models. However, strong predictive analytics can certainly improve decision-making and overall accuracy.

Predictive analytics works off of good, clean data.

How is Netflix so good at pinpointing the right show for you, and how does it decide which new shows to greenlight for its viewers? Good predictive modeling requires three important predictive analytics tools:

Data

The first ingredient for predictive analytics is good data. According to Thomas H. Davenport in the Harvard Business Review , “Lack of good data is the most common barrier to organizations seeking to employ predictive analytics.”

Statistical Analysis

Not just anyone can dive into mined data and figure out whether a grocery store should increase its order of Pop-Tarts by 25% for the third quarter. Many large companies hire data scientists to carefully comb the data and pull out correlations and predictions. This is most often done using a method called regression analysis.

Educated Assumptions

Every predictive analysis is undergirded by certain assumptions, which must be monitored and updated over time as trends and opinions change. One of the reasons banks were so willing to approve mortgages so often in the early 2000s, even for applicants with low income and poor credit, was because they operated under the assumption that housing prices always go up. As soon as housing prices started to sink and overstretched customers went underwater, defaults skyrocketed. This outcome can largely be blamed on basing decisions off unsupported assumptions.

Your company benefits from predictive analytics.

It’s invaluable to know what your customers are most likely to do, what they are most likely to want, and how much they’ll likely spend to get it. With the right information, predictive analytics can dramatically improve your marketing success by helping you to find the right audience at the right time at the right place with the right message.

Your recent Netflix binge of that recommended sci-fi show is proof that predictive analytics works.

How should you use data mining and predictive analytics?

Both data mining and predictive analytics deal with discovering secrets within big data, but don’t confuse these two different methodologies. The best way to understand how they differ is to remember that data mining uses software to search for patterns, while predictive analytics uses those patterns to make predictions and direct decisions.

In this way, data mining often functions as a stepping stone to effective predictive analytics. While data mining is passive and provides insights, predictive analytics is active and offers clear recommendations for action.

As a marketer, you need both as you navigate the world of big data. Yes, that avalanche of information can seem intimidating, but rather than running away, embrace it. Tools like data mining and predictive analytics can give you priceless insights into consumers, as well as into greater trends in your industry.

With the help of data mining and predictive analytics, you can save money, increase your ROI, and potentially convince your customers you’re a bit psychic — just like Netflix.

Check out our upcoming Machine Learning Bootcamp- Predictive Analytics on the 28th of November in Berlin.

Secure your spot NOW.  

 

About the Author:

Jessica Bennett  is a writer, editor, and novelist. Her clients span a number of industries, and she’s written blog posts, product descriptions, articles, white papers, and press releases— all in the name of inbound marketing. She’s proud to be Inbound Certified, but her VP of Morale, Avalon, doesn’t quite get what all the fuss is about. But he’s a rabbit, so you can’t really blame him.
The original post was from Salesforce. You can find it here.

What Happens When You Hire a Data Scientist Without a Data Engineer? Guest Post by Vladislav Supalov

Hey Folks,
Vladislav‘s photo
I’m Vladislav! If you care about AI, machine learning, and data science, you should have heard of data engineering. If you haven’t, or would like to learn more – then this is *exactly* for you. Helping companies to make use of their data is a fascinating topic! I’ve spent quite a bit of time building MVP data pipelines and would like to help you avoid one of the worst mistakes you can make when starting out on a serious project.
Having solid data plumbing in place is pretty darn important if you want to work with company data without wasting time and money. The natural train of thoughts when people want to make use of data “the right way”, usually ends at “we should hire a data scientist”.
That’s a mistake in almost every case. You need to take care of data engineering before that. Here are a few of my favourite pieces of writing on the topic:

What Happens When You Hire a Data Scientist Without a Data Engineer

This one is brief but worth a read. The most important points made, is the wasted time and an observed high tendency for a data scientists who are not given the right tools to quit.
A complete story of getting an analytics team up and running within 500px. Samson did a lot of stuff right, which is admirable. Take note of the tech choices, Luigi, in particular, to get data into a data warehouse. A great example of a well-thought-out way to work with data. One of the major mistakes he points out: not putting enough effort into data evangelism.

Your Data Is Your Lifeblood — Set up the Analytics It Deserves

An utterly amazing interview, full of great advice. I especially love that he points out that you should take care of making both event and operational transaction data available. Only if you combine them, you have a complete picture.
A very long interview with the Head of BI at Stylight. Konstantin did an impressive job in his first year and shares a lot of insight. This is not exactly about data engineering but on the topic of giving a company access to data and how to approach it. One of the most important takeaways for me was his advice to secure a small win for as many people as possible in the company when starting out. There are a lot of low-hanging fruits and you get the best ROI and a lot of goodwill from making them available.
Hope you’ll get a lot of value from those articles! If you want to learn more about data engineering, data pipelines and the stuff I do, scroll to the bottom of the last article and subscribe to Beyond Machine and  Vladislav‘s mailing list.

The Secret Behind one of the biggest online marketplace “OLX Group”  How do they utilize their data and machine learning algorithm? 

OLX Group is a global online marketplace operating in 45 countries and is the largest online classified ads company in India, Brazil, Pakistan, Bulgaria, Poland, Portugal, and Ukraine. It was founded by Alec Oxenford and Fabrice Grinda in 2006.

A platform that connects buyers and sellers in more than 40 countries and has hundreds of millions of customers per month faces many challenges that are to some extent similar but also somewhat different to online retail.

There are main 3 challenges: 

Challenge 1: User experience. When the user navigates the platform and what are the recommendations and the results when doing searches, etc.

Challenge 2: Identifying what is that makes some advertisements much more liquid (easy to sell) than others.

Challenge 3: The reminder after purchasing the items. Predicting if an item is sold 15 days after its entry into the system.

 

As part of the solution for a good user navigation and browsing experience, it is useful to have a good estimate if a specific advertisement has been already sold so that we don’t show it again in the recommendation or search output. This is a probabilistic time-series prediction problem. Another important aspect connected to the previous case is identifying what is that makes some advertisements much more liquid (easy to sell) than others. For this particular case, understanding how the model is making decisions is really important as the outcome can be provided to the sellers in order to improve the liquidity of their advertisements. For the reminder of we will focus on this specific liquidity prediction problem, predicting if an item is sold 15 days after its entry in the system, and we will use XGboost and eli5 for modelling and explaining the predictions respectively.

 

 

XGboost is a well known library for “boosting”, the process of iteratively adding models in an ensemble of models that target the remaining error (pseudo-residuals). These “week learners” are simple models and are only good at dealing with specific parts of the problem space on their own, but can significantly reduce bias while controlling variance (giving a good model in the process) due to the iterative fitting approach followed in constructing this type of ensemble. The data we have available for this problem include textual data (the title and textual description of the original advertisement, plus any chat interactions of the seller and potential buyers), as well as categorical and numeric data (the category of the advertisement, the brand and model of the item, the price, number buyers/sellers interactions for each day after the entry, etc.). The data sample we are using here is a relatively small part of the data from some countries and categories only, so in many of its properties it is not representative of the entire item collection. Nevertheless, let’s start with some basic data munging.

 

The histogram of the day that an item was sold is shown above. We can easily see that most items are sold in the first days after the respective advertisement is placed, but there are still significant sales happening a month later as well. With respect to the day an advertisement is added to the platform, we can see that there is a peak on weekends, but other days are roughly at the same level. Finally, with respect to the hour, an advertisement is added to the platform, we can see in the figure below that there is a peak around lunchtime, and the second peak after work hours. One way to capture more complicated relations is to use the pairplot functionality of the seaborn library. In this case we will get the combinations of scatterplots for the selected columns, while in the primary diagonal we can plot something different, like the respective univariate distributions. We can see that the number of buyers interaction in the first day is a strong predictor if an item will be sold early or late. We can also see that category id is very important predictor as well, as some categories in general tend to be much more liquid than others. Now that we are done with the basic data munging we can proceed to make a model, using the XGboost library.

 

Using a hyperparameter optimization framework we can find out the hyperparameters that work best for this data. Since we are interested also on the output confidence of the prediction itself (and not only on the class), it is typically a good idea to use a value for min_child_weight that is equal or larger than 10 (given that we don’t loose in predictive performance) as the probabilities will tend to be more calibrated. The ranking of the features from the XGboost model is shown in the figure above. Although feature ranking from tree ensembles can be biased (favoring for example continuous or categorical features with many levels over binary or categorical features with few levels) and in addition if features are highly correlated the effect can be splitted between them in non-uniform way, this is already a good indication for many purposes. Now we select one specific instance at prediction time. Using eli5 we get an explanation of how this instance was handled internally by the model, together with the most features that where the most important positive and negative influences for this specific sample.

 

As we can see the sample was classified as being liquid, but still there was some pull down from the text properties (title length, title words, etc) which we can use to provide guidance to the seller for improving the advertisement.

XAIN- Leif-Nissen

5 Problems that we tried to solve at traditional conference

You should abandon traditional conferences but come to our open-air AI Summit

  1. It is time to break rules to only go to traditional conferences and host in hotel rooms. Why? Who wants to sit in the closed environment for the whole day and listen to a lot of keynote talks?
  2. Those conferences have talks last from 30 minutes to 1 hour. Do you know human being has only 20 minutes attention, as an organizer. why do you bother to ask speakers to prepare at least 30 minutes to talk?
  3. There are numerous tech/startup festivals, they are fun, relaxed with endless alcohol and food. We know it is amazing, but what do you bring home from the “FUN” festivals?
  4. Finally, those conferences/shows cost you a fortune.

Agenda is announced on the website

Final Chance to RSVP!

MIE Summit

The founding story behind Beyond Machine (rebranded from M.I.E)

Beyond Machine (rebranded from M.I.E) was spawned from Lele and Irene constant frustration during the founding of their AI startups. In the end of 2015, they both left their jobs at rising mobile ad tech and product companies. Lele first started SoCrowd and pivoted to Deckard after 3 months. Irene wanted to tackle the challenges of visual recognition.

It quickly became apparent that there was a need for a more developed community an outlet for media around Machine Intelligence. After running a fruitful and inspiring Evening Summit, they decided to take Beyond Machine. to the next level, founding a media company.focus on the training, re-education, and networking in the field of AI and innovative technology. Irene decided to leave FindEssence, the first company she co-founded and push forward the growth of Beyond Machine.

 

Beyond Machine’s mission: To connect people in AI industry globally, bringing profound and engaging content and to start a conversation about job substitution issues.

The blurb of M.I.E Summit 2017:

M.I.E. Summit Berlin 2017 is the World’s first open-space machine intelligence summit, which will be held on the 20th of June 2017.

This event will give you the opportunity to learn, discuss and network with your peers in the MI field. Backdropped in one of Berlin’s most vibrant and artistic locations, break free from traditional conference rooms and share a drink in a typical Berliner Biergarten.

The M.I.E Summit Berlin 2017 will provide you with two in-depth event tracks (keynotes, workshops, and panels) as well as over 25 leading speakers and unparalleled networking opportunities.

 

Agenda is announced on the website