Slides - Big Data Strategy Conference in Vilnius

Document technical information

Format pdf
Size 2.8 MB
First found Nov 13, 2015

Document content analysis

Language
English
Type
not defined
Concepts
no text concepts found

Persons

Organizations

Places

Transcript

Big Data: 3 years of fun
“If you are not digging dirt,
it is not data mining”
Sarunas Chomentauskas
CEO
[email protected]
+37 068506502
exacaster.com
Company and Team
Exacaster was founded in 2011
by a group of CRM, consumer
marketing managers and
technologists with the mission to
give marketers FREEDOM to
work directly with their customers
and data.
Our support team:
Egidijus
Pilypas, Co-founder
Pranas Vaitkus
Machine Learning
Advisor, VU Professor
Alain Glickman
Darren Ball
Lead Consultant / ex Orange
Global
Lead Consultant / es Virgin
Offices:
USA (New York/Atlanta) – sales and account
management for North America.
Lithuania (EU)– analyst teams and product
development.
Jolita Bernotiene
Sales Director
Arunas Slekys,
Hughes Network Systems
Board Member / North America entry
Delivered by a seasoned group of 17 data scientists, big data platform
engineers and analysts.
Exacaster serves
customers in four continents
Over the last 24 months, Exacaster has implemented projects in Belgium, Sweden, United
States, United Kingdom, Belize, Honduras, Paraguay, Suriname and Baltics
Large fixed
line operator
Large Scandinavian
grocery retail chain
2 leading mobile
operators in
Latin America
Product & Services Portfolio
Exacater Platform
For Telecom and SaaS industries
Built on top of Cloudera Hadoop , Exacaster Platform is a business application that tracks customers
down to transaction level, predicting their behavior with propensity models, executing model-driven or
event-triggered multi-channel campaigns direct to customer, and measuring impact via reports and KPIs–
all in one platform, done cost effectively.
We fuse 3 stand-alone concepts under a single platform:
Marketing Analytics
Propensity Modeling
Campaign
Automation
Telco Case Study
Reducing mobile churner targeting cost by 89%
XXXs indicators maintained for every
pre-paid mobile subscriber:
Market leader prepid in Paraguay: 3.5 million
customers
Data Sciences Team: Data preparation approach
and models developed by Exacaster team allowed our
South American telecom client to target 20% of
churners at 9.1 times lower cost than before.
Platform : Running it on IBM SPSS and Oracle EDW
takes 4 FTEs vs 1 FTE with Exacaster Platform.
Demographical
Usage
location, handset type,
tenure, no. of related
accounts etc
no. of calls,
amount of data,
number of bundles,
etc.
Usage trends
Billing
SMS usage trend
over last 60 days,
data usage trend
percent of calls
charged from core
balance, etc.
Product & Services Portfolio
Exacaster Platform
For Grocery and Retail
Built on top of Cloudera Hadoop , Exacaster Platform for Retail ingests POS data and runs personalized
loyalty programs via in-store Kiosks, mobile apps and web.
We cover it as an end-to-end offering:
Mobile
POS data
In-store display
External
data
Web
Loyalty & Campaign
Reporting
Predictive Loyalty
Program
Automated D2C
Communication
Case Study: Grocery offers personalization
Increased Campaign Sales by 100% in Retail
Product & Services Portfolio
Exacaster Real Time Pricing API beta
For online/e-commerce industries
Our cloud-based real-time machine learning API decides how much you should charge to maximize
your margin.
Your initial
prices for any
product
Optimized price with
maximized margin
e.g. 10 EUR
e.g. 12 EUR
Machine Learning API for
each product, each
customer
Lessons learned:
Customer realities
Technology
Team
The future
Lesson #1:
Most organizations manage to use
at best 10 percent of the possible
business value in their data.
Is this really true?
Let’s do some math…
Keyword:
sequential process
Structural fix
Human
Systems
Sensors
Data
captured
Storage
Environment
Query /
Workflow
Table
Chart
Action
Machine
learning
Ongoing fix
Let’s say each step is producing output at ___% of maximum performance:
70%
90%
99%
40%
80%
70%
70%
?
Then, performance of the entire process will be, measured at beginning of each
step:
70%
63%
62%
25%
20%
14%
10%
The results may vary…
80
70
60
50
40
30
20
10
0
Systems
Data capture
Storage
Query
Interpretation
Action
Value
… but the “physics” stays.
Despite great cost and effort at every step, the overall
analytics-to-value chain is as strong as its weakest link.
Weak link #1: Not HW or SW.
It’s what information you have!
Granular/Detailed
Records
Hard to get
Low value
Your Competition info
Government Records
Geographical Profiles
Your Customer Preferences
High value
Summaries
Easy to get
Your web traffic
Your email logs
Your app usage
Your video cameras
Your telemetry data
Your transactions
Need information?
Sorry, confidential.
Learning from the Best:
“All information that is truly harmful if revealed
has been removed by CEO decision. The rest is
freely accessible for all employees without
arguing. Netflix has hundreds of analytical apps,
serving many different departments.”
Head of Netflix Data Infra
Netflix has sales over 1 bn.
USD, est. 1997.
My employees know Excel.
Can you make a
report for me?
Learning from the best:
“Every new Uber employee must learn
SQL and obtains direct access to data.
DIY!”
From: Maksim Golivkin
Uber sales are over 1 bn USD, est. 5 years ago.
Lesson #2:
“Being near state-of-the-art means
you are going to spend most of
your time educating customers”
For example What is Machine Learning and why Big Data
is it’s father?
Statistical Modeling: Two Cultures
In 2001, Leo Breiman, one of the foremost statistical practitioners of our time, outlined two
cultures of statistics: the data modeling culture and the algorithmic modeling culture.
Data modeling
• Focuses on understanding problem domain and
generation mechanism.
• Easy to understand and interpret models with a
simplistic view of the world – but poor in prediction.
This is not machine
learning
Algorithmic modeling
• Does not try to understand the generative
mechanism, instead attempts to predict as
accurately as possible.
• Uses many weak models which in combination
make up a very strong whole.
This is machine
learning
Why Machine Learning
came of age?
10 most important variables
50 most important variables
500 most important variables
These charts from Exacaster R&D show how predictive accuracy increases across many
different ML algorithms as you provide more information.
Starting at 30% error, the experiment ends at <4% error – the only thing that changed is the
amount of information provided.
Why it matters in business?
Easy to use
Visual exploration
Machine learning
Full data search
Deep learning
Computerized
decision
SQL
Pig
R
Graph QL
Hard to use
Human
decision
ML starts with solutions to
common problems…
Guess the object’s
attributes based on other
objects
Find similar behavior
clustering
classification
community detection
Machine learning
Forecast a value
regression
anomaly detection
Guess preference
recommendation
Consider multiple factors and
find the best combination fitting
the objective
optimisation
… and becomes Powerful and
Dangerous in 2014
Identify objects in
pictures
Recognize human
conversation
Speech
recognition
Picture and Video
tagging
Deep Learning
Map meaning
between languages
Translation
Train computers to
use human interfaces
Skills training
Remember importance of
right information?
Google Book Corpus
“Our book scanning effort, now in its eighth year, has
put tens of millions of books online. Beyond the obvious
benefits of being able to discover books and search
through them, the project lets us take a step back and
learn what the entire collection tells us about culture
and language.”
Can you guess why Google
went for such a large expense?
Why it is a very Big Deal
“Whenever someone has used a deep learning model to tackle one of the challenges, it
has performed better than any model ever previously devised to tackle that specific
problem”
Jeremy Howard, Kaggle
Word 2 Vec: a revolution
in computerized linguistics
Which phrase does not fit?
Why it is a very Big Deal
Why it is a very Big Deal
Why it is a very Big Deal
Why it is a very Big Deal
The training dataset behind
395,909 tagged images
Our aim:
Replacing human decisions in business
one workflow at a time
People hate boring, systematic, repetitive
actions – welcome to the world of lower-level
management decisions.
Potential for enhancement with machine learning
is huge.
Automate the whole process from Data to Action
– our mission at Exacaster.
Lesson #3:
Technology only matters AFTER
you know what you want to do
Then it’s easy – because most likely
you are not alone
Big Data means challenges in
every data processing aspect
We may have 10x even
100x more fine-grained
logs and information but
querying them fast and
cheap is not simple
Internal and external data
sources must be combined,
but how to maintain?
variety
volume
“We want to keep all data
we can keep”
velocity
Fast, live data is needed
alongside with historical
RDBMS technology has its
limitations
Hard drive will never spin faster, because the hard drive edge starts exceeding
Mach 1 and breaks sound barrier. It breaks, too.
Therefore, new technologies have arrived: Map Reduce, No-SQL, MPP, In-memory, Direct To Flash –
each optimized for specialized tasks. Analytics is getting very specialized… and expensive.
Mach 1:
340.92 m/s
The task chooses the tool
ML workflow example
Process
Status API
Data configuration
Evaluation data prep WF
Prediction data prep WF
RF Prediction
Evaluation of previous
predictions
Accuracy
metadata
List of scores
Train data prep WF
Training
Key variables
metadata
Estimated confidence
boundaries
Paralelism
and multistep data
processing is
key
Next step
Scheduling API
The task chooses the tool
Reporting requirements are totally something else
Reports list
Add/ Copy /
Edit /Delete
Report
Change report
dimensions
Define
dimensions
Change
metrics
Show chart /
report status UI
widget
Change
filter
Define fact file columns
Schedule
(re)build
Run joins and
filters
Run cube
aggregation
Here’ it’s pre-aggregated Cubes
vs Live Query
The task dictates what tools to use
Wanna do search on live data?
Problem <-> Architecture fit is key
IMHO:
• Information wants to be free. Data wants to be joined…
• All data will still end up in one place. We bet 3 years ago it
will be in Hadoop, and still think it is true.
• Don’t forget why SQL was invented – because life was very miserable
before it with Foxpro file drivers …
• Hadoop brings the luxury to parse same data in map-reduce, search, SQL
or machine learning approaches.
• Optimized and specialized kits are right for extreme use cases, for most –
it is the agility to rapidly solve many varied problems.
What will happen to
traditional EDW?
asked:
“There’sWill
no relationship
between the EDW and
Hadoop replace
Hadoopor
right
now —
they are going to be
expand
DW?
complementary. It’s NOT about rip and replace:
we’re not going to get rid of RDBMS or MPP, but
instead use the right tool for right job — and that
will very much be driven by price.”
Alsdair Anderson,
HSBC Head ot IT infrastructure, Hadoop Summit 2014
Lesson #4:
Building the right Team
Broad roles vs Experts
What is the right approach?
Big Data team requires broad roles:
• Many Big Data tasks are complex so communication overhead quickly
kills productivity.
• Typical LT “IT guru” is not a true generalist – right attitude to learn as fast
as possible “not my language” is more important than skill.
• Understanding the problem, picking any right tool and implementing –
such team members are priceless.
Lesson #5:
Best Big Data – invisible Big Data
Thinking beyond BI
Intelligence everywhere?
Make your Big Data analytics
disappear – just magic!
We’re here to help you with your Big Data
challenges:
Understanding,
Predicting
and taking Action.
Hiring:
Sarunas Chomentauskas
CEO
[email protected]
tel +37 068506502
Big Data Engineer ready to try
new things.
Front-end/JS developer with a
sense of beauty.
Exacaster UAB
Rugiu st2, LT-08418 Vilnius
Lithuania, EU
×

Report this document