eIQ Insights: Analyzing Ashley Madison Leaked Data to Understand Big Data

, ,

Why you’re reading this article

Harvard Business Review calls it a “management revolution”. McKinsey released a whopping 156 page report touting it as “the next frontier for innovation, competition, and productivity.” Palantir, a startup that used it to help the US government track down Osama Bin Laden, is now one of the hottest companies in Silicon Valley valued at $20B based on their latest funding round. Forget Google, Facebook, and Twitter, bright college grads have already shifted their sights set on Palantir. Big data has become the new black.

The big data wave isn’t simply creating companies slated for multi-billion dollar IPOs and exits, it has also created new job opportunities. What used to be a boring number crunching chore is now called data science, which Harvard Business Review coined the “sexiest job of the 21st century”. Every cool startup now boasts a data science team led by some chief data scientist. As usual, digital agencies are jumping on the bandwagon with some of them creating new units that supposedly “bring together data sciences, social, new age content, and emerging marketing technology with sound business thinking to create a proposition that’s truly integrated.” Whatever that means.

“There’s gold in the streets, just waiting for someone to scoop it up.” – Walter White in Breaking Bad

Using big data to look at big data, Google shows that search volume for ‘big data’ follows a nice hockey-stick trajectory envied by many startups. It’s pretty clear – big data is big business.

The Real Reason why you’re reading this article

“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…” – Dan Ariely

Despite all the media hype about big data, the sad reality is that no one actually truly understands it. What’s causing all this misunderstanding and why are we long overdue for a paradigm shift in big data?

Mythbusters: Debunking Big Data myths

1. Big Data requires an expensive enterprise platform

The biggest lie in big data is that it’s complicated and tech-heavy. It’s also the primary reason why companies fail to adopt big data. We’re talking specifically about fear of technology and/or obsession with technology. The notion that big data requires a tech-savvy professional puts off a lot of people and prevents them from taking initial baby steps towards working with data in their organizations. On the other hand, there are those that focus too much on technology for technology’s sake. Big data platforms are a means to achieve business goals, not an end in itself.

Peter Thiel, the billionaire venture capitalist who founded Paypal and Palantir, argues against over-emphasizing technology. In his bestseller Zero to One: Notes on Startups, or How to Build the Future, Thiel says that “we’ve let ourselves become enchanted by big data only because we exoticize technology. We’re impressed with small feats accomplished by computers alone but we ignore big achievements from complementarity because the human contribution makes them less uncanny.”

Only until people see through this smokescreen of big data complexity created by what we’d like to call the “Big Data-Industrial Complex” – the sum of companies with three-letter acronym names that peddle big data technology products – we’ll be able to move on towards addressing the real challenges of big data.

Doing a simple search on Google for ‘big data’ shows how competitive this space is and how much money is at stake for the Big Data Industrial Complex.

This supplier-side bias is compounded by a consumer-side that’s often clueless about big data. CXOs in Fortune 500 companies insist on purchasing the latest big data platforms and technologies in order to ensure their “competitive advantage”. In reality, most of what these companies are trying to achieve can be done at a fraction of the technology and cost. The reason why people still go for the flashiest platforms is because of fear – “a fancy tool just gives the second-rater one more pillar to hide behind,” says Hugh MacLeod, blogger, cartoonist, and best-selling author.

2. Big Data needs a lot of data (Duh?)

The second myth in big data is that we need to have a lot of data in order to do “big data”.

“Today’s companies have an insatiable appetite for data, mistakenly believing that more data always creates more value. But big data is often dumb data,” says Peter Thiel.

The reality is that most companies don’t need that much data. If your company is not in the business of finding a cure for cancer or tracking down terrorists; there’s no need for mountains of data to properly sell your product.

The reason why people in mostly large companies end up obsessing over endless data is very simple: it’s because they’re afraid. Afraid of making decisions based on less than perfect data. Afraid of having to do actual work. Afraid of taking responsibility because they can hide behind the smokescreen. People fail to realize that the real value lies in the action that comes after analysing the data set, big or small.

“Companies brag about the size of their datasets the way fishermen brag about the size of their fish. They claim access to endless terabytes of information. The advantages seem obvious: the more you know, the better,” says Slater Victoroff in his brilliant TechCrunch article.

Not enough data Just enough data

Like the Lean movement that encourages companies and employees to take an “MVP” approach towards building businesses and products, big data is long overdue for an MVP revolution. You don’t need a lot, you simply need enough.

3. Big Data is the domain for data scientists

There are countless cases where companies invest millions of dollars into big data tech but still fail because they don’t have the right people in place to analyse and execute. As Thiel said, “Computers can find patterns that allude humans, but they don’t know how to compare patterns from different sources or how to interpret complex behaviors. Actionable insights can only come from a human analyst.”

According to McKinsey,

There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”

Data Scientists are not the solution

Hiring ‘sexy’ data scientists won’t fix the problem. According to Josh Attenberg and Foster Provost who teach the practical data science course at NYU Stern, “one of the complaints about the data scientists trained in computer science departments is that they’re “just technical”, understanding algorithms well, but lacking important skills in problem formulation, evaluation, and analysis generally. On the other hand, those trained in business schools tend to have underdeveloped technical skills.” Getting organizations up to speed on working with big data requires more than just hiring traditional data scientists or MBAs; instead, everyone needs to be able to work with data.

There have been positive changes though, especially in marketing. “The new job title of “growth hacker” is integrating itself into Silicon Valley’s culture, emphasizing that coding and technical chops are now an essential part of being a great marketer. The role of the VP of Marketing, long thought to be a non-technical role, is rapidly fading and in its place, a new breed of marketer/coder hybrids have emerged,” says Andrew Chen who popularized the term growth hacker.

Auren Hoffman, CEO of LiveRamp, shares on Quora: “The role of the chief marketing officer (CMO) is changing dramatically and is becoming “moneyballed” and very data oriented. Today’s Moneyballer CMO plans her marketing initiatives the way Billy Beane built the Oakland A’s. She leverages granular data on customer actions to expand beyond the traditional CMO role, influencing product strategy, customer service, and optimized sales pitches.”

Buzz words aside, a quick look at job postings for marketing positions at Facebook and Uber for example illustrates the transformation we’re going through. Uber’s growth marketers are expected to use tools like Tableau and understand languages like Python and SQL in addition to being able to process and analyze complex data sets. Where to find these folk? Graduates with majors in engineering, computer science, math, economics, or statistics. Meanwhile, traditional digital agencies are still stuck in 2005 and hiring communications majors for “performance marketing” roles (good luck with that).

Ashley Madison leak reveals if bigger is better…

To illustrate our point that a smaller data, people-focused, and lean approach can lead to useful insights, we’ve analyzed the leaked Ashley Madison data dump to answer the following four questions:

  1. Are Sagittarius men more likely to cheat?
  2. What are the most popular sexual kinks?
  3. Do sexual preferences change over time?
  4. What is the churn rate and LTV (lifetime value) of Ashley Madison users?

Tools and technologies used: MySQL, Python, PHP, Excel, Notepad++

Q1: Are Sagittarius men more likely to cheat?

“He’s the main cheater of the zodiac. He may espouse high morals, but these can loosen when he sees a pretty face or nice body. Tie your Saggi to the bedpost,” says one believer.

But is this really true? After running our SQL query, we get the results below. There’s obviously one outlier, Capricorn, caused by the default month and year settings in the (previous) Ashley Madison registration dropdown menu. After removing the Capricorn outlier, we see that contrary to popular belief, Saggies are not the biggest cheaters in the zodiac.

Q2: What are the most popular sexual kinks?

When signing up on Ashley Madison, users indicate their sexual preferences. We used a combination of SQL and Python to parse the preferences and map them out by gender.

Q3: Do sexual preferences change over time?

Yes, apparently they do. By mapping out sexual preferences by birth year, we found that the younger generation is more open to experimenting and one-night flings whereas older people enjoy cuddling and naughty talk.

Q4: What’s the churn rate and LTV (Lifetime Value) of Ashley Madison users?

As marketers, we’re naturally interested in measuring churn rate and LTV because these numbers can make or break a business. According to Andrew Chen, investors usually don’t fund dating startups because of the built-in (and typically high) churn rates as well as high customer acquisition costs (CAC) associated with the industry. Typical annual churn rates can go as high as 93%. Looking at the Ashley Madison data, we’re seeing churn rates of 80%.

Ashley Madison LTVs are roughly $400 USD. Their monthly cohorts show a jump in user quality starting October 2013. This could be due to new product initiatives such as pay for mobile access, business travel, pay to get noticed, and, ironically, pay to get your account fully removed.

A few parting words

Everyone can utilize big data as long as you emphasize people over platforms, processes, and politics and understand that small (data) can be beautiful if you know minimal SQL and/or Python. Don’t be afraid, learn the critical tools, and make big data your friend.