Category Archives: Analytics

Data privacy – is the juice worth the squeeze?

I love You Tube, Google, Twitter and for years have felt if they want to track my 200 episode obsession with Turkish period dramas or cat video likes then so be it. I’m not doing anything wrong so why would I care how it impacts my privacy? I then came across this quote in Oliver Stone’s Snowden movie and thought it was time to look into it further.

Saying you don’t care about privacy because you have nothing to hide is like saying you don’t care about freedom of speech because you don’t have anything to say. 

Edward Snowden

So what are they doing with your data?

An outraged father stormed into a well known US store to speak to the manager because the marketing team had sent his school age daughter discount vouchers for baby clothes and cribs. The store apologised profusely and said they would look into. A few days later the father called back to apologise and explain that his daughter was indeed pregnant.

Targeting is one of the most common uses of big data. The marketing department that so offended the pregnant girl’s father probably used a process like this:

  1. Segment – They purchased a list of new mothers or asked some to come forward as part of a survey. Next they found who on that list also had a store loyalty card or used a payment card
  2. Profile – Using payment or loyalty card data they could draw up a list of common product combinations these women had purchased while pregnant eg unscented lotions, folic acid, handbags that are big enough to hold nappies etc
  3. Engage – Looking at other customers who were buying those product combinations they generated a list of people who were probably pregnant and sent them Facebook adverts, coupons or email promotions
  4. Measure – Collected commission / bonus because of increased sales and boasted how good their predictive models were

Google, Facebook and many others hold vast stores of data about huge numbers of people which can be used to target you on the off chance that you might want to buy a washing machine 3 weeks after you searched for one online and then purchased in store. Some people find that creepy I find it clumsy but if they want to use my data for that broadly speaking I am not that bothered.

Can you trust large corporations to look after your data?

Half my life’s photos are on Facebook, when I needed to prove to my relationship status to the Australian government for visa purposes I used my Facebook timeline which showed over 5 years of dating with timestamps, places and photos. That is useful data to me, Facebook store it and make it easy for me to share. In return they know where I go out, who I hang out with, where I live, likes, dislikes, opinions on political issues, products I buy second hand on market place.

All of that sounded like a good idea when I first started using the site but since the Cambridge Analytica scandal, Equifax data breach and Sony hack there are some companies that I don’t trust anymore and I would like my data back please, it is the law after all. Great thank you, how do I know it is all there and can I upload it to a similar company easily. Unfortunately that bit is not so easy.

I would like to see a situation where when I hand my data over to a company they sign a list of my terms and conditions rather than the endless, unread end user licence agreements (EULAs) I click away to when I sign up to a new free service.

Tim Berniers-Lee inventor of the World Wide Web has recognised this and has developed an open source specification called Solid that enables people to take back control of their data and privacy. It is only accessible to app developers at the moment but he has started a company called Inrupt to help organisations work with personal data in a way that benefits both parties with ultimate ownership of the data residing with the individual.

Broadly speaking the idea is to create a massive decentralised database where people store their data in a standardised format wherever they want. In my Facebook example I would upload a picture to my timeline but it would be stored where I tell them to store it and I would give them a key to access it. If I stopped trusting them I would change the locks and give the keys to another platform. The NHS, BBC, Natwest Bank and the Flanders government are early adopters of this specification. It remains to be seen whether it will catch on.

How can you make them give your data back?

The fact that you want to buy a sofa, TV or a chocolate bar is a valuable piece of information to the people who sell those things not because of the value of your sale but because of the future sales these companies will make due to a deeper understanding of their customers. It is possible that you could share that information and have companies fight over your sale in the form of discounts or benefits in kind on condition that you can have your data back if you want to at any point. Companies like Invisibly started by Jim McKelvey (Co-founder of Square) are experimenting with this at the moment.

The likes of Google, You Tube and Facebook have shown how valuable our data is to them by the sheer quality and scale of the ‘free’ products they offer us to harvest that information. The internet is now bubbling with decentralised apps ready to leverage better ways of sharing our data by building trust between individuals and organisations on a more level playing field.


The same data used to predict the likelihood of a person getting cancer can be used by health professionals to provide better proactive care or by an unscrupulous health insurance companies to suspend health cover before they become liable to pay for it.

To opt out of sharing health data, loyalty or bank cards because there may be a bad actor out there is to ignore the main issue which is we need more robust data privacy protections if we want to live in a modern world and take advantage of all that involves.

It will be hard but the juice of organisations striving to be trusted by their customers is worth the squeeze of setting up an infrastructure that enables customers to take away their data from negligent, corrupt or greedy organisations. However without an active body of individuals and government officials striving to guide companies that infrastructure will never materialise.


How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did ( – Tim Berniers Lee about Inrupt – turning the web right side up.

Home · Solid (

A new era of innovation and trust in data | Inrupt

Are you making data driven mistakes?

The age of big data means we are all being encouraged to tell stories with data and compile numbers to show we are data driven. Clever dashboards and visualisations show us needles that need to be pushed, numbers that need to go green and curves that need to continually trend up. Much of this means we don’t need to think about why the numbers must go green – but go green they must.

I still feel a bit like a deer in headlights when I am asked to comment on data for the first time in meetings and haven’t had time to think about it. It feels a lot like brainstorming where I feel compelled to say either vague things confidently or stupid things because I haven’t had time to think them through. In this article I will look at some common data driven mistakes followed by some questions I use to sniff out potential problems that will lead to bad decisions.

Using only quantitative data to make your decisions

Robert McNamara was the US secretary of defense during the Vietnam war and modeled what success would look like based only on quantitative data. His plan sounds spookily familiar, he created clear objectives and achievable goals in the form of metrics so success could be reported on and easily understood by people not working in the war department. McNamara said that all the important quantitative measures indicated that they were winning the war despite what his generals were telling him. Daniel Yankelovich summarised the quantitative fallacy in 1972 like this:

The first step is to measure whatever can be easily measured. This is OK as far as it goes. The second step is to disregard that which can’t be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can’t be measured easily really isn’t important. This is blindness. The fourth step is to say that what can’t be easily measured really doesn’t exist. This is suicide.

McNamara had put too much faith in the data and had not factored in many variables that were hard or impossible to measure like the resilience of a soldier fighting for his home rather than his president.

This happens all the time when too much blind faith is put into a system, metric or some new technology. McNamara also believed that learning technologies could be used to make people smarter and as a result lowered the IQ requirement for the draft to 80; a policy subtly alluded to in the movie Forest Gump.

None of the data driven decisions I make will ever have the impact of these mistakes, but I strive to learn from history.

Looking for insight within the data you have rather than the data you need

Like the old joke where a drunk man is looking for his keys under a lamppost rather than where he lost them, in corporate training it is common to use happy sheets, NPS and attendance rates to measure the effectiveness of a program. We know we should be measuring knowledge transfer and performance improvement after the training, but it is hard to measure so often we don’t. After all, by looking under the lamppost the drunk was at least eliminating that spot as the place where he had lost his keys!

Using data like a drunk uses a lamp post – for support rather than illumination

I recently spoke to a training manager who had just done a successful presentation about return on investment for a new training system he had implemented. He used a data driven forecast of how many people would be using his training system next year based on uptake this year. He had proven his point and everybody loved the visualisations. However, when he looked at the numbers again over a longer timeframe, he was quite surprised to see he would have double the population of Australia using his corporate training system within 5 years which seemed unlikely.

How to talk back to the data

Darrell Huff in his 1954 book ‘How to lie with statistics’ outlines five questions you should ask about data which are still relevant today. It is easy to do this in retrospect and a gross simplification of a complex subject, but these are great questions to ask when presented with any data.

Going back to the Vietnam example of a data driven decision I am going to apply the five questions to it:

All quantitative metrics indicate we are winning the war and therefore we should continue until we are victorious.  

  1. Who says so? The US secretary of defense. Ok there is a chance of bias
  • How do they know? They are using quantitative data, metrics like body count, boots on the ground and comparing them to the enemy

  • What’s missing? The US had never fought a conflict like this, qualitative data from his generals
  • Did somebody change the subject? A fact has been stated and a conclusion made but is the conclusion related to the fact? Do those metrics indicate that we are winning the war?
  • Does it make sense? We have been winning this war for nineteen years, how come we haven’t won it yet

H G Wells said that ‘One day statistical thinking will be as necessary for efficient citizenship as the ability to read and write’, almost one hundred years later I think that day has come, nice one Bertie!

At twelve years old I got two percent in my maths exam, which you get for putting both your first and last name on the paper. With a huge amount of work, I went on to do a degree in accountancy and statistics, but number analysis still doesn’t come to me easily. This article is the first of a series that will explore ways to look at data in a practical fun way that I hope will help you (and me) use data better.