Managing ethics in the age of big data

The Big Data revolution has raised a myriad of ethical issues related to privacy, confidentiality, transparency and identity. Who owns all that data that you’re analyzing? Are there limits to what kinds of inferences you can make, or what decisions can be made about people based on those inferences?

Navigating the fast-paced world of data isn’t easy. So it’s important that we put a good framework in place because the consequences of a slip-up can be severe.

Andrew Fryer, a Technical Evangelist at Microsoft, talks you through some common ethical scenarios. He discusses good and bad usages of data, teaching you how to tell the difference. Finally, he gives you three things you should be doing to manage data ethically in your company.

Watch the video of him explaining it (from Extract London 2015) and then read the article for more in depth insights.

Want more great stories like this one? Check out Extract!

Just because you can, doesn’t mean you should

Ethics. What does it really mean? How do we define it in terms of the web and data?

We could define it in terms of the law, but I think it’s fair to say that the law in the UK (or indeed any country) generally lags behind technology.

The other issue, in terms of basing our ethical actions on the law, is the concept of case law. When a law is set, say a dangerous dog act, we then have to have a conversation about what a dangerous dog is, which gets tested in court and a precedent is set. And that means our laws are even further behind where they need to be than when they’re passed in the first place.

Which means that if we want to act ethically, we can’t rely purely on the law to guide our actions.

Ethical dilemmas in data

Historically, ethical problems with data arise when we start mashing data together.

Let’s say that a robot makes the brakes in my car. And then the car has an accident in which the brakes are shown to be at fault. Because we kept the data from the robot, we can go back and work out which robot made the faulty brakes and do a selective recall on those cars. We can react quickly, get the proper brakes fitted and hopefully reclaim our brand reputation. And because we’re only doing a selective recall we save ourselves a lot of money.

I think we can all agree that that sounds like a clever and appropriate use of data. But if we take the same scenario and tweak it slightly it becomes less good.

Let’s say a typical car’s brake discs last for 20,000 miles. Then one day the brake manufacturers discover that some cars need to have their brakes replaced after 10,000 miles. They sell that data on to an insurance company, who thinks that maybe the reason for the early brake change is that these people are bad drivers. So the insurance company decides to raise the premiums on anyone who’s brakes don’t last 20,000 miles. Not good.

We could spend ages debating who is at fault or whether or not a law was technically broken, but I think we can all agree that the practice is unethical.

Then of course there is stuff that’s downright illegal. Take store cards for example. You could theorize that people who buy the same stuff like each other, and we could create a dating site off the back of that. It could ping you when you buy Jalfrezi that there is a girl in the next aisle and she likes it too.

That would be illegal in the UK because the data on those store cards was collected for a specific purpose that those people signed up for and not a dating app.

Ethical risks

Unfortunately, users are accident prone. They’ll sign up for any old service or app because it’s useful to them. But they won’t read the small print until one day they are horrified that their social media site has sold their private holiday photos to a travel agent who is using them to promote a vacation package. That type of thing tends to make people very angry.

As a company using data, you need to ask yourself: What would happen if my analytics got out in the wild?

Here are two stories from UK department store Marks and Spencers (M&S) – both of which were in the press – which illustrate the good and bad uses of data.

Good use of data

When you go to the store to buy a t-shirt you want them to have the right size. But how do they know how many of each size to carry? M&S created a bell curve of the most popular sizes in stores across the country, and it turned out that in Liverpool they were selling stuff on average two sizes bigger than in London. When the press got a hold of this information, which M&S shared with them, they ran with: “Marks and Spencer thinks people in Liverpool are fat”.

In this scenario, M&S didn’t do anything wrong, other than maybe have a bad PR team. They didn’t break and laws and they were genuinely doing this data analysis to help their customers get the right item when they went to the store.

Bad use of data

Every Christmas you get an email from your M&S loyalty card telling you your wife’s size. That scenario is fraught with danger in more ways than one. And what irks me most about this scheme is that M&S is doing this out of their own self-interest. It’s not really helping the customer in any way, it’s just trying to encourage them to buy more stuff.

Managing ethics

The first thing to do is understand the potential risks yourself. Think critically about what it is your business is doing with data and ask yourself who benefits from it? Is it you? Or is it your customers? Or both?

The second step is to educate your users so that they understand the risks of giving away their data in the first place. It’s important to have conversations with your users about this stuff. Some of it will be obvious, but other stuff might be a little scary for them and you as a company have to be able to assure users that you won’t use their data in ways they aren’t comfortable with.

Finally, develop a code of conduct for your business and teach it to all of your employees. At Microsoft, we have mandatory training every year on how to access data and how to mash it together. If you’ve got the technology you can use to protect your data, use it and use it well.

About the author

Andrew Fryer is the Technical Evangelist at Microsoft in the UK. He specializes in Data Management and the private cloud.


What is Extract?

Extract is one full day jam-packed with data stories that will entertain, educate and inspire you. It’s everything you’ve ever wanted to know about data, told by the people who know it best. Our speakers hail from some of the most successful and innovative companies in the business. You’ll hear data-driven talks on everything from beating the competition to creating the next unicorn. And our workshops will showcase the best of the best in data tooling. You’ll get an exclusive look at some of the latest technologies and pick up first-hand tips on implementing new strategies.

Extract data from almost any website


INSTANT ACCESS