Getting To Know Your Data?
Organisation crown jewels
Data is now the lifeblood of modern society. The rapid development of business technology is a revolution that facilitates the processing and analysis of more data, in richer and more complex forms. The outcomes for commercial organisations and Government Departments from this work inform medium and longer-term decisions about policy and services.
Every data set has a rating, and everybody has a guide on how its should be managed from the security perspective
At the heart of this revolution is the collation of huge amounts of data – commercial, personal, business, intellectual – the crown jewels of the organisation. There are massive benefits from this to organisations and societies but also dangers. For a host of reasons including processing capability, efficiency, management, access and of course protection, data tends to get centralised in internally managed infrastructure and systems or externally hosted environments that do give legal and security teams some very difficult headaches. Big data equals big juicy target. Lots of data equals identification and management issues. Many different data sets equal different content, different legal and regulatory implications. So how can organisations deal with these issues in a way that drives a consistent level of data protection?
The data value issue
The Office for National Statistics is a huge consumer of data. We obtain open source, commercial, market sensitive and personal information. We use these sources to produce official statistics are used to shape Government and local services across decades. We also want to explore new data sources and produce the more detailed, granular statistics and analysis, including exploratory work for statistics such as migration, trade, education, and enhanced financial accounts. The range of data is enormous. It also poses challenges. Is open source, anonymised licensed data that same as geolocation mobile phone data? Is private taxpayer data the same as property data available freely on the Internet? Is there a difference in how these are processed separately or together?
As a Government department, our data sets have designated owners - the information Asset Owner – who is a real person and is responsible for specific data sets. It is this owner who determines the appropriate use of the data. Historically this use has been reviewed in a ‘silo’ way where the data is generally used in isolation from other data sets or linked with other data using highly controlled methods. That is old world though. The new world sees ONS undergoing significant change within its statistical environment to meet the challenges of new data sources, technologies, statistical production and the Digital Economy Act. At its core, this results in a single, organisation-wide platform that contains multiple datasets that can be linked and matched for statistical purposes. The silo approach to assess data set content and determine its business use is no longer valid in this new world. ONS needed a more tangible that enabled improved understanding of the actual content of a data set and how it can be linked and matched with other data sets. Importantly, this new approach also catered for the perspectives of security and regulatory issues, including supplier agreements on the access and use of data they supply. Typical questions arising from this were:
• Where is the data?
• Who can access it?
• Can it be shared?
• How should it be managed?
• What compliance is needed?
These are all issues that affect any business dealing with data. ONS has found a way to deal with it using a simple but very powerful technique.
A model for data sensitivity?
If a magic wand were to be waved then when an organisation obtained data it would know a lot about it, particularly what its content was, how sensitive it was, how it could be used and what controls were needed to protect it. It so happens that these are burning issues with modern data protection legislation and the General Data Protection Regulation.
Over a period of several months, ONS built a model. For this model to work it had to deal with a huge variety of data sets, which had to apply organisation-wide, and it certainly needed to be consistent. It also needed to recognise that the data has a ‘value’ based on its content that should provide an indication of its protection requirements. Finally, it had to be a repeatable method that could stand the test of time and data evolves. As a shopping list, it was a pretty big one.
What emerged was the Data Sensitivity Model. This utilises two key concepts: descriptive criteria that show the makeup of the data through seven lenses; characteristics that describe the range within the lens from effectively not risky to risky.
The descriptive criteria and characteristics are summarised in the following table.
In the ONS world, the owner is responsible for a sensitivity assessment of their data set with support from security and data specialists. This is underpinned by a tool that enables the selection of characteristics for each descriptor to generate a score that is translated into a simple Red, Amber or Green rating. Subsequent management and use of the data set is then based on its rating.
Low sensitivity (green). This applies to open source and non-disclosive data that can be shared across ONS for statistical research purposes. Data agreements typically associated with this allows full access by ONS employees with an approved business need;
Medium sensitivity (Amber). This applies to data that is commercially sensitive, market sensitive or contains attributes that could be used to identify sensitive information relating to individuals or groups of individuals. Data agreements typically associated with this allows for some access by ONS employees with an approved business need;
High sensitivity (red). This applies to data that contains significant aggregate information relating to individuals, groups or enterprises. Data agreements typically associated with provides conditions for access by ONS employees with an approved business need.
Running data through the model for three typical data sets within the Office generates the following outputs.
What have we got from this?
Consistency, consistency, consistency…. Every data set has a rating, and everybody has a guide on how it should be managed from the security perspective. Not only has this helped understand data set content, as each specific data set is assessed it helps understand those that can be combined for greater analysis or those which should have restrictions for sensitivity, security or data partner reasons.
The implementation of the model has:
• Enabled a measured judgement of the sensitivity of the content of a dataset;
• Improved data set information understanding and management within the IAO, security and business communities;
• Enabled consistent assessment of potential individual and aggregated data combining a range of sensitivities;
• Provided a basis for protecting data in relation to its content in a consistent
• Enabled the development of a data set ‘matrix’ highlighting data that can be aggregated and data that cannot.
For ONS it has been a real win-win. Better understanding, better control, better compliance.