Skip to main content

Blog Post

How To Deal With Data Sprawl In Your Organization

by Bill Shaouy
March 8, 2018

From healthcare to employee intranets, to financial services and non-profits, one of the most common challenges a growing organization faces is data sprawl. This is the phenomenon where data that should be located in one place sprawls to multiple applications and documents. For example, an organization may write part of its customer data (name, address, etc.) in a sales database application, part in an accounting spreadsheet, and part a project dossier document. Worse, it is not uncommon that the same field (e.g. address) appears in more than one of these places.

fields entered redundantly in a sales database

What are the dangers of data sprawl?

This redundancy leads to inefficiencies and sometimes even human error. Multiply all this by the typically large number of data fields an organization can accumulate for a customer, among many other entities, and the organization has a big challenge on its hands.

General Data Protection Regulation (GDPR) compliance is another critical consideration. GDPR is a broad set of rights to protect the data of EU citizens. It launches May 2018 with fines up to 4% of worldwide revenue for extreme violations. Research by Citrix cites data sprawl as one of the major obstacles to GDPR compliance.

How can we manage these risks?

The following steps can be taken to mitigate data sprawl. It's a good idea to set up this effort as a project in a project tracking system like JIRA, Trello, Asana, etc.

Step 1. Create a document listing all data items currently being captured across all applications, spreadsheets, and documents. A spreadsheet works well but is not a must.

a spreadsheet mitigates data sprawl

Step 2. Add a column to the document from step 1 that lists the multiple possible names for any data item. For example, a data item labeled "Address" in one app can be labeled "Addr." in another.

Step 3. Work with your organization's stakeholders to prioritize the highest-value data items to simplify. Simplification can mean assigning only one writer of the data item or even deleting the data item. The team will further account for which stakeholders benefit from each.

Then, for each high-priority data item, determine whether there is more than one app/sheet/doc that the data item is manually written to. If so:

Step 4.1 Determine with stakeholders which app/sheet/doc should be the only one that data item should be written to. The rest should only read from that source.

Step 4.2 Work with stakeholders to create the action items that will get the data item to the desired state – one app/sheet/doc that gets the manual write, the rest automatically reading from that source.

Step 4.3 Determine whether the data item has a different name across apps/sheets/docs. If so, determine with stakeholders what the official name should be, and the action items to modify the apps/sheets/docs to use the official name.  Further, define naming conventions with stakeholders, for example, "always use 'University', not 'U' nor 'Univ'". Place the results in an official naming conventions reference document.

Step 4.4 Create a project ticket, including:

  • Tasks as defined in the previous steps
  • Acceptance criteria
  • The final arbiter, i.e. the stakeholder who is most impacted by the change

Step 4.5 Execute the project ticket

Step 4.6 With stakeholders, perform an impact assessment on the change. Confirm that automatic data reads working properly and stakeholders properly notified of any name changes.

Step 4.7 If this is the first iteration, perform a retrospective with stakeholders and tweak the process if need be.

Step 4.8 Go back to Step 4.1 and repeat for every data item.

The above steps will go a long way to getting better control over your data. Keep in mind that while a data item can be redundantly written between documents and applications, it can sometimes be redundantly written within a document or application as well.

If you’d like to learn more about managing data sprawl in your organization, please contact us. At Mediacurrent, we've invested in Data and Business Analytics personnel and processes to make digital transformation a reality for our clients. We’re happy to chat with you!

Headshot

Meet team member, Bill Shaouy

Bill is a senior technical professional who has been working with Drupal for over ten years. He has innovated client-centered, Drupal-based solutions for non-profit and for-profit organizations, placing a premium on fostering lasting relationships with clients and teammates in equal measure.

Bill was first introduced to Drupal in 2007, when he served as I/T Architect and Development Lead for the DC Comics Zuda web content management system. Since then, Bill has led Drupal projects in both the profit and nonprofit space, for such organizations as the State of Georgia, Jane Addams Hull House, the Mohonk Nature Preserve, the New York Hall of Science, Mentorplace, the World Bank, and most recently, the Atlanta Falcons. He also served as one of the original board members of the Atlanta Drupal User’s Group, giving multiple presentations there and contributing to the running of the early Atlanta Drupalcamps, and was also the founder and long time chairman of the IBM Drupal User's Group. Bill has also conducted many technology roadmap consulting engagements over the past several years, many from the nonprofit space. These engagements helped clients form a long term technology strategy based on current and projected requirements. 

When he’s not working, Bill is an active musician and songwriter, with two albums of his own and several more with him accompanying on piano.

Learn more about Bill >

Related Insights