# The Corporate view on Data



Source

# Where to start?

I often hear the term Data driven Company thrown around like there would be no tomorrow.
But what exactly should this mean?

Data-driven decision making (DDDM) is the process of making organizational decisions based on actual Data rather than intuition or observation alone.
North Eastern University

OK, so the Term Data driven Company is the short form of Data driven Decision-making.
To make a decision we first need a question, right?

As it turns out In this Article from Forbes Magazine I found that this problem is not new.

as early as the 1960s, in her landmark study of the Pearl Harbor attack, Roberta Wohlstetter argued that the Japanese attack succeeded because of an overabundance of Data: “At the time of Pearl Harbor the circumstances of collection in the sense of access to a huge variety of data were…close to ideal.” Problems arose, not from too little information, but from too much, and from the inability to glean useful “information” from mere “data.”

The more Data a company has, the lower the SNI Signal to Noise Ratio becomes.
Therefore, the Company has to do more processing to get a good “signal” out of it.

# Data Definition.


The North eastern University point out how to be a DDDM Company.

  1. Know your Mission
  2. Identify Data sources
  3. clean and organize data
  4. Perform statistical analysis
  5. draw conclusions.

This sounds reasonable, we already found that most company's struggle with step 1, but I'm also disturbed by the 4. Point, perform statistical analysis.

Why would you do that? What conclusion should I draw from a statistical analysis of Fluffy the cat's Body Temperature?

Now I wonder if I really have the same understanding of the term Data as whoever wrote these smart Articles?
What if there are 2 ways of thinking about data?

# Technical Data


These Data is mostly used by technical personnel performing Automation, Controlling, or Auditing tasks. Errors in this kind of data have direct or indirect consequences.
They are Expensive, so should be avoided at all costs.

# Economic Data


These Data is used by so-called Data Engineers, Data scientists or Data Analysts.
These are approximations derived from Technical data through Statistical analysis, so some errors are not the end of the world.

Let's keep this in mind and look at the list again.
Step 2. Would be to identify the Data-source.

# Assessment


It is essential to make an assessment on what data does a Company have.
Similar to the asset management that most companies are already familiar with.
The Company first need to have a knowledge of what data is stored where.
Who's the owner, responsible, and what liabilities this data has attached to it?

Asset management is a systematic approach to the governance and realization of value from the things that a group or entity is responsible for, over their whole life cycles.
Wikipedia

I need to do more research on that part!

# Clean and Organize Data


The 3. Step would be to clean and organize the data.
The Question is doing it up front at time of creation of the data or at the time of use?
For me, this depends on the cost and fissility of even doing it late.
The word cleanup and transform have the same meaning in this chapter.

# ETL vs. ELT


In the Industry there is this abbreviation ETL or ELT which stands for
E - Extract, T - Transform, L - Load.

# ETL


Means: do the cleanup at time of Creation.
This has some pro's and con's to it:

# Pro

  • Data quality is high.
  • Data Processing is steady and predictable.
# Con
  • Cost is higher as data that may never be used.
  • Costly Storage requirements.

# ELT


Means: do the cleanup at time of use.
This has some pro's and con's to it as well:

# Pro

  • Cost for storage can be lowered.
  • Raw data is reserved, more flexibility.
# Con

  • It may not be possible to clean it up because correlation data is missing.

# Standardize, Serialized and Organize


At the time, we transformed the data into a clean and factual state, we need a standardized form to store it in, an Serialized Format.

What does that mean?

# Standardized


# Serialized


Each process in a company starts, or end, with a document.
But if it’s not machine-readable and interchangeable i.e. Serialized.

It is dark data.

# Documents

Documents can be either:

  • unstructured = paper or proprietary
  • Semi structure = OCR scanned or PDF…
  • Structured = Serialized.

Source

# Organized


This would be your Buzzword Single Source of Truth
But Data has also some Metadata attached to it, where does that data goes?
Glad you ask! 😳.

# Next Up


We will have a look at the Data Landscape of an Organization.
Dive deeper into the differences of Economical vs. Technical Data.