A Data Science Central Community
“Data! Data! Data! I can’t make bricks without clay!” Data is crucial, as pointed out by Sherlock Holmes two hundred years ago. In today’s business world, data is more important and valuable than ever, thanks to the accessibility of big data analytics tools.
As companies continue to collect, store and analyze data, it grows in size - eventually becoming difficult and expensive to manage, especially for corporations where the overall enterprise data footprint is expected reach 40 zettabytes by 2020 (according to IDG). Specifically, overburdening data can slow the performance of applications and infrastructure, increase data management cost and complexity, extend the system outage windows and make governance and compliance difficult.
So how can organizations manage their data effectively? Information Lifecycle Management (ILM) is the idea of managing enterprise data throughout its whole lifecycle. An effective ILM strategy will move data smoothly and efficiently, usually through five phases in the ILM continuum: creation and receipt, distribution, use, maintenance and disposition.
But we know that the importance of data declines with age for most companies, as it gradually becomes less active. A winning ILM model partitions data based on its age and activity, into four primary tiers: the production tier, partition tier, database archive tier and Apache Hadoop Tier:
Production Tier: The production tier should contain active data less than a year old, reserved for the highest-performing and most expensive infrastructure to ensure speed and reliability.
Partition Tier: Semi-active data (one to three years old) should live in the partition tier. This tier indexes and sub divides data into ranges based on different parameters, excluding data from causing processing overhead.
Database Archive Tier: Inactive data (three to seven years old) should be archived. Properly archived data still retains native access, as well as the ability to de-archive into the production database.
Apache Hadoop Tier: Apache Hadoop can be used as cost-effective archiving solution for data over seven years old. Hadoop is able to deliver massive scalability and excellent workload performance, at nearly 55.5 times cheaper than tier one infrastructure (according to Monash Research).
Integrating Hadoop into the ILM strategy also lays the foundation for enterprise analytics, due to the framework’s powerful processing features and integration with other platforms.
Napoleon once said, “War is 90 percent information.” Every company that wants to survive in today’s highly competitive business world should find an effective and efficient way manage their data, insteading of keep it all on a single-tier database. It’s time that companies start implementing their own winning ILM models.