Getting the most from data analytics projects: How cities can handle the data challenge
By Chip George
Oftentimes CIOs of municipal and county governments are overwhelmed by the groundbreaking potential of big data within their cities. Projects running the gamut from traffic light sensors, to snow plow distribution, to identifying trends in infant mortality unlock valuable information to help local governments make better governing decisions and determine budgetary priorities. These CIOs are intrigued at the thought of big data analytics, but are often unsure how to go about implementing a solution within their jurisdictions. Even more limiting is their lack of knowledge regarding all of the components necessary to create a viable data analytics solution.
Gathering the data is only the beginning. There are many types of data – structured data, which can reside in clear fields within a database; unstructured data, which is information that doesn’t fit into a traditional database format; or semi-structured, which has components of both. Unstructured and semi-structured data, such as video or social media, are where the data explosion is really taking place. Video is being captured at an incredible rate from video surveillance cameras positioned on buildings, to red-light cameras, to body cameras on law enforcement officers. Cities need to be able to store and provide access to this data at high speeds in order to turn that data into actionable information.
The need for a carefully planned storage solution for a data analytics system can often be overlooked, although storage can be the most prohibitive obstacle in harnessing the power of data analytics. As the application of data grows, so too will the volume collected. Municipalities will have to find ways to handle the massive amounts of data that needs to be stored and accessed.
There are three methods most often employed related to data storage: flash storage; traditional mechanical disk storage; and cloud storage. Cities can collect all of the data in the world, but if they don’t employ smart strategies to enable the storage and access to the data, a data analytics programs will never get off the ground. When it comes to a successful solution, the data can’t just stand alone. CIOs need to make sure they are using the right storage solution for the job.
When it comes to data storage, flash storage is synonymous with speed and performance. Solid state flash drives have no moving parts; information is stored in microchips. This lack of moving parts is what makes solid state drives so much faster than hard disk drives. However, this speed comes with a higher cost. When local governments are analyzing their storage needs, they need to turn a critical eye to determining which data needs to be accessed most frequently by the most people. Solid state drives are more costly upfront, but they produce results much faster, cutting costs in the long run. Mission critical applications that are sensitive to performance should be deployed using flash technology. Some storage vendors today incorporate flash as a caching tier to make workloads on traditional mechanical disk storage faster.
Traditional Mechanical Disk Storage
Considered by many to be the faithful work horse of the storage landscape, hard disk drives are a cost-effective way to handle routine data for many workloads. While slower than solid state drives, hard disk drives have a greater capacity than their solid state drive counterparts, which means that they can store far more information. Assigning data that is not in active use, or whose use is not reliant on speed and accessibility from multiple sources, is a good use of hard disk drives in any data analytics system. Some vendors today have technologies to make this storage as efficient as possible using technologies like compression and deduplication. Virtual copies, or snapshots, can also simplify data protection and make backup and recovery very fast.
While most state and local governments will prefer to keep their most sensitive data on-premises, unstructured and semi-structured data is especially well suited to cloud storage, with significant advantages:
- Cost reduction – Data analytics can require a significant amount of computing resources to analyze and process large volumes of data. Using the cloud reduces costs to the city, particularly in a pay-per-use, utility pricing model.
- Reduced overhead – By taking advantage of cloud technologies, IT teams can reallocate dollars normally spent on physical hardware. Where many private on premises data centers are at capacity, using cloud can be a strategy to extend the life of an existing data center.
- Provisioning and scaling – Utilizing the cloud provides the ability to easily scale as required by the amount and type of data being processed. Moving to a hybrid cloud model allows the ability to store data in a private data center, while taking advantage of public compute resources. This model enables institutions to maintain complete sovereign control over their data, while taking advantage of public cloud compute resources. This helps institutions to avoid cloud lock in and maintain a strong negotiation posture with their cloud vendor.
Data analytics enables cities, municipalities and local governments to harness the power of big data to help inform their governing and budgetary decisions. For most public sector entities, the right storage tool to support their analytics effort is a combination of all three. Taking the time to do a thorough analysis of data collected and need for access allows IT teams to determine the right combination of storage to help them achieve their mission. Providing the proper support on the storage side can help smooth the way for local governments to be powered by big data.
Chip George is the senior director of state and local government and education (SLED) for NetApp. In this role, Chip is responsible for the field alignment of over 100 sales, engineering, channel, program capture and marketing team members dedicated to the creation and execution of NetApp’s SLED strategy. Chip and his team serve the unique data storage and data management needs of state government, local government, higher education, K-12 and teaching hospitals.