Artificial Intelligence: The Future is Data Capture, Not Machine Learning
A 2021 report from KPMG shows artificial intelligence (AI) is progressing much faster than anticipated, with skyrocketing adoption driven partly by the Covid-19 pandemic. Researchers at Oxford University in
England estimate that by 2024, AI will be better than humans at translation, will write bestselling books by 2049 and will perform surgeries by 2053. Machine learning (ML), the proficiency of a machine to mimic human ability to accumulate knowledge and use it to drive insights, is generally considered the basis of AI.
AI’s Dependence on Data
Although AI might depend on its machine learning abilities, we need to take a step back and realize ML doesn’t happen in a vacuum. ML is driven by big data, without which it can’t take place. Effectively, therefore, AI depends completely on the amount of data we can capture and the methods we use to process and manage it. For this reason, I believe we need to pay more attention to data capture, transport, processing, and storage if we want to realize the promise of AI in the future.
The Importance of Data Capture
Capturing data is essential, whether it’s for software-based AI applications, smart robots based on AI, or machine learning. When AI products were initially designed, developers spent huge research and development resources collecting human behavioral data, both on the industry side and the consumer side. In healthcare, many smart applications offer predictive analysis for prognoses and treatments. While these programs are becoming progressively smarter, they could be made even more accurate by applying increased intelligence gathered from human data.
User data is critical for developing technologies with higher intelligence, whether these are software systems, hardware devices, IoT devices, or home automation equipment. However, one of the most difficult aspects of capturing data in edge environments is transmitting it securely to a data center because of the threat of ransomware attacks or viruses.
With Data, More IS
More
Projections from Statista indicate that by the end of 2025, the world will potentially generate 181 zettabytes of data, an increase of 129% over 2021’s 79 zettabytes. This applies particularly in medical science, where various organizations collect massive amounts of data.
For example, data from the first Covid-19 vaccines administered helped to determine the accuracy of doses for all age groups. Similarly, we need more data to achieve greater accuracy and more effective devices, whether for software, robotics, or anything else.
We also need more data from real edges, whether these are static or moving, and regardless of how remote their location, to be able to run timely AI and ML applications.
The future of AI will depend on capturing more data through real-time applications from edges such as a gas pipeline, a submarine in the ocean, a defense front, healthcare, IoT devices, satellites, or rockets in space.
The Challenges of Managing Data
To optimize AI for the future, we also need high-performance systems. These could be storage or cloud-based systems, processed by modern, data-hungry applications. The more data you feed these applications, the faster they can run their algorithms and deliver insights, whether these are for micro strategy tools or business intelligence tools. This is usually called data mining, and, in the past, we did it by putting the data into a warehouse and then running applications to process it.
However, these methods are rife with challenges. Data-generating devices are now continuously churning out ever-growing amounts of information. Whether the source is autonomous vehicles or healthcare, and whether the platform is a drone or edge device, everything is capable of generating larger amounts of data than before. Until now, the data management industry has not been able to capture these quantities, either through networks, 5G, cloud, or any other storage method.
These circumstances have led to 90% of data gathered being dropped because of inadequate storage capacity and the inability to process it quickly and deliver it to a data center. The outcomes also apply to critical data captured at remote sites that have no connectivity or cloud applications running at the edge.
Forward to the Future
The more data we have, the better AI performs. The more information we can gather in real-time from real users on the ground, the smarter we can make our AI devices. The more we can make AI applicable to the use cases, the more human we can make the connection, and the better we can solve the users' problems. To date, much of the big data we generate goes unused, primarily because organizations cannot capture, transport, and analyze it fast enough to create real-time insights. It’s essential for us to develop ways to resolve these challenges, to enable us to enjoy the advantages of putting AI to work for humanity.