A Data Science Central Community
Continuum Analytics, H2O.ai and MapD Technologies Create Open Common Data Frameworks for GPU In-Memory Analytics
SAN JOSE, CA—May 8, 2017—Continuum Analytics, H2O.ai, and MapD Technologies have announced the formation of the GPU Open Analytics Initiative (GOAI) to create common data frameworks enabling developers and statistical researchers to accelerate data science on GPUs. GOAI will foster the development of a data science ecosystem on GPUs by allowing resident applications to interchange data seamlessly and efficiently. BlazingDB, Graphistry and Gunrock from UC Davis led by CUDA Fellow John Owens have joined the founding members to contribute their technical expertise.
The formation of the Initiative comes at a time when analytics and machine learning workloads are increasingly being migrated to GPUs. However, while individually powerful, these workloads have not been able to benefit from the power of end-to-end GPU computing. A common standard will enable intercommunication between the different data applications and speed up the entire workflow, removing latency and decreasing the complexity of data flows between core analytical applications.
At the GPU Technology Conference (GTC), NVIDIA’s annual GPU developers’ conference, the Initiative announced its first project: an open source GPU Data Frame with a corresponding Python API. The GPU Data Frame is a common API that enables efficient interchange of data between processes running on the GPU. End-to-end computation on the GPU avoids transfers back to the CPU or copying of in-memory data reducing compute time and cost for high-performance analytics common in artificial intelligence workloads.
Users of the MapD Core database can output the results of a SQL query into the GPU Data Frame, which then can be manipulated by the Continuum Analytics’ Anaconda NumPy-like Python API or used as input into the H2O suite of machine learning algorithms without additional data manipulation. In early internal tests, this approach exhibited order-of-magnitude improvements in processing times compared to passing the data between applications on a CPU.
“The data science and analytics communities are rapidly adopting GPU computing for machine learning and deep learning. However, CPU-based systems still handle tasks like subsetting and preprocessing training data, which creates a significant bottleneck,” said Todd Mostak, CEO and co-founder of MapD Technologies. “The GPU Data Frame makes it easy to run everything from ingestion to preprocessing to training and visualization directly on the GPU. This efficient data interchange will improve performance, encouraging development of ever more sophisticated GPU-based applications.”
“GPU Data Frame relies on the Anaconda platform as the foundational fabric that brings data science technologies together to take full advantage of GPU performance gains,” said Travis Oliphant, co-founder and chief data scientist of Continuum Analytics. “Using NVIDIA’s technology, Anaconda is mobilizing the Open Data Science movement by helping teams avoid the data transfer process between CPUs and GPUs and move nimbly toward their larger business goals. The key to producing this kind of innovation are great partners like H2O and MapD.”
“Truly diverse open source ecosystems are essential for adoption - we are excited to start GOAI for GPUs alongside leaders in data and analytics pipeline to help standardize data formats,” said Sri Ambati, CEO and co-founder of H2O.ai. “GOAI is a call for the community of data developers and researchers to join the movement to speed up analytics and GPU adoption in the enterprise.”
The GPU Open Analytics Initiative is actively welcoming participants who are committed to open source and to GPUs as a computing platform.
Details of the GPU Data Frame can be found at the Initiative’s Github link -
In conjunction with this announcement, MapD Technologies has announced the immediate open sourcing of the MapD Core database to foster open analytics on GPUs. Anaconda and H2O already have large open source communities, which can benefit from this project immediately and drive further development to accelerate the adoption of data science and analytics on GPUs.
About Anaconda Powered by Continuum Analytics
Anaconda is the leading Open Data Science platform powered by Python, the fastest growing data science language with more than 13 million downloads and 4 million unique users to date. Continuum Analytics is the creator and driving force behind Anaconda, empowering leading businesses across industries worldwide with solutions to identify patterns in data, uncover key insights and transform data into a goldmine of intelligence to solve the world’s most challenging problems. Learn more at continuum.io.
H2O.ai is focused on bringing AI to businesses through software. Its flagship product is H2O, the leading open source platform that makes it easy for financial services, insurance and healthcare companies to deploy AI and deep learning to solve complex problems. More than 9,000 organizations and 80,000+ data scientists depend on H2O for critical applications like predictive maintenance and operational intelligence. The company -- which was recently named to the CB Insights AI 100 -- is used by 169 Fortune 500 enterprises, including 8 of the world’s 10 largest banks, 7 of the 10 largest insurance companies and 4 of the top 10 healthcare companies. Notable customers include Capital One, Progressive Insurance, Transamerica, Comcast, Nielsen Catalina Solutions, Macy's, Walgreens and Kaiser Permanente.
About MapD Technologies
MapD Technologies is a next-generation analytics software company. Its technology harnesses the massive parallelism of modern graphics processing units (GPUs) to power lightning-fast SQL queries and visualization of large data sets. The MapD analytics platform includes the MapD Core database and MapD Immerse visualization client. These software products provide analysts and data scientists with the fastest time to insight, performance not possible with traditional CPU-based solutions. MapD software runs on-premise and on all leading cloud providers.
Founded in 2013, MapD Technologies originated from research at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). MapD is funded by GV, In-Q-Tel, New Enterprise Associates (NEA), NVIDIA, Vanedge Capital and Verizon Ventures. The company is headquartered in San Francisco.