Subscribe to our Newsletter

50+ Open Source Tools for Big Data (See Anything Missing?)

Originally posted on BigDataStudio, by Fari Payandeh.


Hadoop Distributions


Cloud Operating System

Cloud Foundry — By VMware

OpenStack — Worldwide participation and well-known companies


fusion-io — Not open source, but very supportive of Open Source projects; Flash-aware applications.

Development Platforms and Tools

REEF — Microsoft’s Hadoop development platform

Lingual — By Concurrent

Pattern — By Concurrent

Python — Awesome programming language

Mahout — Machine learning programming language

Impala — Cloudera

R — MVP among statistical tools

Storm — Stream processing by Twitter

LucidWorks — Search, based on Apache Solr

Giraph — Graph processing by Facebook

NoSql Databases

MongoDB, Cassandra, Hbase

Sql Databases

MySql — Belongs to Oracle

MariaDB — Partnered with SkySql

PostgreSQL — Object Relational Database

TokuDB — Improves RDBMS performance

Server Operating Systems

Red Hat — The defacto OS for Hadoop Servers

BI, Data Integration, and Analytics




Click here to read more.

Views: 7909

Comment by Vincent Granville on September 18, 2013 at 9:02am

What about visualization, hardware and security? Also, I would add a few in Analytics (a category of its own) including RapidMiner. As for BI / Dashboards, I would add Actuate (Birt).

Comment by Michael Clayton on September 18, 2013 at 5:20pm    

Tomcat Java Servlets were used to support SQL queries of Postgres and other DBMS' in a large semiconductor chip factory, which has "big data" in structured format but with hierarchical-relational schema designed to capture data from hundreds of process machines sensors and metrology raw data as well as electrical test data on hundreds of tests, for thousands of die, on each of 10,000 wafers per month, to support visualization and modeling efforts of hundreds of engineers in 5 countries, globally connected to these DB tools.  Commercial visualization and modeling software, JMP by SAS, the existing engineering stat tool already in place.  

JSL scripts link to the Apache Java servlets which query the existing engineering DB's and Manufacturing DB's, to generate csv files that could be used by Excel, JMP, R routines, or whatever local stat or graphing tool was common at each site.  User base grew for past 3 years since integration work was completed (most integration effort used communications protocols common to our industry, parsing the data files with Python (or Perl in earlier years) to load to the various engineering or manufacturing DB's.  While many of the smaller local DB's used Postgres, the older legacy yield management DB and SPC DB used Oracle, which supported the ODBC interface allowing the SQL queries to pull data from those systems, and merge that data with other DB context information on even older VMS-based proprietary systems.  This mixed-mode scenario is augmented by the many open source tools over time, to avoid the millions of dollars and months of effort to replace the legacy factory MES and SPC systems, and the hundreds of tester systems (many PC based) just to get 21st century data communications.  


You need to be a member of BigDataNews to add comments!

Join BigDataNews

Sponsored By

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service