Subscribe to our Newsletter

50+ Open Source Tools for Big Data (See Anything Missing?)

Originally posted on BigDataStudio, by Fari Payandeh.

bigdata-opensource-final5

Hadoop Distributions

Hortonworks

Cloud Operating System

Cloud Foundry — By VMware

OpenStack — Worldwide participation and well-known companies

Storage

fusion-io — Not open source, but very supportive of Open Source projects; Flash-aware applications.

Development Platforms and Tools

REEF — Microsoft’s Hadoop development platform

Lingual — By Concurrent

Pattern — By Concurrent

Python — Awesome programming language

Mahout — Machine learning programming language

Impala — Cloudera

R — MVP among statistical tools

Storm — Stream processing by Twitter

LucidWorks — Search, based on Apache Solr

Giraph — Graph processing by Facebook

NoSql Databases

MongoDB, Cassandra, Hbase

Sql Databases

MySql — Belongs to Oracle

MariaDB — Partnered with SkySql

PostgreSQL — Object Relational Database

TokuDB — Improves RDBMS performance

Server Operating Systems

Red Hat — The defacto OS for Hadoop Servers

BI, Data Integration, and Analytics

Talend

Pentaho

Jaspersoft

Click here to read more.

Views: 7325

Comment by Vincent Granville on September 18, 2013 at 9:02am

What about visualization, hardware and security? Also, I would add a few in Analytics (a category of its own) including RapidMiner. As for BI / Dashboards, I would add Actuate (Birt).

Comment by Michael Clayton on September 18, 2013 at 5:20pm

http://tomcat.apache.org/    

Tomcat Java Servlets were used to support SQL queries of Postgres and other DBMS' in a large semiconductor chip factory, which has "big data" in structured format but with hierarchical-relational schema designed to capture data from hundreds of process machines sensors and metrology raw data as well as electrical test data on hundreds of tests, for thousands of die, on each of 10,000 wafers per month, to support visualization and modeling efforts of hundreds of engineers in 5 countries, globally connected to these DB tools.  Commercial visualization and modeling software, JMP by SAS, the existing engineering stat tool already in place.  

JSL scripts link to the Apache Java servlets which query the existing engineering DB's and Manufacturing DB's, to generate csv files that could be used by Excel, JMP, R routines, or whatever local stat or graphing tool was common at each site.  User base grew for past 3 years since integration work was completed (most integration effort used communications protocols common to our industry, parsing the data files with Python (or Perl in earlier years) to load to the various engineering or manufacturing DB's.  While many of the smaller local DB's used Postgres, the older legacy yield management DB and SPC DB used Oracle, which supported the ODBC interface allowing the SQL queries to pull data from those systems, and merge that data with other DB context information on even older VMS-based proprietary systems.  This mixed-mode scenario is augmented by the many open source tools over time, to avoid the millions of dollars and months of effort to replace the legacy factory MES and SPC systems, and the hundreds of tester systems (many PC based) just to get 21st century data communications.  

Comment

You need to be a member of Big Data News to add comments!

Join Big Data News

© 2017   BigDataNews.com is a subsidiary of DataScienceCentral LLC and not affiliated with Systap   Powered by

Badges  |  Report an Issue  |  Terms of Service