Subscribe to our Newsletter

Top 10 Pointers in New Apache Spark 1.6 Release

In this new year 2016, we should be excited that Apache Spark community have released and announced the availability of Apache Spark 1.6, which is the 7th release on the 1.x line.

  • Committers – Contributors to Spark had crossed 1000, which is doubled.
  • Patches – Apache Spark 1.6 version includes & covers 1000 patches.
  • Run SQL query on files – This feature helps user and application to run SQL queries on files directly without create a table. And it’s similar to the feature available in Apache Drill. For an example select id from json.`path/to/json/files` as j.
  • Star (*) expansion for StructTypes – This features makes it easier to nest and unnest arbitrary numbers of columns. It is pretty common for customers to do regular extractions of update data from an external datasource (e.g. mysql or postgres). While this is possible today in the new release with some small improvements to the analyzer. And goal is to allow users to execute the following two queries as well as their dataframe equivalents to find the most recent record for each key to unnest the struct from above group by query.
  • Parquet Performance – It has been the most commonly used data formats with in the Apache Spark, and Parquet scan performance has pretty big impact on many large applications. Continue Reading

Views: 458

Tags: Analytics, Apache, Big, Data, Spark

Comment

You need to be a member of BigDataNews to add comments!

Join BigDataNews

On Data Science Central

© 2019   BigDataNews.com is a subsidiary of DataScienceCentral LLC and not affiliated with Systap   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service