A Data Science Central Community
Guest blog post by Michel Bruley
LinkedIn was founded in 2003, is currently revenue 243 million and employs 1797 people. This is not what we call a large company. However, LinkedIn has 175 million members in 200 countries including 50% outside the U.S., two new members join the network every second, and analysts said that all "executive" of the Global 500 are members. Under these conditions, LinkedIn is facing a high volume of data to process. Indeed their information system must support 2 billion a year of research carried out by members, dealing daily data and 75 T0 10 billion lines.
By analyzing all data LinkedIn is able to establish for example the list of the words used by members to describe their capabilities, and these words differ from one country to another. United States and Canada are extended highlights of the experience, while in Italy, France or Germany we say innovative, as in Brazil and Spain they are dynamic and that in Great Britain they highlights their motivation.
LinkedIn is certainly one of the companies involved in the development of what is now known in the business world as the "Science of Data", which is based on know-how from computer science, mathematics, data analysis and business management. Specifically the process is to collect quickly raw data, explore and analyze, translate this data into actionable information, and therefore reduce the overall time between the discovery of relevant facts, the characterization of business opportunity and triggering actions.
But what LinkedIn does with its data? The company classically realizes analysis to better understand and carry out its activities, but above all it creates products / services based on the information it generates, either at the global level as with most used words seen above, either at the individual level with systems recommendations (the people you may know, the jobs that ...). The data allow for example to identify people of influence, viral process and social trends, test new products / services, new sites to maximize the business impact of connection and use of the site by members, to understand service use over time based on subscription levels, the connection means (PC, mobile, ...), providing detailed reports of analysis of advertising revenue, to assess the impact of action of viral marketing, to optimize recommendation engine, to create specialized functions for services to business (marketing, recruitment, ...).
To obtain these interesting results of the operation of its data, LinkedIn had to develop its own management application’s data flow, storage, research, network analysis, etc., and of course their own dashboards. For that the company went to get on the market tools or solutions they need, and we can mainly list: Teradata Aster, Hadoop, Azkaban, Kafka, Project Voldemort, Pig, Pithon, Prefuse, Microstrategy, Tableau software
To go further about the LinkedIn case, you can usefully follow the below 50’ video presentation, entitled "Data Science @ LinkedIn: Insight & Innovation at Scale", by Manu Sharma, Principal Research Scientist and Group Manager, Product Analytics at LinkedIn: https://www.youtube.com/watch?v=W7ZcUJEHAOk