The field of data science can be divided into a number of different areas, from big data infrastructure, to actually doing the analysis and developing tools to output the results and make them useful.
Here’s a list of fields related to the concepts of data science and big data.
Data Analysis/Analytics.
This could mean anything involving the analysis of information (data). Analyzing large data sets using SAS, JMP, etc. is an example. Big data analytics go through huge volumes of data, something that conventional analytics can’t do. Analytics are necessary to process billions of data combinations in various formats and figure out what is important and what isn’t. Once that is done, data can start to turn into a valuable asset.
Data Mining
It entails any tool used to find unknown properties of the data. This is an area that has taken much inspiration from machine learning and statistics, but its ends are different. An important differentiator is that data mining is conducted by a person with a specific data set and a clear goal in mind. This person is trying to make sense of data. Usually, the goal is to discover preliminary insights in an area where there was little knowledge, or to be able to predict future observations precisely. Common data mining techniques include cluster analysis, neural networks, and classification and regression trees.
Machine Learning.
Machine learning focuses on prediction, based on known properties. Basically, it’s a group of algorithms and techniques used to design systems that are able to learn from data. The algorithms used in machine learning have a strong mathematical and statistical base that doesn’t care much about data pre-processing or domain knowledge (knowledge about the environment in which the target system operates). After a data scientist acquires the data, he or she needs to clean it and transform it into something useful. After the data has been transformed, the data scientist decides, based on domain knowledge, what statistical method or machine learning algorithm will be best to solve the specific problem they are tackling.
Data visualization.
Data visualization is the presentation of data in a pictorial or graphical format. It has become increasingly important to represent data in layman terms. People from all walks of life involved in the data projects have to be able to understand the results of the analysis and get valued insights. Make sure you invest as much into people who will use data than in big data itself. As more data is collected and analyzed, decision makers at all levels and from different fields embrace data visualization software that permits them to see analytical results presented visually, find relevance among the millions of variables and even predict the future. It’s easier for people to comprehend the meaning of data when it’s displayed in graphs rather than over dozens of spreadsheets.
Big Data
Big Data relates to the technologies and professional services needed to make sense of huge amounts of data. New technologies have arisen in the last few years, such as Hadoop, Spark, Map Reduce, among others, to complement traditional relational database systems. These systems are built specifically to store and organize vast amounts of data and can be scaled relatively easily. Several companies have focused their professional service practices on the Big Data space, which allows organizations to implement Big Data initiatives using best practices.