Data, in its ubiquity, comes from various places and in various formats. A key challenge of data mining is synthesizing information from disparate sources into a form amenable for analysis. Data is found in relational databases, various text formats, online through APIs, and even scattered across many webpages which must be scraped. To compound the problem, the real world is messy. Frequently datafiles are malformatted, data points are missing or faulty, and data is mislabeled and miscategorized in unpredictable ways. These challenges must be addressed regardless of whether the final goal is to draw up some descriptive charts in an Excel spreadsheet or to train a deep neural network.
Why study Machine Learning?
Machine learning is a set of techniques for automatically finding patterns in large sets of data. Such patterns are not easily found by human analysts because they are either too non-intuitive to ever be considered or too subtle, involving the combination of many variables in a delicate balance that is difficult or impossible for a human to determine.
This programme is designed to provide students with knowledge and applied skills in data science, big data analytics and business intelligence. It aims to develop analytical and investigative knowledge and skills using data science tools and techniques, and to enhance data science knowledge and critical interpretation skills. Students will understand the impact of data science upon modern processes and businesses, be able to identify, and implement specific tools, practices, features and techniques to enhance the analysis of data.
This course is an introduction to R statistical programming language for data science. It aims to give participants a fundamental understanding of the language to be able to effectively use it for future projects, and also an appreciation for the myriad applications of R.
Why Study Probability and Statistics?
Probability and statistics are the classic mathematical tools for making sense of large sets of data. The simplest use of statistics is to determine the mean, median, mode, and variance of a dataset. These quantities, referred to as summary statistics, provide us with a way of reducing a long list of numbers into a few which characterize the initial set. Statistics is a critical tool for interpreting and comparing sets of data. Students will learn not only how to compute these quantities, but also how to interpret them in the context of business decisions.
This course serves to provide an overview on Big Data and Data Science. It allows participants to obtain a high-level understanding of how Data Science can be used to obtain insight from Big Data in order to drive strategic decision making
Why Data Visualization?
A picture is worth a thousand words – especially when you have data and you are trying to understand its structure, the relationship between its variables and glean interesting insights from it. This is especially true for big data, which could include thousands of variables, making it necessary to use some kind of pictures to help us wrap our heads around it. Data visualization is the art of presenting data in an intuitive and easy to understand graphical format. The idea of using pictures to understand data has been around for a long time in the form of maps and graphs. Now with the advent of data science and big data technologies, data visualization has become a rapidly evolving field that combines art and science to enable intuitive and fluent communication of information. Anyone working with data must learn to present data in a way that yields insight and understanding. This means understanding data visualization and the principles of human perception, cognition and communication that are the cornerstones of effective visualizations. In this course, students learn the concepts and principles that enable effective and efficient graphics and visualizations. Hands-on exercises walk students through various types of charts and graphs, visual constructs for networks and text, and interactive web applications