Big Community: What does data wrangling mean to you?
Jason Bissell: Data wrangling is the process of collecting, correcting and formatting data before they can be turned into insights. Other terms, such as data blending, data preparation or data unification are often used to refer to the need to empower a wide range of data workers within the enterprise, not only IT professionals but also data analysts, stewards, scientists or even lines of business users, so that they can autonomously access the data they need and turn their daily tasks into data-driven activities.
At Talend, we recognize that organisations today are swimming in data, but most companies are only utilising a fraction of what they collect. By implementing a self-service data preparation strategy, companies can enable more widespread use of data throughout their organisation and create a data-driven culture.
Big Community: Why is data wrangling an important part of big data analytics?
Jason Bissell: While many companies have made good progress on their path to becoming more data-driven, a major stumbling block is often the last mile of the journey as companies work to get accurate and actionable information in the hands of more business users. While the concept of data lakes have dramatically cut the costs and the time needed to collect any data in any form, and in real-time, there is still a gap between this raw data and the smart data that business users need to turn into insights.
Self-service closes this gap. Employees are no longer dependent on IT resources or ‘information brokers’. They have the freedom to prepare and analyse data themselves, helping speed time to insight, while IT is able to maintain data security and governance. Data professionals such as data analysts or scientists no longer have to spend 80% of their time preparing data before delivering business insights or designing predictive models.
Big Community: Can you wrangle without investing in tools?
Jason Bissell: Yes. Before data preparation tools like Talend Data Preparation, business users had to specify their needs to IT Professionals, with business justification, so that their demands get on top of the IT Backlog. If they wanted to fulfil their needs autonomously, without IT involvement, data workers spent up to 80% of their time on laborious and time-consuming data wrangling processes using spreadsheets.
However, these manual processes were costing businesses an estimated 500 hours and $22,000 per year— emphasizing the need for automated tools that alleviate the common challenges of overburdened IT teams that can’t keep up with the growing data demands of the business, and data analysts who are all too frequently spending more time wrangling data than they were supplying insights.
Big Community: Do technologies that store data for analytics (e.g. Hadoop) have built-in wrangle capabilities?
Jason Bissell: The Hadoop platform does not have a built-in process for data wrangling. This means the overwhelming majority of data preparation work in Hadoop is currently being done by writing code in scripting languages like Hive, Pig or Python – also imposing a high technical barrier for individuals to participate in the preparation process.
For this purpose, Talend provides a big data software solution that includes a Hadoop connector for virtually any data source. With Talend, developers can integrate data from almost any source, and integrate it quickly in an easy-to-use graphical environment. Without needing to learn new skills or write complicated code, developers can visually map big data sources and targets to create and transform massive data sets for social data mining, sensor data analytics and other big data operations. With Talend Data Preparation on top, any data worker who has the credentials can then access the data from the data lake, augment it with their own data sources, and put this data at work for analytics or other data-driven tasks.
Big Community: What’s unique about your own data wrangle capabilities?
Jason Bissell: Talend Data Preparation runs natively where your data is. It can be traditional on-premises data warehouses, cloud based environments, and/or data lakes on Hadoop at extreme scale. It is also the first to support Apache Beam, a unified programming model for executing both batch and streaming data processing pipelines that are portable across a variety of runtime platforms.
Talend Data Preparation goes beyond standalone preparation for analytics. It enables enterprise-wide, self-service data access and collaboration. For example, a preparation can start under the hands of a data scientist to create a personalization engine for predicting the customer’s next best actions. But then this model and related preparations might have to be embedded into a mobile app or e-commerce site and run in real-time. By combining intuitive data preparation capabilities with proper IT oversight; ultimately, Talend goes beyond personal decision-making, allowing enterprise to turn those decisions into actions.
Our solution also allows anyone in IT or a business role to access, blend, clean, and enrich data with an Excel-like, easy to use point and click user interface. It is not only geared at a specific persona, like a data analyst. It is rather a collaborative tool that empowers business users to solve their data preparation challenges on their own without waiting for technical IT resources, but in a governed and controlled way.
- October 2017(58)
- September 2017(65)
- August 2017(97)
- July 2017(111)
- June 2017(87)
- May 2017(105)
- April 2017(113)
- March 2017(108)
- February 2017(112)
- January 2017(109)
- December 2016(110)
- November 2016(121)
- October 2016(111)
- September 2016(123)
- August 2016(169)
- July 2016(142)
- June 2016(152)
- May 2016(118)
- April 2016(60)
- March 2016(86)
- February 2016(154)
- January 2016(3)
- December 2015(150)