Import.io Transform helps customers reduce the time it takes to prepare data for analysis by cleaning the data directly in Import.io. Below is a walk-through of using Transform to clean and prepare data from Amazon.com and analyze customer ratings by brand. Our end-goal is to visualize and monitor how different brands of televisions perform, as far as customer ratings correlated to average price, and then track this overtime. We used Tableau for the final charts to show the price distribution of televisions and average ratings across brands.
Here’s how we extracted and transformed the data to analyze customer ratings by brand.
After copying the Amazon.com URL where the data is located into Import.io Extract, we captured each column of data needed including brand name, price, size of television, and rating.
You can now see that the data is structured in columns, but the data is not in the right form to easily feed into a data analytics tool. For instance, having the extra words “out of 5 stars” in the rating column won’t allow us to analyze it easily. Also, the price column does not have consistent decimal points and has a space between the $ symbol and the price.
So, we can use Import.io Transform to easily and quickly clean the data and prepare it for analysis.
Once in Transform you can start cleaning the columns with more than 100 Excel functions. If you are not familiar with all of these (and who is?), then you can click on the blue info button next to each to see a description and the syntax for formulating your new columns.
For this example, you can get rid of the extra words in the ratings column with a two steps formula creating a new column showing where the word “out” is located using the function FIND, then using that column to create a clean ratings column with the function LEFT. We used this example of showing how to calculate a column from another calculated column, but you could have also done this in one step using REMOVE.
Next, we need to clean up the pricing column using the REPLACE function to remove the dollar symbol and SUBSTITUTE to add a decimal.
Then, we want to include television size in our analysis, but we need just the number and not the measurement type “in”, so we use a simple LEFT function to create a new column of just the number.
Once this Transform is saved, every time you rerun that extractor to get the most up to date information, the Transform is automatically updated and saved in the Import.io cloud for fresh analysis and access from any device.
The output includes the original columns and the transformed columns of data that are ready for your visualization and analytics software.
In this instance, we were able to quickly drag and drop the clean data in to Tableau to visualize average price by screen size and brand, average rating by brand, and a scatter plot showing correlation of price to rating by screen size.
This competitive comparison information is helpful to vendors looking to improve and analyze customer ratings. A vendor may want to run the data extraction and transformation in Import.io on a daily or weekly basis to see if product improvements move the ratings higher overtime.
Rather than downloading data to an Excel spreadsheet or using a 3rd party package and trying to keep it up-to-date when a website changes, Import.io does the work for you. Once a transformation is created, then every time your scheduled data extraction runs, the transformation is also applied, leaving you with clean, ready-to-use data for visualization and analysis. All of the data, both original and transformed, is stored in Import.io’s cloud based service available for you to perform further insights.