In the blossoming world of big data, the data miner is king. Although your own business may already see the value in data, it’s more difficult to understand how to data mine for success.
First, let’s take a look at what data mining is. Data mining is the process of automatically sorting through large data sets to look for patterns and trends. It greatly exceeds the average data analysis you can do manually. Data mining can quickly take the guesswork out of data by looking for automated predictions of behavior. That’s huge for anyone with limited time or know-how, allowing them to harness and interpret large amounts of data on their own.
That means data mining has innumerable benefits you can integrate into your knowledge discovery process to pull together information that impacts your bottom line. For example, the finance industry could use it to look at historical data and new trends to determine what type of stocks to invest in. Most companies use data mining to improve their bottom line and quickly make profitable decisions.
But data mining requires more than a few keystrokes and some sweat equity. You also need the right tools to get there. With today’s tools, anyone can collect data from almost anywhere, but not everyone can pull the important nuggets out of that data. Slapping your data into Tableau is an OK start, but it’s not going to give you the business-critical insights you’re looking for. To truly make your data come alive, you need to mine it and find the diamond in the rough.
Jumpstarting your data mining journey can be an uphill battle if you didn’t study data science in school. Not to worry! Few of today’s brightest data scientists did. So, for those of us who may need a little refresher on data mining or are starting from scratch, here are 45 great resources to learn data mining concepts and techniques.
Learn Data Mining Languages: R, Python and SQL
W3Schools – Fantastic set of interactive tutorials for learning different languages. Their SQL tutorial is second to none. You’ll learn how to manipulate data in MySQL, SQL Server, Access, Oracle, Sybase, DB2 and other database systems.
Treasure Data – The best way to learn is to work towards a goal. That’s what this helpful blog series is all about. You’ll learn SQL from scratch by following along with a simple, but common, data analysis scenario.
10 Queries – This course is recommended for the intermediate SQL-er who wants to brush up on his/her skills. It’s a series of 10 challenges coupled with forums and external videos to help you improve your SQL knowledge and understanding of the underlying principles.
TryR – Created by Code School, this interactive online tutorial system is designed to step you through R for statistics and data modeling. As you work through their seven modules, you’ll earn badges to track your progress helping you to stay on track.
Leada – If you’re a complete R novice, try Lead’s introduction to R. In their 1 hour 30 min course, they’ll cover installation, basic usage, common functions, data structures, and data types. They’ll even set you up with your own development environment in RStudio.
Advanced R – Once you’ve mastered the basics of R, bookmark this page. It’s a fantastically comprehensive style guide to using R. We should all strive to write beautiful code, and this resource (based on Google’s R style guide) is your key to that ideal.
Swirl – Learn R in R – a radical idea certainly. But that’s exactly what Swirly does. They’ll interactively teach you how to program in R and do some basic data science at your own pace. Right in the R console.
Python for beginners – The Python website actually has a pretty comprehensive and easy-to-follow set of tutorials. You can learn everything from installation to complex analyzes. It also gives you access to the Python community, who will be happy to answer your questions.
PythonSpot – A complete list of Python tutorials to take you from zero to Python hero. There are tutorials for beginners, intermediate and advanced learners.
Best Data Mining Books
Data Jujitsu: The Art of Turning Data into Product – This free book by DJ Patil gives you a brief introduction to the complexity of data problems and how to approach them. He gives nice, understandable examples that cover the most important thought processes of data mining. It’s a great book for beginners but still interesting to the data mining expert. Plus, it’s free!
Data Mining: Concepts and Techniques – The third (and most recent) edition will give you an understanding of the theory and practice of discovering patterns in large data sets. Each chapter is a stand-alone guide to a particular topic, making it a good resource if you’re not into reading in sequence or you want to know about a particular topic.
Mining of Massive Datasets – Based on the Stanford Computer Science course, this book is often sighted by data scientists as one of the most helpful resources around. It’s designed at the undergraduate level with no formal prerequisites. It’s the next best thing to actually going to Stanford!
Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners – This book is a must read for anyone who needs to do applied data mining in a business setting (ie practically everyone). It’s a complete resource for anyone looking to cut through the Big Data hype and understand the real value of data mining. Pay particular attention to the section on how modeling can be applied to business decision making.
Data Smart: Using Data Science to Transform Information into Insight – The talented (and funny) John Foreman from MailChimp teaches you the “dark arts” of data science. He makes modern statistical methods and algorithms accessible and easy to implement.
Hadoop: The Definitive Guide – As a data scientist, you will undoubtedly be asked about Hadoop. So you’d better know how it works. This comprehensive guide will teach you how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. Make sure you get the most recent addition to keep up with this fast-changing service.
Online Learning: Data Mining Webinars and Courses
DataCamp – Learn data mining from the comfort of your home with DataCamp online courses. They have free courses on R, Statistics, Data Manipulation, Dynamic Reporting, Large Data Sets and much more.
Coursera – Coursera brings you all the best University courses straight to your computer. Their online classes will teach you the fundamentals of interpreting data, performing analyzes and communicating insights. They have topics for beginners and advanced learners in Data Analysis, Machine Learning, Probability and Statistics and more.
Udemy – With a range of free and pay for data mining courses, you’re sure to find something you like on Udemy no matter your level. There are 395 in the area of data mining! All their courses are uploaded by other Udemy users meaning quality can fluctuate so make sure you read the reviews.
CodeSchool – These courses are handily organized into “Paths” based on the technology you want to learn. You can do everything from build a foundation in Git to take control of a data layer in SQL. Their engaging online videos will take you step-by-step through each lesson and their challenges will let you practice what you’ve learned in a controlled environment.
Udacity – Master a new skill or programming language with Udacity’s unique series of online courses and projects. Each class is developed by a Silicon Valley tech giant, so you know what your learning will be directly applicable to the real world.
Treehouse – Learn from experts in web design, coding, business and more. The video tutorials from Treehouse will teach you the basics and their quizzes and coding challenges will ensure the information sticks. And their UI is pretty easy on the eyes.
Learn from the Best: Top Data Miners to Follow
Dr. Vincent Granville – With proven success in bringing measurable value to everything from startups to Fortune 100 companies, Vincent Granville is a data science pioneer, author, CEO and investor known for developing and deploying new techniques like hidden decision trees.
Bruce Ratner – Best-selling author and President of DM STAT-1 Consulting, Bruce Ratner prides himself on providing the best non-hyped solutions to business problems.
Bob Gourley – Author of The Cyber Threat, Bob Gourley brings decades of experience as a former USN intelligence and DIA CTO.
Monica Rogati – Former VP of Data at Jawbone, data scientist and AI advisor Monica Rogati turns data into products and stories.
Klint Finley – As a reporter for WIRED, Klint Finley reports on telecommunications, software development, technology law, network neutrality, machine learning, open source, hacker and startup culture, code literacy and more.
David Smith – As a Cloud Developer Advocate at Microsoft, David Smith has launched several products, including data mining applications and financial analysis suites.
Kristen Nicole – Contributor at Time, Kristen Nicole possesses a deep understanding of research and analysis, and was named by Forbes as a top influencer in big data.
John Foreman – Chief Data Scientist at MailChimp and author of Data Smart, John is worth a follow for his witty yet poignant tweets on data science.
DJ Patil – Author and Chief Data Scientist at The White House OSTP, DJ tweets everything you’ve ever wanted to know about data in politics.
Nate Silver – He’s Editor-in-Chief of FiveThirtyEight, a blog that uses data to analyze news stories in Politics, Sports, and Current Events.
Andrew Ng – As the Chief Data Scientist at Baidu, Andrew is responsible for some of the most groundbreaking developments in Machine Learning and Data Science.
Bernard Marr – He might know pretty much everything there is to know about Big Data.
Christian Rudder – As the Co-founder of OKCupid, Christian has access to one of the most unique datasets on the planet and he uses it to give fascinating insight into human nature, love, and relationships
Practice What You’ve Learned: Data Mining Competitions
Kaggle – This is the ultimate data mining competition. The world’s biggest corporations offer big prizes for solving their toughest data problems.
Stack Overflow – The best way to learn is to teach. Stackoverflow offers the perfect forum for you to prove your data mining know-how by answering fellow enthusiast’s questions.
TunedIT – With a live leaderboard and interactive participation, TunedIT offers a great platform to flex your data mining muscles.
DrivenData – You can find a number of nonprofit data mining challenges on Data Driven. All of your mining efforts will go towards a good cause.
Quora – Another great site to answer questions on just about everything. There are plenty of curious data lovers on there asking for help with data mining and data science.
Meet Your Fellow Data Miner: Social Networks, Groups and Meetups
Reddit – Reddit is a forum for finding the latest articles on data mining and connecting with fellow data scientists. We recommend subscribing to a variety of subreddits to dig into how data mining is being used, and the latest trends.
Facebook – As with many social media platforms, Facebook is a great place to meet and interact with people who have similar interests. There are a number of very active data mining groups you can join, including:
- Machine Learning Forum
- Big Data, Data Science, Data Mining & Statistics
- Data Mining/Big Data – Social Network Analysis
- Machine Learning, Artificial Intelligence and Data Analytics
LinkedIn – If you’re looking for data mining experts in a particular field, look no further than LinkedIn. There are hundreds of data mining groups ranging from the generic to the hyper-specific. Here’s where to get started:
- Big Data and Analytics
- Innovation Enterprise
- RData Mining
- Data Science Central
- Data Mining, Statistics, Big Data, Data Visualization, and Data Science
- Actuary / Actuarial, Predictive Modeling, Data Mining, and Statistics News / Jobs / Careers Group
- Data Mining Technology
- Healthcare Data Mining and Modeling
Meetup – Want to meet your fellow data miners in person? Attend a meetup! Just search for data mining in your city and you’re sure to find an awesome group near you. Here are some data mining groups to explore:
- SF Data Science Meetup
- SF Data Mining
- Boston Data Mining
- Miami Data Science Meetup
- NYC Data Mining Meetup
- Data Science DC
Data Mining Concepts and Techniques
After you explore the resources above and learn more about data mining, it’s time to leverage those concepts and apply them. There are several best practices and techniques to use in data mining to help shape your results and streamline the process. Depending on the needs of your company, you can use data mining to do everything from predicting buyer behavior to finding the best leads for your business.
Cluster analysis works to group users with commonalities (ranging from education to location) together within a database. This can be useful for businesses like real estate, where agents might be looking for home sellers in a specific region with certain education levels, to help them locate their perfect potential customers. Luxury travel agents looking to sell to high-income couples without children can also look at clustering to identify their ideal customer base.
Association learning explores where common associations lie in large volumes of data. Take the example of Amazon. They’re skilled at using their data to uncover insights into what types of products customers may want based on what they’ve already bought. For example, Amazon may see that someone who purchases kitchen knives will likely be interested in cutting boards, cutting gadgets for fruits and vegetables, and specialized knife cleaners.
It can be downright overwhelming to figure out how to make the best decision for your company’s bottom line. Where do you start? This is where decision trees can help. You can see the costs and benefits of each decision based on large volumes of historical data. The data ultimately breaks down into subsets so you can see how each decision will impact your business. This can help improve your project risk management options and help you make more profitable decisions for your company.
Like the name implies, classification is a common technique where pre-classified samples are used to create a usable model. Creating a classification algorithm can be automated to analyze and measure your data. Now consider how to apply this in real life. A credit card company could use the classification technique to analyze data and determine whether or not a credit card offer should be sent out, or what kind of bonus rewards to offer.
If your business relies heavily on sales forecasting, you could benefit from the regression statistical model. Think about how every variable of your business affects the others, from the seasonality of your sales to the location you’re selling in. When you use the regression statistical model, the data can be used to predict your forecasting and analyze trends. Now you have clear, straightforward, and data-driven information on where to allocate your time and resources.
Data mining isn’t going away, and its value will only continue to increase. The question is whether or not you want to leverage the data available to you to help improve your bottom line and scale your business.