Excel is an indispensable tool for data analysis and with the right datasets and techniques, beginners can learn to uncover insights and make informed decisions. Its intuitive interface and powerful functionality allow users to perform a wide range of processes such as data manipulation, data visualization and statistical analysis.

What are “Excel Datasets”?

Excel datasets are collections of data that are stored and organized in an Excel spreadsheet, which is a commonly used software that enables users to create, manipulate and analyze data in a structured format. These datasets can come in two main formats: Excel(.xlsx) and Comma Separated Values (CSV). The Excel format provides more advanced features for organizing and analyzing complex data, including the use of formulas and visualizations, while CSV, on the other hand, offers a simpler format that is compatible with a wide range of software applications, making it easier to share data between different programs.

In this article, we have compiled a list of 15 Excel Datasets for Data Analytics Beginners. With these Excel datasets covering topics like financial analysis, market analysis and time series analysis, beginners can practice data analysis techniques such as data cleaning, pivot tables and charts while gaining insights into real-world scenarios.

List of the Excel Datasets for Data Analytics Beginners

  1. Superstore Sales
  2. Iris
  3. Titanic
  4. Wine Quality
  5. Adult Census Income
  6. Boston Housing
  7. Breast Cancer Wisconsin Dataset
  8. Online Shoppers Purchasing Intention
  9. Bank Marketing
  10. Avocado Prices
  11. Amazon Top 50 Bestselling Books 2009 – 2019
  12. FIFA World Cup
  13. New York City Airbnb Open Data
  14. World Happiness Report
  15. Stock Price

1. Superstore Sales

The Superstore Sales data provides sales data for a fictional retail company, including information on products, orders and customers. It is often used to practice data analytics.

This Excel dataset includes the following variables:

2. Iris

This dataset includes measurements of the sepal length, sepal width, petal length and petal width of 150 iris flowers, which belong to 3 different species: setosa, versicolor and virginica. The iris dataset has 150 rows and 5 columns, which are stored as a dataframe, including a column for the species of each flower.

The description of its variables includes:

One use case of the Iris dataset in Excel is to analyze the relationship between the different features of the Iris flower and classify the flower species based on the feature values. This can be done using techniques such as correlation analysis, inferential statistics, and predictive modeling.

You can also download this Excel dataset on Kaggle by clicking here.

3. Titanic

This popular open-source dataset offers information on the passengers onboard the Titanic ship when it sank on April 15, 1912. It can be used by data analytics beginners interested in data cleaning and preprocessing, descriptive statistics, data visualization and predictive modeling.

Some of the variables included in the dataset:

4. Wine Quality

The Wine Quality dataset contains information on red and white wine samples. This dataset aims to classify the quality of the wine based on chemical properties like pH, density, alcohol content and citric acid content.

The common variables included in this Excel dataset:

5. Adult Census Income

This Excel dataset is a collection of information about individuals living in the United States, extracted from the 1994 Census database. It contains various demographic, social and economic attributes about each individual.

Some of the attributes included in this dataset:

The “income” attribute is the target variable and the dataset is very useful to data analytics beginners.

6. Boston Housing

The Boston Housing dataset consists of information on housing in the area of Boston, Massachusetts. It has about 506 rows and 14 columns of data.

Some of the variables in the dataset include:

This dataset can be utilized in data analytics to analyze the relationship between various features of house prices and a housing market, perform data analysis and generate insights.

7. Breast Cancer Wisconsin Dataset

This Excel dataset consists of information about breast cancer tumours and was initially created by Dr. William H. Wolberg. The dataset was created to assist researchers and machine learning practitioners in classifying tumours as either malignant(cancerous) or benign (non-cancerous).

Some of the variables included in this dataset:

8. Online Shoppers Purchasing Intention

The Online Shoppers Purchasing Intention dataset is a collection of data related to purchase patterns and consumer behaviour in the context of online shopping. It was created by conducting surveys of online shoppers and collecting data from their responses.

Some of the variables in this dataset include:

This Excel dataset is used in research and analytics related to e-commerce and online marketing. It can help businesses to understand the factors that drive customer behaviour and is also useful for data analytics beginners.

9. Bank Marketing

This popular dataset is to study marketing campaigns for a Portuguese banking institution. It contains information about the bank’s marketing campaigns, as well as customer demographics and economic indicators.

Some of the variables included in this dataset:

10. Avocado Prices

The Avocado Prices dataset consists of data related to the prices of avocados in the United States. The data is collected from various sources like the Hass Avocado Board and the United States Department of Agriculture (USDA).

Some of the variables in this dataset include:

It can also be used by businesses in the food industry to make strategic decisions about buying and selling avocados.

11. Amazon Top 50 Bestselling Books 2009 - 2019

This Excel dataset is a collection of data related to the top 50 best-selling books on Amazon for each year between 2009 and 2019.

The dataset includes the following variables:

The Amazon Top 50 Bestselling Books can be used to explore trends in book sales on Amazon over a decade and is useful to data analytics beginners.

12. FIFA World Cup

The FIFA World Cup dataset is a collection of data related to the FIFA World Cup which is held every four years. It contains information on every World Cup tournament from 1930 to 2014.

Some of the variables in this dataset include:

The dataset can be used to analyze trends in the World Cup over time, such as changes in the number of teams that participate or the number of goals scored.

13. New York City Airbnb Open Data

This excel dataset consists of public information about Airbnb listings and metrics in New York City. The 2019 New York City Airbnb Open Data includes information on about 50,000 Airbnb listings in the city and is made available to the public by the New York City government to promote transparency and understanding of the impact of rentals on the city.

Some of the variables in the dataset include:

14. World Happiness Report

This dataset includes information on the happiness levels of over 150 countries, such as economic, social, and health factors that contribute to happiness. It is useful to data analytics beginners for practicing data exploration, visualization, and regression analysis.

Some of the variables in this dataset include:

15. Stock Price

This dataset includes the daily stock prices of various companies, such as Apple, Google and Amazon. It is useful for practicing time series analysis and predicting future stock prices.

The variables in this dataset:

Common Practice Questions for These Excel Datasets

Superstore Sales

Iris

Titanic

Wine Quality

Adult Census Income

Boston Housing

Breast Cancer Wisconsin Dataset

Online Shoppers Purchasing Intention

Bank Marketing

Amazon Top 50 Bestselling Books 2009 – 2019

FIFA World Cup

New York City Airbnb Open Data

World Happiness Report

Stock Price

Final Thoughts

Excel offers a wide range of tools for data analytics beginners and you can improve your skills by using the Excel datasets listed in this article.

You can also create various types of visualizations such as line charts, bar charts, scatter plots, histograms and pie charts to answer the questions above.


The lead image of this article was generated via HackerNoon's AI Stable Diffusion model using the prompt 'Excel datasets'.

More Dataset Listicles:

  1. Tableau Datasets
  2. Power BI Datasets
  3. Keras Datasets