E-Commerce Analysis

nicodemusnaisau
6 min readNov 27, 2022

--

image by jcomp on Freepik

Background Project

Doing data analysis on e-commerce data which saved in relational databases by processing up to data visualization. This analysis uses some assists of libraries such as Pandas, Seaborn etc.

In the available data, there is a hidden message. If it is developed, it will help business processes and how companies can make the right decisions in the future in the future, these are products, sales, and customers.

Processing up to see how the trend of each needs are expected to help companies to do suitable strategic efforts based on the results of existing data analysis.

This project will make an effort to be able to do a search instead of every implied information contained in the existing data.

Relational Tables on the dataset

Before conducting an analysis to answer the objective, several stages of cleaning data must be done first, they are:

  1. Handling Missing Value
  2. Handling Parsing Date & Handling Inconsistency format
  3. Handling Duplicate Data

Tools

  1. Jupyter Notebook
  2. Visual Studio Code

Objective And Research Question

Product

  • The company wants to know which category of products are sold the most and which product category that is less desirable to buyers.

Sales

  • The company wants to know the time series of sales during the 2017 and 2018 period based on this acquisition, and the analysis of the year with the highest sales.

Customer

  • Where the locations for buying the most customer orders are and the lowest states in making orders.
  • The customer payment methods while shopping using what kind of payment method during the data period.

Cleaning Data

Handling Missing Value

After checking each table column, it was found that there were 3 tables with a null value, such as the order table, product table, and review table.

we don't need to drop data reviews that contain missing values, because from the order reviews sometimes people don't fill the review completely

After handling it by using the drop function on the table order, there hadn’t more NaN.

Handling Parsing Date & Handling Inconsistency format

Checking data that has an inappropriate data type on some tables contains data which has timestamp values ​​but is has object data type, so we need to change the data type.

After handling it by using astype function, the data format is suitable with the appropriate data type and the value of the attribute data itself.

Handling Duplicate Data

In each table, data duplication is often happened, which may be caused due to data input or system errors. At this stage, we will check the data that has a same value of each attribute. After checking on each table, it does not find duplicate values.

Explanatory Data Analysis

Product

Product Category that sells the most

Product is one of the keys in every sales market because with the sale of the products, it will be the main driver in obtaining profits sale. Judging from the trend of categories of products that have value high sales in order to increase the product.

Product sales that are most ordered by buyers are in the category bed bath table reaching more than 10,000 orders. Many of the buyers do orders for household needs. Health and Beauty also became the buyer’s favorite. The recorded sales in the health and beauty category reaching orders around 9,000 orders.

Product sales that are lowest ordered by buyers are in the category of security and services

Sales

Time Series

On the time-series sales chart from 2016 to the end of 2018 shows that sales in 2017 experienced a significant movement in November which has the explosion, so it increased the sales. In 2018, there were months that experienced a decline in sales in the second quarter.

Let’s check the detail of the sales that occurred throughout 2017.

It can be seen that November reached 7000 orders. So, it became the month which had the highest sale during the period sales in 2017.

The cause of the sale can be seen from the events that happened in that time, an annual event with a discount called Black Friday. Black Friday is events with shopping discounts on certain products. Because of public enthusiasm for the Black Friday event, it causes a spike in transactions

Customers

Customer payment methods with the number of transactions.

Based on the acquisition of the data obtained by the visualization, we can see that buyers mostly use credit-card payment methods compared with other payment methods. It is possible based on the schema 0% installments offered with the scheme of each provider bank.

Installment-credit schemes allow the buyers to shop for expensive household needs. So, choosing the credit option with installments per month is an option that makes it easy for the buyers. The least used payment method by buyers is the debit card and the voucher.

The location of the buyer’s distribution area for each region

Purchases from the location of the most state customers is in the SP state or known as São Paulo by reaching a total order of more than 40,000 number. São Paulo is a state with a Brazilian financial center. So that people will do transactions more often because the city is the financial center of Brazil. Furthermore, São Paulo is a largest city in Brazil and the 8th largest metropolitan area. So, the transactional activity in this area is quite large. RJ or known as Rio de Janeiro is the second largest state after São Paulo.

Let’s look at the state with the lowest number of transactions

RR or known as Roraima is the lowest state with a total number of transactions are less than 50. Based on the number of state cities, RR has a small number of cities compared to the total number of existing states. Roraima only has 41 states.

To seeing it further, we can see the correlation between the number of cities in each state with the total transactions that exist in that state.

It was found that correlation 1 is a correlation with the variable itself while the correlation value of 0.66 indicates the correlation between variables. Based on that thing, the correlation of 0.66 indicates a positive correlation which means that every increase of variable of the cities number results in an increase of the number of existing transactions.

Conclusion

Sales with discount events will certainly increase the number of transaction levels on company. So, the company can take advantage of some discount momentum for increasing the number of existing transactions.

Companies also need to communicate with its sellers to be able to do some preparations related to the stock requirement of sales items. So that when the event takes place, the needs of goods still available and able to meet the needs of the order.

Sales from a state with a small number of cities have a positive correlation number of existing transactions.

Next Development🚀

  • The analysis of the use of vouchers for each transaction.
  • The comparison of performance on delivery and ratings given by buyers.

GitHub Repository

Repository

Let’s Get Connect

Linkedin

--

--

No responses yet