ChatGPT Data Analysis and Cleanup Examples

In this guide, we'll explore how to harness the power of ChatGPT for data analysis tasks. ChatGPT is a versatile AI model that can assist in a wide range of natural language processing tasks, including text classification, analysis and more. We'll walk you through the process, step by step, and provide practical examples along the way.

 

Data Analysis

Data analysis is both difficult and time consuming. This is especially true in a business environment, where things can often move very slowly.

Most probably, the actual job of data analysis isn't going anywhere soon. LLMs and ChatGPT are great new tools that can be used really by anyone. Below is a screenshot of what's possible in the free version of ChatGPT today. Please note that in the newer versions of GPT-4 and up, it is possible to upload PDF files and even Excel.

ChatGPT basic data analysis

 

What is Data Analysis?

Data analysis is the systematic process of examining, cleaning, interpreting, and modeling data to uncover meaningful insights and inform decision-making. It involves collecting, processing, and visualizing data to extract valuable information and patterns, making it a cornerstone of informed actions and discoveries.

 

How to do Data Analysis in ChatGPT?

Using the free version of ChatGPT, the data can simply be pasted into the prompt, and questions can be asked. 
The prompt format goes as follows:

  • Provide context for the data - what is it about?
  • Provide the data itself, in the cleanest possible way so ChatGPT can parse it.
  • Ask questions about the data.

 

Example Data Analysis Prompt in ChatGPT?

Please see the sample prompt below as an example for a data analysis request for ChatGPT:


We are given the following data table of sales during the past 7 days:
weekday, product sold, customer city, sale price
Sunday, Widget1, New York, $100
Sunday, Widget1, San Francisco, $100
Monday, Widget1, New York, $100
Monday, Widget1, San Francisco, $100
Tuesday, Widget1, New York, $100
Tuesday, Widget1, San Francisco, $100
Wednesday, Widget2, New York, $200
Wednesday, Widget1, New York, $100
Wednesday, Widget1, San Francisco, $100
Thursday, Widget1, New York, $100
Friday, Widget2, New York, $200
Saturday, Widget1, New York, $100
Saturday, Widget2, New York, $200
Which products have the highest demand?
 
 
> Based on the provided sales data, it appears that Widget1 and Widget2 are the products being sold. To determine which product has the highest demand, we can analyze the frequency of each product sold:
Widget1 was sold on Sunday, Monday, Tuesday, Wednesday, Thursday, and Saturday.
Widget2 was sold on Wednesday and Friday.
From this analysis, Widget1 appears to have the highest demand since it was sold on more days compared to Widget2.

 

Data Cleanup

Clean and complete data is a major problem in the "real world".
Here is a typical scenario: business might ask their marketing department to create a landing page with a registration of some sort (maybe for a raffle or giveaway). Marketing in turn might copy an existing template, change some content and push it out to their leads. Not too much thought is placed on the data collected, because... deadline.

And some data gets collected over the following weeks...

Eventually this new list of customers needs to be used, with all the flawed data that was collected. What could some of these flaws possibly be, you ask?

  • People mistyped their information
  • Project scope changed a bit, now you need more data
  • Some obvious data was just missed

 

Now, we have a "dirty" set of data, which could take a lot of time and effort to clean up or expand, and this is a real world problem where someone has to manually make adjustments.

Please see the screenshot below, of how you can take a simple line of data and actually Generate additional data for it:

ChatGPT basic data collection and generation

 

Using ChatGPT to generate user data?

Obviously, there are limits to how much data can be generated. For example there's no way to generate someone's birthday just using their name and address (or is there...?).

However, generating basic knowledge about a customer, by infering from already known information is work wonders if that's all you need! 

 

Correcting User Inputs with ChatGPT

We have done a lot of tests on ChatGPT to clean up lists of data.  By clean up, we want it to be fixed up for obvious mistypes or misspellings.  The results are just stunning, and resemble what a person my do manually!

Below is a screenshot, of how ChatGPT can automatically correct email addresses. Notice how the results is actually very good, given simple prompt to clean the email list. Now, imagine building an API connection to ChatGPT, and sending all emails to be processed like this!

ChatGPT basic data analysis

 

Generating New User Data with ChatGPT

ChatGPT makes it possible to generate new content based on publicly known information and some current starting point to expand on. In the example below, we request additional address details based on an incomplete address provided by the user:


What is the zip code, telephone area code, country ISO code, for the following address: "1600 Amphitheatre Parkway in Mountain View, California"?

>
>The information for the address "1600 Amphitheatre Parkway, Mountain View, California" is as follows: ZIP Code: 94043 Telephone Area Code: 650 Country ISO Code: US (United States)

 

Conclusion

LLMs and ChatGPT in particular have now proven themselves to be capable of many tasks, which would be very time consuming for a mere human to accomplish.
Tapping into these new tools and capabilities deserves a good look, especially now in a world where productivity is paramount. With the above article and provided examples, you should be on a good way to improve your data today.

We hope the above tips were helpful, and here are some more tips on designing good prompts.




The fields marked with * are required.

I have read the privacy policy.