November 24, 2023

Data Transformation with GPT-4 (Preppin Data, 2023, Week 1)

 

Environment:  ChatGPT Plus, GPT-4


Data:  Preppin Data, 2023, Week 1


Objective:  Transform data without prompting GPT-4 step-by-step



This video from the Information Lab tested whether ChatGPT can transform data.  However, the approach was not prompting GPT-4 to transform data step-by-step.  Rather the approach was uploading two files (one input and one output) and asking GPT-4 to produce an output file similar to the uploaded output file. 


Let’s test GPT-4 the same way and see how it performs.



1)  Output 1:  Upload 2 data files and prompt GPT-4 to produce output file without providing step-by-step instruction.




GPT-4 provided the steps how it transformed data and produced the output.





2)  Output 2:  Again upload 2 data files and prompt GPT-4 to produce output file without providing step-by-step instruction. 




3)  Output 3:  Finally, upload another 2 data files and prompt GPT-4 to produce output file without providing step-by-step instruction.




In all 3 scenarios, GPT-4 transformed data, provided the steps, and produced outputs.  Note that these data sets are considered simple.  The results are posted here.



Verdict:  For simple data sets, GPT-4 can transform data without step-by-step prompting. This is quite remarkable because GPT-4 is acting like human intelligence with cognitive ability to think through the process, transform the data, and provide the steps.


For complex data sets, GPT-4 cannot transform data without step-by-step prompting yet.  However, Sam Altman the CEO of OpenAI has recently said that the company has discovered an emergent new cognitive capability.  Perhaps with this new capability of critical thinking, the next version of ChatGPT will eventually be able to transform complex data sets.



November 17, 2023

Data Transformation with GPT-4 (Preppin Data, 2023, Week 31)

 

Environment:  ChatGPT Plus, GPT-4


Data:  Preppin Data, 2023, Week 31


Objective:  Use GPT-4 to transform data and fill in the missing IDs in file ee_dim_input.csv.



This data set was selected because it deals with HR data and HR data is almost alway complex.  The requirement is to fill in the missing IDs. The data set has 2 csv files:  (1) ee_dim_input.csv contains the list of employees and (2) ee_monthly_input.csv is a monthly snapshot of employees who worked during the month.



1)  Upload Data:  upload csv files.


The next 3 steps are for removing duplicated rows.



2)  Remove Duplicates:  remove duplicated rows with same employee_id in file ee_monthly_input.csv and use the output as a lookup table.  The output file is named ee_monthly_input_cleaned.csv





3)  Double-Check for Duplicates: double-check for duplicates for field ‘employee_id’ in file ee_monthly_input_cleaned.csv.  Result confirmed that there’s no duplicates.


Double-check for duplicates for field ‘guid’ in file ee_monthly_input_cleaned.csv.  Result confirmed there’s one duplicate.




4)  Remove Duplicates:  remove duplicated row.  The output is named ee_monthly_input_cleaned_updated.csv file.


The next 2 steps are for filling in the missing values in fields ‘employee_id’ and ‘guid’.



5)  Fill In Field ‘guid’:  link 2 files together by field ‘employee_id’ and fill in missing values for field ‘guid’.  The output is named ee_dim_input_updated_guid_linked_to_cleaned.csv file.






6)  Fill in Field ‘employee_id’:  link 2 files together by field ‘guid’ and fill in missing values for field ‘employee_id’.  The output is named ee_dim_input_updated_employee_id_linked_to_cleaned.csv file.  This file is the final result.







Verdict:  It took me more than 1 hour to transform this data set.  The challenge was knowing how to provide the proper prompts so GPT-4 would do what were needed.  The final result met the requirement.  


GPT-4 is remarkable as it can transform data just like Tableau Prep or Alteryx.  The billion-dollar question is to figure out how to integrate GPT-4 within the corporate IT systems so that it can connect to databases, work with millions of records, and refresh data on schedule.


November 10, 2023

Data Transformation with GPT-4 – Rank Percentile


Environment:  ChatGPT Plus, GPT-4.

 

Data:  Article counts dataset.

 

Objective:  Transform data to create different tiers with rank percentile.


In my post on 9/22/2023, I wrote the steps how to transform data to create different tiers using RANK_PERCENTILE function in Tableau Prep.  Let’s see if GPT-4 can do the same data transformation.


1) Upload Data:  Upload the article counts dataset.


2) Count Articles Based on Views:  Count number of articles grouped by number of views.






3) Rank Percentile:  Rank percentile based on average views.





4) Group Percentile Rank into Tiers:





5) Export Table to Excel File:  export table to Excel file for visualization in Tableau.  


Verdict:  GPT-4 can definitely transform data if it were provided with the correct prompts.  The new skill for data analytics professionals is to learn how to write proper prompts.  GTP-4 can be your powerful analytical assistant as it can analyze, visualize, and transform data.


November 3, 2023

Exploratory Data Analysis with GPT-4

 

Environment:  ChatGPT Plus, GPT-4 with Advanced Data Analysis

 

Data:  Superstore dataset.

 

Objective:  Run Exploratory Data Analysis with GPT-4.

 

In GPT-4, you can enter custom instructions so that the answers provided will be tailored to your role.  Click on your account name and select ‘Custom instructions’.  Enter instructions about your role/profession, responsibilities, and expertise. 




After uploading the Superstore dataset and prompting GPT-4 to analyze the data, the recommended steps include Data Cleaning, Exploratory Data Analysis (EDA), Advanced Analysis, Data Visualization, and Insight Generation.  Notice that these steps were specific and relevant to the role of data analyst that was entered in custom instructions above.





Stage 2:  Recommendations for Exploratory Data Analysis:



After prompting to proceed with time-based analysis, GPT-4 created this Monthly Sales Trends graph:




Stage 3:  Plan for Advanced Analysis




After prompting to proceed with forecasting, GPT-4 delivered this Sales Forecast graph:



This Customer Lifetime Value has some interesting insights.  The equivalent of this would be a bar chart that shows customers with highest sales in Tableau.




Stage 4:  Data Visualization



The KPI dashboard is rudimentary.  It's better to give specific instruction to GPT-4 to design a dashboard according to your need.


Stage 5:  Insights



Verdict:  When GPT-4 knows your role as data analyst, it offers a deeper Exploratory Data Analysis with many insights.  Your job is to select the relevant graphs and insights that you need and tailor your analysis in the direction you want.  GPT-4 can be your awesome assistant.