My experience with data visualization started with PowerBI. Now it is great and all with all the charting options, Drill Through and interactive chart UI elements among many other complex charting options. I understand that this is industry software but the steep learning curve and counterintuitive UI doesn’t make it that approachable. In simple words, its not fun. I put this problem down in my personal Book of Ideas for an implementation I could work on. As a React developer, I was satisfied working with it professionally but wanted to do a proper project of my own. Got a little familiar with NextJS, the React framework. I knew what I was going to make: my AI-powered data visualization tool, DataGlass.
Deciding The Stack
Frontend: NextJS was my choice for this project as I wanted to understand the framework on the go while developing with it. The App Router is really cool, with how you can define paths on your site using the file based hierarchy you define.
Backend: Talking further on stuff I mentioned about NextJS, it is a great framework for implementing both frontend and backend. You can make API endpoints using the App Router. Goes without saying my backend programming will be done using JavaScript.
The Work Begins
I explored shadcn/ui as I had heard a lot about it in the React community and its ease of use with tailwind-css in making readymade user-friendly UI. It was a breeze making the main page layout. I had thought of making the whole app on one page. So the main page consisted of a dropzone for CSV files to be uploaded. In response to that these components are rendered,
- A table would be rendered for perusing the data.
- Statscard: A Card component where you can see stats for each column. For each column, I had an boolean array that computes whether the column is numerical or categorical. If numerical, the card should tell me mean, median, mode and standard deviation. For categorical columns, only mode will be shown.
- The data’s headers plus the metadata of columns (more on this later), is sent to OpenAI API, get recommendations for charts in a JSON object, and render each kind of chart accordingly.
- And of course, as I mentioned I made components for each kind of chart to render. Namely, Bar, Area, Line, Scatter, Bubble and Pie Chart. The chart cards are pushed into a Grid component. Initially, every card had the same size. Later on I changed the cards size to be bigger according to their type. For example, Line Chart takes the twice the space of a normal card, for the user to be able to observe the trend better.
I chose a package called Papaparse, for parsing CSV data for use in my JavaScript code. For making the charts, I chose Chart.js for now. Also, using the NextThemeProvider, made a Dark Mode toggle option, cuz why not ?
The First Hurdle
On the surface, the idea was: use AI, give it data and it will generate insights. It would be simple: send data from a CSV file to an OpenAI GPT model through their API, get chart recommendations in a JSON format and based on each chart type and attributes on both axes it tells, I will render certain Chart components. Bar chart, area chart etc. Simple right ? Right ?
Wrong.
Initially, I was naive to think I could pass the entire data to the AI. However, its not cost-friendly to pass the entire data as the number of tokens input and being processed by the LLM is A LOT. Even with my choice of using gpt-4o-mini, it is just not economical on a big scale of usage.
My first thought was to only pass the header names and metadata on the columns being numerical or categorical, thinking it was enough context for the LLM to reach a conclusion on what kind of charts should be rendered. But what if a CSV file with several attributes is passed to it ? Still not economical.
My second solution, was to drop the AI usage for now. My dashboard will not be AI-powered for now. But it is fine. There are data science methods by packages available in Python, that can give us adequate insights on how to go about it.
It was a pain but I had to switch to Python for the backend. Using the FastAPI framework, which was really intuitive and I got used to it real fast, due to another framework I had experience with Flask. Why did I not choose Flask ? Just to learn a new technology. For the love of the game.
On the backend, I judged factors like frequency of values in a column, if they are numerical or categorical columns among others to make a JSON object. Now isn’t that economical ? I released this as v0.1.0 on my GitHub as a milestone in the development of my awesome tool.
Deleting Charts from The Grid
In my vision for the app, I had left most of the stuff to be done by the LLM. It was time to bring some user interactivity. Like being able to add and delete your own charts. For the delete part, I created a deleting function on the Dashboard page that would be passed on to each generated chart, and that chart would use that function in the Delete buttons associated with them. It was working wonderfully. I had a bug where after deleting all charts from the grid, it would regenerate all of them again by querying to the API, but it got fixed real quick after I noticed the dependency of the grid on the emptiness of the dashboard denoted by a boolean value in a useEffect hook.
Changes Planned
- In the sidebar, I am planning to add a panel to add your own charts on the grid.
- Also I am thinking of making a prompt input field that will take text input from user and based on it, append more charts to the grid.
- For charts like Line Charts displaying huge data, it can sometimes look like a lot of points on the chart are mish-mashed with each other making it hard to make sense of. If only we could zoom in on a point by scrolling, to know the trend around that data point………Hmmmmmmmmmmmmm……………….
See you next time for another DataGlass devlog. Check out my project at: https://github.com/AtomicRogue1/DataGlass and some pics of my app on my LinkedIn: https://www.linkedin.com/in/yash-verma-49b256196/