Show HN: Sourcetable – AI Spreadsheet and Data Platform

91 points by mceoin 17 hours ago | 59 comments

Hi HN! I’m Eoin, founder of Sourcetable (https://sourcetable.com).

Sourcetable is an AI-native spreadsheet that syncs with all your data. Users pair with an AI copilot that helps them do their spreadsheet work, as well as more database-centric analysis and SQL.

Soucetable syncs with databases including Postgres, MySQL, and MongoDB, and over 100+ business applications including Stripe, Zendesk, Hubspot, Quickbooks and Google Analytics. That data is available in a spreadsheet, and any models you build automatically update in near-real-time as new data flows in. The core primitives are AI + spreadsheet + data sync + storage + compute.

If you want to play with Sourcetable today, the easiest way is to upload a CSV and start asking questions.

Who is it for? Sourcetable is for analysts, operators and finance folk doing data-centric work in a spreadsheet. Sourcetable’s spreadsheet-based AI assistant understands workbook range selection and can adjust scope context to the datasets you are working with. You can talk directly to your database and SaaS integrations, which is great for analysis, data search and retrieval, SQL writing & editing (including writing joins across different datasets), and automatic chart creation.

Niching down, if you work in operations at a <50 person startup or SMB and your company relies on a Postgres or MySQL database, Sourcetable is an affordable reporting tool with turnkey data infrastructure that doesn’t require code or engineers to set up.

Spreadsheets are the most used analytical tool on the planet. AI is a platform shift with broad applications. We are staying open-minded about users and use cases since everything is so new.

Backstory: I spent ten years working in de-facto operations and technical roles at startups. Sourcetable draws from that experience of needing better data tooling inside spreadsheets, and constantly hacking ad hoc solutions to fill the gap. Andrew (CTO / co-founder) previously had a deep learning company and was initially drawn to the idea that Sourcetable could be an operating system for the web. We’re both Aussie expats in the Bay Area, which is how we met. Internally, we think of Sourcetable as an application platform, with AI applications being a useful and interesting place to focus.

Features & Use Cases: Talk to your CSV files, spreadsheets, integrations, and datasets using LLMs. AI + data work: Text-to-SQL, search and retrieval from databases, LLM-based data analysis. (This is an entirely different experience to what Copilot/Gemini & Excel/Sheets provide, since they are thin workbooks and not data platforms.) AI + spreadsheet work: formula assist, workbook analysis, data cleaning, chart creation, error handling, summarization, chat, etc. Automated reporting: data is synced, reports you build stay up to date. No-code data access: give the business team safe database access so they will leave you alone! Centralizing data for cross-channel reporting. (e.g. Postgres + Stripe + Mailchimp) Analyzing large CSV files: Sourcetable can handle multi-gigabit files. (Google Sheets can’t handle large data and the experience in Excel is rather cumbersome.)

Technical Details: Sourcetable was built to be fast. It was also built to scale.

AI: LLama 3 (via Groq), Claude, GPT-4o, LiteLLM, custom LLMs

Frontend: DuckDB, React, ShadCN, AntV / Bizcharts, Plotly, CodeMirror, Hookstate

Backend: DuckDB, Python, Cassandra, Redis, NGINX, Cloudflare

Data Eng & Transformations: Fivetran, DBT, Apache Arrow, SQLglot

Distributed Computing & Scaling: Daft, Ray, Cloud Formation

Other: Linux Namespaces, Dill (U.Queensland)

A huge thank you to the open source community, and a special shout-out to DuckDB for being so damn fast. Thank you also to Groq & Anthropic for the rate limit increases in time for this ShowHN post!

Feedback: Product feedback is welcome! eoin@sourcetable.com

primitivesuave 4 hours ago | next |

This is incredible. I uploaded a CSV with ~6000 rows containing campaign finance data for a particularly corrupt local politician and asked "what was the total contributed amount in [year]". Not only did it produce the correct answer (in around the same amount of time it took me to calculate it on my end) but it also seemed to understand that the spreadsheet was related to campaign finance in the "summary" portion of the response.

The most useful aspect was that I could ask "what was the total contributed amount between January and June of 2020" and get an accurate answer for that as well. Since the date column is provided as an "MM/DD/YYYY" string, I would normally have to do some boilerplate work to sanitize this.

For my particular use case, the charting aspect left a few things to be desired - once I grouped campaign donations by contributor, I could only see the first 10 rows in the AI response, with no option to expand the output. But overall I was truly blown away that something like this is even possible for a small team to build.

mceoin 3 hours ago | root | parent |

> For my particular use case, the charting aspect left a few things to be desired - once I grouped campaign donations by contributor, I could only see the first 10 rows in the AI response, with no option to expand the output.

Insert it as a table on the page (you should see a button), it will then print the whole table result from that query into the spreadsheet. Also, you can check the SQL first and validate it, then print to table after that.

Try a few million rows and see what happens!

dioptre 3 hours ago | root | parent |

Also keep an eye out on the limit - we default to 10,000 to keep it snappy but if you want to make it larger its a click away. The "summarize table" button should auto limit to 1B+ rows.

mmckelvy 5 hours ago | prev | next |

Interesting. I think you're on to something here. I fully agree that a combination of spreadsheets and SQL are the ideal tools for data analysis -- not a SaaS GUI.

> Niching down, if you work in operations at a <50 person startup or SMB and your company relies on a Postgres or MySQL database, Sourcetable is an affordable reporting tool with turnkey data infrastructure that doesn’t require code or engineers to set up.

With the rise of AI, companies like Tembo that help you set up all in one databases, and tools like this, I'm increasingly of the mind that many companies should start bringing things like analytics and observability in-house. I don't see the need to pay Mixpanel or Datadog thousands of dollars per month when a self-serve solution that relies on tried and true tech is more or less at your fingertips.

mceoin 5 hours ago | root | parent | next |

Agree. A general thesis I have is that the API-ification of the web fragmented business information, and with every new SaaS tool we fragment our company's data further. The trend at all company sizes is to be increasingly analytical, but for SMBs it's too hard to get access to your data (mainly due to technical limitations). So it makes sense to centralize data somewhere, and we think that somewhere is inside the data tool that everyone actually uses: the spreadsheet.

Many other advantages of this data centralization too. Data + spreadsheets + compute is a nice application base for agents.

threeseed 5 hours ago | root | parent |

> So it makes sense to centralize data somewhere

Modelling and integrating datasets that you don't own is extremely hard.

Shopify for example updates their API every 3 months.

How much time and money do you think an SMB can afford to spend on this before the ROI becomes so poor that they abandon it entirely.

mceoin 4 hours ago | root | parent | next |

Yes some integrations are excellent (hey Stripe : ), some are terrible (no comment on who). We're finding that LLMs increasingly able to fill the gap around organizing data schema for that initial data prep piece where someone has to build the data tables that others consume. To your specific question/problem set, when a schema updates you end up with a "fuzzy schema matching problem"; we are solving that separately anyways for a separate product feature requirement.

Strong note here that the current state of technology is much better for SMB scale data and not enterprise scale data with messy schemas.

mceoin 4 hours ago | root | parent | prev |

There is a separate answer here which is many (most?) SMBs can't afford technical folk, so the ability integrate data at all, talk to it and model it (using SQL or AI), is already a big step forward for them.

My personal use case tends to involve a lot of Postgres data and transaction events for my reporting. We see "simple" businesses like parts manufacturers, print shops, vineyards, etc. all doing something similar.

threeseed 5 hours ago | root | parent | prev |

Minus the AI part tools like this have existed for decades.

And companies are not dumping their SaaS tools and switching to them en masse.

Because (a) data silos have dramatically increased pushing dreams of a unified data schema out of reach, (b) technology stacks have become far more complex necessitating tools like Datadog and (c) competition is stronger than ever meaning that skimping on paying for tools like MixPanel is often short sighted and counter productive.

Companies like this will do fine and there will be always be a demand for them especially in the SMB space. But there simply isn't the business value in bringing a lot of analytics and observability in-house in almost all cases.

mmckelvy 4 hours ago | root | parent |

Not yet. But in the analytics case, suppose you could build a tool that collected data on your own infrastructure, allowed you to write plain SQL against a PostgreSQL database to get whatever analytics data you need, had an AI-driven text-to-SQL option so non-technical users could get whatever analytics data _they_ need, and output everything to a universal interface, i.e. a spreadsheet? No vendor flavored DSL, GUI, or workflows to learn. That product would be tough to beat. It wasn't built in the past because it was hard. But with AI and something like Tembo or Timescale, is it actually hard anymore?

aerosmile 6 hours ago | prev | next |

It’s amazing that Microsoft - given their focus on AI and decades of experience in spreadsheets - doesn’t offer this type of functionality. Corporate bureaucracy vs startup agility!

mceoin 6 hours ago | root | parent |

At risk of poking the bear, they should have done this decades ago. Except for LLMs they have had everything they needed to bundle this stack into a single product solution; this would be much better for users.

And yes! We're definitely of the opinion that as a startup we can outcompete the two trillion-dollar death stars when it comes to product experience. AI is a platform shift!

longstaff2009 3 hours ago | prev | next |

Thats a spicy example dataset!

I like that it's able to infer information from the context of the cells, e.g. being able to run a query across continents when the data only contains the country.

Being able to ask it to interpret the results is helpful, it would be cool if it automatically told you if there was enough data to have statistical significance in the conclusions it was presenting.

mceoin 3 hours ago | root | parent |

You may see that we try to suggest follow-up questions or question improvements where we think better context-in will result in a better result-out.

Curious what will happen if you modify the question to be more explicit?

I have seen that PMs and data-trained folk tend to be very articulate in asking for exactly what they want and that tends to lead to significantly better LLM responses.

Brajeshwar 4 hours ago | prev | next |

You might want to check who is blacklisting you and request to unblock. AdGuard blocked sourcetable.com as "Scam".

https://www.dropbox.com/scl/fi/np92pyo0eb0zphysc9wwz/screens...

sim_123 12 hours ago | prev | next |

This is amazing. I’ve been scouting for such a solution as we’ve outgrown excel. Giving it a spin

mceoin 12 hours ago | root | parent |

A very common use case we see is SMBs having outgrown their spreadsheet but not wanting to move to a full-blown BI tool. They want the power, but not the change in interface/medium.

I didn't go into details above but a nice thing is that we leverage cloud compute and storage, so you can query billion-row data in sub-second time. (Courtesy of Duck!)

yawnxyz 13 hours ago | prev | next |

> Niching down, if you work in operations at a <50 person startup or SMB and your company relies on a Postgres or MySQL database, Sourcetable is an affordable reporting tool with turnkey data infrastructure that doesn’t require code or engineers to set up.

I'm already using Retool for these kinds of tasks- what does sourcetable do that I can't already do with Retool?

edit: also, did you build your own spreadsheet engine, or use an off-the-shelf one? (also will it be open source ;P)

mceoin 13 hours ago | root | parent | next |

Category Comparison (table-based solutions): "How are you different than Retool/Airtable/Coda/Notion/Zapier Tables, etc."

The primary difference vs table-based solutions is that Sourcetable is a spreadsheet in the common sense of the word, similar to Excel and Sheets. We have A1 notation and cell-based referencing. This is what most users expect, and this flexibility/familiarity has a big impact on the breadth of users and use cases within a team.

The formula referencing system of these table-based solutions is usually very limited both to columns/rows (not cells), and is a set of SQL-based queries which are much more limited than that 500+ formulas and functions spreadsheet users commonly expect.

Retool specifically: I tend to think of Retool as a lightweight custom-ERP software system, whereas Sourcetable more like Excel + PowerBI + Data Warehouse, so we will generally be much stronger for reporting and analysis. We definitely have some overlap in potential users since technical operators should like us both. FWIW - Retool is an excellent product.

dioptre 12 hours ago | root | parent | prev |

Hi I'm Andy, Cofounder & CTO @ Sourcetable.

We use a heavily modified licensed engine that prevents us from open sourcing everything (for now). We have plans to open source our agentic/plugin framework, and other parts of the system. We also have a strong ethos of contributing back to open source where we can (contributed back to Arrow, DuckDB etc.).

I'd also add that while everyone knows how to use and work with spreadsheets, we also provide a SQL layer on top that you can use to query data sources as an advanced user (we developed a nomenclature to work within sheets/across sheets/files/our data-warehouse). This allows more technical users to work side-by-side in the same environment as non-technical users without crossing pythonic or reporting boundaries.

On top of this, the AI assistant can answer most of the questions you might have of all this data.

I think as ML gets more sophisticated, we will in general need to be less technical. The "tooling" might even disappear, but we will still need something to communicate important data centric decisions. Whether you like it or not spreadsheets are the foundation of human research and operations and have been for thousands of years, and I feel humanity will need less complicated "tools" and we will keep to our roots.

alooPotato 6 hours ago | prev | next |

Cool.

How did you build so many integrations so fast?

Selfishly, would love to see Streak (CRM) integration as well.

mceoin 6 hours ago | root | parent |

Mostly Fivetran, a little Airbyte, and a few custom integrations. Would love to add Streak (can you get it into Fivetran? We can usually crank those integrations out within an hour.)

djbiggs 6 hours ago | prev | next |

Awesome, have you got any mining specific worked examples or spatial examples? Thinking about lidar point clouds and running deltas for stock pile management. Looking at building a new mine and typically there at any mine site there are excel macros which might take an hour to run embeded in the operations. Often developed by older engineers, who will default to excel. Any suggestions on how best drive technical user adoption (asides from dropping it on the kids in the engineering deparments, can't wait that long) ?

dioptre 5 hours ago | root | parent | next |

The underlying datatypes we support in our data-warehouse support 3d and 4d data. So we can do vector queries on these and do transformations over different spaces. I think given what you need we can put your data in our data-warehouse, and then present it to the older engineers in an excel format with 3d plotting. We might want to chat about the details though, give me a holler at andrew@sourcetable.com

mceoin 5 hours ago | root | parent | prev |

Yes actually! My cousin is a mining engineer so I spent a bunch of time playing around with mining data during testing. Turns out all New South Wales government data is public. Right now you can talk to any CSV or database using LLMs. I've also played around with a bunch of marine biology datasets too!

(p.s. I think Andrew, CTO, is going to jump in here as he has more experience in this space.)

mceoin 5 hours ago | root | parent |

Can you email me -- eoin@sourcetable.com -- more about the Excel macros? This might be easy to help you out with agents. A lot of compute-intensive stuff that takes ages in Excel is nearly instant in Sourcetable because we are leveraging cloud compute, but it really depends on your use case.

smcleod 4 hours ago | prev | next |

Are you open sourcing the product for non-commercial use?

mceoin 4 hours ago | root | parent |

Would love to but unfortunately there are pieces we can't open-source for various reasons. We'll open source bits and pieces over time, and generally are excited to start blogging about AI & technical learnings now that the product is out of stealth mode.

Small plug for the analytics tracker we are using which Andrew (CTO) built and is open source: https://github.com/sfproductlabs/tracker

HeralFacker 5 hours ago | prev | next |

What external checks are included to verify the chatbot output?

SoulAuctioneer 5 hours ago | root | parent |

Wherever possible, the chatbot output is deterministic, in that to answer a query, we're realtime generating and running code or SQL against your data. Our LLM orchestrates that, and finally evaluates whether the output correctly and adequately answers the question.

We also extensively use synthetic data and examples to guide and constrain our models.

Another way we're ensuring good-quality output is to ensure good-quality _input_ -- by enriching the detail and specificity of the user's question, and asking the user to disambiguate when we determine the question is too broad.

escot 13 hours ago | prev | next |

Very cool. It would be great to have auto complete across cells.

mceoin 13 hours ago | root | parent |

Yes we don't yet have the full auto-suggest magic that Sheets offers, but you can click-drag for auto-complete the same way Excel offers.

We released Sourcetable today with the AI chatbot & AI data analysis features, but a very limited cell-based AI (only "summarize" and "fix formula"). We'll be releasing a big AI-based magic-autofill solution in the coming weeks.

_hfqa 11 hours ago | prev | next |

Congrats on the launch! It’s wild to see AI stepping into spreadsheets like this. Pretty soon there won’t be a part of our workflow AI hasn’t touched.

mceoin 11 hours ago | root | parent |

Thanks _hfqa! We think there's massive potential here. It's a big platform shift, and spreadsheets weren't really impacted by the mobile or cloud compute waves, so it's a space long-overdue for disruption. (The last shift was back when Google Sheets took spreadsheets to the browser 17 years ago!!)

halfcat 4 hours ago | prev | next |

I always wonder where these spreadsheet/database apps will land. Usually it falls flat for one of a few reasons I’ve observed:

- Fundamental gap in skillset, in that if you want to have ultimate flexibility to slice and dice the data and report on whatever you’re seeking, you’ve ultimately needed SQL skills in the past (which isn’t rocket science, but also isn’t something most accounting users can run with on their own).

- Fundamental desire of users to work with unstructured data. This goes back at least as far as Excel vs Lotus Improv in the early 90’s. Joel Spolsky talked about this, how they were terrified that Lotus Improv was going to kill Excel, because Improv was built to work with structured data, which users could then query and ask questions of to get any answer they want. But it turned out, as they observed people using both apps, there were zero users that used 100% normalized, structure data.

- Imperfect translation between spreadsheet and database. I’ve seen these work well 99.9% of the time, but at some point a column gets added or something that throws off formulas. And 0.1% error is basically catastrophic in accounting.

Maybe LLMs help overcome these challenges. Wish you luck.

SoulAuctioneer 3 hours ago | root | parent | next |

Agree with you, and we're definitely trying to thread the needle!

We're generating the SQL to answer natural language questions, so folks can just get answers and results tables if that's all they need, with the option for power users to fiddle with the SQL either directly or via a query editor GUI.

There's a ton of use cases for working with unstructured and semi-structured data and that's coming down the pipe!

mceoin 3 hours ago | root | parent | prev |

This is 100% the correct insight in my experience.

TL;DR, most technical people massively overestimate the technical / data abilities of regular spreadsheet users. We find simple use cases are best, and with each new LLM release the UX around more complex data improves significantly.

The reason we chose to build as a full-blown spreadsheet instead of just a table-based solution was that we saw that most people want the flexibility of a regular spreadsheet, but access to their (structured) business data. Table-based solutions wedge you into AI and you can never get out of that.