It actually happened at work, where I had a table which contained almost 2 million records.
I had created an importer which read csv’s from Google Drive and saved the values from these files to records in the table. Which are displayed in a chart on the website.
The importer runs every two hours for an hour (via a cronjob) and had 33k files on Google Drive to go 1 month ago (and every 10 minutes, 3 new files are added), while I had 1 month of vacation to go so I went to check up on the website yesterday.
What I saw: the charts were incredibly slow (if the page did not throw an exception because it was allocating too much bytes) and the importer had still 27k files to go.
When running the importer manually, I found out that it took 2 minutes :clue: to save 1 file in the database.
So two problems came to my path: slow charts (front end, reading/SELECT) and slow importing (back end, writing/INSERT).
I had prioritized the charts, since that’s what our client sees. So yesterday I made an sqldump from the database on the server for my laptop and I updated a lot of queries for the charts (the client requested some quite complex formules), and added an index for one column. The charts have a much more acceptable loading time now.
So today I was going to do some refactoring since I updated all those queries (Before updating those queries I did a lot of hacking in PHP, long story), and since I am on my PC instead of my laptop today, I went to make an SQL dump again. Shockingly, I saw two files on the server:
- dump_18-8.sql 134 MB
- dump_19-8.sql 219 MB
I immediately executed the importer manually and yes, not even 15K of files were left (the importer removes them after they have been read) and it now takes ~2 sec to save one file to the database… :o
How is it possible that an insert-statement takes considerably less time (2 min. to 2 sec) after adding an index to the table? It seems the contrary is not the only truth?