Chris Tankersley

PHP 5.2 and MySQL 5 vs SQLQUERY and IBM DBU on V5R4

Posted on 2010-12-19

This is completely not at all scientific, but at work we needed to do what amounted to a giant SQL query to generate a stats file. The query itself was against two tables - one was 200,000 rows long and the other was 4.6 million rows long. We needed to add up many of the columns to get a row-by-row analysis on the individual pieces of a product, per customer.

The iSeries is a single CPU Power5 CPU running at 2.7ghz (I think), with 8gb of RAM and about 100gb of free harddrive space. We first ran the query on the iSeries using SQL. Within about 20 minutes we had eaten up 50gigs worth of hard drive space on the black box and had capitalized the CPU. That posed a problem as that meant we had used about 95% of the available harddrive space on the box, and the iSeries tends to shut down when that happens. The query was eventually killed.

Since the analysis was just a giant SQL query, I suggested moving it to one of the MySQL servers. It was an Ubuntu 8.04 box with half a gig of ram, 34gig of free hard drive, and a dual-core Xeon at 2.5ghz, all inside VMWare ESXi that is also running about 7 other VMs. Its actually one of our smaller virtual servers, but I figured we could sacrifice raw power for time.

We copied the tables to CSV, created the tables on the MySQL server, and then ran a PHP script to import the CSV files. All-in-all it took about 6 hours between beginning and end as giant 7gig files were shunted around the network and imported. I took the original query and modified it a bit for the joins, added an outfile to save the results, and set it to run.

I watched it for a bit. The CPU on the box never went about 75% on the query, though we had hit 100% during the import of the 4.6 million row database, and memory usage was never really above 20%. For the size of the tables, I was actually surprised. I let it just run.

In total, the first run took about 3.5 hours to run. I then learned that outfiles need an absolute path, not relative. Oh well. I ran it again and this time it ran in 2 hours, 48 minutes, which included actually generating the file. The file itself was almost 3gigs and 10 million lines long.

Not too shabby I'd say. The younger DB server on a small virtual machine did much better on resources than the more mature OS and hardware.


Comments