The Challenge
Lakeshore Software was analyzing a large data set. Two huge data sets as a matter of fact. The size (over 10 GB’s each) made it impractical to even attempt the analysis on a high-powered PC.
Our Solution
We decided to implement a network of inexpensive, diskless PCs running Linux. By splitting one of the data sets across five machines, that entire data set could be stored in RAM. One central computer (another PC) managed a) the process of splitting data set #1 into 5 subsets, b) passing subsets of data set #2 to the Linux machines and c) combining the results.
In addition to the network, Lakeshore Software created a flexible software framework that allowed any type of data to be passed between the central computer and the Linux machines, meaning that changes can quickly be made to the data sets or analysis algorithms.
The speed improvement was over 5 times for one Linux machine and an overall improvement of more than 25 times using five Linux machines. All at a total cost of less than $2000. Furthermore, the network is scalable, with expansion capabilities of up to 40 machines.
In addition to the network, Lakeshore Software created a flexible software framework that allowed any type of data to be passed between the central computer and the Linux machines, meaning that changes can quickly be made to the data sets or analysis algorithms.
The speed improvement was over 5 times for one Linux machine and an overall improvement of more than 25 times using five Linux machines. All at a total cost of less than $2000. Furthermore, the network is scalable, with expansion capabilities of up to 40 machines.