Wednesday, March 17, 2010

Power of UNIX

UNIX/Linux is a very powerful operating system with a lot of simple commands for small tasks. Power of UNIX lies in its ability to handle very complex, huge data requirements which would bring other operating systems to a screeching halt.


For example from my personal experience, simple utilities like grep, gzip, gunzip, split are able to handle data varying from 1 KB to GBs with the same elegance and accuracy.

Say we have a file of size around 36 GB.

Follwing operations are performed on that file:
1. gzip   : size comes down to 4.2 GB.
2. gunzip : size grows back to 36 GB.
3. grep   : Search a pattern in this file and create a new file of size around 32 GBs => happens in around half an hour!
4. gzip   : compress it back and file size becomes 4.1 GB.

Alternately, you can use split command between step 2 and 3 to split the big file in no. of small chunks and then perform the remaining operations.



The operations we performed above are quite simple. However, the huge data set on which they are applied is what makes the problem hard. That's where power of UNIX comes...




No comments: