Crunch-time Begins Now For Mac

Posted on  by  admin
  1. Crunch-time Begins Now For Mac

Apple Computer Inc. Has been courting the life sciences market as a new business opportunity, given the Macintosh's operating system's Unix underpinnings and a wealth of. The combination of Apple's beautiful Aqua user interface with the FreeBSD base underpinning Mac OS X has made working with an Apple G4 a pleasure. And now comes the more powerful Power Mac G5.

As a life science researcher, I spend most of my time handling large amounts of data - hundreds of thousands of database records in data sets that can reach 2GB in size and take two weeks to generate on a dual-processor Linux server. It's my job to design the experiments, use the right software to address questions, write scripts to control software tools, generate data, write more scripts to process that data, then load and present the results in a form that other researchers can understand. Creating, parsing, compiling and sorting data takes time - and processor power. The more power you have, the faster you can work. That's why Apple's G5 represents an important step forward for anyone with compute-intensive work.

It's one intensely fast machine. Further reading: To do my work, I use a combination of Perl, the 's (NCBI) BLAST program and FileMaker Pro. Perl 5.6, a popular and flexible programming language, is installed with Apple's Developer Tools, obtainable from the Apple Developer Connection. BLAST, which stands for Basic Local Alignment Search Tool, is a publicly available tool kit of applications for searching and comparing DNA and protein sequences.

It's used extensively in the bio-IT field to match genes or DNA and protein sequences. The program is very processor-intensive (hence those two-week runs) and requires copious amounts of memory to handle those 2GB databases. I also use, a framework of modules that sits atop Perl and reduces the time needed to code various routines for handling life sciences data.

Using Perl and BioPerl, I can write routines to control research programs like BLAST, and parse the information it generates. Finally, is a great database tool for research groups. With personnel and time perennially short, simple solutions that require minimal effort are key. Less than an hour is needed to develop two elegant FileMaker Pro databases that can hold DNA sequence queries and their BLAST results. And it saves time when searching a database with almost 1 million records.

With that background, I put Apple's newest Power Mac G5, the dual 2-GHz model loaded with 2GB of SDRAM, to the test. I compared it with a late 2002 dual 1.25-GHz Power Mac G4 with 1GB of RAM. The model that has been discontinued has 2MB of Level 3 cache memory per processor and was the top-of-the-line model last fall.

The tests included compiling the NCBI tool kit using the GNU C (GCC) compiler. Both the G4 and G5 were equipped with Apple's developer tool kit and the December GCC update. Apple's X11 and X11 software development kit (SDK) were also installed, as they're needed by the NCBI tool kit. After configuring each build, I ran the 'make' utility to actually perform my compile.

On the dual 1.25-GHz G4, this took 12 minutes, 10 seconds. On the new G5, it took just 8 minutes and 10 seconds, or 32% less time.

Crunch-time Begins Now For Mac

That's an understandable, but still very nice, gain in speed. The real meat of the work went into using NCBI's BLAST. To search the databases, I constructed a file of 10 DNA sequences representing full-length genes from an organism commonly called the Acorn worm ( Saccoglossus kowalevski). Analyzing all sequences in one file, as opposed to each individually, means BLAST can load the database into memory just once at the start of the run. I compared both the two Power Macs with our newly compiled binary (blastgcc), as well as a G4 optimized binary (blastg4), from Apple. Using BLAST with a match size of 11 DNA bases against a nucleotide database that was nearly 2GB in size, I saw good performance: Using blastgcc, the G4 needed 26:32 for the test, the G5, just 17:45 - a 33% reduction in time. The blastg4 binary performed even better: The G4 was done in 17:12, while the dual G5 needed only 12:13 - a 29% reduction.

Increasing our match size to 30 bases, which indicates we're looking for more perfect comparisons, yielded modest speed gains. The G5 took about the same time to do the more complicated match, 17:41. Using blastg4, it took 17:06 on the G4, and a swift 10:33 on the G5. That's 38% less time than on the G4.

Another typical scenario involved comparing our sequences against the protein databases that were less than half the size of the first database. Here, I used BLAST to translate the DNA sequences six different ways (three forward-reading frames, three reverse) before making comparisons. Using the default match size of 3, blastgcc took 8:46 on the G4, 5:33 on the G5, or 36% less time. Interestingly, blastg4 didn't perform much better, coming in at 8 minutes even on the G4 and 5:15 on the G5.

On both machines, blastgcc and blastg4 pegged both CPUs at 100% for lengthy periods of time. The processors would dip for a breather between sets to between 10% and 20% for a few seconds, then jump back up to 100%. That didn't happen during the first round of tests. I suspect that each uptick represented the comparison and its subsequent write to disk. As it appeared that RAM was limiting both machines in our nucleotide tests, and since I wanted to compare both machines fairly, I added another 256MB RAM to the G4 (bumping it to 1.25GB). Then I ran the test using a smaller 700MB database that matched human genomic nucleotides and the human copies of the Acorn worm genes.

Here, the G5 screamed! With a match size of 11, blastgcc took 11:36 on the G4, but only 4 minutes on the G5, a whopping 65% reduction in time. The blastg4 took 7:55 on the G4 and 4:03 on the G5. Although the latter number was virtually the same as in the previous test, it was still about 50% faster than the G4. With the data in hand, I next parsed the relevant information - with Perl and BioPerl coming to the rescue. Using the BLAST parsing routines, I tested both machines' ability to analyze 8 million lines of data generated from BLASTing just over 7,000 DNA sequences. Each BLAST report involved loading data from disk, generating the data objects in memory, iterating through each object and writing information back to disk: In other words, I was measuring disk I/O, memory usage and CPU time.

The G4 took an hour and 10 minutes to plow through the data; the G5 needed 52 minutes - almost 20 minutes less - a 25% decrease in processing time. Interestingly, only one processor was used most of the time during the test, although the second would spike on occasion. My guess: Perl and its routines weren't taking advantage of the second processor. Finally, I had to manage all of the data I'd created with FileMaker Pro. I needed to import 100,000 such lines of data into a set of databases I had created while out on a field study. In an ideal situation, data can be imported into a clean database; there are no other indices to update and current block utilization to worry about.

Crunch-time Begins Now For Mac

Crunch-time begins now for mac

The G4 imported the data in 3:03. The G5 took 2:47, 8% less time. But in the real world, users will often need to update pre-existing databases.

To add to a database already containing more than 700,000 records, the G4 needed 8:30. For some reason, however, the G5 took longer: 10:51.

(I don't know whether this was a one-time anomaly, and time-constraints kept me from repeating the tests to dig deeper.) Then I tried the reverse. What if - in our real-world scenario - we had made a mistake and needed to remove all of those records? The G4 and G5 culled the date back out of our data set set in 18:58 and 18 minutes flat, respectively. In each case, on the G5, both processors were taxed at 30% to 60% of their capacity. The G4 exhibited similar behavior, though rarely did processor use drop below 40%. My final thoughts: Even when using software that hasn't yet been optimized for the G5, I saw significant speed gains, ranging from 25% to 65% for most of the various tests I performed. For short tasks, the speed gains might seem small.

But for large, complex and time-consuming tasks, the G5 could mean the difference between seeing results in one week rather than two. To me, this much is clear: With the introduction of the G5, lab scientists now have even more of a reason to put a Power Mac on their desk - especially given the operating system's Unix base, which makes running open-source scientific applications a snap. And with each passing day, more scientific software, both commercial and open-source, is being released that runs on Mac OS X. Personally, I can't wait. And now I have an even better excuse for a holiday present to myself. Bob Freeman has a Ph.D.

In virology and is a bioinformatics and technology development consultant at Boston-based Bioinfoworks. He has been using Macs since the Mac SE and programming before he took Algebra I. He currently assists a group of researchers with the bioinformatics of the Acorn worm and can be reached at.

In a highly classified location, a gang of brilliant, misfit grad students are interrogated by government operatives. The question: How did it all begin? The answer: With a last ditch effort to win back love. By way of an extremely dangerous, untested, lucid dreaming machine. Join FIRST to watch episodes early: » Get your Rooster Teeth merch: » Subscribe: About Rooster Teeth: Welcome to Rooster Teeth. We’re a production company in Austin, TX, making podcasts, animated shows, and live-action shorts and series.

We also make content on a bunch of gaming channels like Let's Play, Achievement Hunter, and Funhaus! More Rooster Teeth: » Achievement Hunter: » Let's Play: » Red vs. Blue: Crunch Time, Ep. 1 - The Beginning Rooster Teeth.

Coments are closed