Re: Tophat run on a 2-node cluster

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Rayson Ho <no email>
Date: Fri, 2 Aug 2013 18:06:04 -0400

Just found that Jacob already answered some of your questions... I
just wanted to add a few more things:

- for hardware configurations for each instance type, I found the most
detailed & easy to read info at the bottom of the page at:
http://aws.amazon.com/ec2/instance-types/

- if you submitted your application to Grid Engine, then qacct can
tell you more about the memory usage history.

- lastly, /mnt has over 400GB of free storage, so you can take
advantage of the free instance ephemeral storage.

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

On Fri, Aug 2, 2013 at 4:51 PM, Manuel J. Torres <mjtorres.phd_at_gmail.com> wrote:
> I am trying to run tophat software mapping ~38 Gb of RNA-seq reads in fastq
> format to a reference genome on a 2-node cluster with the following
> properties:
> NODE_IMAGE_ID = ami-999d49f0
> NODE_INSTANCE_TYPE = c1.xlarge
>
> Question: How many CPUs are there on this type of cluster?
>
> Here is a df -h listing of my cluster:
> root_at_master:~# df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/xvda1 9.9G 9.9G 0 100% /
> udev 3.4G 4.0K 3.4G 1% /dev
> tmpfs 1.4G 184K 1.4G 1% /run
> none 5.0M 0 5.0M 0% /run/lock
> none 3.5G 0 3.5G 0% /run/shm
> /dev/xvdb1 414G 199M 393G 1% /mnt
> /dev/xvdz 99G 96G 0 100% /home/large-data
> /dev/xvdy 20G 5.3G 14G 29% /home/genomic-data
>
> I created a third volume for the output that does not appear in this list
> but is listed in my config file and which I determined I can read and write
> to. I wrote the output files to this larger empty volume.
>
> I can't get tophat to run to completion. It appears to be generating
> truncated intermediate files. Here is the tophat output:
>
> [2013-08-01 17:34:19] Beginning TopHat run (v2.0.9)
> -----------------------------------------------
> [2013-08-01 17:34:19] Checking for Bowtie
> Bowtie version: 2.1.0.0
> [2013-08-01 17:34:21] Checking for Samtools
> Samtools version: 0.1.19.0
> [2013-08-01 17:34:21] Checking for Bowtie index files (genome)..
> [2013-08-01 17:34:21] Checking for reference FASTA file
> [2013-08-01 17:34:21] Generating SAM header for
> /home/genomic-data/data/Nemve1.allmasked
> format: fastq
> quality scale: phred33 (default)
> [2013-08-01 17:34:27] Reading known junctions from GTF file
> [2013-08-01 17:36:56] Preparing reads
> left reads: min. length=50, max. length=50, 165174922 kept reads
> (113024 discarded)
> [2013-08-01 18:24:07] Building transcriptome data files..
> [2013-08-01 18:26:43] Building Bowtie index from Nemve1.allmasked.fa
> [2013-08-01 18:29:01] Mapping left_kept_reads to transcriptome
> Nemve1.allmasked with Bowtie2
> [2013-08-02 07:34:40] Resuming TopHat pipeline with unmapped reads
> [bam_header_read] EOF marker is absent. The input is probably truncated.
> [bam_header_read] EOF marker is absent. The input is probably truncated.
> [2013-08-02 07:34:41] Mapping left_kept_reads.m2g_um to genome
> Nemve1.allmasked with Bowtie2
> [main_samview] truncated file.
> [main_samview] truncated file.
> [bam_header_read] EOF marker is absent. The input is probably truncated.
> [bam_header_read] invalid BAM binary header (this is not a BAM file).
> [main_samview] fail to read the header from
> "/home/results-data/top-results-8-01-2013/topout/tmp/left_kept_reads.m2g\
> _um_unmapped.bam".
> [2013-08-02 07:34:54] Retrieving sequences for splices
> [2013-08-02 07:35:16] Indexing splices
> Warning: Empty fasta file:
> '/home/results-data/top-results-8-01-2013/topout/tmp/segment_juncs.fa'
> Warning: All fasta inputs were empty
> Error: Encountered internal Bowtie 2 exception (#1)
> Command: /home/genomic-data/bin/bowtie2-2.1.0/bowtie2-build
> /home/results-data/top-results-8-01-2013/topout/tmp/segm\
> ent_juncs.fa
> /home/results-data/top-results-8-01-2013/topout/tmp/segment_juncs
> [FAILED]
> Error: Splice sequence indexing failed with err =1
>
> Questions:
>
> Am I running out of memory?
>
> How much RAM does the AMI have and can I make that larger?
>
> No matter what configuration starcluster I define, I can't seem to make my
> root directory larger that 10Gb and it appears to full.
>
> Can I make the root directory larger that 10GB?
>
> Thanks!
>
> --
> Manuel J Torres, PhD
> 219 Brannan Street Unit 6G
> San Francisco, CA 94107
> VOICE: 415-656-9548
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
Received on Fri Aug 02 2013 - 18:06:05 EDT

This message: [ Message body ]
Next message: Rayson Ho: "Re: Error Loading EBS Volume"
Previous message: Manuel J. Torres: "Error Loading EBS Volume"
In reply to: Manuel J. Torres: "Tophat run on a 2-node cluster"
Next in thread: Blanchette, Marco: "Re: Tophat run on a 2-node cluster"
Reply: Blanchette, Marco: "Re: Tophat run on a 2-node cluster"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Navigation

Re: Tophat run on a 2-node cluster

Search:

Sort all by:

Navigation