Package picard.sam

Class FastqToSam


  • @DocumentedFeature
    public class FastqToSam
    extends CommandLineProgram
    Converts a FASTQ file to an unaligned BAM or SAM file.

    Output read records will contain the original base calls and quality scores will be translated depending on the base quality score encoding: FastqSanger, FastqSolexa and FastqIllumina.

    There are also arguments to provide values for SAM header and read attributes that are not present in FASTQ (e.g see RG or SM below).

    Inputs

    One FASTQ file name for single-end or two for pair-end sequencing input data. These files might be in gzip compressed format (when file name is ending with ".gz").

    Alternatively, for larger inputs you can provide a collection of FASTQ files indexed by their name (see USE_SEQUENCIAL_FASTQ for details below).

    By default, this tool will try to guess the base quality score encoding. However you can indicate it explicitly using the QUALITY_FORMAT argument.

    Output

    A single unaligned BAM or SAM file. By default, the records are sorted by query (read) name.

    Usage examples

    Example 1:

    Single-end sequencing FASTQ file conversion. All reads are annotated as belonging to the "rg0013" read group that in turn is part of the sample "sample001".

     java -jar picard.jar FastqToSam \
          F1=input_reads.fastq \
          O=unaligned_reads.bam \
          SM=sample001 \
          RG=rg0013
     

    Example 2:

    Similar to example 1 above, but for paired-end sequencing.

     java -jar picard.jar FastqToSam \
          F1=forward_reads.fastq \
          F2=reverse_reads.fastq \
          O=unaligned_read_pairs.bam \
          SM=sample001 \
          RG=rg0013 
     
    • Field Detail

      • FASTQ

        @Argument(shortName="F1",
                  doc="Input fastq file (optionally gzipped) for single end data, or first read in paired end data.")
        public File FASTQ
      • FASTQ2

        @Argument(shortName="F2",
                  doc="Input fastq file (optionally gzipped) for the second read of paired end data.",
                  optional=true)
        public File FASTQ2
      • USE_SEQUENTIAL_FASTQS

        @Argument(doc="Use sequential fastq files with the suffix <prefix>_###.fastq or <prefix>_###.fastq.gz",
                  optional=true)
        public boolean USE_SEQUENTIAL_FASTQS
      • QUALITY_FORMAT

        @Argument(shortName="V",
                  doc="A value describing how the quality values are encoded in the input FASTQ file.  Either Solexa (phred scaling + 66), Illumina (phred scaling + 64) or Standard (phred scaling + 33).  If this value is not specified, the quality format will be detected automatically.",
                  optional=true)
        public htsjdk.samtools.util.FastqQualityFormat QUALITY_FORMAT
      • OUTPUT

        @Argument(doc="Output SAM/BAM file. ",
                  shortName="O")
        public File OUTPUT
      • READ_GROUP_NAME

        @Argument(shortName="RG",
                  doc="Read group name")
        public String READ_GROUP_NAME
      • SAMPLE_NAME

        @Argument(shortName="SM",
                  doc="Sample name to insert into the read group header")
        public String SAMPLE_NAME
      • LIBRARY_NAME

        @Argument(shortName="LB",
                  doc="The library name to place into the LB attribute in the read group header",
                  optional=true)
        public String LIBRARY_NAME
      • PLATFORM_UNIT

        @Argument(shortName="PU",
                  doc="The platform unit (often run_barcode.lane) to insert into the read group header",
                  optional=true)
        public String PLATFORM_UNIT
      • PLATFORM

        @Argument(shortName="PL",
                  doc="The platform type (e.g. illumina, solid) to insert into the read group header",
                  optional=true)
        public String PLATFORM
      • SEQUENCING_CENTER

        @Argument(shortName="CN",
                  doc="The sequencing center from which the data originated",
                  optional=true)
        public String SEQUENCING_CENTER
      • PREDICTED_INSERT_SIZE

        @Argument(shortName="PI",
                  doc="Predicted median insert size, to insert into the read group header",
                  optional=true)
        public Integer PREDICTED_INSERT_SIZE
      • PROGRAM_GROUP

        @Argument(shortName="PG",
                  doc="Program group to insert into the read group header.",
                  optional=true)
        public String PROGRAM_GROUP
      • PLATFORM_MODEL

        @Argument(shortName="PM",
                  doc="Platform model to insert into the group header (free-form text providing further details of the platform/technology used)",
                  optional=true)
        public String PLATFORM_MODEL
      • COMMENT

        @Argument(doc="Comment(s) to include in the merged output file\'s header.",
                  optional=true,
                  shortName="CO")
        public List<String> COMMENT
      • DESCRIPTION

        @Argument(shortName="DS",
                  doc="Inserted into the read group header",
                  optional=true)
        public String DESCRIPTION
      • RUN_DATE

        @Argument(shortName="DT",
                  doc="Date the run was produced, to insert into the read group header",
                  optional=true)
        public htsjdk.samtools.util.Iso8601Date RUN_DATE
      • SORT_ORDER

        @Argument(shortName="SO",
                  doc="The sort order for the output sam/bam file.")
        public htsjdk.samtools.SAMFileHeader.SortOrder SORT_ORDER
      • MIN_Q

        @Argument(doc="Minimum quality allowed in the input fastq.  An exception will be thrown if a quality is less than this value.")
        public int MIN_Q
      • MAX_Q

        @Argument(doc="Maximum quality allowed in the input fastq.  An exception will be thrown if a quality is greater than this value.")
        public int MAX_Q
      • STRIP_UNPAIRED_MATE_NUMBER

        @Deprecated
        @Argument(doc="Deprecated (No longer used). If true and this is an unpaired fastq any occurrence of \'/1\' or \'/2\' will be removed from the end of a read name.")
        public Boolean STRIP_UNPAIRED_MATE_NUMBER
        Deprecated.
      • ALLOW_AND_IGNORE_EMPTY_LINES

        @Argument(doc="Allow (and ignore) empty lines")
        public Boolean ALLOW_AND_IGNORE_EMPTY_LINES
    • Constructor Detail

      • FastqToSam

        public FastqToSam()
    • Method Detail

      • determineQualityFormat

        public static htsjdk.samtools.util.FastqQualityFormat determineQualityFormat​(htsjdk.samtools.fastq.FastqReader reader1,
                                                                                     htsjdk.samtools.fastq.FastqReader reader2,
                                                                                     htsjdk.samtools.util.FastqQualityFormat expectedQuality)
        Looks at fastq input(s) and attempts to determine the proper quality format Closes the reader(s) by side effect
        Parameters:
        reader1 - The first fastq input
        reader2 - The second fastq input, if necessary. To not use this input, set it to null
        expectedQuality - If provided, will be used for sanity checking. If left null, autodetection will occur
      • main

        public static void main​(String[] argv)
        Stock main method.
      • getSequentialFileList

        protected static List<File> getSequentialFileList​(File baseFastq)
        Get a list of FASTQs that are sequentially numbered based on the first (base) fastq. The files should be named: _001., _002., ..., _XYZ. The base files should be: _001. An example would be: RUNNAME_S8_L005_R1_001.fastq RUNNAME_S8_L005_R1_002.fastq RUNNAME_S8_L005_R1_003.fastq RUNNAME_S8_L005_R1_004.fastq where `baseFastq` is the first in that list.
      • doWork

        protected int doWork()
        Description copied from class: CommandLineProgram
        Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.
        Specified by:
        doWork in class CommandLineProgram
        Returns:
        program exit status.
      • makeItSo

        public void makeItSo​(htsjdk.samtools.fastq.FastqReader reader1,
                             htsjdk.samtools.fastq.FastqReader reader2,
                             htsjdk.samtools.SAMFileWriter writer)
        Handles the FastqToSam execution on the FastqReader(s). In some circumstances it might be useful to circumvent the command line based instantiation of this class, however note that there is no handholding or guardrails to running in this manner. It is the caller's responsibility to close the reader(s)
        Parameters:
        reader1 - The FastqReader for the first fastq file
        reader2 - The second FastqReader if applicable. Pass in null if only using a single reader
        writer - The SAMFileWriter where the new SAM file is written
      • doUnpaired

        protected int doUnpaired​(htsjdk.samtools.fastq.FastqReader freader,
                                 htsjdk.samtools.SAMFileWriter writer)
        Creates a simple SAM file from a single fastq file.
      • doPaired

        protected int doPaired​(htsjdk.samtools.fastq.FastqReader freader1,
                               htsjdk.samtools.fastq.FastqReader freader2,
                               htsjdk.samtools.SAMFileWriter writer)
        More complicated method that takes two fastq files and builds pairing information in the SAM.
      • createSamFileHeader

        public htsjdk.samtools.SAMFileHeader createSamFileHeader()
        Creates a simple header with the values provided on the command line.
      • customCommandLineValidation

        protected String[] customCommandLineValidation()
        Description copied from class: CommandLineProgram
        Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.
        Overrides:
        customCommandLineValidation in class CommandLineProgram
        Returns:
        null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.