Specifications¶
Fastq sequence description¶
Fields in fastq description:
Key |
Description |
---|---|
|
Instrument ID |
|
Run number on instrument. |
|
Flowcell Identifier |
|
Flowcell IDS |
|
Lane number |
|
Tile number |
|
Position X of cluster |
|
Position Y of cluster |
|
Optional, appears when UMI is specified in sample sheet. UMI sequences for Read 1 and Read 2, seperated by a plus [+] |
|
Read number - 1 can be single read or Read 2 of paired-end |
|
Y if the read is filtered (did not pass), N otherwise |
|
0 when none of the control bits are on, otherwise it is an even number. On HiSeq X and NextSeq systems, control specification is not performed and this number is always 0. |
|
Index of the read |
See also https://help.basespace.illumina.com/files-used-by-basespace/fastq-files
Filter file¶
The filter files can be found in the BaseCalls directory.
The filter file specifies whether a cluster passed filters.
Filter files are generated at cycle 26 using 25 cycles of data. For each tile, one filter file is generated.
Location: Data/Intensities/BaseCalls/L001
File format: s_[lane]_[tile].filter
The format is described below
Bytes |
Description |
---|---|
0-3 |
Zero value (for backwards compatibility) |
4-7 |
Filter format version number |
8-11 |
Number of clusters |
12-(N+11) |
Where N is the cluster number. unsigned 8-bits integer Bit 0 is pass or failed filter |
Filter bytes example:
bytes([0, 0, 0, 0]) # prefix 0
bytes([3, 0, 0, 0]) # version 3
struct.pack("<I", cluster_count) # number of cluster in little endian unsigned int
bytes([1]*cluster_count) # For each cluster an unsigned 8-bits integer Where Bit 0 is pass or failed filter
1 == PASS FILTER
0 == NO PASS FILTER
In hexdump:
BYTES 0-3 BYTES 4-7 BYTES 8-11 BYTES 12-14
00 00 00 00 03 00 00 00 03 00 00 00 01 01 01
At bytes 8-11 I have 3 clusters and each cluster is represented by a an unsigned 8-bit integer.
Control file¶
The control files are binary files containing control results.
Bytes |
Description |
---|---|
0-3 |
Zero value (for backwards compatibility) |
4-7 |
Format version number |
12-(2xN+11) |
|
Locations file¶
The BCL to FASTQ converter can use different types of position files and will expect a type based on the version of RTA used The locs files can be found in the Intensities/L<lane> directories
Bcl file¶
The BCL files can be found in the BaseCalls directory inside the run directory: Data/Intensities/BaseCalls/L<lane>/C<cycle>.1
They are named as follows:
s_<lane>_<tile>.bcl
Format:
Bytes |
Description |
---|---|
0-3 |
Number of N clusters in unsigned 32bits little endian integer |
4-(N+3) |
|
Stat file¶
The stats files can be found in the BaseCalls directory inside the run directory: Data/Intensities/BaseCalls/L00<lane>/C<cycle>.1
They are named as follows:
s_<lane>_<tile>.stats
The Stats file is a binary file containing base calling statistics; the content is described below.
The data is for clusters passing filter only:
Start |
Description |
Data type |
---|---|---|
Byte 0 |
Cycle number |
integer |
Byte 4 |
Rverage Cycle Intensity |
double |
Byte 12 |
Average intensity for A over all clusters with intensity for A |
double |
Byte 20 |
Average intensity for C over all clusters with intensity for C |
double |
Byte 28 |
Average intensity for G over all clusters with intensity for G |
double |
Byte 44 |
Average intensity for A over clusters with base call A |
double |
Byte 52 |
Average intensity for C over clusters with base call C |
double |
Byte 60 |
Average intensity for G over clusters with base call G |
double |
Byte 68 |
Average intensity for T over clusters with base call T |
double |
Byte 76 |
Number of clusters with base call A |
integer |
Byte 80 |
Number of clusters with base call C |
integer |
Byte 84 |
Number of clusters with base call G |
integer |
Byte 88 |
Number of clusters with base call T |
integer |
Byte 92 |
Number of clusters with base call X |
integer |
Byte 96 |
Number of clusters with intensity for A |
integer |
Byte 100 |
Number of clusters with intensity for C |
integer |
Byte 104 |
Number of clusters with intensity for G |
integer |
Byte 108 |
Number of clusters with intensity for T |
integer |
References¶
bcl2fastq source code from illumina downloads https://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software/downloads.html
Spec file from illumina support https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/bcl2fastq/bcl2fastq_letterbooklet_15038058brpmi.pdf
https://help.basespace.illumina.com/files-used-by-basespace/fastq-files
https://docs.python.org/3/library/struct.html#format-characters
See also mkdata.sh
file in bcl2fastq source code for insights on bcl format.