GEvo Blastn Bug

From CoGepedia
Jump to: navigation, search

Bug Description

When using blastn in GEvo, a large HSP appear in the middle of the region when the query sequence length is changed by 1 nucleotide.

This problem was identified by Mike Freeling.

Visualization

Difference between analyses is that addition of 1 nucleotide to the top panel of the top analysis

Analysis may be regenerated at: http://genomevolution.org/r/4i6h
Analysis may be regenerated at: http://genomevolution.org/r/4i6i

Report of "disappearing" HSP in blast file

Top portion of blast report which contains the large HSP (problem HSP starts at "Query 129"):

BLASTN 2.2.24+

Query=  AT1G75520 
Length=478

Subject=  Bra008203 
Length=17361


 Score = 68.0 bits (35),  Expect = 3e-14
 Identities = 76/94 (80%), Gaps = 4/94 (4%)
 Strand=Plus/Plus

Query  1      CTAGGTTTCGTGTTCCACTGATCAAAGATTTGAAAAAAAACATATACTTAGTAAACTTCA  60
              |||||||| ||||||||||||||||||| ||    || |||||||    |  |||||| |
Sbjct  9943   CTAGGTTTTGTGTTCCACTGATCAAAGAGTT----AAGAACATATTTCAATAAAACTTTA  9998

Query  61     AGCAATTTTTATATTACCCAATTGAATTTCTCCA  94
              |  |||||||||| |||||| |||||||||||||
Sbjct  9999   AATAATTTTTATACTACCCAGTTGAATTTCTCCA  10032


 Score = 54.5 bits (28),  Expect = 3e-10
 Identities = 32/34 (94%), Gaps = 0/34 (0%)
 Strand=Plus/Plus

Query  442    TTGGCGTAGATGAATGTAAACGGATGGTAATATA  475
              |||||||| |||||||||||||||| ||||||||
Sbjct  10342  TTGGCGTATATGAATGTAAACGGATAGTAATATA  10375


 Score = 50.7 bits (26),  Expect = 4e-09
 Identities = 77/95 (81%), Gaps = 5/95 (5%)
 Strand=Plus/Plus

Query  129    ATATATATATACCCAACAACTGAGAAAAGATGGAAAAAGTTTAGTTAAAAACTGGTCCTG  188
              ||||||||||| |||||| ||  |||||||||||| || || |||||||||     | | 
Sbjct  10072  ATATATATATAACCAACATCTCGGAAAAGATGGAACAAATT-AGTTAAAAAAA---CATT  10127

Query  189    GGCGGCTTTAAATTATATTTATGCACTTAAATTTA  223
              ||||||||||||| ||||| ||| | |||||||||
Sbjct  10128  GGCGGCTTTAAATCATATTCATGTA-TTAAATTTA  10161


 Score = 48.8 bits (25),  Expect = 2e-08
 Identities = 47/58 (81%), Gaps = 0/58 (0%)
 Strand=Plus/Plus

Query  316    GGTTTTTACTTAGATAATATCGTGTCATTCCATCTAGATTCAACCCCTGTCTACAATA  373
              |||||| ||| |||| |||| ||  || | |  ||||||||||||||| |||||||||
Sbjct  10207  GGTTTTCACTGAGATGATATTGTTCCAGTTCCACTAGATTCAACCCCTCTCTACAATA  10264


 Score = 25.7 bits (13),  Expect = 0.15
 Identities = 13/13 (100%), Gaps = 0/13 (0%)
 Strand=Plus/Plus

Query  127   TAATATATATATA  139
             |||||||||||||
Sbjct  7327  TAATATATATATA  7339

Top portion of blast report which DOES NOT contains the large HSP :

BLASTN 2.2.24+


Query=  AT1G75520 
Length=479

Subject=  Bra008203 
Length=17361


 Score = 69.9 bits (36),  Expect = 7e-15
 Identities = 77/95 (81%), Gaps = 4/95 (4%)
 Strand=Plus/Plus

Query  1      ACTAGGTTTCGTGTTCCACTGATCAAAGATTTGAAAAAAAACATATACTTAGTAAACTTC  60
              ||||||||| ||||||||||||||||||| ||    || |||||||    |  |||||| 
Sbjct  9942   ACTAGGTTTTGTGTTCCACTGATCAAAGAGTT----AAGAACATATTTCAATAAAACTTT  9997

Query  61     AAGCAATTTTTATATTACCCAATTGAATTTCTCCA  95
              ||  |||||||||| |||||| |||||||||||||
Sbjct  9998   AAATAATTTTTATACTACCCAGTTGAATTTCTCCA  10032


 Score = 54.5 bits (28),  Expect = 3e-10
 Identities = 32/34 (94%), Gaps = 0/34 (0%)
 Strand=Plus/Plus

Query  443    TTGGCGTAGATGAATGTAAACGGATGGTAATATA  476
              |||||||| |||||||||||||||| ||||||||
Sbjct  10342  TTGGCGTATATGAATGTAAACGGATAGTAATATA  10375


 Score = 48.8 bits (25),  Expect = 2e-08
 Identities = 47/58 (81%), Gaps = 0/58 (0%)
 Strand=Plus/Plus

Query  317    GGTTTTTACTTAGATAATATCGTGTCATTCCATCTAGATTCAACCCCTGTCTACAATA  374
              |||||| ||| |||| |||| ||  || | |  ||||||||||||||| |||||||||
Sbjct  10207  GGTTTTCACTGAGATGATATTGTTCCAGTTCCACTAGATTCAACCCCTCTCTACAATA  10264


 Score = 25.7 bits (13),  Expect = 0.15
 Identities = 13/13 (100%), Gaps = 0/13 (0%)
 Strand=Plus/Plus

Query  128   TAATATATATATA  140
             |||||||||||||
Sbjct  7327  TAATATATATATA  7339

Top portion of blast report which contains the large HSP :br>

Lambda     K      H
    1.33    0.621     1.12 

Gapped
Lambda     K      H
    1.33    0.621     1.12 

Effective search space used: 8066820




Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 5, Extension: 2

Top portion of blast report which DOES NOT contains the large HSP :

Lambda     K      H
    1.33    0.621     1.12 

Gapped
Lambda     K      H
    1.33    0.621     1.12 

Effective search space used: 8084168




Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 5, Extension: 2

Blast Commands

Extra HSP, Legacy Blast:

/usr/local/bin/legacy_blast.pl bl2seq -p blastn -o /opt/apache/CoGe/tmp/GEvo/52376346_1-2.bl2seq -i /opt/apache/CoGe/tmp/GEvo/f34524790ds39598r1c1u-2233d777g2dsg3.faa -j /opt/apache/CoGe/tmp/GEvo/f103007203ds48732r1cA02u8000d8000g1dsg12468.faa  -W 7 -G 5 -E 2 -q -2 -r 1 -e 30 -F F

Extra HSP, Blast+

blastn -query /opt/apache/CoGe/tmp/GEvo/f34524790ds39598r1c1u-2233d777g2dsg3.faa -subject /opt/apache/CoGe/tmp/GEvo/f103007203ds48732r1cA02u8000d8000g1dsg12468.faa -evalue 30 -gapopen 5 -gapextend 2 -word_size 7 -penalty -2 -reward 1 -dust no

Missing HSP, Legacy Blast

/usr/local/bin/legacy_blast.pl bl2seq -p blastn -o /opt/apache/CoGe/tmp/GEvo/22224470_1-2.bl2seq -i /opt/apache/CoGe/tmp/GEvo/f34524790ds39598r1c1u-2232d777g2dsg3.faa -j /opt/apache/CoGe/tmp/GEvo/f103007203ds48732r1cA02u8000d8000g1dsg12468.faa  -W 7 -G 5 -E 2 -q -2 -r 1 -e 30 -F F

Missing HSP, Blast+

blastn -query /opt/apache/CoGe/tmp/GEvo/f34524790ds39598r1c1u-2232d777g2dsg3.faa -subject /opt/apache/CoGe/tmp/GEvo/f103007203ds48732r1cA02u8000d8000g1dsg12468.faa -evalue 30 -gapopen 5 -gapextend 2 -word_size 7 -penalty -2 -reward 1 -dust no

Conclusions

  • The bug is in blastn. Current version is 2.2.24
  • Possible causes:
    • Change in sequence search space (sequence length) causes change in evalue
    • Change in sequence causes an "edge-effect" where exact sequence pattern at the end of the character sequence causes a chance in HSP identification
    • Repeat sequences are causing an internal problem to how blast identifies and categorizes HSPs

Download

The sequences, blast reports, log files, and GEvo Images can be obtained at: http://genomevolution.org/CoGe/data/distrib/gevo_blast_bug.tar.gz

Comparison to Blast 2.2.25+

Same problem exists: GEvo bug Blast 2.2.25+