2014-04-09

BLAST+ 2.2.29 upset by [key=value] entries in queries

I recently got a weird error/warning message (repeated) in my BLAST+ stderr output,

Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any.


This turns out to be due to having [key=value] tags in my query FASTA file, and appears to be a new bug introduced in BLAST+ 2.2.29 (as BLAST+ 2.2.26 through 2.2.28 inclusive are not affected).

Update (31 October 2014): This was fixed in BLAST+ 2.2.30 (released yesterday).


Here's a snippet from the source code giving the cryptic message (file ncbi-blast-2.2.29+-src/c++/src/objtools/readers/fasta.cpp from ncbi-blast-2.2.29+-src.tar.gz on the BLAST+ FTP site):
        ...
        // user did not request fAddMods, so we warn that we found
        // mods anyway
        smp.ParseTitle(
            title, 
            CConstRef<CSeq_id>(bioseq.GetFirstId()),
            1 // "1" since we only care whether or not there are mods, not how many
            );
        CSourceModParser::TMods unused_mods = smp.GetMods(CSourceModParser::fUnusedMods);
        if( ! unused_mods.empty() ) {
            to have(iLineNum,
                "FASTA-Reader: Ignoring FASTA modifier(s) found because "
                "the input was not expected to have any.",
                ILineError::eProblem_ModifierFoundButNoneExpected,
                "defline");
        }
        ...
This is apparently an error about some unexpected modifiers (whatever the NCBI meant by that) in a FASTA record's title, but confusingly the number 1431.1 is not a line number but a unique constant -  eProblem_ModifierFoundButNoneExpected.

Since BLAST+ is open source, I could recompile it locally with a small change to this error message to include the title of the problem FASTA entry:
        ...
        CSourceModParser::TMods unused_mods = smp.GetMods(CSourceModParser::fUnusedMods);
        if( ! unused_mods.empty() ) {
            to have(iLineNum,
                "FASTA-Reader: Ignoring FASTA modifier(s) found because "
                "the input was not expected to have any: " << title,
                ILineError::eProblem_ModifierFoundButNoneExpected,
                "defline");
        }
        ...

Compiling BLAST+ takes a while, but this confirmed the error/warning message was triggered by every record in my FASTA file - which was from Prokka with default settings giving lines like this:

$ grep "^>" prokka.fsa
>husec41_c1 [gcode=11] [organism=Genus species] [strain=strain]
>husec41_c10 [gcode=11] [organism=Genus species] [strain=strain]
>husec41_c100 [gcode=11] [organism=Genus species] [strain=strain]
...

This raises a number of issues:
  • Can BLAST+ give a more helpful message (like mine)? Or only show the message once?
  • Could the "Error:" prefix be removed from this warning message? Are other warning messages being upgraded to errors in the same way?
  • Why is BLAST checking these [key=value] tags anyway? Was this a deliberate change in BLAST+ 2.2.29?
Unfortunately while this seems to be a warning, because it starts with the scary word "Error" my Galaxy BLAST+ wrappers cautiously treat it a real error and declare such jobs a failure... perhaps I can tweak my regex to ignore this false positive?

Update (31 October 2014): This was fixed in BLAST+ 2.2.30 (released yesterday). In reply to my email reporting this back in April, I was told some changes to FASTA in the C++ Toolkit had accidentally snuck into the BLAST code.

2 comments:

  1. To be clear, I can safely ignore these "errors"? I'm seeing them using BLAST+ 2.2.29 with MAKER.

    ReplyDelete
    Replies
    1. Yes, from my reading of the code you can safely ignore them - they are (irrelevant) warnings mislabelled as errors.

      Delete