Comet release 2021.01

Documentation for parameters for release 2021.01 can be found here.

release 2021.01 rev. 0 (2021.01.0), release date 2021/06/23

Update the expectation value (E-value) calculation by improving the determination of the tail region of the xcorr cumulative distribution for the linear regression fit.
New ThreadPool code by D. Shteynberg. The previous code apparently has intermittent issues when using many (70+) threads.
Make flanking (previous and next) residue reporting consistent when a peptide is present in a protein multiple times and thus could have different flanking residues within the same protein. Previous versions did not consistently report the same set of flanking residues in repeated/replicate searches. The flanking residues for the first occurrence of the peptide in a protein will now always be reported.
Added a parameter "old_mods_encoding" to enable using the old character based modification encodings (e.g. DLYM*NCK) instead mass based encodings (e.g. DLYM[15.9949]NCK) in the SQT output files. Add "old_mods_encoding = 1" to the comet.params file to use the old modification character encodings. This functionality was added to support post-processing tools that have not been updated to handle the numeric modification encodings. This is a "hidden" parameter in that it is not present in the example params file generated by "comet.exe -p" nor is it in the sample parameters files available for download. It must be manually added to your comet.params.
The "print_expect_score" parameter is now deprecated; it will be treated as a hidden parameter. Anyone using SQT output who would rather have the Sp score instead of the E-value reported will now have to manually add "print_expect_score = 0" to their params file.
Added a no digestion (aka "no_cut", aka don't cleave anywhere) entry to the comet.params file.
The Windows Visual Studio solution is updated to compile with v142 build tools using Visual Studio 2019.
This version of Comet will also run using comet.params files from the 2020.01 releases as there have been no significant changes to the parameter entries.
Known bug: in the mzIdentML output, the attribute "dBSequence_ref" for element "PeptideEvidence" is incorrectly written as "DBSeqence_ref". On 2021/08/04, the release files were updated to correct this. Thanks to A. Collins for reporting the error.
Here's a list of some known bugs that weren't addressed for this release. Hopefully I can address some of these in a follow-up maintenance release.
- Reported PEFF modification and substitution positions are off by 1 when the start methionine residue is cleaved (using "clip_nterm_methionine = 1").
- The program will access restricted memory (negative array position) when "precursor_NL_ions = 1" is set. Presumably this can happen with other specified neutral loss masses besides "1" although that hasn't been tested yet. Until the underlying bug can be identified, Comet will now report a warning ("Error3") and skip the analysis of those neutral loss peaks and allow the search to complete.
- The reported calculated peptide masses can vary by one number in the 6th decimal point between replicate searches. This occurs very infrequently if at all.
- The preliminary score rank (Sp rank) can vary between replicate searches. I've only observed the ranks differ by 1, e.g. 12 in one run and 13 in the other. This is sadly an issue associated with threading that cannot be addressed without a huge performance hit so this behavior will continue to exist going forward. Fortunately I don't believe this occurs frequently, especially for the "good" IDs. Plus the preliminary score rank and score are old retrofit values that are added solely for backwards compatibility; they were made unnecessary with the fast xcorr calculation from 2008.

Go download from the download page.

Comet

MS/MS database search

Comet release 2021.01