MAKER Test: Difference between revisions
Line 11: | Line 11: | ||
3. Download and install prerequisites if they are not installed. The minimum prerequisites are: | 3. Download and install prerequisites if they are not installed. The minimum prerequisites are: | ||
a. BioPerl and various other Perl modules (see the MAKER documentation for a complete list[http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial]). | a. BioPerl and various other Perl modules, listed below (see the MAKER documentation for a complete list[http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial]). | ||
i. BioPerl | i. BioPerl | ||
ii. DBI | ii. DBI |
Latest revision as of 21:45, 17 March 2015
MAKER is a genome annotation pipeline[1]. It allows for a researcher or group of researchers to take a genome, some amount of evidence (for example, an EST file, a protein file (both in FASTA format), and a repeat file, and potentially more), and create structural annotations for a genome. It is capable of training HMM files in order to provide better annotations for a genome with little evidence, although this takes many runs. This page is an attempt to document the work being done to add MAKER into CoGe.
How to Download and Install MAKER
MAKER may be downloaded from the Yandell lab, here: http://www.yandell-lab.org/software/maker.html. The full installation instructions may be found here: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial. The instructions here will just serve as a brief overview for getting MAKER running on the command line in UNIX.
1. Register and download MAKER from the Yandell lab MAKER software page.
2. Unpack MAKER in whichever folder it will be run from.
3. Download and install prerequisites if they are not installed. The minimum prerequisites are:
a. BioPerl and various other Perl modules, listed below (see the MAKER documentation for a complete list[2]). i. BioPerl ii. DBI iii. Error iv. Error::Simple v. File::NFSLock vi. File::Which vii. Inline viii. Perl::Unsafe::Signals ix. Proc::Signal x. URI::Escape xi. Bit::Vector xii. Inline::C xiii. PerlIO::gzip xiv. forks xv. forks::shared b. SNAP c. Exonerate d. RepeatMasker e. NCBI BLAST
4. Add MAKER and its prerequisites to $PATH. For example, the paths might look something like:
a. MAKER: export PATH="/home/user/maker/bin:$PATH" b. RepeatMasker: export PATH="/home/user/RepeatMasker:$PATH" c. Exonerate: export PATH="/home/user/exonerate-2.2.0-x86_64/bin:$PATH" d. SNAP: export PATH="/home/user/snap:$PATH"
Running MAKER from the UNIX command line
1. Set up the MAKER control files by typing "maker -CTL".
a. To set the genome sequence, enter the path to the genome FASTA file after "genome=". So, this might look like "genome=dpp_contig.fasta". Leave a space between this and the commented description (which starts with a "#" symbol). b. Set the EST or mRNA data by typing the path to the desired EST or mRNA fasta file after "est=". c. To have MAKER generate structural annotations directly from EST or mRNA data, change "est2genome=0" to "est2genome=1". d. Change any other desired settings. For more details on these settings and how to set them, see the full MAKER tutorial[3].
2. Open the MAKER boot options file ("maker_bopts.ctl") and ensure that the correct BLAST search type is selected.
a. For example, to use NCBI-BLAST, set "blast_type=ncbi+".
3. Edit the MAKER options file ("maker_opts.ctl") to the desired settings.
4. Run MAKER by typing "maker" on the command line.
a. To prevent MAKER from eating up too many system resources, use the "nice" command. b. To run MAKER in the background, and to avoid being spammed with constant messages, send the messages to both stdout and stderr with "&> somefile.txt". c. Putting both "a" and "b" together, the command might look something like, "nice maker &> log.txt".
5. MAKER may take about a day or more to run, depending on the size of the genome it is attempting to annotate and the evidence it is given. Once finished, it should place all files into a "genomename.maker.output" folder, with all the data split into separate folders by contigs. Each one of these folders will have several sub-folders, one of which should contain a "chromosomename.gff" file which contains the structural annotations, repeats, and will always end with the contig sequence.
Testing MAKER annotation accuracy with minimal data
The point of this exercise is to see how little data can be used with MAKER and still get reasonably accurate annotations.
The Arabidopsis thaliana genome was annotated using MAKER with the Arabidopsis lyrata mRNA data as evidence. Both the genome and the mRNA annotations were downloaded from GenBank. The A. thaliana genome with the annotations from MAKER using the A. lyrata mRNA evidence was uploaded to CoGe (genome id 25440) and compared to the existing A. thaliana annotations (genome id 16911). The analysis can be recreated here: https://genomevolution.org/r/f6kd.
A fair amount of synteny exists between the two genomes, although noise is present. The actual gene regions are currently being compared to assess accuracy directly. The SynMap listed above is used, and then BLAST is set to "BlastN: Small Regions" in GEvo. Also, under "Sequence Options," "Mask Sequence" is set to "Non-CDS" for both sequences.
Table explanation:
Sample Number: Numerically tracks each trial. Coords id16911: The coordinates in the original genome. Coords id39871: The coordinates in the MAKER-annotated genome. Total Genes id16911: The total number of genes annotated in the original genome. Does not include genes only partially visible in the genome-viewer. Total Genes id39871: The total number of genes annotated in the MAKER-annotated genome. Does not include genes only partially visible in the genome-viewer. Close Match: The CDS regions of the gene look the same by crude visual analysis. Numbers in parentheses represent the number of matches where a small piece of the gene matched to a different gene. Similar Match: A few small differences in the CDS regions can be seen. Numbers in parentheses represent the number of matches where a small piece of the gene matched to a different gene. Partial Match: Only a relatively small piece of the genes are matching. Numbers in parentheses represent the number of matches where a small piece of the gene matched to a different gene. No Match: The original gene is not represented in the new annotations. Pseudogene Misannotated: A pseudogene has been annotated as a functional gene in the MAKER annotations. New: A gene in the MAKER annotations is not found in the original genome's annotations. Multiple Match (Original:MAKER): These are shown as matching to multiple regions, with the ratio of the number of places on the original genome annotations that match followed by the number of places on the MAKER-annotated genome that match. The quantities of these are also listed. So, an example would look like 1 (only one such multiple match found) 2 (two matches found in the original annotations) : 1 (one match found in the MAKER annotations). Split (One gene to two+): Matches where only one gene in the original is present and multiple genes in the MAKER-annotated genome are found. This is different from a multiple match because each region is only matching one place. Joined (Two+ genes to one): Matches where multiple genes are present in the original and only one gene is present in the MAKER-annotated genome. This is different from a multiple match because each region is only matching one place.
Sample Number | Coords id16911 | Coords id39871 | Total Genes id16911 | Total Genes id39871 | Close Match | Similar Match | Partial Match | No Match | Pseudogene Misannotated | New | Multiple Match (Original:MAKER) | Split (One gene to two+) | Joined (Two+ genes to one) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 544697-648473 | 1 544697-648473 | 30 | 16 | 10 | 4 | 1 | 12 | 0 | 0 | 0 | 0 | 0 |
2 | 1 71582-180099 | 1 71582-180099 | 36 | 21 | 15(5) | 2 | 1 | 11 | 0 | 0 | 1 2:1 | 0 | 1 |
3 | 1 1202212-1307510 | 1 1202212-1307510 | 26 | 17 | 12 | 3 | 2 | 7 | 0 | 0 | 1 2:1 | 0 | 0 |
4 | 1 1455642-1559002 | 1 1455642-1559002 | 29 | 16 | 16(9) | 0 | 5 | 8 | 0 | 0 | 1 2:1 | 0 | 1 |
5 | 1 1928762-2039295 | 1 1928762-2039046 | 33 | 21 | 19(2) | 1 | 0 | 13 | 1 | 0 | 3 2:1 | 0 | 0 |
6 | 1 2540944-2656892 | 1 2542248-2656892 | 31 | 19 | 15(6) | 1 | 6 | 9 | 0 | 0 | 4 3:1, 2 2:1 | 0 | 1 |
7 | 1 2879268-2981804 | 1 2879268-2981656 | 30 | 18 | 14(2) | 3 | 8 | 9 | 0 | 0 | 0 | 0 | 1 |
8 | 1 3382313-3487979 | 1 3382313-3487979 | 27 | 18 | 14(2) | 1 | 2 | 9 | 0 | 0 | 2 2:1 | 0 | 0 |
9 | 1 3901597-4006840 | 1 3903894-4006840 | 32 | 15 | 10(1) | 3 | 2 | 14 | 0 | 0 | 0 | 0 | 1 |
10 | 1 4197703-4300444 | 1 4197703-4300444 | 30 | 15 | 15(3) | 0 | 2 | 13 | 0 | 0 | 1 1:2 | 0 | 0 |
11 | 1 4958502-5064486 | 1 4958502-5064486 | 27 | 17 | 15(3) | 2 | 2 | 7 | 0 | 0 | 1 2:1 | 0 | 0 |
12 | 1 5380446-5485921 | 1 5380446-5485921 | 35 | 17 | 13(1) | 2(1) | 4 | 15 | 0 | 0 | 2 1:2 | 0 | 0 |
13 | 1 7384303-7486702 | 1 7379980-7486702 | 28 | 13 | 6 | 0 | 2 | 8 | 0 | 0 | 4 1:3, 5 1:2, 1 2:1 | 0 | 1 |
14 | 1 8513858-8619927 | 1 8513858-8618644 | 30 | 15 | 9(1) | 5(2) | 2 | 12 | 0 | 0 | 0 | 0 | 1 |
15 | 1 9389859-9495818 | 1 9384718-9495818 | 31 | 14 | 10(2) | 1(1) | 5 | 11 | 0 | 0 | 4 1:2 | 0 | 2 |
16 | 1 9389859-9495818 | 1 9384718-9495818 | 26 | 15 | 8 | 5 | 2 | 8 | 0 | 0 | 3 1:2 | 0 | 0 |
17 | 1 12158853-12263571 | 1 12158853-12263571 | 22 | 5 | 4 | 0 | 1 | 17 | 0 | 0 | 0 | 0 | 0 |
18 | 1 12859293-12963916 | 1 12859293-12963916 | 18 | 7 | 3(1) | 3(3) | 0 | 10 | 0 | 0 | 1 2:1, 2 1:2 | 0 | 0 |
19 | 1 14108617-14211652 | 1 14108617-14211652 | 5 | 2 | 1 | 1 | 0 | 3 | 0 | 0 | 0 | 0 | 0 |
20 | 1 14108617-14211652 | 1 14108617-14211652 | 5 | 2 | 1 | 1 | 0 | 3 | 0 | 0 | 0 | 0 | 0 |
21 | 1 15553892-15657802 | 1 15553892-15657802 | 4 | 2 | 1 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
22 | 1 17682582-17808194 | 1 17682582-17805715 | 29 | 19 | 11(1) | 6(1) | 2 | 8 | 0 | 0 | 2 1:2 | 0 | 1 |
23 | 1 20236917-20340245 | 1 20236944-20340245 | 28 | 20 | 12(1) | 5(2) | 4 | 7 | 0 | 0 | 0 | 0 | 0 |
24 | 1 21984661-22089844 | 1 21984661-22089844 | 24 | 13 | 10(1) | 1(1) | 1 | 9 | 0 | 0 | 1 2:1, 2 1:2 | 0 | 1 |
25 | 1 26732839-26838712 | 1 26732839-26838712 | 33 | 16 | 15(2) | 1 | 0 | 17 | 0 | 0 | 0 | 0 | 0 |
26 | 5 185117-290911 | 5 185117-290911 | 30 | 19 | 12(2) | 5(1) | 4(1) | 8 | 0 | 0 | 0 | 0 | 0 |
27 | 5 2484720-2590086 | 5 2484720-2590086 | 29 | 18 | 14(6) | 2 | 2(2) | 6 | 0 | 0 | 2 3:1 | 0 | 0 |
28 | 5 5240999-5347779 | 5 5240999-5347779 | 30 | 20 | 18(4) | 2 | 1 | 8 | 0 | 0 | 1 2:1 | 0 | 0 |
29 | 5 7948309-8052594 | 5 7948309-8052594 | 29 | 16 | 12(4) | 2(1) | 2 | 6 | 0 | 0 | 0 | 0 | 2 |
30 | 5 10746648-10848147 | 5 10746648-10848147 | 12 | 1 | 1 | 0 | 0 | 11 | 0 | 0 | 0 | 0 | 0 |
31 | 5 13424499-13532238 | 5 13434606-13537546 | 10 | 4 | 4(1) | 0 | 0 | 5 | 0 | 0 | 1 2:1 | 0 | 0 |
32 | 5 16020402-16123101 | 5 16020402-16123101 | 29 | 9 | 6(3) | 1(1) | 4 | 17 | 0 | 1 | 1 2:1, 2 1:2 | 0 | 0 |
33 | 5 18741802-18845407 | 5 18741802-18845407 | 25 | 17 | 12(2) | 6(3) | 1 | 3 | 0 | 0 | 1 2:1, 2 1:2 | 0 | 0 |
34 | 5 21699561-21801099 | 5 21384155-21488362 | 26 | 15 | 0 | 0 | 3(1) | 23 | 0 | 0 | 1 1:2 | 0 | 0 |
35 | 5 24375824-24480269 | 5 24375824-24480269 | 19 | 13 | 8(5) | 5(1) | 1 | 5 | 0 | 0 | 1 1:2 | 0 | 0 |
36 | 5 26726994-26835104 | 5 26726994-26835104 | 34 | 17 | 14(5) | 3 | 2 | 12 | 0 | 0 | 1 2:1 | 0 | 0 |
37 | 5 1395509-1499568 | 5 1395509-1499568 | 28 | 19 | 17(2) | 2 | 1(1) | 8 | 0 | 0 | 0 | 0 | 0 |
38 | 5 3963813-4071018 | 5 3963813-4070909 | 29 | 21 | 20(2) | 1 | 1 | 7 | 0 | 0 | 0 | 0 | 0 |
39 | 5 6645731-6751247 | 5 6645731-6751247 | 35 | 21 | 19(4) | 2 | 2 | 12 | 0 | 0 | 0 | 0 | 0 |
40 | 5 9395950-9500584 | 5 9395950-9500584 | 28 | 16 | 12 | 3 | 1 | 12 | 0 | 0 | 0 | 0 | 0 |
41 | 5 12035328-12138569 | 5 12035328-12138569 | 4 | 4 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
42 | 5 14198202-14303272 | 5 14198202-14303110 | 17 | 8 | 6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
43 | 5 17439327-17544830 | 5 17439468-17544830 | 28 | 14 | 8(1) | 5(1) | 3(1) | 11 | 0 | 0 | 1 2:1 | 0 | 0 |
44 | 5 20126385-20238307 | 5 20126385-20238307 | 21 | 14 | 11(7) | 2(1) | 1 | 5 | 0 | 0 | 1 1:2, 4 1:4 | 0 | 1 |
45 | 5 22960801-23065559 | 5 22960764-23065559 | 26 | 14 | 10(5) | 4(2) | 2 | 8 | 0 | 0 | 2 2:1 | 0 | 0 |
46 | 5 25587492-25693902 | 5 25587492-25693902 | 30 | 23 | 19(7) | 3 | 3(1) | 5 | 0 | 0 | 7 3:1 | 0 | 0 |
47 | 3 1096943-1203341 | 3 1096943-1203341 | 32 | 19 | 17(5) | 0 | 2 | 8 | 0 | 0 | 3 1:2, 1 2:1 | 0 | 0 |
48 | 3 2215142-2329383 | 3 2215142-2329383 | 29 | 21 | 15(4) | 3(2) | 3(1) | 7 | 0 | 0 | 3 1:2 | 0 | 0 |
49 | 3 3432575-3541667 | 3 3432575-3541908 | 36 | 21 | 15(4) | 6(2) | 4 | 11 | 0 | 0 | 0 | 0 | 0 |
50 | 3 4704624-4811185 | 3 4704624-4811185 | 25 | 15 | 15(5) | 0 | 3 | 7 | 0 | 0 | 0 | 0 | 0 |
51 | 3 5886108-5996205 | 3 5886108-5996205 | 30 | 15 | 11 | 4 | 0 | 15 | 0 | 0 | 0 | 0 | 0 |
52 | 3 7009098-7112660 | 3 7009100-7112660 | 31 | 16 | 12(1) | 3(1) | 1 | 8 | 0 | 0 | 1 6:1, 1 2:1 | 0 | 0 |
53 | 3 8153548-8257243 | 3 8153548-8257243 | 18 | 6 | 4(3) | 2 | 3 | 9 | 0 | 0 | 0 | 0 | 0 |
54 | 3 9258942-9363353 | 3 9258942-9363353 | 25 | 12 | 9(2) | 3 | 2 | 10 | 0 | 0 | 0 | 0 | 0 |
55 | 3 10561071-10666301 | 3 10543921-10648775 | 16 | 5 | 1 | 4 | 0 | 9 | 0 | 0 | 5 1:3 | 0 | 0 |
56 | 3 11760867-11863509 | 3 11760867-11863509 | 3 | 1 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
57 | 3 12982458-13085348 | 3 12983120-13085348 | 7 | 1 | 0 | 1 | 0 | 6 | 0 | 0 | 0 | 0 | 0 |
58 | 3 14035354-14143449 | 3 14035354-14143449 | 5 | 3 | 2 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
59 | 3 15184235-15295034 | 3 15184235-15290697 | 12 | 6 | 5(2) | 1 | 0 | 6 | 0 | 0 | 0 | 0 | 0 |
60 | 3 16347038-16449725 | 3 16347038-16449725 | 20 | 13 | 9(6) | 2(1) | 0 | 8 | 0 | 0 | 2 1:2, 3 1:3 | 1 | 0 |
61 | 3 17415629-17517888 | 3 17415629-17517888 | 23 | 13 | 11(2) | 2(1) | 0 | 10 | 0 | 0 | 2 1:2 | 0 | 0 |
62 | 3 18636527-18750533 | 3 18636527-18750533 | 26 | 12 | 8(3) | 3(1) | 1 | 11 | 0 | 0 | 1 4:1, 1 1:2, 3 1:3 | 0 | 0 |
63 | 3 19855826-19960027 | 3 19855826-19960027 | 36 | 25 | 22(2) | 3(1) | 1 | 10 | 0 | 0 | 0 | 0 | 0 |
64 | 3 21065078-21171051 | 3 21065078-21171051 | 32 | 19 | 15(3) | 4(2) | 6 | 6 | 0 | 0 | 2 2:1, 1 3:1 | 0 | 0 |
65 | 3 22173829-22279195 | 3 22173829-22279195 | 23 | 14 | 12(4) | 1 | 0 | 10 | 1 | 0 | 1 3:1 | 0 | 0 |
66 | 3 23140428-23245727 | 3 23140428-23245727 | 31 | 15 | 11(2) | 3 | 1 | 13 | 0 | 0 | 2 1:2 | 0 | 1 |
67 | 2 902313-1009004 | 2 902313-1007874 | 23 | 9 | 7 | 1 | 2 | 12 | 0 | 0 | 1 2:1 | 0 | 0 |
68 | 2 1933901-2035341 | 2 1925574-2026339 | 17 | 7 | 5(4) | 1 | 0 | 6 | 0 | 0 | 1 2:1, 5 1:3 | 0 | 0 |
69 | 2 2535215-2637601 | 2 2535215-2637601 | 8 | 2 | 2 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 |
70 | 2 2997623-3107099 | 2 2997623-3107099 | 9 | 1 | 1 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 |
71 | 2 4569145-4671448 | 2 4569145-4671448 | 7 | 1 | 1 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 |
72 | 2 5064881-5168486 | 2 5064881-5168486 | 4 | 3 | 3 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
73 | 2 5645124-5756134 | 2 5645124-5756134 | 18 | 6 | 6(1) | 0 | 0 | 12 | 0 | 0 | 2 1:2 | 0 | 0 |
74 | 2 6724150-6827899 | 2 6724150-6827068 | 18 | 11 | 6(3) | 2 | 0 | 5 | 0 | 0 | 2 1:2, 3 1:3 | 1 | 1 |
75 | 2 7734455-7852230 | 2 7734455-7852230 | 28 | 18 | 14(1) | 3 | 2 | 9 | 0 | 0 | 0 | 0 | 0 |
76 | 2 8809211-8914699 | 2 8809211-8914699 | 37 | 18 | 16(2) | 2(1) | 2 | 16 | 0 | 0 | 1 2:1 | 0 | 0 |
77 | 2 9716127-9819766 | 2 9716127-9819766 | 28 | 13 | 10(1) | 1(1) | 1 | 10 | 0 | 0 | 7 1:7 | 0 | 2 |
78 | 2 10817070-10922077 | 2 10817070-10921395 | 26 | 13 | 8(2) | 2 | 1 | 12 | 0 | 0 | 0 | 0 | 0 |
79 | 2 11690670-11794867 | 2 11690670-11794867 | 33 | 17 | 15(3) | 1 | 1 | 15 | 0 | 0 | 1 2:1, 2 1:2 | 0 | 0 |
80 | 2 12802632-12907369 | 2 12802632-12907608 | 25 | 16 | 14(1) | 1 | 1 | 8 | 0 | 0 | 1 2:1 | 0 | 0 |
81 | 2 13702665-13806233 | 2 13702665-13806233 | 26 | 18 | 14(5) | 2 | 1 | 9 | 0 | 0 | 2 1:2 | 0 | 0 |
82 | 2 14604611-14707443 | 2 14604644-14707443 | 19 | 13 | 9(4) | 2 | 3 | 4 | 0 | 0 | 0 | 0 | 0 |
83 | 2 15650550-15755165 | 2 15650550-15755165 | 25 | 12 | 10(2) | 2 | 1 | 11 | 0 | 0 | 1 2:1 | 0 | 0 |
84 | 2 16665089-16773406 | 2 16665089-16773406 | 26 | 15 | 13(3) | 0 | 3 | 9 | 0 | 0 | 0 | 0 | 0 |
85 | 2 17684541-17788679 | 2 17684541-17788679 | 24 | 15 | 14(3) | 1 | 3 | 5 | 0 | 0 | 1 2:1 | 0 | 0 |
86 | 2 18707881-18822229 | 2 18707881-18822229 | 32 | 18 | 15(6) | 3(1) | 5 | 8 | 0 | 0 | 4 1:3 | 0 | 0 |
87 | 4 751713-858018 | 4 751713-858018 | 31 | 16 | 10(1) | 3 | 0 | 12 | 0 | 0 | 1 2:1 | 0 | 2 |
88 | 4 1722114-1824380 | 4 1722169-1824380 | 3 | 2 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | 0 |
89 | 4 17593684-17695729 | 4 17593684-17695729 | 23 | 13 | 10 | 3(1) | 0 | 9 | 0 | 0 | 1 2:1 | 0 | 0 |
90 | 4 16618075-16721974 | 4 16618075-16721974 | 25 | 15 | 11(3) | 4(2) | 2 | 8 | 0 | 0 | 1 2:1 | 0 | 0 |
91 | 4 2696288-2802663 | 4 2696288-2802663 | 21 | 11 | 9(3) | 2 | 0 | 6 | 0 | 0 | 1 2:1, 5 1:2 | 0 | 0 |
92 | 4 3714499-3816439 | 4 3714499-3816439 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
93 | 4 4837112-4938939 | 4 4837112-4938939 | 4 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
94 | 4 6473657-6578058 | 4 6473696-6578058 | 22 | 11 | 8(5) | 2 | 1 | 10 | 0 | 0 | 9 1:2 | 0 | 1 |
95 | 4 8312586-8416525 | 4 8312586-8416525 | 23 | 12 | 7(3) | 3 | 2(1) | 11 | 0 | 0 | 0 | 0 | 0 |
96 | 4 10189947-10297372 | 4 10189947-10297372 | 30 | 15 | 13(3) | 1 | 4 | 11 | 0 | 0 | 1 3:1 | 0 | 0 |
97 | 4 11983703-12093572 | 4 11983703-12093572 | 26 | 16 | 13(1) | 3 | 2 | 8 | 0 | 0 | 1 2:1 | 0 | 0 |
98 | 4 13898993-14007840 | 4 13898993-14007840 | 38 | 23 | 22(2) | 0 | 2 | 14 | 0 | 0 | 0 | 0 | 0 |
99 | 4 14876974-14979681 | 4 14876974-14979681 | 32 | 22 | 21(1) | 1(1) | 2 | 8 | 0 | 0 | 2 1:2, 3 1:3 | 0 | 0 |
100 | 4 15741039-15845643 | 4 15741039-15845643 | 22 | 19 | 16(5) | 3(2) | 0 | 3 | 0 | 0 | 2 1:2 | 0 | 0 |
Column Totals | N/A | N/A | 2362 | 1312 | 1006(219) | 198(42) | 156(9) | 852 | 3 | 1 | N/A | 2 | 22 |
Summary Percentages
1312/2362 = 56% of genes annotated
1006/1312 = 77% had same CDS
1204/1312 = 92% had similar or better CDS
852/2362 = 36% of genes were not annotated
Note: 56% + 36% = 92%, so where is the missing 8%? The answer is that many genes had multiple syntenic matches, and these are accounted for in the "partial matches" column and the "joined" and "split" columns. If these were to be separated out, then the total missed genes would be (2362 * 0.08) + 852 = 1041.