MAKER Test: Difference between revisions

From CoGepedia
Jump to navigation Jump to search
Lkosin (talk | contribs)
Lkosin (talk | contribs)
 
(21 intermediate revisions by the same user not shown)
Line 11: Line 11:


3. Download and install prerequisites if they are not installed. The minimum prerequisites are:
3. Download and install prerequisites if they are not installed. The minimum prerequisites are:
     a. BioPerl and various other Perl modules (see the MAKER documentation for a complete list[http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial]).
     a. BioPerl and various other Perl modules, listed below (see the MAKER documentation for a complete list[http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial]).
          i. BioPerl
          ii. DBI
          iii. Error
          iv. Error::Simple
          v. File::NFSLock
          vi. File::Which
          vii. Inline
          viii. Perl::Unsafe::Signals
          ix. Proc::Signal
          x. URI::Escape
          xi. Bit::Vector
          xii. Inline::C
          xiii. PerlIO::gzip
          xiv. forks
          xv. forks::shared
     b. SNAP
     b. SNAP
     c. Exonerate
     c. Exonerate
Line 21: Line 36:
     c. Exonerate: export PATH="/home/user/exonerate-2.2.0-x86_64/bin:$PATH"
     c. Exonerate: export PATH="/home/user/exonerate-2.2.0-x86_64/bin:$PATH"
     d. SNAP: export PATH="/home/user/snap:$PATH"
     d. SNAP: export PATH="/home/user/snap:$PATH"


== Running MAKER from the UNIX command line ==
== Running MAKER from the UNIX command line ==
Line 52: Line 66:
A fair amount of synteny exists between the two genomes, although noise is present. The actual gene regions are currently being compared to assess accuracy directly. The SynMap listed above is used, and then BLAST is set to "BlastN: Small Regions" in GEvo. Also, under "Sequence Options," "Mask Sequence" is set to "Non-CDS" for both sequences.
A fair amount of synteny exists between the two genomes, although noise is present. The actual gene regions are currently being compared to assess accuracy directly. The SynMap listed above is used, and then BLAST is set to "BlastN: Small Regions" in GEvo. Also, under "Sequence Options," "Mask Sequence" is set to "Non-CDS" for both sequences.


Table explanation:


For the region: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT1G26850.1 (chr: 1 9251146-9353432) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|1-exonerate_est2genome-gene-93.12-mRNA-1 (chr: 1 9251146-9353432):
    Sample Number: Numerically tracks each trial.
 
    Coords id16911: The coordinates in the original genome.
Out of 29 genes in the original, 10 appear to match closely in the MAKER annotated sequence, one matches with some noticeable changes, and 2 more are only partial matches, and thus 16 appear to be missing. One pseudogene in the original is annotated as an actual gene in the MAKER-annotated sequence.
    Coords id39871: The coordinates in the MAKER-annotated genome.
 
    Total Genes id16911: The total number of genes annotated in the original genome. Does not include genes only partially visible in the genome-viewer.
For the region: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT2G27460.1 (chr: 2 11690670-11794867) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|2-exonerate_est2genome-gene-117.3-mRNA-1 (chr: 2 11690670-11794867):
    Total Genes id39871: The total number of genes annotated in the MAKER-annotated genome. Does not include genes only partially visible in the genome-viewer.
 
    Close Match: The CDS regions of the gene look the same by crude visual analysis. Numbers in parentheses represent the number of matches where a small piece of the gene matched to a different gene.
Out of 32 genes in the original, 13 appear to match relatively closely, 1 appears to be a very close match, 13 appear to be missing, and 5 appear to be partial matches.  
    Similar Match: A few small differences in the CDS regions can be seen. Numbers in parentheses represent the number of matches where a small piece of the gene matched to a different gene.
 
    Partial Match: Only a relatively small piece of the genes are matching. Numbers in parentheses represent the number of matches where a small piece of the gene matched to a different gene.
For the region: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT1G17540.1 (chr: 1 5979551-6082641) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|1-exonerate_est2genome-gene-57.6-mRNA-1 (chr: 1 5684234-5787298):
    No Match: The original gene is not represented in the new annotations.
 
    Pseudogene Misannotated: A pseudogene has been annotated as a functional gene in the MAKER annotations.
Out of 27 genes total, none appeared to match closely, and only 3 had partial matches. 8 genes appear to be missing entirely, one pseudogene has been annotated as a functional gene, and another gene has been added.
    New: A gene in the MAKER annotations is not found in the original genome's annotations.
 
    Multiple Match (Original:MAKER): These are shown as matching to multiple regions, with the ratio of the number of places on the original genome annotations that match followed by the number of places on the MAKER-annotated genome that match. The quantities of these are also listed. So, an example would look like 1 (only one such multiple match found) 2 (two matches found in the original annotations) : 1 (one match found in the MAKER annotations).
For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT1G41830.1 (chr: 1 15553892-15657802) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|1-exonerate_est2genome-gene-156.0-mRNA-1 (chr: 1 15553892-15657802):
    Split (One gene to two+): Matches where only one gene in the original is present and multiple genes in the MAKER-annotated genome are found. This is different from a multiple match because each region is only matching one place.
 
    Joined (Two+ genes to one): Matches where multiple genes are present in the original and only one gene is present in the MAKER-annotated genome. This is different from a multiple match because each region is only matching one place.
Out of 4 genes in the original, 2 appear to be close matches and 2 appear to be missing. None the the 13 pseudogenes in the original were annotated in the MAKER-annotated genome.
<table border="1" cellpadding="5" cellspacing="5">
 
<tr>
For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT5G37060.1 (chr: 5 14592741-14695414) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|5-exonerate_est2genome-gene-146.9-mRNA-1 (chr: 5 14592741-14695414):
<th>Sample Number</th>
 
<th>Coords id16911</th>
Out of 15 genes in the original, 9 appear to be close matches, 2 appear to be partial matches, and 4 appear to be missing.
<th>Coords id39871</th>
 
<th>Total Genes id16911</th>
For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT2G25050.1 (chr: 2 10604108-10709383) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|2-exonerate_est2genome-gene-106.9-mRNA-1 (chr: 2 10604108-10709383):
<th>Total Genes id39871</th>
 
<th>Close Match</th>
Out of 19 genes in the original, 2 appear to be close or nearly exact matches, 9 appear to be relatively close matches, 3 appear to partial matches, and 5 appear to be missing. 1 pseudogene has been annotated as a gene and 1 gene has been split into two genes.
<th>Similar Match</th>
 
<th>Partial Match</th>
For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT3G56140.1 (chr: 3 20779407-20882669) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|2-exonerate_est2genome-gene-168.5-mRNA-1 (chr: 2 16819379-16922569):
<th>No Match</th>
 
<th>Pseudogene Misannotated</th>
Out of 31 genes in the original, 4 appear to be close or relatively close matches, 6 appear to have partial matches, and the rest (21) do not really match up at all. That said, there are 17 genes total in the MAKER-annotated genome, 7 of which do not appear to have any matching annotations.
<th>New</th>
 
<th>Multiple Match (Original:MAKER)</th>
For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT1G51820.1 (chr: 1 19187407-19291883) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|2-exonerate_est2genome-gene-124.1-mRNA-1 (chr: 2 12416941-12518612):
<th>Split (One gene to two+)</th>
 
<th>Joined (Two+ genes to one)</th>
Out of 20 genes in the original, 0 have close matches, 14 have partial matches, and 6 have no matches. Oddly, almost all the partial matches in the original are matching up to one gene in the MAKER-annotations. Only 10 genes are annotated in the MAKER-annotations.
</tr>
 
<tr>
For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT4G36180.1 (chr: 4 17070209-17173698) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|4-exonerate_est2genome-gene-171.7-mRNA-1 (chr: 4 17070209-17173812):
<td>1</td>
 
<td>1 544697-648473</td>
Out of 23 genes in the original, 8 appear to have close matches, 1 is mostly close but is missing a few small parts of the gene, 5 look close but are matching with multiple genes, 2 appear to have partial matches, and 7 appear to have no matches. Only 14 genes are annotated in the MAKER-annotated genome.
<td>1 544697-648473</td>
 
<td>30</td>
For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT4G06676.1 (chr: 4 3848998-3951589) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|4-exonerate_est2genome-gene-39.0-mRNA-1 (chr: 4 3848998-3951589):
<td>16</td>
 
<td>10</td>
Out of 2 genes in the original, 1 appears to be relatively close, and 1 has no corresponding annotations. Only 1 gene is present in the MAKER-annotations, and the original has 20 pseudogenes that are not annotated in the MAKER-annotated genome.
<td>4</td>
 
<td>1</td>
For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT4G10730.1 (chr: 4 6559793-6664786) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|4-exonerate_est2genome-gene-66.2-mRNA-1 (chr: 4 6559793-6664786):
<td>12</td>
 
<td>0</td>
Out of 25 genes in the original, 6 appear to be close matches, 1 has a partial match, and 18 have no syntenic matches. Only 6 genes are annotated in the MAKER-annotated genome, but all 6 have close matches.
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>1 71582-180099</td>
<td>1 71582-180099</td>
<td>36</td>
<td>21</td>
<td>15(5)</td>
<td>2</td>
<td>1</td>
<td>11</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>1 1202212-1307510</td>
<td>1 1202212-1307510</td>
<td>26</td>
<td>17</td>
<td>12</td>
<td>3</td>
<td>2</td>
<td>7</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>1 1455642-1559002</td>
<td>1 1455642-1559002</td>
<td>29</td>
<td>16</td>
<td>16(9)</td>
<td>0</td>
<td>5</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>5</td>
<td>1 1928762-2039295</td>
<td>1 1928762-2039046</td>
<td>33</td>
<td>21</td>
<td>19(2)</td>
<td>1</td>
<td>0</td>
<td>13</td>
<td>1</td>
<td>0</td>
<td>3 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>1 2540944-2656892</td>
<td>1 2542248-2656892</td>
<td>31</td>
<td>19</td>
<td>15(6)</td>
<td>1</td>
<td>6</td>
<td>9</td>
<td>0</td>
<td>0</td>
<td>4 3:1, 2 2:1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>7</td>
<td>1 2879268-2981804</td>
<td>1 2879268-2981656</td>
<td>30</td>
<td>18</td>
<td>14(2)</td>
<td>3</td>
<td>8</td>
<td>9</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>1 3382313-3487979</td>
<td>1 3382313-3487979</td>
<td>27</td>
<td>18</td>
<td>14(2)</td>
<td>1</td>
<td>2</td>
<td>9</td>
<td>0</td>
<td>0</td>
<td>2 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>9</td>
<td>1 3901597-4006840</td>
<td>1 3903894-4006840</td>
<td>32</td>
<td>15</td>
<td>10(1)</td>
<td>3</td>
<td>2</td>
<td>14</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>10</td>
<td>1 4197703-4300444</td>
<td>1 4197703-4300444</td>
<td>30</td>
<td>15</td>
<td>15(3)</td>
<td>0</td>
<td>2</td>
<td>13</td>
<td>0</td>
<td>0</td>
<td>1 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>11</td>
<td>1 4958502-5064486</td>
<td>1 4958502-5064486</td>
<td>27</td>
<td>17</td>
<td>15(3)</td>
<td>2</td>
<td>2</td>
<td>7</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>12</td>
<td>1 5380446-5485921</td>
<td>1 5380446-5485921</td>
<td>35</td>
<td>17</td>
<td>13(1)</td>
<td>2(1)</td>
<td>4</td>
<td>15</td>
<td>0</td>
<td>0</td>
<td>2 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>13</td>
<td>1 7384303-7486702</td>
<td>1 7379980-7486702</td>
<td>28</td>
<td>13</td>
<td>6</td>
<td>0</td>
<td>2</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>4 1:3, 5 1:2, 1 2:1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>14</td>
<td>1 8513858-8619927</td>
<td>1 8513858-8618644</td>
<td>30</td>
<td>15</td>
<td>9(1)</td>
<td>5(2)</td>
<td>2</td>
<td>12</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>15</td>
<td>1 9389859-9495818</td>
<td>1 9384718-9495818</td>
<td>31</td>
<td>14</td>
<td>10(2)</td>
<td>1(1)</td>
<td>5</td>
<td>11</td>
<td>0</td>
<td>0</td>
<td>4 1:2</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>16</td>
<td>1 9389859-9495818</td>
<td>1 9384718-9495818</td>
<td>26</td>
<td>15</td>
<td>8</td>
<td>5</td>
<td>2</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>3 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>17</td>
<td>1 12158853-12263571</td>
<td>1 12158853-12263571</td>
<td>22</td>
<td>5</td>
<td>4</td>
<td>0</td>
<td>1</td>
<td>17</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>18</td>
<td>1 12859293-12963916</td>
<td>1 12859293-12963916</td>
<td>18</td>
<td>7</td>
<td>3(1)</td>
<td>3(3)</td>
<td>0</td>
<td>10</td>
<td>0</td>
<td>0</td>
<td>1 2:1, 2 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>19</td>
<td>1 14108617-14211652</td>
<td>1 14108617-14211652</td>
<td>5</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>3</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>20</td>
<td>1 14108617-14211652</td>
<td>1 14108617-14211652</td>
<td>5</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>3</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>21</td>
<td>1 15553892-15657802</td>
<td>1 15553892-15657802</td>
<td>4</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>22</td>
<td>1 17682582-17808194</td>
<td>1 17682582-17805715</td>
<td>29</td>
<td>19</td>
<td>11(1)</td>
<td>6(1)</td>
<td>2</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>2 1:2</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>23</td>
<td>1 20236917-20340245</td>
<td>1 20236944-20340245</td>
<td>28</td>
<td>20</td>
<td>12(1)</td>
<td>5(2)</td>
<td>4</td>
<td>7</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>24</td>
<td>1 21984661-22089844</td>
<td>1 21984661-22089844</td>
<td>24</td>
<td>13</td>
<td>10(1)</td>
<td>1(1)</td>
<td>1</td>
<td>9</td>
<td>0</td>
<td>0</td>
<td>1 2:1, 2 1:2</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>25</td>
<td>1 26732839-26838712</td>
<td>1 26732839-26838712</td>
<td>33</td>
<td>16</td>
<td>15(2)</td>
<td>1</td>
<td>0</td>
<td>17</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>26</td>
<td>5 185117-290911</td>
<td>5 185117-290911</td>
<td>30</td>
<td>19</td>
<td>12(2)</td>
<td>5(1)</td>
<td>4(1)</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>27</td>
<td>5 2484720-2590086</td>
<td>5 2484720-2590086</td>
<td>29</td>
<td>18</td>
<td>14(6)</td>
<td>2</td>
<td>2(2)</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>2 3:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>28</td>
<td>5 5240999-5347779</td>
<td>5 5240999-5347779</td>
<td>30</td>
<td>20</td>
<td>18(4)</td>
<td>2</td>
<td>1</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>29</td>
<td>5 7948309-8052594</td>
<td>5 7948309-8052594</td>
<td>29</td>
<td>16</td>
<td>12(4)</td>
<td>2(1)</td>
<td>2</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>30</td>
<td>5 10746648-10848147</td>
<td>5 10746648-10848147</td>
<td>12</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>11</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>31</td>
<td>5 13424499-13532238</td>
<td>5 13434606-13537546</td>
<td>10</td>
<td>4</td>
<td>4(1)</td>
<td>0</td>
<td>0</td>
<td>5</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>32</td>
<td>5 16020402-16123101</td>
<td>5 16020402-16123101</td>
<td>29</td>
<td>9</td>
<td>6(3)</td>
<td>1(1)</td>
<td>4</td>
<td>17</td>
<td>0</td>
<td>1</td>
<td>1 2:1, 2 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>33</td>
<td>5 18741802-18845407</td>
<td>5 18741802-18845407</td>
<td>25</td>
<td>17</td>
<td>12(2)</td>
<td>6(3)</td>
<td>1</td>
<td>3</td>
<td>0</td>
<td>0</td>
<td>1 2:1, 2 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>34</td>
<td>5 21699561-21801099</td>
<td>5 21384155-21488362</td>
<td>26</td>
<td>15</td>
<td>0</td>
<td>0</td>
<td>3(1)</td>
<td>23</td>
<td>0</td>
<td>0</td>
<td>1 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>35</td>
<td>5 24375824-24480269</td>
<td>5 24375824-24480269</td>
<td>19</td>
<td>13</td>
<td>8(5)</td>
<td>5(1)</td>
<td>1</td>
<td>5</td>
<td>0</td>
<td>0</td>
<td>1 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>36</td>
<td>5 26726994-26835104</td>
<td>5 26726994-26835104</td>
<td>34</td>
<td>17</td>
<td>14(5)</td>
<td>3</td>
<td>2</td>
<td>12</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>37</td>
<td>5 1395509-1499568</td>
<td>5 1395509-1499568</td>
<td>28</td>
<td>19</td>
<td>17(2)</td>
<td>2</td>
<td>1(1)</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>38</td>
<td>5 3963813-4071018</td>
<td>5 3963813-4070909</td>
<td>29</td>
<td>21</td>
<td>20(2)</td>
<td>1</td>
<td>1</td>
<td>7</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>39</td>
<td>5 6645731-6751247</td>
<td>5 6645731-6751247</td>
<td>35</td>
<td>21</td>
<td>19(4)</td>
<td>2</td>
<td>2</td>
<td>12</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>40</td>
<td>5 9395950-9500584</td>
<td>5 9395950-9500584</td>
<td>28</td>
<td>16</td>
<td>12</td>
<td>3</td>
<td>1</td>
<td>12</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>41</td>
<td>5 12035328-12138569</td>
<td>5 12035328-12138569</td>
<td>4</td>
<td>4</td>
<td>3</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>42</td>
<td>5 14198202-14303272</td>
<td>5 14198202-14303110</td>
<td>17</td>
<td>8</td>
<td>6</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>43</td>
<td>5 17439327-17544830</td>
<td>5 17439468-17544830</td>
<td>28</td>
<td>14</td>
<td>8(1)</td>
<td>5(1)</td>
<td>3(1)</td>
<td>11</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>44</td>
<td>5 20126385-20238307</td>
<td>5 20126385-20238307</td>
<td>21</td>
<td>14</td>
<td>11(7)</td>
<td>2(1)</td>
<td>1</td>
<td>5</td>
<td>0</td>
<td>0</td>
<td>1 1:2, 4 1:4</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>45</td>
<td>5 22960801-23065559</td>
<td>5 22960764-23065559</td>
<td>26</td>
<td>14</td>
<td>10(5)</td>
<td>4(2)</td>
<td>2</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>2 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>46</td>
<td>5 25587492-25693902</td>
<td>5 25587492-25693902</td>
<td>30</td>
<td>23</td>
<td>19(7)</td>
<td>3</td>
<td>3(1)</td>
<td>5</td>
<td>0</td>
<td>0</td>
<td>7 3:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>47</td>
<td>3 1096943-1203341</td>
<td>3 1096943-1203341</td>
<td>32</td>
<td>19</td>
<td>17(5)</td>
<td>0</td>
<td>2</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>3 1:2, 1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>48</td>
<td>3 2215142-2329383</td>
<td>3 2215142-2329383</td>
<td>29</td>
<td>21</td>
<td>15(4)</td>
<td>3(2)</td>
<td>3(1)</td>
<td>7</td>
<td>0</td>
<td>0</td>
<td>3 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>49</td>
<td>3 3432575-3541667</td>
<td>3 3432575-3541908</td>
<td>36</td>
<td>21</td>
<td>15(4)</td>
<td>6(2)</td>
<td>4</td>
<td>11</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>50</td>
<td>3 4704624-4811185</td>
<td>3 4704624-4811185</td>
<td>25</td>
<td>15</td>
<td>15(5)</td>
<td>0</td>
<td>3</td>
<td>7</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>51</td>
<td>3 5886108-5996205</td>
<td>3 5886108-5996205</td>
<td>30</td>
<td>15</td>
<td>11</td>
<td>4</td>
<td>0</td>
<td>15</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>52</td>
<td>3 7009098-7112660</td>
<td>3 7009100-7112660</td>
<td>31</td>
<td>16</td>
<td>12(1)</td>
<td>3(1)</td>
<td>1</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1 6:1, 1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>53</td>
<td>3 8153548-8257243</td>
<td>3 8153548-8257243</td>
<td>18</td>
<td>6</td>
<td>4(3)</td>
<td>2</td>
<td>3</td>
<td>9</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>54</td>
<td>3 9258942-9363353</td>
<td>3 9258942-9363353</td>
<td>25</td>
<td>12</td>
<td>9(2)</td>
<td>3</td>
<td>2</td>
<td>10</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>55</td>
<td>3 10561071-10666301</td>
<td>3 10543921-10648775</td>
<td>16</td>
<td>5</td>
<td>1</td>
<td>4</td>
<td>0</td>
<td>9</td>
<td>0</td>
<td>0</td>
<td>5 1:3</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>56</td>
<td>3 11760867-11863509</td>
<td>3 11760867-11863509</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>57</td>
<td>3 12982458-13085348</td>
<td>3 12983120-13085348</td>
<td>7</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>58</td>
<td>3 14035354-14143449</td>
<td>3 14035354-14143449</td>
<td>5</td>
<td>3</td>
<td>2</td>
<td>1</td>
<td>0</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>59</td>
<td>3 15184235-15295034</td>
<td>3 15184235-15290697</td>
<td>12</td>
<td>6</td>
<td>5(2)</td>
<td>1</td>
<td>0</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>60</td>
<td>3 16347038-16449725</td>
<td>3 16347038-16449725</td>
<td>20</td>
<td>13</td>
<td>9(6)</td>
<td>2(1)</td>
<td>0</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>2 1:2, 3 1:3</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>61</td>
<td>3 17415629-17517888</td>
<td>3 17415629-17517888</td>
<td>23</td>
<td>13</td>
<td>11(2)</td>
<td>2(1)</td>
<td>0</td>
<td>10</td>
<td>0</td>
<td>0</td>
<td>2 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>62</td>
<td>3 18636527-18750533</td>
<td>3 18636527-18750533</td>
<td>26</td>
<td>12</td>
<td>8(3)</td>
<td>3(1)</td>
<td>1</td>
<td>11</td>
<td>0</td>
<td>0</td>
<td>1 4:1, 1 1:2, 3 1:3</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>63</td>
<td>3 19855826-19960027</td>
<td>3 19855826-19960027</td>
<td>36</td>
<td>25</td>
<td>22(2)</td>
<td>3(1)</td>
<td>1</td>
<td>10</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>64</td>
<td>3 21065078-21171051</td>
<td>3 21065078-21171051</td>
<td>32</td>
<td>19</td>
<td>15(3)</td>
<td>4(2)</td>
<td>6</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>2 2:1, 1 3:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>65</td>
<td>3 22173829-22279195</td>
<td>3 22173829-22279195</td>
<td>23</td>
<td>14</td>
<td>12(4)</td>
<td>1</td>
<td>0</td>
<td>10</td>
<td>1</td>
<td>0</td>
<td>1 3:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>66</td>
<td>3 23140428-23245727</td>
<td>3 23140428-23245727</td>
<td>31</td>
<td>15</td>
<td>11(2)</td>
<td>3</td>
<td>1</td>
<td>13</td>
<td>0</td>
<td>0</td>
<td>2 1:2</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>67</td>
<td>2 902313-1009004</td>
<td>2 902313-1007874</td>
<td>23</td>
<td>9</td>
<td>7</td>
<td>1</td>
<td>2</td>
<td>12</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>68</td>
<td>2 1933901-2035341</td>
<td>2 1925574-2026339</td>
<td>17</td>
<td>7</td>
<td>5(4)</td>
<td>1</td>
<td>0</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>1 2:1, 5 1:3</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>69</td>
<td>2 2535215-2637601</td>
<td>2 2535215-2637601</td>
<td>8</td>
<td>2</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>4</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>70</td>
<td>2 2997623-3107099</td>
<td>2 2997623-3107099</td>
<td>9</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>7</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>71</td>
<td>2 4569145-4671448</td>
<td>2 4569145-4671448</td>
<td>7</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>72</td>
<td>2 5064881-5168486</td>
<td>2 5064881-5168486</td>
<td>4</td>
<td>3</td>
<td>3</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>73</td>
<td>2 5645124-5756134</td>
<td>2 5645124-5756134</td>
<td>18</td>
<td>6</td>
<td>6(1)</td>
<td>0</td>
<td>0</td>
<td>12</td>
<td>0</td>
<td>0</td>
<td>2 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>74</td>
<td>2 6724150-6827899</td>
<td>2 6724150-6827068</td>
<td>18</td>
<td>11</td>
<td>6(3)</td>
<td>2</td>
<td>0</td>
<td>5</td>
<td>0</td>
<td>0</td>
<td>2 1:2, 3 1:3</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>75</td>
<td>2 7734455-7852230</td>
<td>2 7734455-7852230</td>
<td>28</td>
<td>18</td>
<td>14(1)</td>
<td>3</td>
<td>2</td>
<td>9</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>76</td>
<td>2 8809211-8914699</td>
<td>2 8809211-8914699</td>
<td>37</td>
<td>18</td>
<td>16(2)</td>
<td>2(1)</td>
<td>2</td>
<td>16</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>77</td>
<td>2 9716127-9819766</td>
<td>2 9716127-9819766</td>
<td>28</td>
<td>13</td>
<td>10(1)</td>
<td>1(1)</td>
<td>1</td>
<td>10</td>
<td>0</td>
<td>0</td>
<td>7 1:7</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>78</td>
<td>2 10817070-10922077</td>
<td>2 10817070-10921395</td>
<td>26</td>
<td>13</td>
<td>8(2)</td>
<td>2</td>
<td>1</td>
<td>12</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>79</td>
<td>2 11690670-11794867</td>
<td>2 11690670-11794867</td>
<td>33</td>
<td>17</td>
<td>15(3)</td>
<td>1</td>
<td>1</td>
<td>15</td>
<td>0</td>
<td>0</td>
<td>1 2:1, 2 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>80</td>
<td>2 12802632-12907369</td>
<td>2 12802632-12907608</td>
<td>25</td>
<td>16</td>
<td>14(1)</td>
<td>1</td>
<td>1</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>81</td>
<td>2 13702665-13806233</td>
<td>2 13702665-13806233</td>
<td>26</td>
<td>18</td>
<td>14(5)</td>
<td>2</td>
<td>1</td>
<td>9</td>
<td>0</td>
<td>0</td>
<td>2 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>82</td>
<td>2 14604611-14707443</td>
<td>2 14604644-14707443</td>
<td>19</td>
<td>13</td>
<td>9(4)</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>83</td>
<td>2 15650550-15755165</td>
<td>2 15650550-15755165</td>
<td>25</td>
<td>12</td>
<td>10(2)</td>
<td>2</td>
<td>1</td>
<td>11</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>84</td>
<td>2 16665089-16773406</td>
<td>2 16665089-16773406</td>
<td>26</td>
<td>15</td>
<td>13(3)</td>
<td>0</td>
<td>3</td>
<td>9</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>85</td>
<td>2 17684541-17788679</td>
<td>2 17684541-17788679</td>
<td>24</td>
<td>15</td>
<td>14(3)</td>
<td>1</td>
<td>3</td>
<td>5</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>86</td>
<td>2 18707881-18822229</td>
<td>2 18707881-18822229</td>
<td>32</td>
<td>18</td>
<td>15(6)</td>
<td>3(1)</td>
<td>5</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>4 1:3</td>
<td>0</td>
<td>0</td>
</tr>
<td>87</td>
<td>4 751713-858018</td>
<td>4 751713-858018</td>
<td>31</td>
<td>16</td>
<td>10(1)</td>
<td>3</td>
<td>0</td>
<td>12</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>88</td>
<td>4 1722114-1824380</td>
<td>4 1722169-1824380</td>
<td>3</td>
<td>2</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>2</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>89</td>
<td>4 17593684-17695729</td>
<td>4 17593684-17695729</td>
<td>23</td>
<td>13</td>
<td>10</td>
<td>3(1)</td>
<td>0</td>
<td>9</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>90</td>
<td>4 16618075-16721974</td>
<td>4 16618075-16721974</td>
<td>25</td>
<td>15</td>
<td>11(3)</td>
<td>4(2)</td>
<td>2</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>91</td>
<td>4 2696288-2802663</td>
<td>4 2696288-2802663</td>
<td>21</td>
<td>11</td>
<td>9(3)</td>
<td>2</td>
<td>0</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>1 2:1, 5 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>92</td>
<td>4 3714499-3816439</td>
<td>4 3714499-3816439</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>93</td>
<td>4 4837112-4938939</td>
<td>4 4837112-4938939</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>94</td>
<td>4 6473657-6578058</td>
<td>4 6473696-6578058</td>
<td>22</td>
<td>11</td>
<td>8(5)</td>
<td>2</td>
<td>1</td>
<td>10</td>
<td>0</td>
<td>0</td>
<td>9 1:2</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>95</td>
<td>4 8312586-8416525</td>
<td>4 8312586-8416525</td>
<td>23</td>
<td>12</td>
<td>7(3)</td>
<td>3</td>
<td>2(1)</td>
<td>11</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>96</td>
<td>4 10189947-10297372</td>
<td>4 10189947-10297372</td>
<td>30</td>
<td>15</td>
<td>13(3)</td>
<td>1</td>
<td>4</td>
<td>11</td>
<td>0</td>
<td>0</td>
<td>1 3:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>97</td>
<td>4 11983703-12093572</td>
<td>4 11983703-12093572</td>
<td>26</td>
<td>16</td>
<td>13(1)</td>
<td>3</td>
<td>2</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>1 2:1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>98</td>
<td>4 13898993-14007840</td>
<td>4 13898993-14007840</td>
<td>38</td>
<td>23</td>
<td>22(2)</td>
<td>0</td>
<td>2</td>
<td>14</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>99</td>
<td>4 14876974-14979681</td>
<td>4 14876974-14979681</td>
<td>32</td>
<td>22</td>
<td>21(1)</td>
<td>1(1)</td>
<td>2</td>
<td>8</td>
<td>0</td>
<td>0</td>
<td>2 1:2, 3 1:3</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>100</td>
<td>4 15741039-15845643</td>
<td>4 15741039-15845643</td>
<td>22</td>
<td>19</td>
<td>16(5)</td>
<td>3(2)</td>
<td>0</td>
<td>3</td>
<td>0</td>
<td>0</td>
<td>2 1:2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>Column Totals</td>
<td>N/A</td>
<td>N/A</td>
<td>2362</td>
<td>1312</td>
<td>1006(219)</td>
<td>198(42)</td>
<td>156(9)</td>
<td>852</td>
<td>3</td>
<td>1</td>
<td>N/A</td>
<td>2</td>
<td>22</td>
</tr>
</table>


For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT3G22270.1 (chr: 3 7824480-7927857) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|3-exonerate_est2genome-gene-78.4-mRNA-1 (chr: 3 7824480-7927857):
== Summary Percentages ==


Out of 24 genes in the original, 9 appear to be close matches, 6 appear to have partial matches, 4 look close but are matching up with multiple genes, and 5 have no matches. 17 genes are annotated in the MAKER-annotated genome.
1312/2362 = 56% of genes annotated


For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT2G18470.1 (chr: 2 7955285-8057767) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|2-exonerate_est2genome-gene-80.8-mRNA-1 (chr: 2 7955285-8057767):
1006/1312 = 77% had same CDS


Out of 24 genes in the original, 12 appear to be close matches, 1 appears to be a partial match, and 11 have no matches. Only 13 genes are annotated in the MAKER-annotated genome.
1204/1312 = 92% had similar or better CDS


For the regions: Arabidopsis thaliana Col-0 (thale cress) (TAIR v10.02, unmasked) AT1G62340.1 (chr: 1 23001123-23105656) and Arabidopsis thaliana (genbank vLJK-Rotation, unmasked) maker-gi|1-exonerate_est2genome-gene-230.10-mRNA-1 (chr: 1 23001123-23105656):
852/2362 = 36% of genes were not annotated


Out of 21 genes in the original, 6 appear to have close matches, 6 appear to have partial matches that are almost close matches, 1 appears to have a partial match that is not a close match, and 8 appear to have no matching annotations. There are only 12 genes in the MAKER-annotated genome.


<UNDER CONSTRUCTION>
Note: 56% + 36% = 92%, so where is the missing 8%? The answer is that many genes had multiple syntenic matches, and these are accounted for in the "partial matches" column and the "joined" and "split" columns. If these were to be separated out, then the total missed genes would be (2362 * 0.08) + 852 = 1041.

Latest revision as of 21:45, 17 March 2015

MAKER is a genome annotation pipeline[1]. It allows for a researcher or group of researchers to take a genome, some amount of evidence (for example, an EST file, a protein file (both in FASTA format), and a repeat file, and potentially more), and create structural annotations for a genome. It is capable of training HMM files in order to provide better annotations for a genome with little evidence, although this takes many runs. This page is an attempt to document the work being done to add MAKER into CoGe.

How to Download and Install MAKER

MAKER may be downloaded from the Yandell lab, here: http://www.yandell-lab.org/software/maker.html. The full installation instructions may be found here: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial. The instructions here will just serve as a brief overview for getting MAKER running on the command line in UNIX.

1. Register and download MAKER from the Yandell lab MAKER software page.

2. Unpack MAKER in whichever folder it will be run from.

3. Download and install prerequisites if they are not installed. The minimum prerequisites are:

    a. BioPerl and various other Perl modules, listed below (see the MAKER documentation for a complete list[2]).
         i. BioPerl
         ii. DBI
         iii. Error
         iv. Error::Simple
         v. File::NFSLock
         vi. File::Which
         vii. Inline
         viii. Perl::Unsafe::Signals
         ix. Proc::Signal
         x. URI::Escape
         xi. Bit::Vector
         xii. Inline::C
         xiii. PerlIO::gzip
         xiv. forks
         xv. forks::shared
    b. SNAP
    c. Exonerate
    d. RepeatMasker
    e. NCBI BLAST

4. Add MAKER and its prerequisites to $PATH. For example, the paths might look something like:

    a. MAKER: export PATH="/home/user/maker/bin:$PATH"
    b. RepeatMasker: export PATH="/home/user/RepeatMasker:$PATH"
    c. Exonerate: export PATH="/home/user/exonerate-2.2.0-x86_64/bin:$PATH"
    d. SNAP: export PATH="/home/user/snap:$PATH"

Running MAKER from the UNIX command line

1. Set up the MAKER control files by typing "maker -CTL".

    a. To set the genome sequence, enter the path to the genome FASTA file after "genome=". So, this might look like "genome=dpp_contig.fasta".
       Leave a space between this and the commented description (which starts with a "#" symbol).
    b. Set the EST or mRNA data by typing the path to the desired EST or mRNA fasta file after "est=".
    c. To have MAKER generate structural annotations directly from EST or mRNA data, change "est2genome=0" to "est2genome=1".
    d. Change any other desired settings. For more details on these settings and how to set them, see the full MAKER tutorial[3].

2. Open the MAKER boot options file ("maker_bopts.ctl") and ensure that the correct BLAST search type is selected.

    a. For example, to use NCBI-BLAST, set "blast_type=ncbi+".

3. Edit the MAKER options file ("maker_opts.ctl") to the desired settings.

4. Run MAKER by typing "maker" on the command line.

    a. To prevent MAKER from eating up too many system resources, use the "nice" command.
    b. To run MAKER in the background, and to avoid being spammed with constant messages, send the messages to both stdout and stderr with "&> somefile.txt".
    c. Putting both "a" and "b" together, the command might look something like, "nice maker &> log.txt".

5. MAKER may take about a day or more to run, depending on the size of the genome it is attempting to annotate and the evidence it is given. Once finished, it should place all files into a "genomename.maker.output" folder, with all the data split into separate folders by contigs. Each one of these folders will have several sub-folders, one of which should contain a "chromosomename.gff" file which contains the structural annotations, repeats, and will always end with the contig sequence.

Testing MAKER annotation accuracy with minimal data

The point of this exercise is to see how little data can be used with MAKER and still get reasonably accurate annotations.

The Arabidopsis thaliana genome was annotated using MAKER with the Arabidopsis lyrata mRNA data as evidence. Both the genome and the mRNA annotations were downloaded from GenBank. The A. thaliana genome with the annotations from MAKER using the A. lyrata mRNA evidence was uploaded to CoGe (genome id 25440) and compared to the existing A. thaliana annotations (genome id 16911). The analysis can be recreated here: https://genomevolution.org/r/f6kd.

A fair amount of synteny exists between the two genomes, although noise is present. The actual gene regions are currently being compared to assess accuracy directly. The SynMap listed above is used, and then BLAST is set to "BlastN: Small Regions" in GEvo. Also, under "Sequence Options," "Mask Sequence" is set to "Non-CDS" for both sequences.

Table explanation:

    Sample Number: Numerically tracks each trial.
    Coords id16911: The coordinates in the original genome.
    Coords id39871: The coordinates in the MAKER-annotated genome.
    Total Genes id16911: The total number of genes annotated in the original genome. Does not include genes only partially visible in the genome-viewer.
    Total Genes id39871: The total number of genes annotated in the MAKER-annotated genome. Does not include genes only partially visible in the genome-viewer.
    Close Match: The CDS regions of the gene look the same by crude visual analysis. Numbers in parentheses represent the number of matches where a small piece of the gene matched to a different gene.
    Similar Match: A few small differences in the CDS regions can be seen. Numbers in parentheses represent the number of matches where a small piece of the gene matched to a different gene.
    Partial Match: Only a relatively small piece of the genes are matching. Numbers in parentheses represent the number of matches where a small piece of the gene matched to a different gene.
    No Match: The original gene is not represented in the new annotations.
    Pseudogene Misannotated: A pseudogene has been annotated as a functional gene in the MAKER annotations.
    New: A gene in the MAKER annotations is not found in the original genome's annotations.
    Multiple Match (Original:MAKER): These are shown as matching to multiple regions, with the ratio of the number of places on the original genome annotations that match followed by the number of places on the MAKER-annotated genome that match. The quantities of these are also listed. So, an example would look like 1 (only one such multiple match found) 2 (two matches found in the original annotations) : 1 (one match found in the MAKER annotations).
    Split (One gene to two+): Matches where only one gene in the original is present and multiple genes in the MAKER-annotated genome are found. This is different from a multiple match because each region is only matching one place.
    Joined (Two+ genes to one): Matches where multiple genes are present in the original and only one gene is present in the MAKER-annotated genome. This is different from a multiple match because each region is only matching one place.
Sample Number Coords id16911 Coords id39871 Total Genes id16911 Total Genes id39871 Close Match Similar Match Partial Match No Match Pseudogene Misannotated New Multiple Match (Original:MAKER) Split (One gene to two+) Joined (Two+ genes to one)
1 1 544697-648473 1 544697-648473 30 16 10 4 1 12 0 0 0 0 0
2 1 71582-180099 1 71582-180099 36 21 15(5) 2 1 11 0 0 1 2:1 0 1
3 1 1202212-1307510 1 1202212-1307510 26 17 12 3 2 7 0 0 1 2:1 0 0
4 1 1455642-1559002 1 1455642-1559002 29 16 16(9) 0 5 8 0 0 1 2:1 0 1
5 1 1928762-2039295 1 1928762-2039046 33 21 19(2) 1 0 13 1 0 3 2:1 0 0
6 1 2540944-2656892 1 2542248-2656892 31 19 15(6) 1 6 9 0 0 4 3:1, 2 2:1 0 1
7 1 2879268-2981804 1 2879268-2981656 30 18 14(2) 3 8 9 0 0 0 0 1
8 1 3382313-3487979 1 3382313-3487979 27 18 14(2) 1 2 9 0 0 2 2:1 0 0
9 1 3901597-4006840 1 3903894-4006840 32 15 10(1) 3 2 14 0 0 0 0 1
10 1 4197703-4300444 1 4197703-4300444 30 15 15(3) 0 2 13 0 0 1 1:2 0 0
11 1 4958502-5064486 1 4958502-5064486 27 17 15(3) 2 2 7 0 0 1 2:1 0 0
12 1 5380446-5485921 1 5380446-5485921 35 17 13(1) 2(1) 4 15 0 0 2 1:2 0 0
13 1 7384303-7486702 1 7379980-7486702 28 13 6 0 2 8 0 0 4 1:3, 5 1:2, 1 2:1 0 1
14 1 8513858-8619927 1 8513858-8618644 30 15 9(1) 5(2) 2 12 0 0 0 0 1
15 1 9389859-9495818 1 9384718-9495818 31 14 10(2) 1(1) 5 11 0 0 4 1:2 0 2
16 1 9389859-9495818 1 9384718-9495818 26 15 8 5 2 8 0 0 3 1:2 0 0
17 1 12158853-12263571 1 12158853-12263571 22 5 4 0 1 17 0 0 0 0 0
18 1 12859293-12963916 1 12859293-12963916 18 7 3(1) 3(3) 0 10 0 0 1 2:1, 2 1:2 0 0
19 1 14108617-14211652 1 14108617-14211652 5 2 1 1 0 3 0 0 0 0 0
20 1 14108617-14211652 1 14108617-14211652 5 2 1 1 0 3 0 0 0 0 0
21 1 15553892-15657802 1 15553892-15657802 4 2 1 1 0 2 0 0 0 0 0
22 1 17682582-17808194 1 17682582-17805715 29 19 11(1) 6(1) 2 8 0 0 2 1:2 0 1
23 1 20236917-20340245 1 20236944-20340245 28 20 12(1) 5(2) 4 7 0 0 0 0 0
24 1 21984661-22089844 1 21984661-22089844 24 13 10(1) 1(1) 1 9 0 0 1 2:1, 2 1:2 0 1
25 1 26732839-26838712 1 26732839-26838712 33 16 15(2) 1 0 17 0 0 0 0 0
26 5 185117-290911 5 185117-290911 30 19 12(2) 5(1) 4(1) 8 0 0 0 0 0
27 5 2484720-2590086 5 2484720-2590086 29 18 14(6) 2 2(2) 6 0 0 2 3:1 0 0
28 5 5240999-5347779 5 5240999-5347779 30 20 18(4) 2 1 8 0 0 1 2:1 0 0
29 5 7948309-8052594 5 7948309-8052594 29 16 12(4) 2(1) 2 6 0 0 0 0 2
30 5 10746648-10848147 5 10746648-10848147 12 1 1 0 0 11 0 0 0 0 0
31 5 13424499-13532238 5 13434606-13537546 10 4 4(1) 0 0 5 0 0 1 2:1 0 0
32 5 16020402-16123101 5 16020402-16123101 29 9 6(3) 1(1) 4 17 0 1 1 2:1, 2 1:2 0 0
33 5 18741802-18845407 5 18741802-18845407 25 17 12(2) 6(3) 1 3 0 0 1 2:1, 2 1:2 0 0
34 5 21699561-21801099 5 21384155-21488362 26 15 0 0 3(1) 23 0 0 1 1:2 0 0
35 5 24375824-24480269 5 24375824-24480269 19 13 8(5) 5(1) 1 5 0 0 1 1:2 0 0
36 5 26726994-26835104 5 26726994-26835104 34 17 14(5) 3 2 12 0 0 1 2:1 0 0
37 5 1395509-1499568 5 1395509-1499568 28 19 17(2) 2 1(1) 8 0 0 0 0 0
38 5 3963813-4071018 5 3963813-4070909 29 21 20(2) 1 1 7 0 0 0 0 0
39 5 6645731-6751247 5 6645731-6751247 35 21 19(4) 2 2 12 0 0 0 0 0
40 5 9395950-9500584 5 9395950-9500584 28 16 12 3 1 12 0 0 0 0 0
41 5 12035328-12138569 5 12035328-12138569 4 4 3 1 0 0 0 0 0 0 0
42 5 14198202-14303272 5 14198202-14303110 17 8 6 1 0 0 0 0 0 0 1
43 5 17439327-17544830 5 17439468-17544830 28 14 8(1) 5(1) 3(1) 11 0 0 1 2:1 0 0
44 5 20126385-20238307 5 20126385-20238307 21 14 11(7) 2(1) 1 5 0 0 1 1:2, 4 1:4 0 1
45 5 22960801-23065559 5 22960764-23065559 26 14 10(5) 4(2) 2 8 0 0 2 2:1 0 0
46 5 25587492-25693902 5 25587492-25693902 30 23 19(7) 3 3(1) 5 0 0 7 3:1 0 0
47 3 1096943-1203341 3 1096943-1203341 32 19 17(5) 0 2 8 0 0 3 1:2, 1 2:1 0 0
48 3 2215142-2329383 3 2215142-2329383 29 21 15(4) 3(2) 3(1) 7 0 0 3 1:2 0 0
49 3 3432575-3541667 3 3432575-3541908 36 21 15(4) 6(2) 4 11 0 0 0 0 0
50 3 4704624-4811185 3 4704624-4811185 25 15 15(5) 0 3 7 0 0 0 0 0
51 3 5886108-5996205 3 5886108-5996205 30 15 11 4 0 15 0 0 0 0 0
52 3 7009098-7112660 3 7009100-7112660 31 16 12(1) 3(1) 1 8 0 0 1 6:1, 1 2:1 0 0
53 3 8153548-8257243 3 8153548-8257243 18 6 4(3) 2 3 9 0 0 0 0 0
54 3 9258942-9363353 3 9258942-9363353 25 12 9(2) 3 2 10 0 0 0 0 0
55 3 10561071-10666301 3 10543921-10648775 16 5 1 4 0 9 0 0 5 1:3 0 0
56 3 11760867-11863509 3 11760867-11863509 3 1 1 0 0 2 0 0 0 0 0
57 3 12982458-13085348 3 12983120-13085348 7 1 0 1 0 6 0 0 0 0 0
58 3 14035354-14143449 3 14035354-14143449 5 3 2 1 0 2 0 0 0 0 0
59 3 15184235-15295034 3 15184235-15290697 12 6 5(2) 1 0 6 0 0 0 0 0
60 3 16347038-16449725 3 16347038-16449725 20 13 9(6) 2(1) 0 8 0 0 2 1:2, 3 1:3 1 0
61 3 17415629-17517888 3 17415629-17517888 23 13 11(2) 2(1) 0 10 0 0 2 1:2 0 0
62 3 18636527-18750533 3 18636527-18750533 26 12 8(3) 3(1) 1 11 0 0 1 4:1, 1 1:2, 3 1:3 0 0
63 3 19855826-19960027 3 19855826-19960027 36 25 22(2) 3(1) 1 10 0 0 0 0 0
64 3 21065078-21171051 3 21065078-21171051 32 19 15(3) 4(2) 6 6 0 0 2 2:1, 1 3:1 0 0
65 3 22173829-22279195 3 22173829-22279195 23 14 12(4) 1 0 10 1 0 1 3:1 0 0
66 3 23140428-23245727 3 23140428-23245727 31 15 11(2) 3 1 13 0 0 2 1:2 0 1
67 2 902313-1009004 2 902313-1007874 23 9 7 1 2 12 0 0 1 2:1 0 0
68 2 1933901-2035341 2 1925574-2026339 17 7 5(4) 1 0 6 0 0 1 2:1, 5 1:3 0 0
69 2 2535215-2637601 2 2535215-2637601 8 2 2 0 0 4 0 0 0 0 0
70 2 2997623-3107099 2 2997623-3107099 9 1 1 0 0 7 0 0 0 0 0
71 2 4569145-4671448 2 4569145-4671448 7 1 1 0 0 5 0 0 0 0 0
72 2 5064881-5168486 2 5064881-5168486 4 3 3 0 0 1 0 0 0 0 0
73 2 5645124-5756134 2 5645124-5756134 18 6 6(1) 0 0 12 0 0 2 1:2 0 0
74 2 6724150-6827899 2 6724150-6827068 18 11 6(3) 2 0 5 0 0 2 1:2, 3 1:3 1 1
75 2 7734455-7852230 2 7734455-7852230 28 18 14(1) 3 2 9 0 0 0 0 0
76 2 8809211-8914699 2 8809211-8914699 37 18 16(2) 2(1) 2 16 0 0 1 2:1 0 0
77 2 9716127-9819766 2 9716127-9819766 28 13 10(1) 1(1) 1 10 0 0 7 1:7 0 2
78 2 10817070-10922077 2 10817070-10921395 26 13 8(2) 2 1 12 0 0 0 0 0
79 2 11690670-11794867 2 11690670-11794867 33 17 15(3) 1 1 15 0 0 1 2:1, 2 1:2 0 0
80 2 12802632-12907369 2 12802632-12907608 25 16 14(1) 1 1 8 0 0 1 2:1 0 0
81 2 13702665-13806233 2 13702665-13806233 26 18 14(5) 2 1 9 0 0 2 1:2 0 0
82 2 14604611-14707443 2 14604644-14707443 19 13 9(4) 2 3 4 0 0 0 0 0
83 2 15650550-15755165 2 15650550-15755165 25 12 10(2) 2 1 11 0 0 1 2:1 0 0
84 2 16665089-16773406 2 16665089-16773406 26 15 13(3) 0 3 9 0 0 0 0 0
85 2 17684541-17788679 2 17684541-17788679 24 15 14(3) 1 3 5 0 0 1 2:1 0 0
86 2 18707881-18822229 2 18707881-18822229 32 18 15(6) 3(1) 5 8 0 0 4 1:3 0 0
87 4 751713-858018 4 751713-858018 31 16 10(1) 3 0 12 0 0 1 2:1 0 2
88 4 1722114-1824380 4 1722169-1824380 3 2 0 1 0 2 1 0 0 0 0
89 4 17593684-17695729 4 17593684-17695729 23 13 10 3(1) 0 9 0 0 1 2:1 0 0
90 4 16618075-16721974 4 16618075-16721974 25 15 11(3) 4(2) 2 8 0 0 1 2:1 0 0
91 4 2696288-2802663 4 2696288-2802663 21 11 9(3) 2 0 6 0 0 1 2:1, 5 1:2 0 0
92 4 3714499-3816439 4 3714499-3816439 1 1 1 0 0 0 0 0 0 0 0
93 4 4837112-4938939 4 4837112-4938939 4 2 2 0 0 0 0 0 0 0 0
94 4 6473657-6578058 4 6473696-6578058 22 11 8(5) 2 1 10 0 0 9 1:2 0 1
95 4 8312586-8416525 4 8312586-8416525 23 12 7(3) 3 2(1) 11 0 0 0 0 0
96 4 10189947-10297372 4 10189947-10297372 30 15 13(3) 1 4 11 0 0 1 3:1 0 0
97 4 11983703-12093572 4 11983703-12093572 26 16 13(1) 3 2 8 0 0 1 2:1 0 0
98 4 13898993-14007840 4 13898993-14007840 38 23 22(2) 0 2 14 0 0 0 0 0
99 4 14876974-14979681 4 14876974-14979681 32 22 21(1) 1(1) 2 8 0 0 2 1:2, 3 1:3 0 0
100 4 15741039-15845643 4 15741039-15845643 22 19 16(5) 3(2) 0 3 0 0 2 1:2 0 0
Column Totals N/A N/A 2362 1312 1006(219) 198(42) 156(9) 852 3 1 N/A 2 22

Summary Percentages

1312/2362 = 56% of genes annotated

1006/1312 = 77% had same CDS

1204/1312 = 92% had similar or better CDS

852/2362 = 36% of genes were not annotated


Note: 56% + 36% = 92%, so where is the missing 8%? The answer is that many genes had multiple syntenic matches, and these are accounted for in the "partial matches" column and the "joined" and "split" columns. If these were to be separated out, then the total missed genes would be (2362 * 0.08) + 852 = 1041.