MAP

Download

Version 2009-12-31: map-2009-12-31.tar.gz
The individual source code files are available here, in case you would like to take a look at the code before downloading.
README file

This tarball, map-2009-12-31.tar.gz, contains a version of the source
code for the sequence alignment method described in the following
paper:

    Tamer Kahveci, Vebjorn Ljosa, and Ambuj K. Singh: “Speeding up
    whole-genome alignment by indexing frequency vectors,”
    Bioinformatics, 20(13), p. 2122–2134, 2004,
    doi:10.1093/bioinformatics/bth212.

I write "a version" because there is some confusion about which
version of the code is the right one.  We submitted the paper on
2003-06-21.  In our eagerness to start our next project (a sequence
assembly approach that used some of the same ideas as MAP and building
on its codebase), we did not properly use tags and branches to
segregate the projects in CVS.  What is packaged up here is
functionally identical to what was current in CVS on 2003-07-16.  It
appears that I gave this version to someone who asked for it, so I
presume that it works.

The code is unchanged since then, but I have modified the build system
slightly in order to make the code build with current versions of
automake, autoconf, et al.  You should be able to compile it as
follows:

    ./configure
    make

The program `mkmt` creates a match table, and the program `findex`
creates an F-index.  Although there is not much documentation, the `-h`
options to `mkmt` and `findex` are decent starting points:

map$ ~/tmp/mapinst/bin/mkmt -h
Usage: /Users/ljosa/tmp/mapinst/bin/mkmt [OPTION]... INDEX
Construct the match table for an INDEX and QUERY

      -h            Display this help.
      -V            Display version number.
      -a            Output match table in ASCII format.
      -b            Output match table in binary format.
      -B            Use boxes for both strings.
      -t            Use B+-trees.  (Cannot be used with -n.)
      -n            Nested-loop join.  (Cannot be used with -t.)
      -p NUMBER     Partition the vector space NUMBER ways in each dimension.
      -e NUMBER     Use error rate NUMBER (default: 0.010000).
      -i FILENAME   Read query string from FILENAME instead of standard input.
      -o FILENAME   Write matchtable to FILENAME instead of standard output.
      -q            Be quiet, don't print statistics to standard output.

map$ ~/tmp/mapinst/bin/findex -h
Usage: /Users/ljosa/tmp/mapinst/bin/findex [OPTION]...
Construct an F-index

  -h            Display this help.
  -V            Display version number.
  -w INTEGER    Window size (default: 4096).
  -c ARGUMENT   Box capacity (default: 1000).
                A numerical argument implies static box capacity.  The following
                arguments specify that adaptive techniques should be used:
                  volume  - Fixed volume.  Requires -v.
                  density - Fixed density.  Requires -d, can use -l.
                  mhistv  - Minimal total volume.  Requires -n.
                  mhistd  - Maximal total density.  Requires -n.
  -v NUMBER     Maximum volume for boxes.  For use with -c volume.
  -d NUMBER     Minimum density for boxes.  For use with -c density.
  -l            Use lookahead.  For use with -c density.
  -n INTEGER    Desired number of boxes.  For use with -c mhistv and -c mhistd.
  -i FILENAME   Read input from FILENAME instead of standard input.
  -o FILENAME   Write index to FILENAME instead of standard output.
  -q            Be quiet, don't print statistics to standard output.
  -D            Debug; turn on verbose output.

The code to partition a match table and order the MBRs for interactive
search lives inside the older, monolithic program `map`.  Run it
without arguments to see which arguments to pass in.  Be warned that
`map` is a lot less polished than `mkmt` and `findex`.  For instance,
it will usually just crash if its arguments or input files are not as
expected.

Please let me know whether the programs work for you.  The code is
released under the BSD license; see the file `LICENSE` for details.


2009-12-31

Vebjorn Ljosa
Broad Institute of MIT and Harvard
ljosa@broad.mit.edu
vebjorn@ljosa.com
http://www.ljosa.com/~ljosa/
Last updated: 2009-12-31