Table< GENOME_TYPE > Class Template Reference

#include <table.h>

List of all members.

Public Member Functions

 Table ()
 Table (ifstream &fd, long numOfRows, long cleanMinLen, double dupPerc, ifstream &orag1DataFile, ifstream &org2DataFile)
 Table (const Table< GENOME_TYPE > &t)
 ~Table ()
void CleanDup ()
void CleanShortEntries ()
void CleanDeleted ()
void CleanProphages (ifstream &fd, int orgNum, double perc)
void CleanTE (ifstream &fd, int orgNum, double perc)
void PrintTable (ofstream &ofs, int org)
void Clustering ()
void RemoveOverlaps ()
void IterativeRemoveOverlaps ()
void RemoveNuisances (double orthPerc)
void RemoveParalogs (double orthPerc)
void Loop (double orthPerc, const char *deletedParFile1, const char *deletedParFile2)
void RemoveFinalConflicts ()
void RemoveFinalOverlaping ()

Private Types

typedef std::set< Entry< GENOME_TYPE >,
compare_entries_1< GENOME_TYPE
> >::iterator 
entryIter

Private Member Functions

void InsertEntries (ifstream &fd)
void SwapEntries (unsigned long i, unsigned long j)
void DeleteEntry (unsigned long i)
void SortBy (int org)
double Intersec (unsigned long entry, GENOME_TYPE S, GENOME_TYPE E, int orgNum) const
GENOME_TYPE InersecLen (GENOME_TYPE S, GENOME_TYPE E, GENOME_TYPE tmpS, GENOME_TYPE tmpE, int orgNum) const
GENOME_TYPE GetLength (GENOME_TYPE S, GENOME_TYPE E, int orgNum) const
void ManipulateStartsAndEnds (GENOME_TYPE &S1, GENOME_TYPE &E1, GENOME_TYPE &S2, GENOME_TYPE &E2, int orgNum) const
bool IsDuplication (unsigned long entry, unsigned ofEntry, int orgNum) const
set< Entry< GENOME_TYPE >,
compare_entries_1< GENOME_TYPE > > 
GetDupGroup (unsigned long entry, int orgNum, GENOME_TYPE *endArr)
void FillEnds (GENOME_TYPE *arr, int orgNum) const
unsigned long BinarySearch (GENOME_TYPE S1, GENOME_TYPE E1, GENOME_TYPE S2, GENOME_TYPE E2, int orgNum) const
void Join (unsigned long *arr)
void FilterSet (set< Entry< GENOME_TYPE >, compare_entries_1< GENOME_TYPE > > &s, unsigned long j, int org, bool flag, bool sign=false)
void UpDateCircInfo ()
bool JoinForOverlap (unsigned long i, unsigned long j)
void ReadFromProbFile (ifstream &fileName, string &genumName, GENOME_TYPE &genumLen, bool &circ)
void PrintDeletedParalogs (ofstream &file1, ofstream &file2)
bool JoinForDeletedParalogs (unsigned long i, unsigned long j, int orgNum)

Private Attributes

Entry< GENOME_TYPE > * entries
unsigned long numOfEntries
unsigned long tableSize
unsigned long firstCircOrg1
unsigned long firstCircOrg2
unsigned long lastCircOrg1
unsigned long lastCircOrg2
Genome< GENOME_TYPE > organism1
Genome< GENOME_TYPE > organism2
GENOME_TYPE minLen
Entry< GENOME_TYPE > * deletedParalog1
Entry< GENOME_TYPE > * deletedParalog2
unsigned long deletedParalogCount1
unsigned long deletedParalogCount2
double dupPerc


Detailed Description

template<class GENOME_TYPE>
class Table< GENOME_TYPE >

Class name: Table

This class hold the information of the table, and all the operations that are made on the table. The information includes: 1) An array of entries. 2) Number of entries in the table. 3) information about the first genome (of type Genome). 4) information about the second genome (of type Genome).

Author:
Rani Zand


Member Typedef Documentation

template<class GENOME_TYPE>
typedef std::set< Entry<GENOME_TYPE> , compare_entries_1<GENOME_TYPE> >::iterator Table< GENOME_TYPE >::entryIter [private]
 

Used to scan a set.


Constructor & Destructor Documentation

template<class GENOME_TYPE>
Table< GENOME_TYPE >::Table  )  [inline]
 

Constructor.

template<class GENOME_TYPE>
Table< GENOME_TYPE >::Table ifstream &  fd,
long  numOfRows,
long  cleanMinLen,
double  dupPerc,
ifstream &  orag1DataFile,
ifstream &  org2DataFile
 

Constructor.

Parameters:
fd the file that contains the entries of the table.
numOfRows number of rows in the table.
cleanMinLen a parameter used in 'CleanShortEntries' method.
dupPerc a parameter used in 'IsDuplication' method.
org1DataFile the path of "org1".prob
org2DataFile the path of "org2".prob

template<class GENOME_TYPE>
Table< GENOME_TYPE >::Table const Table< GENOME_TYPE > &  t  ) 
 

Copy constructor.

Parameters:
t the table to copy from.

template<class GENOME_TYPE>
Table< GENOME_TYPE >::~Table  )  [inline]
 

Distructor.


Member Function Documentation

template<class GENOME_TYPE>
unsigned long Table< GENOME_TYPE >::BinarySearch GENOME_TYPE  S1,
GENOME_TYPE  E1,
GENOME_TYPE  S2,
GENOME_TYPE  E2,
int  orgNum
const [private]
 

Search for entry S1,E1,S2,E2 and return th index , assuming table is sorted by 'orgNum'.

Parameters:
S1 the start of the first segment.
E1 the end of the first segment.
S2 the start of the second segment.
E2 the end of the second segment.
orgNum the number of the organism.
Returns:
the index of the entry.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::CleanDeleted  ) 
 

Remove the unused entries from the table.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::CleanDup  ) 
 

Clean duplicated entries (same S,E in both organisms) from the table.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::CleanProphages ifstream &  fd,
int  orgNum,
double  perc
 

This method gets a file 'fd' that contains S and Es of prophages, and deletes all the entries that the percentage of intersection with these S and Es (sum for all prophages) in organism 'orgNum' is bigger than 'perc'.

Parameters:
fd the prophages file.
orgNum the number of the organism to work on.
perc the percentage of intersection that according to it entries are deleted.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::CleanShortEntries  ) 
 

Clean entries that their length is one of the organisms at least is smaller thab 'minLen'.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::CleanTE ifstream &  fd,
int  orgNum,
double  perc
 

This method gets a file 'fd' that contains S and Es of transposable elements, and deletes all the entries that the percentage of intersection with these S and Es (sum for all tansposable elements) in organism 'orgNum' is bigger than 'perc'.

Parameters:
fd the transposable elements file.
orgNum the number of the organism to work on.
perc the percentage of intersection that according to it entries are deleted.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::Clustering  ) 
 

This method makes the Clustering step that is explained........

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::DeleteEntry unsigned long  i  )  [private]
 

Delete entry i from the table

Parameters:
i the entry to delete.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::FillEnds GENOME_TYPE *  arr,
int  orgNum
const [private]
 

This method updates the vaues of 'arr' which is an array used to find the duplicates of an entry. (more details are found in function 'GetDupGroup').

Parameters:
arr the array that is updated.
orgNum the number of the organism.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::FilterSet set< Entry< GENOME_TYPE >, compare_entries_1< GENOME_TYPE > > &  s,
unsigned long  j,
int  org,
bool  flag,
bool  sign = false
[private]
 

This method filters all entries of the set 's' that are duplicate to entry 'j' in organism 'org'. If 'flag' = true , it removes all the entries of the set 's' that their sign isnt 'sign'.

Parameters:
s the Set to filter.
j the entry that 's' is filtered according to it.
org the number of the organism.
flag if true then filter also according to 'sign', if false not.
sign the sign to filter according to if flag = true.

template<class GENOME_TYPE>
set< Entry< GENOME_TYPE >, compare_entries_1< GENOME_TYPE > > Table< GENOME_TYPE >::GetDupGroup unsigned long  entry,
int  orgNum,
GENOME_TYPE *  endArr
[private]
 

Return a set of the entries that are duplicates of entry 'entry'. In order to find the duplications of entry 'entry' we used and array 'endArr' that its size is the same as the size of the table. The value of each entry in 'endArr' means that for this entry and all smaller entries in the table, that value is the bigget E of all entries in organism 'orgNum'. assuming the table is sorted according to 'orgNum'.

Parameters:
entry the entry that its duplications are returned.
orgNum the number of the organism.
endArr the array used to find the duplications.
Returns:
a set of the duplication entries.

template<class GENOME_TYPE>
GENOME_TYPE Table< GENOME_TYPE >::GetLength GENOME_TYPE  S,
GENOME_TYPE  E,
int  orgNum
const [private]
 

Return the length of segment [S,E] in organism 'orgNum'

Parameters:
S the start of the segment.
E the end of the segment.
orgNum orgNum the number of the organism.
Returns:
the length of [S,E].

template<class GENOME_TYPE>
GENOME_TYPE Table< GENOME_TYPE >::InersecLen GENOME_TYPE  S,
GENOME_TYPE  E,
GENOME_TYPE  tmpS,
GENOME_TYPE  tmpE,
int  orgNum
const [private]
 

Return the intersection length between segment [S,E] and segment [tmp_S,tmp_E] in organism 'orgNum'.

Parameters:
S the start of the segment.
E the end of the segment.
tmpS the start of the segment.
tmpE the end of the segment.
orgNum orgNum the number of the organism.
Returns:
the intersection length

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::InsertEntries ifstream &  fd  )  [private]
 

This method gets an entries file and inserts the entries in the table.

Parameters:
fd the file that contains the entries.

template<class GENOME_TYPE>
double Table< GENOME_TYPE >::Intersec unsigned long  entry,
GENOME_TYPE  S,
GENOME_TYPE  E,
int  orgNum
const [private]
 

Return the percentage of the intersection between 'entry' and segment [S,E] in organism 'orgNum'. if they dont intersect, 0 is returned.

Parameters:
entry the entry working on.
S the start of the segment.
E the end of the segment.
orgNum orgNum the number of the organism.
Returns:
the percentage of intersection.

template<class GENOME_TYPE>
bool Table< GENOME_TYPE >::IsDuplication unsigned long  entry,
unsigned  ofEntry,
int  orgNum
const [private]
 

This method returns true if 'entry' is a duplicate of 'ofEntry' in 'orgNum', false if not.

Parameters:
entry the first entry.
ofEntry the second entry.
orgNum orgNum the number of the organism.
Returns:
true if 'entry' is a duplicate of 'ofEntry' in 'orgNum', false if not.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::IterativeRemoveOverlaps  ) 
 

This method keep on running 'RemoveOverlaps' method until there are no more entries to join.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::Join unsigned long *  arr  )  [private]
 

This method gets an array 'arr' that is as big as the table. The array contains numbers of entry in the table, if arr[i] = j that means that entry 'i' will be joined with entry 'j'.

Parameters:
arr the array of indexes

template<class GENOME_TYPE>
bool Table< GENOME_TYPE >::JoinForDeletedParalogs unsigned long  i,
unsigned long  j,
int  orgNum
[private]
 

If entry 'i' and entry 'j' are intersected in organism 'orgNum' with percentage more than 0.7 these entries are joined to one entry and true is returned, if not nothing is done and false is returned.

Parameters:
i the first entry.
j the second entry.
orgNum the number of the organism.
Returns:
true if joined, false if not.

template<class GENOME_TYPE>
bool Table< GENOME_TYPE >::JoinForOverlap unsigned long  i,
unsigned long  j
[private]
 

This method is used in the 'RemoveOverlaps' method. It gets two entries, 'i' and 'j', and checks if they intersect in both organisms, if so it joins them to one entry and return true, if not return false.

Parameters:
i the first entry.
j the second entry.
Returns:
true if joined, false if not.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::Loop double  orthPerc,
const char *  deletedParFile1,
const char *  deletedParFile2
 

This method keeps on running 'Clustering', 'RemoveNuisances' and 'RemoveParalogs' methods until convergence of the table.

Parameters:
orthPerc the percentage for the Paralog step.
deletedParFile1 the file that will contain the deleted entries of the paralog step in organism 1.
deletedParFile2 the file that will contain the deleted entries of the paralog step in organism 2.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::ManipulateStartsAndEnds GENOME_TYPE &  S1,
GENOME_TYPE &  E1,
GENOME_TYPE &  S2,
GENOME_TYPE &  E2,
int  orgNum
const [private]
 

Used in method Intersec. if one of the segments [S1,E1] or [S2,E2] is circular we add the organism's length to some of S1,E1,S2,E2 accoding to the intersection of both segments in order to calculate the percentage of intersection in method Intersec.

Parameters:
S1 the start of the first segment.
E1 the end of the first segment.
S2 the start of the second segment.
E2 the end of the second segment.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::PrintDeletedParalogs ofstream &  file1,
ofstream &  file2
[private]
 

Print the deleted paralogs from both organisms to 'file1' and 'file2'. Note: deleted paralogs the are intersected with percentage more than 0.7 are joined.

Parameters:
file1 the first file.
file2 the second file.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::PrintTable ofstream &  ofs,
int  org
 

Prints the table to file 'ofs' sorted according to 'org'.

Parameters:
ofs the file to print the table to.
org the number of the organism that according to it the table is sorted.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::ReadFromProbFile ifstream &  fileName,
string &  genumName,
GENOME_TYPE &  genumLen,
bool &  circ
[private]
 

This method gets a .prop file 'fileName' and gets from it the genome's length and circularity.

Parameters:
fileName the .prop file.
genumLen reference to the genome's length.
circ reference to the genome's circularity.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::RemoveFinalConflicts  ) 
 

Every two entries that one is duplication of the other in one of the organisms, both are deleted from the table.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::RemoveFinalOverlaping  ) 
 

For each entry 'i', if there is an entry 'j' such that both entries intersect in one of the organisms, one of the entries is made smaller in both organisms (in the same percentage) so that no more intersection exsists between 'i' and 'j'.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::RemoveNuisances double  orthPerc  ) 
 

For each entry 'i' we get the duplications of it in organism 1, s1, and the duplications of it in organism 2, s2. if there is an entry 'j' in s1 such that length(i)/length(j) <= 'orthPerc' and there is an entry 'k' in s2 such that length(i)/length(k) <= 'orthPerc', then entry 'i' is deleted from the table.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::RemoveOverlaps  ) 
 

Check for each entry 'i' in the table if there is an entry 'j' such that 'i' and 'j' intersect in both organisms, if so 'i' and 'j' are joined into one entry.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::RemoveParalogs double  orthPerc  ) 
 

For each entry 'i' we get the duplications of it in organism 1, s1, and the duplications of it in organism 2, s2. if there is an entry 'j' in s1 such that length(i)/length(j) <= 'orthPerc' or there is an entry 'k' in s2 such that length(i)/length(k) <= 'orthPerc', then entry 'i' is deleted from the table.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::SortBy int  org  )  [private]
 

Sort the table according to 'org'.

Parameters:
org the number of the organism to sort according to

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::SwapEntries unsigned long  i,
unsigned long  j
[private]
 

Swap entry i with entry j.

Parameters:
i number of first entry to swap.
j number of second entry to swap.

template<class GENOME_TYPE>
void Table< GENOME_TYPE >::UpDateCircInfo  )  [private]
 

If one of the organisms is circular, this method updates the value of 'firstCircOrg1', 'firstCircOrg2', 'lastCircOrg1' and 'lastCircOrg1'.


Member Data Documentation

template<class GENOME_TYPE>
Entry<GENOME_TYPE>* Table< GENOME_TYPE >::deletedParalog1 [private]
 

Used to print the deleted entries in organism 1 from the RemoveParalog step.

template<class GENOME_TYPE>
Entry<GENOME_TYPE>* Table< GENOME_TYPE >::deletedParalog2 [private]
 

Used to print the deleted entries in organism 2 from the RemoveParalog step.

template<class GENOME_TYPE>
unsigned long Table< GENOME_TYPE >::deletedParalogCount1 [private]
 

Counter of the number of entries that were deleted in the RemoveParalog step in organism 1.

template<class GENOME_TYPE>
unsigned long Table< GENOME_TYPE >::deletedParalogCount2 [private]
 

Counter of the number of entries that were deleted in the RemoveParalog step in organism 2.

template<class GENOME_TYPE>
double Table< GENOME_TYPE >::dupPerc [private]
 

A parameter used to find duplicated entries.

template<class GENOME_TYPE>
Entry<GENOME_TYPE>* Table< GENOME_TYPE >::entries [private]
 

The entries of the tables.

template<class GENOME_TYPE>
unsigned long Table< GENOME_TYPE >::firstCircOrg1 [private]
 

The number of the first entry that is circular in organism 1.

template<class GENOME_TYPE>
unsigned long Table< GENOME_TYPE >::firstCircOrg2 [private]
 

The number of the first entry that is circular in organism 2.

template<class GENOME_TYPE>
unsigned long Table< GENOME_TYPE >::lastCircOrg1 [private]
 

The number of the last entry that is circular in organism 1.

template<class GENOME_TYPE>
unsigned long Table< GENOME_TYPE >::lastCircOrg2 [private]
 

The number of the last entry that is circular in organism 2.

template<class GENOME_TYPE>
GENOME_TYPE Table< GENOME_TYPE >::minLen [private]
 

A parameter used to delete all entries that their length is smaller that minLen/

template<class GENOME_TYPE>
unsigned long Table< GENOME_TYPE >::numOfEntries [private]
 

Number of entries that are in use in the table.

template<class GENOME_TYPE>
Genome<GENOME_TYPE> Table< GENOME_TYPE >::organism1 [private]
 

Information about the first organism.

template<class GENOME_TYPE>
Genome<GENOME_TYPE> Table< GENOME_TYPE >::organism2 [private]
 

Information about the second organism.

template<class GENOME_TYPE>
unsigned long Table< GENOME_TYPE >::tableSize [private]
 

Number of entries in the table (including unused entries).


The documentation for this class was generated from the following file:
Generated on Sat May 6 13:40:42 2006 for MAGIC by  doxygen 1.4.6-NO