#include <table.h>
Public Member Functions | |
Table () | |
Table (ifstream &fd, long numOfRows, long cleanMinLen, double dupPerc, ifstream &orag1DataFile, ifstream &org2DataFile) | |
Table (const Table< GENOME_TYPE > &t) | |
~Table () | |
void | CleanDup () |
void | CleanShortEntries () |
void | CleanDeleted () |
void | CleanProphages (ifstream &fd, int orgNum, double perc) |
void | CleanTE (ifstream &fd, int orgNum, double perc) |
void | PrintTable (ofstream &ofs, int org) |
void | Clustering () |
void | RemoveOverlaps () |
void | IterativeRemoveOverlaps () |
void | RemoveNuisances (double orthPerc) |
void | RemoveParalogs (double orthPerc) |
void | Loop (double orthPerc, const char *deletedParFile1, const char *deletedParFile2) |
void | RemoveFinalConflicts () |
void | RemoveFinalOverlaping () |
Private Types | |
typedef std::set< Entry< GENOME_TYPE >, compare_entries_1< GENOME_TYPE > >::iterator | entryIter |
Private Member Functions | |
void | InsertEntries (ifstream &fd) |
void | SwapEntries (unsigned long i, unsigned long j) |
void | DeleteEntry (unsigned long i) |
void | SortBy (int org) |
double | Intersec (unsigned long entry, GENOME_TYPE S, GENOME_TYPE E, int orgNum) const |
GENOME_TYPE | InersecLen (GENOME_TYPE S, GENOME_TYPE E, GENOME_TYPE tmpS, GENOME_TYPE tmpE, int orgNum) const |
GENOME_TYPE | GetLength (GENOME_TYPE S, GENOME_TYPE E, int orgNum) const |
void | ManipulateStartsAndEnds (GENOME_TYPE &S1, GENOME_TYPE &E1, GENOME_TYPE &S2, GENOME_TYPE &E2, int orgNum) const |
bool | IsDuplication (unsigned long entry, unsigned ofEntry, int orgNum) const |
set< Entry< GENOME_TYPE >, compare_entries_1< GENOME_TYPE > > | GetDupGroup (unsigned long entry, int orgNum, GENOME_TYPE *endArr) |
void | FillEnds (GENOME_TYPE *arr, int orgNum) const |
unsigned long | BinarySearch (GENOME_TYPE S1, GENOME_TYPE E1, GENOME_TYPE S2, GENOME_TYPE E2, int orgNum) const |
void | Join (unsigned long *arr) |
void | FilterSet (set< Entry< GENOME_TYPE >, compare_entries_1< GENOME_TYPE > > &s, unsigned long j, int org, bool flag, bool sign=false) |
void | UpDateCircInfo () |
bool | JoinForOverlap (unsigned long i, unsigned long j) |
void | ReadFromProbFile (ifstream &fileName, string &genumName, GENOME_TYPE &genumLen, bool &circ) |
void | PrintDeletedParalogs (ofstream &file1, ofstream &file2) |
bool | JoinForDeletedParalogs (unsigned long i, unsigned long j, int orgNum) |
Private Attributes | |
Entry< GENOME_TYPE > * | entries |
unsigned long | numOfEntries |
unsigned long | tableSize |
unsigned long | firstCircOrg1 |
unsigned long | firstCircOrg2 |
unsigned long | lastCircOrg1 |
unsigned long | lastCircOrg2 |
Genome< GENOME_TYPE > | organism1 |
Genome< GENOME_TYPE > | organism2 |
GENOME_TYPE | minLen |
Entry< GENOME_TYPE > * | deletedParalog1 |
Entry< GENOME_TYPE > * | deletedParalog2 |
unsigned long | deletedParalogCount1 |
unsigned long | deletedParalogCount2 |
double | dupPerc |
This class hold the information of the table, and all the operations that are made on the table. The information includes: 1) An array of entries. 2) Number of entries in the table. 3) information about the first genome (of type Genome). 4) information about the second genome (of type Genome).
|
Used to scan a set. |
|
Constructor. |
|
Constructor.
|
|
Copy constructor.
|
|
Distructor. |
|
Search for entry S1,E1,S2,E2 and return th index , assuming table is sorted by 'orgNum'.
|
|
Remove the unused entries from the table. |
|
Clean duplicated entries (same S,E in both organisms) from the table. |
|
This method gets a file 'fd' that contains S and Es of prophages, and deletes all the entries that the percentage of intersection with these S and Es (sum for all prophages) in organism 'orgNum' is bigger than 'perc'.
|
|
Clean entries that their length is one of the organisms at least is smaller thab 'minLen'. |
|
This method gets a file 'fd' that contains S and Es of transposable elements, and deletes all the entries that the percentage of intersection with these S and Es (sum for all tansposable elements) in organism 'orgNum' is bigger than 'perc'.
|
|
This method makes the Clustering step that is explained........ |
|
Delete entry i from the table
|
|
This method updates the vaues of 'arr' which is an array used to find the duplicates of an entry. (more details are found in function 'GetDupGroup').
|
|
This method filters all entries of the set 's' that are duplicate to entry 'j' in organism 'org'. If 'flag' = true , it removes all the entries of the set 's' that their sign isnt 'sign'.
|
|
Return a set of the entries that are duplicates of entry 'entry'. In order to find the duplications of entry 'entry' we used and array 'endArr' that its size is the same as the size of the table. The value of each entry in 'endArr' means that for this entry and all smaller entries in the table, that value is the bigget E of all entries in organism 'orgNum'. assuming the table is sorted according to 'orgNum'.
|
|
Return the length of segment [S,E] in organism 'orgNum'
|
|
Return the intersection length between segment [S,E] and segment [tmp_S,tmp_E] in organism 'orgNum'.
|
|
This method gets an entries file and inserts the entries in the table.
|
|
Return the percentage of the intersection between 'entry' and segment [S,E] in organism 'orgNum'. if they dont intersect, 0 is returned.
|
|
This method returns true if 'entry' is a duplicate of 'ofEntry' in 'orgNum', false if not.
|
|
This method keep on running 'RemoveOverlaps' method until there are no more entries to join. |
|
This method gets an array 'arr' that is as big as the table. The array contains numbers of entry in the table, if arr[i] = j that means that entry 'i' will be joined with entry 'j'.
|
|
If entry 'i' and entry 'j' are intersected in organism 'orgNum' with percentage more than 0.7 these entries are joined to one entry and true is returned, if not nothing is done and false is returned.
|
|
This method is used in the 'RemoveOverlaps' method. It gets two entries, 'i' and 'j', and checks if they intersect in both organisms, if so it joins them to one entry and return true, if not return false.
|
|
This method keeps on running 'Clustering', 'RemoveNuisances' and 'RemoveParalogs' methods until convergence of the table.
|
|
Used in method Intersec. if one of the segments [S1,E1] or [S2,E2] is circular we add the organism's length to some of S1,E1,S2,E2 accoding to the intersection of both segments in order to calculate the percentage of intersection in method Intersec.
|
|
Print the deleted paralogs from both organisms to 'file1' and 'file2'. Note: deleted paralogs the are intersected with percentage more than 0.7 are joined.
|
|
Prints the table to file 'ofs' sorted according to 'org'.
|
|
This method gets a .prop file 'fileName' and gets from it the genome's length and circularity.
|
|
Every two entries that one is duplication of the other in one of the organisms, both are deleted from the table. |
|
For each entry 'i', if there is an entry 'j' such that both entries intersect in one of the organisms, one of the entries is made smaller in both organisms (in the same percentage) so that no more intersection exsists between 'i' and 'j'. |
|
For each entry 'i' we get the duplications of it in organism 1, s1, and the duplications of it in organism 2, s2. if there is an entry 'j' in s1 such that length(i)/length(j) <= 'orthPerc' and there is an entry 'k' in s2 such that length(i)/length(k) <= 'orthPerc', then entry 'i' is deleted from the table. |
|
Check for each entry 'i' in the table if there is an entry 'j' such that 'i' and 'j' intersect in both organisms, if so 'i' and 'j' are joined into one entry. |
|
For each entry 'i' we get the duplications of it in organism 1, s1, and the duplications of it in organism 2, s2. if there is an entry 'j' in s1 such that length(i)/length(j) <= 'orthPerc' or there is an entry 'k' in s2 such that length(i)/length(k) <= 'orthPerc', then entry 'i' is deleted from the table. |
|
Sort the table according to 'org'.
|
|
Swap entry i with entry j.
|
|
If one of the organisms is circular, this method updates the value of 'firstCircOrg1', 'firstCircOrg2', 'lastCircOrg1' and 'lastCircOrg1'. |
|
Used to print the deleted entries in organism 1 from the RemoveParalog step. |
|
Used to print the deleted entries in organism 2 from the RemoveParalog step. |
|
Counter of the number of entries that were deleted in the RemoveParalog step in organism 1. |
|
Counter of the number of entries that were deleted in the RemoveParalog step in organism 2. |
|
A parameter used to find duplicated entries. |
|
The entries of the tables. |
|
The number of the first entry that is circular in organism 1. |
|
The number of the first entry that is circular in organism 2. |
|
The number of the last entry that is circular in organism 1. |
|
The number of the last entry that is circular in organism 2. |
|
A parameter used to delete all entries that their length is smaller that minLen/ |
|
Number of entries that are in use in the table. |
|
Information about the first organism. |
|
Information about the second organism. |
|
Number of entries in the table (including unused entries). |