2013年12月31日 星期二

[FWD] Getting start with Python for data science


Python 選用 32-bit Python 3.2

Train or test data 採用 open('Data/train.csv','Ub') Ub 模式開檔

TODO: Do some study on random forests.

2013年12月16日 星期一

[ROSALIND] Alignment

Global Alignment with Scoring Matrix

Local Alignment with Scoring Matrix
Counting Optimal Alignments

Semiglobal Alignment
Overlap Alignment
Fitting Alignment

Global Alignment with Constant Gap Penalty
Maximizing the Gap Symbols of an Optimal Alignment

2013年12月11日 星期三

[Programming Pearl] 一個問題

Given a dictionary of English words (one word per input line in lower case letters), we must find all anagram classes.

不禁令人想到一個 hash function h:

h(A) = 2, h(B) = 3, h(C) = 5, h(D) = 7, h(E) = 11, ..., h(k-th alphabet) = k-th prime number.

h is multiplicative on each alphabet.  For example, h(WORD) = h(W) h(O) h(R) h(D).   Observe that x and y are in the same anagram class if and only if h(x) = h(y).

Or you can define h to be h(letter[]) = sort(letter[]).

2013年12月9日 星期一

[ROSALINE] Suffix Tree

Problem: http://rosalind.info/problems/lrep/

Introduction to suffix tree: https://cs.uwaterloo.ca/~binma/cs482/06_suffix-tree-array.pdf

Suffix trees can do many string operations you might think are very hard, in linear time.

Application: Any substring of S is a prefix of a suffix.

Now we can solve this Rosalind's problem.

PS. Implement suffix tree for better representations.