2013年12月31日 星期二

[FWD] Getting start with Python for data science


https://www.kaggle.com/wiki/GettingStartedWithPythonForDataScience

Python 選用 32-bit Python 3.2

Train or test data 採用 open('Data/train.csv','Ub') Ub 模式開檔



TODO: Do some study on random forests.

2013年12月16日 星期一

[ROSALIND] Alignment

基礎題:
Global Alignment with Scoring Matrix

基礎變化題:
Local Alignment with Scoring Matrix
Counting Optimal Alignments

搞人題:
Semiglobal Alignment
Overlap Alignment
Fitting Alignment

非常搞人題:
Global Alignment with Constant Gap Penalty
Maximizing the Gap Symbols of an Optimal Alignment

2013年12月11日 星期三

[Programming Pearl] 一個問題


Given a dictionary of English words (one word per input line in lower case letters), we must find all anagram classes.

不禁令人想到一個 hash function h:

h(A) = 2, h(B) = 3, h(C) = 5, h(D) = 7, h(E) = 11, ..., h(k-th alphabet) = k-th prime number.

h is multiplicative on each alphabet.  For example, h(WORD) = h(W) h(O) h(R) h(D).   Observe that x and y are in the same anagram class if and only if h(x) = h(y).

Or you can define h to be h(letter[]) = sort(letter[]).

2013年12月9日 星期一

[ROSALINE] Suffix Tree


Problem: http://rosalind.info/problems/lrep/

Introduction to suffix tree: https://cs.uwaterloo.ca/~binma/cs482/06_suffix-tree-array.pdf



Suffix trees can do many string operations you might think are very hard, in linear time.

Application: Any substring of S is a prefix of a suffix.



Now we can solve this Rosalind's problem.



PS. Implement suffix tree for better representations.