2012年9月24日 星期一

肚子餓快去睡


最近提不起勁,640GB隨身硬碟不翼而飛,工作也頗不順利。昨天同事R當面跟兩位主管反應「小根工作量太多」,好厲害紅螞蟻,在此感謝同事厚愛。不管怎麼說,原本計畫一天完成的事,不到兩小時就搞定,那不是有效率的表現,而是時間壓迫下的取捨。想像自己蓋房子,因為連日下雨空氣較濕,混凝土比較慢乾,你會:

1. 花三天等到混凝土全乾,即使工程延期會被罰錢。
2. 直接做下一動,讓工程按照時程進行。


怎麼選無關對錯。如果是蓋自己的房子呢?就算豬腦也會選 1,這就是人性自私的一面。我也是自私的人,想用最短的時間把事情完成,因為那不是我的房子,就算是我的房子,他媽的我也沒時間蓋房子,林爸還要蓋橋。這不是有效率,這是資源分配的沒效率,對於蓋房子的技師來說,他願意這樣蓋豆腐渣房子嗎?但又有什麼辦法,豆腐渣房子,你我都推了一把,豆腐渣房子,也是能住人的。



一位僅靠1萬多元薪水養一家3口的單親媽媽賴小姐,遇到孩子喊肚子餓時,只能跟孩子說:「肚子餓,就趕快去睡覺。」

上面是問題。

近150萬領取基本工資的底層勞工明年恐加薪無望。勞委會上月召開基本工資審議委員會,敲定明年起基本月薪調漲267元、時薪漲6元,但行政院昨召開審查會決議時薪先漲、月薪緩漲且無時限,最快會在一、兩天內由行政院長陳冲核定後拍板。勞委會主委王如玄昨指若拍板會請辭負責。

這是解決方案。政府挺企業,企業挺勞工,因此企業減少的支出,會回饋給勞工,我放你媽的狗臭屁,整個政府就是吃屎長大的。其實也不用真的期待政府能解決什麼問題,因為屎想出來的政策,還是屎。有曲為證:

享富貴,受皇恩,起寒賤,居高位,秉權衡威振京畿。惟君恃寵,把君王媚,全不想存仁義。
〔滾綉球〕起官夫,造水池。與兒孫,買田基。圖求謀,多只為一身之計。縱奸貪,那裡管越瘦吳肥。趨附的,身即榮;觸忤的,令必危。妒量才喜親小輩,只想著復私仇,公道全虧。你將九重天子深瞞眛,致四海生民總亂離,更不道天綱恢恢!
〔倘秀才〕巧言詞,取君王一時笑喜。那裡肯效忠良,使萬國雍熙。你只待顛倒豪傑把世迷,隔靴空癢揉,久症卻行醫,減絕了天理!
〔滾綉球〕你有秦趙事,指鹿心;屠岸賈,縱犬機;待學漢王莽,不臣之意;欺君的董卓燃臍,但行動絃管隨,出門時兵仗圍,入朝中百官悚畏。仗一人假虎張威,望塵有客趨奸黨,借劍無人斬腰賊,一任的忒狂為!
〔尾聲〕金甌底下無名姓,青史編中有是非。你那知爕理陰陽調元氣,那知盜賣江山結外夷。枉辱了玉帶金魚挂蟒衣,受祿無功愧寢食。權方在手人皆懼,禍到臨頭悔後遲。南山竹罄難書罪,東海波乾臭未遺,萬古流傳,教人唾罵你!

以上。此外,昨天晚上吃豬腳飯,再來拐一下氣。不談這些鳥事,多說傷身。



2012-09-26補:

勞委會昨則回應,企業追求利潤時,也應要承擔對員工和環境的社會責任,改善勞動條件。

風馬牛不相及,不知所云。此外,昨天老爸又再講重複的事。這很簡單,提出辦法,或在旁邊靜靜看,個人覺得楊伊媚不錯

Refactoring - Economicstotal

Project Name: Economicstotal


Source code preview:


1. Sourcing manager
sourcing_manager.py

import logging
import sys

import src.common.logger as logger

def source_capacity_utilization():
    import src.capacity_utilization.sourcing as sourcing
    s = sourcing.Sourcing()
    s.source()

def source_industrial_capacity():
    import src.industrial_capacity.sourcing as sourcing
    s = sourcing.Sourcing()
    s.source()

def source_ism_report_on_business():
    import src.ism_report_on_business.sourcing as sourcing
    s = sourcing.Sourcing()
    s.source()
   
def source_wholesale_trade():
    import src.wholesale_trade.sourcing as sourcing
    s = sourcing.Sourcing()
    s.source()

def source_cpi_us():
    import src.cpi_us.sourcing as sourcing
    s = sourcing.Sourcing()
    s.source()

def source_sales_retail():
    import src.sales_retail.sourcing as sourcing
    s = sourcing.Sourcing()
    s.source()

def source_twse_market_capitalization():
    import src.twse_market_capitalization.sourcing as sourcing
    s = sourcing.Sourcing()
    s.source()
   
def main():
    logger.config_root(level=logging.DEBUG)
    #source_capacity_utilization()
    #source_industrial_capacity()
    #source_ism_report_on_business()
    #source_wholesale_trade()
    #source_cpi_us()
    #source_sales_retail()
    source_twse_market_capitalization()
   
if __name__ == '__main__':
    sys.exit(main())


2. Put all python source codes in ./src subdirectory
src/__init__.py ((None))


3. Common module
src/common/__init__.py ((None))
src/common/logger.py
import logging
import sys

FORMAT = "%(asctime)s %(filename)s [%(levelname)s] %(message)s"
DATEFMT = "%H:%M:%S"

def config_root(level=logging.INFO,
                threshold=logging.WARNING,
                format=FORMAT,
                datefmt=DATEFMT):
    root = logging.getLogger()
    root.setLevel(level)
    formatter = logging.Formatter(format, datefmt)

    stdout_handler = logging.StreamHandler(sys.stdout)
    stdout_handler.setLevel(level)
    stdout_handler.setFormatter(logging.Formatter(format, datefmt))
    root.addHandler(stdout_handler)

xlrd_util.py
import datetime
import xlrd

def cell_value(cell, datemode):
    if cell.ctype == xlrd.XL_CELL_DATE:
        t = xlrd.xldate_as_tuple(cell.value, datemode)
        if t[3:] == (0, 0, 0):
            return datetime.date(t[0], t[1], t[2])
        return datetime.date(t[0], t[1], t[2], t[3], t[4], t[5])
    if cell.ctype == xlrd.XL_CELL_EMPTY:
        return None
    if cell.ctype == xlrd.XL_CELL_BOOLEAN:
        return cell.value == 1
    return cell.value    

sourcing_template.py
import os
import re
import shutil
import sys

class SourcingTemplate():

    def __init__(self):
        self.URLS = []
        self.LOCAL_DIR = './'
        self.DEFLATED_DIR = './'

    def source(self):
        self.source_url_to_local(self.LOCAL_DIR)       
        for url in self.URLS:
            src_file = os.path.join(self.LOCAL_DIR, re.compile('https?://|ftp://').sub('', url))
            src_dir = os.path.dirname(src_file)
            self.source_local_to_deflated_batch(src_dir, self.DEFLATED_DIR)
            
    def source_url_to_local(self, dest_dir):
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)
        for url in self.URLS:
            self.__wget(url, dest_dir)

    def source_local_to_deflated_batch(self, src_dir, dest_dir):
        assert os.path.isdir(src_dir)
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)

        for file in os.listdir(src_dir):
            shutil.copy(os.path.join(src_dir, file), os.path.join(dest_dir, file))    
           
    def __wget(self, url, dest_dir):
        dest_file = os.path.join(dest_dir, re.compile('https?://|ftp://').sub('', url))
        dest_file_dir = os.path.dirname(dest_file)
        if not os.path.exists(dest_file_dir):
            os.makedirs(dest_file_dir)

        wget = os.path.abspath('./src/thirdparty/wget/wget.exe')
        assert os.path.isfile(wget)            
        wget_cmdline = '''%s -N \"%s\" --waitretry=3 -P %s''' % (wget, url, dest_file_dir)
        os.system(wget_cmdline)    


4. Sourcing capacity utilization. 
src/capacity_utilization/__init__.py ((None))
src/capacity_utilization/sourcing.py
"""
Web Content
===========
Data from January 1986 to present 
Capacity Utilization: Manufacturing, Mining, and Utilities
<http://www.federalreserve.gov/releases/g17/ipdisk/utl_sa.txt>

Data through 1985 
Capacity Utilization: Manufacturing, Mining, and Utilities
<http://www.federalreserve.gov/releases/g17/iphist/utlhist_sa.txt>

Record format:
<http://www.federalreserve.gov/releases/g17/download.htm>

The format for each line in the files consists of an industry code,
a year, and 12 months of data (to four decimal places) when available.
The data have various start dates; the earliest is 1919. 
"""
import logging
import os
import sqlite3

from ..common import logger
from ..common import sourcing_template

class Sourcing(sourcing_template.SourcingTemplate):

    def __init__(self):
        sourcing_template.SourcingTemplate.__init__(self)
        self.__logger = logging.getLogger()
        self.URLS = [
            'http://www.federalreserve.gov/releases/g17/ipdisk/utl_sa.txt',
            'http://www.federalreserve.gov/releases/g17/iphist/utlhist_sa.txt',
        ]
        self.LOCAL_DIR = './dataset/capacity_utilization/local/'
        self.DEFLATED_DIR = './dataset/capacity_utilization/deflated/'
        self.DB_FILE = './db/economicstotal.db'
        self.ITEM_PREFIX = '''\"B50001\"'''
        self.SQL_INSERT = '''replace into
            CapacityUtilization(period, number) values('%s', '%s')'''

    def source(self):
        sourcing_template.SourcingTemplate.source(self)
        self.source_deflated_to_sqlite_batch(self.DEFLATED_DIR, self.DB_FILE)

    def source_deflated_to_sqlite_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_file = os.path.join(src_dir, file)
            self.source_deflated_to_sqlite(src_file, dest_db)
    
    def source_deflated_to_sqlite(self, src_file, dest_db):
        self.__logger.debug('''%s => %s''' % (src_file, dest_db))
        records = self.__build_db_records(src_file)
        self.__write_db(records, dest_db)
        
    def __build_db_records(self, src_file):
        src_file_fd = open(src_file, 'rb')
        content = src_file_fd.read()
        src_file_fd.close()

        records = []
        lines = content.decode('utf-8').split('\n')
        for line in lines:
            if line.startswith(self.ITEM_PREFIX):
                tokens = line.split()
                assert len(tokens) > 1
                year = tokens[1]
                for i in range(2, len(tokens)):
                    r = [
                        '''%s-%02d-01''' % (year, i - 1),
                        tokens[i]
                    ]
                    records.append(r)
        return records     

    def __write_db(self, records, dest_db):
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
      
        for r in records:
            sql_cmd = self.SQL_INSERT % (r[0], r[1])
            cursor.execute(sql_cmd)
            self.__logger.debug('''(period, number): (%s, %s) => %s'''
                                % (r[0], r[1], dest_db))
        conn.commit()
        cursor.close()
        conn.close()    

src/capacity_utilization/schema.sql
drop table if exists CapacityUtilization;

create table if not exists CapacityUtilization 
(
    creation_dt datetime default current_timestamp,
    period datetime not null,
    number text,
    revision int default 0,
    unique (period, revision) on conflict ignore
);


5.  Sourcing CPI US
src/cpi_us/__init__.py
src/cpi_us/sourcing.py
"""
<http://www.bls.gov/cpi/>
<ftp://ftp.bls.gov/pub/time.series/cu/>

U.S. All items, 1982-84=100 - CUUR0000SA0

Record example:
series_id year period value footnote_codes
CUUR0000SA0       2012 M06     229.478          
CUUR0000SA0       2012 M07     229.104          
CUUR0000SA0       2012 M08     230.379
"""
import logging
import os
import sqlite3

from ..common import logger
from ..common import sourcing_template

class Sourcing(sourcing_template.SourcingTemplate):

    def __init__(self):
        sourcing_template.SourcingTemplate.__init__(self)
        self.__logger = logging.getLogger()
        self.URLS = ['ftp://ftp.bls.gov/pub/time.series/cu/cu.data.1.AllItems']
        self.LOCAL_DIR = './dataset/cpi_us/local/'
        self.DEFLATED_DIR = './dataset/cpi_us/deflated/'
        self.DB_FILE = './db/economicstotal.db'    
        self.ITEM_PREFIX = 'CUUR0000SA0'
        self.SQL_INSERT = '''replace into
            CpiUs(period, number) values('%s', '%s')
            '''

    def source(self):
        sourcing_template.SourcingTemplate.source(self)
        self.source_deflated_to_sqlite_batch(self.DEFLATED_DIR, self.DB_FILE)

    def source_deflated_to_sqlite_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_file = os.path.join(src_dir, file)
            self.source_deflated_to_sqlite(src_file, dest_db)            

    def source_deflated_to_sqlite(self, src_file, dest_db):
        self.__logger.debug('''%s => %s''' % (src_file, dest_db))
        records = self.__build_db_records(src_file)
        self.__write_db(records, dest_db)            

    def __build_db_records(self, src_file):
        src_file_fd = open(src_file, 'rb')
        content = src_file_fd.read()
        src_file_fd.close()

        records = []
        lines = content.decode('utf-8').split('\n')
        for line in lines:
            if line.startswith(self.ITEM_PREFIX):
                tokens = line.split()
                assert len(tokens) == 4
                
                # M13 indicates annual averages => Ignore
                if tokens[2] == 'M13':
                    continue
                
                r = [ '''%s-%s-01''' % (tokens[1], tokens[2][1:]), tokens[3] ]
                records.append(r)
        return records     

    def __write_db(self, records, dest_db):
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
      
        for r in records:
            sql_cmd = self.SQL_INSERT % (r[0], r[1])
            cursor.execute(sql_cmd)
            self.__logger.debug('''(period, number): (%s, %s) => %s'''
                                % (r[0], r[1], dest_db))
        
        conn.commit()
        cursor.close()
        conn.close()

src/cpi_us/schema.sql
drop table if exists CpiUs;

create table if not exists CpiUs 
(
    creation_dt datetime default current_timestamp,
    period datetime not null,
    number text,
    revision int default 0,
    unique (period, revision) on conflict ignore
);


6. Sourcing industrial capacity
src/industrial_capacity/__init__.py
src/industrial_capacity/sourcing.py
"""
Web Content
===========
Data from January 1986 to present 
Industrial Capacity: Manufacturing, Mining, and Utilities
<http://www.federalreserve.gov/releases/g17/ipdisk/cap_sa.txt>

Data through 1985 
Industrial Capacity: Manufacturing, Mining, and Utilities
<http://www.federalreserve.gov/releases/g17/iphist/caphist_sa.txt>

Record format:
<http://www.federalreserve.gov/releases/g17/download.htm>

The format for each line in the files consists of an industry code,
a year, and 12 months of data (to four decimal places) when available.
The data have various start dates; the earliest is 1919. 
"""
import logging
import os
import sqlite3

from ..common import logger
from ..common import sourcing_template

class Sourcing(sourcing_template.SourcingTemplate):

    def __init__(self):
        sourcing_template.SourcingTemplate.__init__(self)
        self.__logger = logging.getLogger()
        self.URLS = [
            'http://www.federalreserve.gov/releases/g17/ipdisk/cap_sa.txt',
            'http://www.federalreserve.gov/releases/g17/iphist/caphist_sa.txt',
        ]
        self.LOCAL_DIR = './dataset/industrial_capacity/local/'
        self.DEFLATED_DIR = './dataset/industrial_capacity/deflated/'
        self.DB_FILE = './db/economicstotal.db'      
        self.ITEM_PREFIX = '''\"B50001\"'''
        self.SQL_INSERT = '''replace into
            IndustrialCapacity(period, number) values('%s', '%s')'''

    def source(self):
        sourcing_template.SourcingTemplate.source(self)
        self.source_deflated_to_sqlite_batch(self.DEFLATED_DIR, self.DB_FILE)
            
    def source_deflated_to_sqlite_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_file = os.path.join(src_dir, file)
            self.source_deflated_to_sqlite(src_file, dest_db)
    
    def source_deflated_to_sqlite(self, src_file, dest_db):
        self.__logger.debug('''%s => %s''' % (src_file, dest_db))
        records = self.__build_db_records(src_file)
        self.__write_db(records, dest_db)
       
    def __build_db_records(self, src_file):
        src_file_fd = open(src_file, 'rb')
        content = src_file_fd.read()
        src_file_fd.close()

        records = []
        lines = content.decode('utf-8').split('\n')
        for line in lines:
            if line.startswith(self.ITEM_PREFIX):
                tokens = line.split()
                assert len(tokens) > 1
                year = tokens[1]
                for i in range(2, len(tokens)):
                    r = [
                        '''%s-%02d-01''' % (year, i - 1),
                        tokens[i]
                    ]
                    records.append(r)
        return records     

    def __write_db(self, records, dest_db):
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
      
        for r in records:
            sql_cmd = self.SQL_INSERT % (r[0], r[1])
            cursor.execute(sql_cmd)
            self.__logger.debug('''(period, number): (%s, %s) => %s'''
                                % (r[0], r[1], dest_db))
        conn.commit()
        cursor.close()
        conn.close()    

src/industrial_capacity/schema.sql
drop table if exists IndustrialCapacity;

create table if not exists IndustrialCapacity 
(
    creation_dt datetime default current_timestamp,
    period datetime not null,
    number text,
    revision int default 0,
    unique (period,  revision) on conflict ignore
);

7. Sourcing ISM report on business
src/ism_report_on_business/__init__.py
src/ism_report_on_business/sourcing.py
"""
ISM Manufacturing Report On Business Historical Information
<http://www.ism.ws/ISMReport/content.cfm?ItemNumber=13339&navItemNumber=12958>

All Manufactuing Indexes in one file
<http://www.ism.ws/files/ISMReport/MfgAllIndexes12.xls>
"""
import datetime
import logging
import os
import sqlite3
import xlrd

from ..common import logger
from ..common import sourcing_template
from ..common import xlrd_util

class Sourcing(sourcing_template.SourcingTemplate):

    def __init__(self):
        sourcing_template.SourcingTemplate.__init__(self)
        self.__logger = logging.getLogger()
        self.URLS = ['http://www.ism.ws/files/ISMReport/MfgAllIndexes12.xls']
        self.LOCAL_DIR = './dataset/ism_report_on_business/local/'
        self.DEFLATED_DIR = './dataset/ism_report_on_business/deflated/'
        self.DB_FILE = './db/economicstotal.db'     
        self.SQL_INSERT = '''replace into
            IsmReportOnBusiness(period, item, number) values('%s', '%s', '%s')
            '''

    def source(self):
        sourcing_template.SourcingTemplate.source(self)
        self.source_deflated_to_sqlite_batch(self.DEFLATED_DIR, self.DB_FILE)

    def source_deflated_to_sqlite_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_file = os.path.join(src_dir, file)
            self.source_deflated_to_sqlite(src_file, dest_db)            

    def source_deflated_to_sqlite(self, src_file, dest_db):
        self.__logger.debug('''%s => %s''' % (src_file, dest_db))
        records = self.__build_db_records(src_file)
        self.__write_db(records, dest_db)            

    def __build_db_records(self, src_file):
        book = xlrd.open_workbook(src_file)
        
        records = []
        for sheet_name in book.sheet_names():
            sheet = book.sheet_by_name(sheet_name)
            for i in range(sheet.nrows):
                cell = sheet.cell(i, 0)
                if cell.ctype == xlrd.XL_CELL_DATE:
                    cell_period = xlrd_util.cell_value(cell, book.datemode)
                    period = self.__align_date(cell_period)
                    item = sheet_name.replace('\'', '')
                    # PMI index on column 1. Others' index on column 5
                    number = sheet.cell_value(i, 1 if item == 'PMI' else 5)

                    if number or number != '':
                        r = [str(period), item, number]
                        records.append(r)
        return records
    
    def __write_db(self, records, dest_db):
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
      
        for r in records:
            sql_cmd = self.SQL_INSERT % (r[0], r[1], r[2])
            cursor.execute(sql_cmd)
            self.__logger.debug('''(%s, %s, %s) => %s''' % (r[0], r[1], r[2], dest_db))

        conn.commit()
        cursor.close()
        conn.close()

    # ISM Report on Business Excel has some non-aligned date
    def __align_date(self, date):
        return datetime.date(date.year, date.month, 1)

src/ism_report_on_business/schema.sql
drop table if exists IsmReportOnBusiness;

create table if not exists IsmReportOnBusiness 
(
    creation_dt datetime default current_timestamp,
    period datetime not null,
    item text,
    number text,
    revision int default 0,
    unique (period, item, revision) on conflict ignore
);

8. Sourcing Advance Monthly Sales for Retail and Food Services
src/sales_retail/__init__.py
src/sales_retail/sourcing.py
"""
<http://www.census.gov/econ/currentdata/export/csv?programCode=MARTS&timeSlotType=12&startYear=1992&endYear=2012&categoryCode=44X72&dataTypeCode=SM&geoLevelCode=US&adjusted=yes&errorData=no&internal=false>

CSV Content
===========
"U.S. Census Bureau"
"Source: Advance Monthly Sales for Retail and Food Services"
"44X72: Retail Trade and Food Services: U.S. Total"
"Seasonally Adjusted Sales - Monthly [Millions of Dollars]"
"Period: 1992 to 2012"
"Data Extracted on: September 17, 2012 (9:52 pm)"

Period,Value
Jan-1992,164083
Feb-1992,164260
Mar-1992,163747
Apr-1992,164759
May-1992,165610
Jun-1992,166089
"""
import csv
import datetime
import logging
import os
import sqlite3

from ..common import logger
from ..common import sourcing_template

class Sourcing(sourcing_template.SourcingTemplate):

    def __init__(self):
        sourcing_template.SourcingTemplate.__init__(self)
        self.__logger = logging.getLogger()
        self.URL_TEMPLATE = '''http://www.census.gov/econ/currentdata/export/csv?programCode=MARTS&timeSlotType=12&startYear=%d&endYear=%d&categoryCode=44X72&dataTypeCode=SM&geoLevelCode=US&adjusted=yes&errorData=no&internal=false'''
        self.LOCAL_DIR = './dataset/sales_retail/local/'
        self.DEFLATED_DIR = './dataset/sales_retail/deflated/'
        self.DB_FILE = './db/economicstotal.db'     
        self.SQL_INSERT = '''insert or ignore into
            AdvanceMonthlyRetailTradeReport(period, number) values('%s', '%s')
            '''
        self.__init_url()

    def source(self):
        sourcing_template.SourcingTemplate.source(self)
        self.source_deflated_to_sqlite_batch(self.DEFLATED_DIR, self.DB_FILE)

    def source_deflated_to_sqlite_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_file = os.path.join(src_dir, file)
            self.source_deflated_to_sqlite(src_file, dest_db)            

    def source_deflated_to_sqlite(self, src_file, dest_db):
        self.__logger.debug('''%s => %s''' % (src_file, dest_db))
        assert os.path.isfile(src_file)
        assert os.path.isfile(dest_db)
        
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
        is_began = False
        csv_reader = csv.reader(open(src_file, 'r'))
        for row in csv_reader:
            if not is_began:
                if len(row) is 2:
                    is_began = True
                    continue
            elif row[1] != 'NA':
                period = self.__period_str(row[0])
                number = row[1]
                sql_cmd = self.SQL_INSERT % (period, number)
                cursor.execute(sql_cmd)

                self.__logger.debug('''(period, value): (%s, %s) => %s'''
                                    % (period, number, dest_db))
        conn.commit()
        cursor.close()
        conn.close()

    def __init_url(self):
        start_year = 1992
        end_year = datetime.datetime.now().year
        url = self.URL_TEMPLATE % (start_year, end_year)
        self.URLS.append(url)

    def __period_str(self, period):
        return datetime.datetime.strptime(period, '%b-%Y').strftime('%Y-%m-%d')

src/sales_retail/schema.sql
drop table if exists AdvanceMonthlyRetailTradeReport;

create table if not exists AdvanceMonthlyRetailTradeReport 
(
    creation_dt datetime default current_timestamp,
    period datetime not null,
    number text,
    revision int default 0,
    unique (period, revision) on conflict ignore
);

9. Sourcing TWSE market capitalization
src/twse_market_capitalization/__init__.py
src/twse_market_capitalization/sourcing.py
"""
<http://www.twse.com.tw/ch/statistics/statistics_week.php>
<http://www.twse.com.tw/ch/statistics/download/week.zip>
"""
import logging
import os
import sqlite3
import xlrd

from ..common import logger
from ..common import sourcing_template
from ..common import xlrd_util

class Sourcing(sourcing_template.SourcingTemplate):

    def __init__(self):
        sourcing_template.SourcingTemplate.__init__(self)
        self.__logger = logging.getLogger()
        self.URLS = ['http://www.twse.com.tw/ch/statistics/download/week.zip']
        self.LOCAL_DIR = './dataset/twse_market_capitalization/local/'
        self.DEFLATED_DIR = './dataset/twse_market_capitalization/deflated/'
        self.UNZIPPED_DIR = './dataset/twse_market_capitalization/unzipped/'
        self.DB_FILE = './db/economicstotal.db'     
        self.SQL_INSERT = '''replace into
            TwseMarketCapitalization(period, number) values('%s', '%s')
            '''

    def source(self):
        sourcing_template.SourcingTemplate.source(self)
        self.source_deflated_to_unzipped(self.DEFLATED_DIR, self.UNZIPPED_DIR)
        self.source_unzipped_to_sqlite_batch(self.UNZIPPED_DIR, self.DB_FILE)

    def source_deflated_to_unzipped_batch(self, src_dir, dest_dir):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_file = os.path.join(src_dir, file)
            self.source_deflated_to_unzipped(src_file, dest_dir)

    def source_deflated_to_unzipped(self, src_file, dest_dir):
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)
        self.__unzip(src_file, dest_dir)
        
    def source_unzipped_to_sqlite_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_file = os.path.join(src_dir, file)
            self.source_unzipped_to_sqlite(src_file, dest_db)            

    def source_unzipped_to_sqlite(self, src_file, dest_db):
        self.__logger.debug('''%s => %s''' % (src_file, dest_db))
        records = self.__build_db_records(src_file)
        self.__write_db(records, dest_db)            
        
    def __unzip(self, src_zip, dest_dir):
        sevenzip = os.path.abspath('./src/thirdparty/sevenzip/7z.exe')
        assert os.path.isfile(sevenzip)
        sevenzip_cmdline = '''%s e %s -aoa -o%s''' % \
                           (sevenzip, src_zip, dest_dir)
        os.system(sevenzip_cmdline)
        
    def __build_db_records(self, src_file):
        book = xlrd.open_workbook(src_file)
        sheet = book.sheet_by_index(0)

        records = []
        for i in range(2, sheet.nrows):
            r = [
                xlrd_util.cell_value(sheet.cell(i, 0), book.datemode),
                sheet.cell_value(i, 1)
            ]
            records.append(r)
        return records   
    
    def __write_db(self, records, dest_db):
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
      
        for r in records:
            sql_cmd = self.SQL_INSERT % (r[0], r[1])
            cursor.execute(sql_cmd)
            self.__logger.debug('''(period, number): (%s, %s) => %s'''
                                % (r[0], r[1], dest_db))
                                
        conn.commit()
        cursor.close()
        conn.close()

src/twse_market_capitalization/schema.sql
drop table if exists TwseMarketCapitalization;

create table if not exists TwseMarketCapitalization 
(
    creation_dt datetime default current_timestamp,
    period datetime not null,
    number text,
    revision int default 0,
    unique (period, revision) on conflict ignore
);


10. Sourcing Monthly Wholesale Trade Report
src/wholesale_trade/__init__.py
src/wholesale_trade/sourcing.py
"""
<http://www.census.gov/wholesale/>
<http://www2.census.gov/wholesale/xls/mwts/historic1.xls>

Record format: (period_date, item, number)
Reference: Package xlrd - https://github.com/takluyver/xlrd/zipball/py3
"""
import logging
import os
import sqlite3
import xlrd

from ..common import logger
from ..common import sourcing_template

class Sourcing(sourcing_template.SourcingTemplate):

    def __init__(self):
        sourcing_template.SourcingTemplate.__init__(self)
        self.__logger = logging.getLogger()
        self.URLS = ['http://www2.census.gov/wholesale/xls/mwts/historic1.xls']
        self.LOCAL_DIR = './dataset/wholesale_trade/local/'
        self.DEFLATED_DIR = './dataset/wholesale_trade/deflated/'
        self.DB_FILE = './db/economicstotal.db'     
        self.ITEMS = [
            'sales',
            'inventories',
            'inventories_sales_ratios',
        ]
        self.SQL_INSERT = '''replace into
            MonthlyWholesaleTradeReport(period, item, number)
            values('%s', '%s', '%s')
            '''

    def source(self):
        sourcing_template.SourcingTemplate.source(self)
        self.source_deflated_to_sqlite_batch(self.DEFLATED_DIR, self.DB_FILE)

    def source_deflated_to_sqlite_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_file = os.path.join(src_dir, file)
            self.source_deflated_to_sqlite(src_file, dest_db)            

    def source_deflated_to_sqlite(self, src_file, dest_db):
        self.__logger.debug('''%s => %s''' % (src_file, dest_db))
        records = self.__build_db_records(src_file)
        self.__write_db(records, dest_db)            

    def __build_db_records(self, src_file):
        book = xlrd.open_workbook(src_file)
        sheet = book.sheet_by_index(0)

        data_rows = {}
        for i in range(sheet.nrows):
            if sheet.cell_value(i, 0) == 'NAICS Code':
                data_rows[i] = 'P' # Period
            elif sheet.cell_value(i, 0) == 42:
                data_rows[i] = 'N' # Number

        records = []
        curr_year = None
        curr_item = -1        
        for key in sorted(data_rows):
            if data_rows[key] == 'P':
                curr_year = sheet.cell_value(key, 3)
            if data_rows[key] == 'N':
                curr_item = (curr_item + 1) % 3
                for month in range(12):
                    r = [
                        '''%s-%02d-01''' % (curr_year, month + 1),
                        self.ITEMS[curr_item],
                        sheet.cell_value(key, month + 3),
                    ]
                    records.append(r)
        return records   
    
    def __write_db(self, records, dest_db):
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
      
        for r in records:
            if r[2] != 'NA':
                sql_cmd = self.SQL_INSERT % (r[0], r[1], r[2])
                cursor.execute(sql_cmd)
                self.__logger.debug('''(period, item, number): (%s, %s, %s) => %s'''
                                    % (r[0], r[1], r[2], dest_db))
        
        conn.commit()
        cursor.close()
        conn.close()

src/wholesale_trade/schema.sql
drop table if exists MonthlyWholesaleTradeReport;

create table if not exists MonthlyWholesaleTradeReport 
(
    creation_dt datetime default current_timestamp,
    period datetime not null,
    item text,
    number text,
    revision int default 0,
    unique (period, item, revision) on conflict ignore
);


11. Third party tools
src/thirdparty/sevenzip/7z.exe
src/thirdparty/wget/wget.exe

2012年9月21日 星期五

三顧茅廬沒找到豬哥亮


2012-09-18 => 無法申請 2012-10-20行程。白起床。

2012-09-20 => 無法申請 2012-10-21行程,自己白癡看錯時間。白起床。

2012-09-21 => 無法申請 2012-10-21行程。

主旨
網路申辦入園系統已修復
發佈日期
2012-09-21
發佈單位
保育研究課
內容
系統已修復並可正常使用,如有其它入園之疑問,敬請在本處留言版保育研究類留下訊息,我們將儘快為您服務,也謝謝各位登山者的體諒!


真是夠衰,第四次終於申請完畢 ((補單吧))。


下午又沒來著的被念,內容大抵是「系統過四個月才發現有問題,那請工讀生來做就行啦」。姑且不論事情原由是非對錯,也不探討到澳洲打工的機會成本,我只想說「馬後砲真容易」。不煩,也不生氣,倒是氣氛蠻凝結的。小根無所謂,小根很自私,比起工作,小根更擔心腳傷。


不過被念也很好喔,至少有向上的動力。不爽不要做,不是賭氣話,而是自己努力的目標

2012年9月19日 星期三

US CPI



失控的 CPI,可以想像 QE3 推下去會更加失控。


數據沒經過季節調整。關於季節調整知識,可參考以下說明:

Because price data are used for different purposes by different groups, the Bureau of Labor Statistics publishes seasonally adjusted as well as unadjusted changes each month.

For analyzing general price trends in the economy, seasonally adjusted changes are usually preferred since they eliminate the effect of changes that normally occur at the same time and in about the same magnitude every year--such as price movements resulting from changing climatic conditions, production cycles, model changeovers, holidays, and sales. 

The unadjusted data are of primary interest to consumers concerned about the prices they actually pay. Unadjusted data also are used extensively for escalation purposes. Many collective bargaining contract agreements and pension plans, for example, tie compensation changes to the Consumer Price Index before adjustment for seasonal variation. ((經濟學家或許太單純,怎麼可以撇開氣候、產品週期等等不談呢?))

以上由 Python + SQLite + Excel 產出,以下略為敘述細節。


Sourcing Phase 1: Download


"""
<http://www.bls.gov/cpi/>
<ftp://ftp.bls.gov/pub/time.series/cu/>
"""
import datetime
import os
import re
import sys

class MigrateWebToContent():

    def __init__(self):
        self.url = 'ftp://ftp.bls.gov/pub/time.series/cu/cu.data.1.AllItems'
        self.wget = os.path.abspath('../thirdparty/wget/wget.exe')
        assert os.path.isfile(self.wget)

    def migrate(self, dest_dir):
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)
        self.__wget(self.url, dest_dir)

    def __wget(self, url, dest_dir):
        dest_file = os.path.join(dest_dir, re.compile('ftp://').sub('', url))
        dest_file_dir = os.path.dirname(dest_file)
        if not os.path.exists(dest_file_dir):
            os.makedirs(dest_file_dir)
       
        wget_cmdline = '''%s \"%s\" --waitretry=3 -P %s''' % (self.wget, url, dest_file_dir)
        os.system(wget_cmdline)

def main():
    m = MigrateWebToContent()
    m.migrate('./content/')

if __name__ == '__main__':
    sys.exit(main())


Sourcing Phase 2: Rename

import logging
import os
import shutil
import sys

import logger

class MigrateContentToRenamed():

    def __init__(self):
        self.__logger = logging.getLogger()
    
    def migrate_batch(self, src_dir, dest_dir):
        assert os.path.isdir(src_dir)
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)

        for file in os.listdir(src_dir):
            shutil.copy(os.path.join(src_dir, file), os.path.join(dest_dir, file))

def main():
    logger.config_root(level=logging.INFO)
    m = MigrateContentToRenamed()
    m.migrate_batch('./content/ftp.bls.gov/pub/time.series/cu/',
                    './renamed/')
 
if __name__ == '__main__':
    sys.exit(main())


Sourcing Phase 3: Write to SQLite

"""
SQLite Schema:
drop table if exists CpiUs;

create table if not exists CpiUs 
(
    creation_dt datetime default current_timestamp,
    period datetime not null,
    number text,
    revision int default 0,
    unique (period, revision) on conflict ignore
);


U.S. All items, 1982-84=100 - CUUR0000SA0

Record example:
series_id year period value footnote_codes
CUUR0000SA0       2012 M06     229.478          
CUUR0000SA0       2012 M07     229.104          
CUUR0000SA0       2012 M08     230.379
"""
import logging
import os
import sqlite3
import sys

import logger

class MigrateRenamedToSqlite():

    def __init__(self):
        self.__logger = logging.getLogger()
        self.item_prefix = 'CUUR0000SA0'
        self.sql_insert = '''replace into
            CpiUs(period, number) values('%s', '%s')
            '''
        
    def migrate_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_txt = os.path.join(src_dir, file)
            self.migrate(src_txt, dest_db)

    def migrate(self, src_txt, dest_db):
        self.__logger.debug('''%s => %s''' % (src_txt, dest_db))
        records = self.__fetch_records(src_txt)
        self.__write_db(records, dest_db)

    def __fetch_records(self, src_txt):
        src_file_fd = open(src_txt, 'rb')
        content = src_file_fd.read()
        src_file_fd.close()

        records = []
        lines = content.decode('utf-8').split('\n')
        for line in lines:
            if line.startswith(self.item_prefix):
                tokens = line.split()
                assert len(tokens) == 4
                
                # M13 indicates annual averages => Ignore
                if tokens[2] == 'M13':
                    continue
                
                r = [ '''%s-%s-01''' % (tokens[1], tokens[2][1:]), tokens[3] ]
                records.append(r)
        return records     

    def __write_db(self, records, dest_db):
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
      
        for r in records:
            sql_cmd = self.sql_insert % (r[0], r[1])
            cursor.execute(sql_cmd)
            self.__logger.debug('''(period, number): (%s, %s) => %s'''
                                % (r[0], r[1], dest_db))
        
        conn.commit()
        cursor.close()
        conn.close()

def main():
    logger.config_root(level=logging.DEBUG)
    m = MigrateRenamedToSqlite()
    m.migrate_batch('./renamed', '../db/economicstotal.db')
 
if __name__ == '__main__':
    sys.exit(main())


Reporting: Excel

Easy. 

2012年9月18日 星期二

Capacity Utilization: Manufacturing, Mining, and Utilities

Link
http://www.federalreserve.gov/releases/g17/caputl.htm



Sourcing Phase 1: Download

"""
Data from January 1986 to present
Capacity Utilization: Manufacturing, Mining, and Utilities
<http://www.federalreserve.gov/releases/g17/ipdisk/utl_sa.txt>

Data through 1985
Capacity Utilization: Manufacturing, Mining, and Utilities
<http://www.federalreserve.gov/releases/g17/iphist/utlhist_sa.txt>
"""
import datetime
import os
import re
import sys

class MigrateWebToContent():

    def __init__(self):
        self.url_utl_sa = 'http://www.federalreserve.gov/releases/g17/ipdisk/utl_sa.txt'
        self.url_utlhist_sa = 'http://www.federalreserve.gov/releases/g17/iphist/utlhist_sa.txt'
        self.wget = os.path.abspath('../thirdparty/wget/wget.exe')
        assert os.path.isfile(self.wget)

    def migrate(self, dest_dir):
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)
        self.__wget(self.url_utl_sa, dest_dir)
        self.__wget(self.url_utlhist_sa, dest_dir)

    def __wget(self, url, dest_dir):
        dest_file = os.path.join(dest_dir, re.compile('https?://').sub('', url))
        dest_file_dir = os.path.dirname(dest_file)
        if not os.path.exists(dest_file_dir):
            os.makedirs(dest_file_dir)
     
        wget_cmdline = '''%s \"%s\" --waitretry=3 -P %s''' % (self.wget, url, dest_file_dir)
        os.system(wget_cmdline)

def main():
    m = MigrateWebToContent()
    m.migrate('./content/')

if __name__ == '__main__':
    sys.exit(main())




Sourcing Phase 2: Rename

import logging
import os
import shutil
import sys

import logger

class MigrateContentToRenamed():

    def __init__(self):
        self.__logger = logging.getLogger()
 
    def migrate_batch(self, src_dir, dest_dir):
        assert os.path.isdir(src_dir)
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)

        for file in os.listdir(src_dir):
            shutil.copy(os.path.join(src_dir, file), os.path.join(dest_dir, file))

def main():
    logger.config_root(level=logging.INFO)
    m = MigrateContentToRenamed()
    m.migrate_batch('./content/www.federalreserve.gov/releases/g17/ipdisk',
                    './renamed/')
    m.migrate_batch('./content/www.federalreserve.gov/releases/g17/iphist',
                    './renamed/')

if __name__ == '__main__':
    sys.exit(main())




Sourcing Phase 3: Write to Sqlite

"""
Schema.sql:

drop table if exists CapacityUtilization;

create table if not exists CapacityUtilization
(
    creation_dt datetime default current_timestamp,
    period_date datetime not null,
    number text,
    revision int default 0,
    unique (period_date, number) on conflict ignore
);

偷夾字在註解。

驟秋,肚子頗不舒服,心頭又有點兒煩悶。大概想她又不想她。還好有《金瓶梅》可看,這就是該死的三十歲人生,三分之一的棺材,沒什麼特別好說的。

且引一段話語敘述台灣困局。

規制恢弘,彷彿那給孤園黃金鋪地;雕鏤精製,依希似祇洹舍白玉為階。高閣摩空,旃檀氣直接九霄雲表;層基亙地,大雄殿可容千眾禪僧。兩翼嵬峨,盡是琳宮紺宇;廊房潔淨,果然精勝洞天。那時鐘鼓宣揚,盡道是寰中佛國;只這緇流濟楚,卻也像塵界人天。那知歲久年深,一瞬地時移事異。莽和尚縱酒撒潑,首壞清規;獃道人懶惰貪眠,不行打掃。漸成寂寞,斷絕門徒。以致凄涼,罕稀瞻仰。兼以烏鼠穿蝕,那堪風雨漂搖?棟宇摧頹,一而二,二而三,支撐摩計,墻垣柵塌,日復日,年復年,振起無人。朱紅櫺槅,拾來煨酒煨茶;合抱梁檻,拿去換鹽換米。風吹羅漢金消盡,雨打彌陀化作塵。吁嗟乎金碧焜炫,一旦為灌莽榛荊。

以上。以下繼續,這些廢文夾在註腳,程式可是能動哩。

Record format:

<http://www.federalreserve.gov/releases/g17/download.htm>

The format for each line in the files consists of an industry code,
a year, and 12 months of data (to four decimal places) when available.
The data have various start dates; the earliest is 1919.
"""
import logging
import os
import sqlite3
import sys

import logger

class MigrateRenamedToSqlite():

    def __init__(self):
        self.__logger = logging.getLogger()
        self.item_prefix = '''\"B50001\"'''
        self.sql_insert = '''replace into
            CapacityUtilization(period_date, number) values('%s', '%s')
            '''
     
    def migrate_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_txt = os.path.join(src_dir, file)
            self.migrate(src_txt, dest_db)

    def migrate(self, src_txt, dest_db):
        self.__logger.debug('''%s => %s''' % (src_txt, dest_db))
        records = self.__fetch_records(src_txt)
        self.__write_db(records, dest_db)

    def __fetch_records(self, src_txt):
        src_file_fd = open(src_txt, 'rb')
        content = src_file_fd.read()
        src_file_fd.close()

        records = []
        lines = content.decode('utf-8').split('\n')
        for line in lines:
            if line.startswith(self.item_prefix):
                tokens = line.split()
                assert len(tokens) > 1
                year = tokens[1]
                for i in range(2, len(tokens)):
                    r = [
                        '''%s-%02d-01''' % (year, i - 1),
                        tokens[i]
                    ]
                    records.append(r)
        return records  

    def __write_db(self, records, dest_db):
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
   
        for r in records:
            sql_cmd = self.sql_insert % (r[0], r[1])
            cursor.execute(sql_cmd)
            self.__logger.debug('''(period, number): (%s, %s) => %s'''
                                % (r[0], r[1], dest_db))
     
        conn.commit()
        cursor.close()
        conn.close()

def main():
    logger.config_root(level=logging.DEBUG)
    m = MigrateRenamedToSqlite()
    m.migrate_batch('./renamed', '../db/economicstotal.db')

if __name__ == '__main__':
    sys.exit(main())

Monthly Wholesale Trade Report

Link: http://www.census.gov/wholesale/



Sourcing Phase 1 - Download:


"""
Example:
    http://www2.census.gov/wholesale/xls/mwts/historic1.xls
"""
import os
import re
import sys

class MigrateWebToContent():

    def __init__(self):
        self.url = '''http://www2.census.gov/wholesale/xls/mwts/historic1.xls'''
        self.wget = os.path.abspath('../thirdparty/wget/wget.exe')
        assert os.path.isfile(self.wget)

    def migrate(self, dest_dir):
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)
        self.__wget(self.url, dest_dir)

    def __wget(self, url, dest_dir):
        dest_file = os.path.join(dest_dir, re.compile('https?://').sub('', url))
        dest_file_dir = os.path.dirname(dest_file)
        if not os.path.exists(dest_file_dir):
            os.makedirs(dest_file_dir)
     
        wget_cmdline = '''%s \"%s\" --waitretry=3 -P %s''' % (self.wget, url, dest_file_dir)
        os.system(wget_cmdline)

def main():
    m = MigrateWebToContent()
    m.migrate('./content/')

if __name__ == '__main__':
    sys.exit(main())



Sourcing Phase 2 - Rename:

import logging
import os
import shutil
import sys

import logger

class MigrateContentToRenamed():

    def __init__(self):
        self.__logger = logging.getLogger()
    
    def migrate_batch(self, src_dir, dest_dir):
        assert os.path.isdir(src_dir)
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)

        for file in os.listdir(src_dir):
            shutil.copy(os.path.join(src_dir, file), os.path.join(dest_dir, file))

def main():
    logger.config_root(level=logging.INFO)
    m = MigrateContentToRenamed()
    m.migrate_batch('./content/www2.census.gov/wholesale/xls/mwts',
                    './renamed/')
if __name__ == '__main__':
    sys.exit(main())



Sourcing Phase 3 - Write to SQLite:

"""
Schema:

drop table if exists MonthlyWholesaleTradeReport;

create table if not exists MonthlyWholesaleTradeReport 
(
    creation_dt datetime default current_timestamp,
    period datetime not null,
    item text,
    number text,
    revision int default 0,
    unique (period, item, number, revision) on conflict ignore
);


Record format: (period_date, item, number)
Reference: Package xlrd - https://github.com/takluyver/xlrd/zipball/py3
"""
import logging
import os
import sqlite3
import sys
import xlrd

import logger

class MigrateRenamedToSqlite():

    def __init__(self):
        self.__logger = logging.getLogger()
        self.items = [
            'sales',
            'inventories',
            'inventories_sales_ratios',
        ]
        self.sql_insert = '''replace into
            MonthlyWholesaleTradeReport(period, item, number)
            values('%s', '%s', '%s')
            '''
        
    def migrate_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)
        for file in os.listdir(src_dir):
            src_xls = os.path.join(src_dir, file)
            self.migrate(src_xls, dest_db)

    def migrate(self, src_xls, dest_db):
        self.__logger.debug('''%s => %s''' % (src_xls, dest_db))
        records = self.__fetch_records(src_xls)
        self.__write_db(records, dest_db)

    def __fetch_records(self, src_xls):
        book = xlrd.open_workbook(src_xls)
        sheet = book.sheet_by_index(0)

        data_rows = {}
        for i in range(sheet.nrows):
            if sheet.cell_value(i, 0) == 'NAICS Code':
                data_rows[i] = 'P' # Period
            elif sheet.cell_value(i, 0) == 42:
                data_rows[i] = 'N' # Number

        records = []
        curr_year = None
        curr_item = -1        
        for key in sorted(data_rows):
            if data_rows[key] == 'P':
                curr_year = sheet.cell_value(key, 3)
            if data_rows[key] == 'N':
                curr_item = (curr_item + 1) % 3
                for month in range(12):
                    r = [
                        '''%s-%02d-01''' % (curr_year, month + 1),
                        self.items[curr_item],
                        sheet.cell_value(key, month + 3),
                    ]
                    records.append(r)
        return records     

    def __write_db(self, records, dest_db):
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
      
        for r in records:
            if r[2] != 'NA':
                sql_cmd = self.sql_insert % (r[0], r[1], r[2])
                cursor.execute(sql_cmd)
                #print(sql_cmd)
                self.__logger.debug('''(period, item, number): (%s, %s, %s) => %s'''
                                    % (r[0], r[1], r[2], dest_db))
        
        conn.commit()
        cursor.close()
        conn.close()

def main():
    logger.config_root(level=logging.DEBUG)
    m = MigrateRenamedToSqlite()
    m.migrate_batch('./renamed', '../db/economicstotal.db')
if __name__ == '__main__':
    sys.exit(main())

2012年9月17日 星期一

Advance Monthly Retail Trade Report

Link

http://www.census.gov/retail/



Sourcing Phase 1: Download


"""
Example:
    http://www.census.gov/econ/currentdata/export/csv?programCode=MARTS&timeSlotType=12&startYear=1992&endYear=2012&categoryCode=44X72&dataTypeCode=SM&geoLevelCode=US&adjusted=yes&errorData=no&internal=false

"U.S. Census Bureau"
"Source: Advance Monthly Sales for Retail and Food Services"
"44X72: Retail Trade and Food Services: U.S. Total"
"Seasonally Adjusted Sales - Monthly [Millions of Dollars]"
"""
import datetime
import os
import re
import sys

class MigrateWebToContent():

    def __init__(self):
        self.url_template = \
            '''http://www.census.gov/econ/currentdata/export/csv?programCode=MARTS&timeSlotType=12&startYear=%d&endYear=%d&categoryCode=44X72&dataTypeCode=SM&geoLevelCode=US&adjusted=yes&errorData=no&internal=false'''
        self.wget = os.path.abspath('../thirdparty/wget/wget.exe')
        assert os.path.isfile(self.wget)

    """
    query
        'startYear'
        'endYear'
    """
    def migrate(self, query, dest_dir):
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)
        url = self.__make_url(query)
        self.__wget(url, dest_dir)

    def __make_url(self, query):
        return self.url_template % (query['startYear'], query['endYear'])

    def __wget(self, url, dest_dir):
        dest_file = os.path.join(dest_dir, re.compile('https?://').sub('', url))
        dest_file_dir = os.path.dirname(dest_file)
        if not os.path.exists(dest_file_dir):
            os.makedirs(dest_file_dir)
     
        wget_cmdline = '''%s \"%s\" --waitretry=3 -P %s''' % (self.wget, url, dest_file_dir)
        os.system(wget_cmdline)

def main():
    q = {
        'startYear' : 1992,
        'endYear' : datetime.datetime.now().year,
    }
    m = MigrateWebToContent()
    m.migrate(q, '''./content/''')

if __name__ == '__main__':
    sys.exit(main())



Sourcing Phase 2: Rename

import logging
import os
import shutil
import sys

import logger

class MigrateContentToRenamed():

    def __init__(self):
        self.__logger = logging.getLogger()
        self.category_code = [
            '44X72', # Retail Trade and Food Services: U.S. Total
        ]
    
    def migrate_batch(self, src_dir, dest_dir):
        assert os.path.isdir(src_dir)
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)

        for file in os.listdir(src_dir):
            args = self.__parse_args(file)
            category_code = args['categoryCode']
            if category_code not in self.category_code:
                self.__logger.error('''No such category_code. File => %s''' % file)
                continue
            adjusted_str = self.__adjusted_str(args['adjusted'])
            cvs_file = '''%s_%s_%s_%s.csv''' % (
                args['categoryCode'], args['startYear'], args['endYear'], adjusted_str
            )
            shutil.copy(os.path.join(src_dir, file), os.path.join(dest_dir, cvs_file))

    def __parse_args(self, args_line):
        args = {}
        for kvs in args_line.split('&'):
            kv = kvs.split('=')
            args[kv[0]] = kv[1]
        return args

    def __adjusted_str(self, adjusted):
        if adjusted == 'yes':
            return 'adjusted'
        else:
            return 'nonadjusted'


        
def main():
    logger.config_root(level=logging.INFO)
    m = MigrateContentToRenamed()
    m.migrate_batch('./content/www.census.gov/econ/currentdata/export',
                    './renamed/')
if __name__ == '__main__':
    sys.exit(main())



Sourcing Phase 3: Write to SQLite

"""
Schema:
drop table if exists AdvanceMonthlyRetailTradeReport;

create table if not exists AdvanceMonthlyRetailTradeReport 
(
    creation_dt datetime default current_timestamp,
    period datetime not null,
    number text,
    revision int default 0,
    unique (period, number, revision) on conflict ignore
);

CSV content:

"U.S. Census Bureau"
"Source: Advance Monthly Sales for Retail and Food Services"
"44X72: Retail Trade and Food Services: U.S. Total"
"Seasonally Adjusted Sales - Monthly [Millions of Dollars]"
"Period: 1992 to 2012"
"Data Extracted on: September 17, 2012 (9:52 pm)"

Period,Value
Jan-1992,164083
Feb-1992,164260
Mar-1992,163747
Apr-1992,164759
May-1992,165610
Jun-1992,166089
"""
import csv
import datetime
import logging
import os
import sqlite3
import sys

import logger

class MigrateRenamedToSqlite():

    def __init__(self):
        self.__logger = logging.getLogger()
        self.sql_insert = '''insert or ignore into
            AdvanceMonthlyRetailTradeReport(period, number) values('%s', '%s')
            '''

    def migrate_batch(self, src_dir, dest_db):
        assert os.path.isdir(src_dir)

        for file in os.listdir(src_dir):
            self.migrate(os.path.join(src_dir, file), dest_db)
            
    def migrate(self, src_csv, dest_db):
        self.__logger.debug('''%s => %s''' % (src_csv, dest_db))
        assert os.path.isfile(src_csv)
        assert os.path.isfile(dest_db)
        
        conn = sqlite3.connect(dest_db)
        cursor = conn.cursor()
        is_began = False
        csv_reader = csv.reader(open(src_csv, 'r'))
        for row in csv_reader:
            if not is_began:
                if len(row) is 2:
                    is_began = True
                    continue
            elif row[1] != 'NA':
                period = self.__period_str(row[0])
                number = row[1]
                sql_cmd = self.sql_insert % (period, number)
                cursor.execute(sql_cmd)

                self.__logger.debug('''(period, value): (%s, %s) => %s'''
                                    % (period, number, dest_db))
        conn.commit()
        cursor.close()
        conn.close()
    
    def __period_str(self, period):
        return datetime.datetime.strptime(period, '%b-%Y').strftime('%Y-%m-%d')

       

def main():
    logger.config_root(level=logging.DEBUG)
    m = MigrateRenamedToSqlite()
    m.migrate_batch('./renamed/', '../db/economicstotal.db')
if __name__ == '__main__':
    sys.exit(main())



Analyzing: Excel

Step 1: Open SQLite2009 Pro => Execute SQL command:

SELECT creation_dt
     , period
     , number
FROM [AdvanceMonthlyRetailTradeReport]
order by period


Step 2: In Result tab, right click => Export to Excel.

Step 3: Calculate YOY.

Step 4: Select period, number, yoy columns => Insert => Line Charts => Format YOY Data Series => Series Options => Plot Series On => Secondary Axis