wordpress 数据库postgresql迁移至mysql

环境: Ubuntu 14.04, postgresql 9.3, mysql 5.5, wordpress 4.7.1

之前用wordpress 3.4.2和postgresql, pg4wp搭建一个个人博客网站,后来想升级时发现pg4wp从1.3.1之后再也没升级过,这个比较坑了,后续的wordpress都没法再升级上去。没办法,硬着头皮把postgresql替换成mysql。

#postgresql operations,从postgresql上导出数据
\copy wp_postmeta to ‘/home/jerry/postmeta.txt’;
\copy wp_posts ‘/home/jerry/posts.txt’;
\copy wp_links ‘/home/jerry/links.txt’;
\copy wp_terms ‘/home/jerry/terms.txt’;
\copy wp_term_relationships ‘/home/jerry/terms_relationships.txt’;
\copy wp_term_taxonomy ‘/home/jerry/terms_taxonomy.txt’;

 

#mysql operations,新建数据库wordpress, 并导入之前的数据
CREATE DATABASE wordpress DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

delete from wp_postmeta;
delete from wp_posts;
delete from wp_links;
delete from wp_terms;
delete from wp_term_relationships;
delete from wp_term_taxonomy;

load data infile ‘/home/jerry/postmeta.txt’ into table wp_postmeta;
load data infile ‘/home/jerry/posts.txt’ into table wp_posts;
load data infile ‘/home/jerry/links.txt’ into table wp_links;
load data infile ‘/home/jerry/terms.txt’ into table wp_terms;
load data infile ‘/home/jerry/terms_relationships.txt’ into table wp_term_relationships;
load data infile ‘/home/jerry/terms_taxonomy.txt’ into table wp_term_taxonomy;

mysql 5.5 load data问题

环境: Ubuntu 14.04,  mysql 5.5

使用mysql导入数据

load data infile ‘/home/jerry/aa.txt’ into table t1;

发现有两个问题:

  1. mysql环境变量secure_file_priv, 只有这个目录才能存放数据并导入到mysql内。查看路径  show variables like ‘%secure_file_priv%’; (但仍然无效)。设置其它路径, 在/etc/mysql/my.conf上[mysqld]下面添加一行了secure-file-priv = “”,重启mysql,仍无效。
  2. /etc/apparmor.d/usr.sbin.mysqld, 这个mysql是用来设置文件的读写权限,在这个文件的底部添加如下两行。

/home/jerry/ r,
/home/jerry/* rw,

sudo /etc/init.d/apparmor reload 重新加载下

 

目前看来mysql的文件的权限设置这块非常严格。

 

Pentaho BI Server实现同比和环比

环境:  Pentaho 5.3, postgresql 9.3

最近在看pentaho report designer和CDE有没有实现类似同比和环比的功能,可惜的是没有找到。那只好从数据库的角度来解决这个问题。

假如有两表test, dim_date

test:

20140818;4
20150817;40
20150818;10
20150819;55
20160817;30
20160818;50

dim_date:

20150104;”2015年01月04日”;”2015年”;”第01月”;”2015-01-04″;”第1周”
20150103;”2015年01月03日”;”2015年”;”第01月”;”2015-01-03″;”第1周”
20150102;”2015年01月02日”;”2015年”;”第01月”;”2015-01-02″;”第1周”

 

通过olap窗口函数lag,语句实现如下:

select * from (
select date_id, volume, lag(volume, 1) over (order by date_fmt) pre_volume, lag(volume, 2) over (order by date_fmt) pre_365_volume from (
select b.date_id, b.date_fmt, a.volume from test a, dim_date b where a.date_id = b.date_id and b.date_fmt in(to_date(‘2016-08-18’, ‘yyyy-mm-dd’) – interval ‘1 day’, ‘2016-08-18’, to_date(‘2016-08-18’, ‘yyyy-mm-dd’) – interval ‘1 year’)
) m
) n where n.date_id = 20160818

可查实现查询日期的前一天和前一年同一日期的数据,这样就可以实现同比和环比的功能。

提取postgresql备份数据库内的数据

环境:CentOS 6.4, Postgresql9.4

今天一不小心把一张表的数据直接truncate掉了,头脑顿时发黑。 庆幸的是还有一份前几天的数据备份。

步骤如下:

找出对应表装载数据对应的行

cat alldb20150717.sql | grep -C number “COPY tab_stats”

提取出该行往后3000行的数据

cat alldb20150717.sql | grep -A3000 “COPY tab_stats”

 

postgresql压力测试pgbench

环境:CentOS 6.5, postgresql 9.4.2

pgbench是一款oltp压力测试软件, 使用TPC-B来模拟压力测试。

使用如下:

./pgbench –help
pgbench is a benchmarking tool for PostgreSQL.

Usage:
pgbench [OPTION]… [DBNAME]

Initialization options:
-i, –initialize invokes initialization mode
-F, –fillfactor=NUM set fill factor
-n, –no-vacuum do not run VACUUM after initialization
-q, –quiet quiet logging (one message each 5 seconds)
-s, –scale=NUM scaling factor
–foreign-keys create foreign key constraints between tables
–index-tablespace=TABLESPACE
create indexes in the specified tablespace
–tablespace=TABLESPACE create tables in the specified tablespace
–unlogged-tables create tables as unlogged tables

Benchmarking options:
-c, –client=NUM number of concurrent database clients (default: 1)
-C, –connect establish new connection for each transaction
-D, –define=VARNAME=VALUE
define variable for use by custom script
-f, –file=FILENAME read transaction script from FILENAME
-j, –jobs=NUM number of threads (default: 1)
-l, –log write transaction times to log file
-M, –protocol=simple|extended|prepared
protocol for submitting queries (default: simple)
-n, –no-vacuum do not run VACUUM before tests
-N, –skip-some-updates skip updates of pgbench_tellers and pgbench_branches
-P, –progress=NUM show thread progress report every NUM seconds
-r, –report-latencies report average latency per command
-R, –rate=NUM target rate in transactions per second
-s, –scale=NUM report this scale factor in output
-S, –select-only perform SELECT-only transactions
-t, –transactions=NUM number of transactions each client runs (default: 10)
-T, –time=NUM duration of benchmark test in seconds
-v, –vacuum-all vacuum all four standard tables before tests
–aggregate-interval=NUM aggregate data over NUM seconds
–sampling-rate=NUM fraction of transactions to log (e.g. 0.01 for 1%)

Common options:
-d, –debug print debugging output
-h, –host=HOSTNAME database server host or socket directory
-p, –port=PORT database server port number
-U, –username=USERNAME connect as specified database user
-V, –version output version information, then exit
-?, –help show this help, then exit

Report bugs to <pgsql-bugs@postgresql.org>.

 

先创建测试数据库benchdb,

./pgbench -i benchdb

./pgbench -c 50 -t 10 -r benchdb

starting vacuum…end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 500/500
latency average: 0.000 ms
tps = 119.220347 (including connections establishing)
tps = 124.124611 (excluding connections establishing)
statement latencies in milliseconds:
0.007204 \\set nbranches 1 * :scale
0.001772 \\set ntellers 10 * :scale
0.001308 \\set naccounts 100000 * :scale
0.001818 \\setrandom aid 1 :naccounts
0.001568 \\setrandom bid 1 :nbranches
0.001390 \\setrandom tid 1 :ntellers
0.001702 \\setrandom delta -5000 5000
1.218338 BEGIN;
1.219992 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.406070 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
257.778830 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
53.106822 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.349952 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
7.491730 END;

Oracle 表连接 筛选字段执行计划不正确

问题: 表SUMM_ADV_CONSUME是分区表,发出一个查询如下
select count(*) from  dates T434858, SUMM_ADV_CONSUME T434932 where  “T434858.DATE_ID” = T434932.DATE_ID and T434858.DATE_NAME = ‘20131202’

产生的执行计划扫描很多分区表,正常情况是一个分区表。使用同样的逻辑查询,用另个筛选条件date_name2

select count(*) from  dates T434858, SUMM_ADV_CONSUME T434932 where  “T434858.DATE_ID” = T434932.DATE_ID and T434858.DATE_NAME2 = to_date(‘20131105’, ‘yyyymmdd’)
只扫描一个分区表,执行计划正确。 分析两个字段发现date_name2上有建唯一索引。 对date_name创建唯一索引也能得出正确的执行计划。

数据仓库中从mysql导数据到oracle

在数据仓库etl过程会有许多不同的数据源从dw导数据, 以mysql数据源为例分几种方法来导入:

1. 借助etl工具本身来导入

优点: 开发效率高,直接表映射
缺点:etl工具本身license, 加载数据的效率低

2. 借助oracle gateway拉取mysql内的数据

优点:开发效率高,只需配置
缺点:如果数据量比较多的话拉取有瓶颈,不会使用谓语下推操作

3. 借助NFS将mysql数据导入,然后通过sqlldr加载NFS上的数据文件

优点:加载数据快, 直接使用原生态的导出和加载
缺点:配置麻烦

Oracle expdp导出多表或表中的部分数据

环境:Oracle database 11gR2,  RHEL 5.4
导出表中的部分(query):
expdp hmt/123456 dumpfile=hmtdb20130301_hmt_log.dmp directory=hmt_dir tables = fct_hmt_log query=fct_hmt_log:'”where show_date <= 20121010″‘
注意:有单引号包含双引号
导出多个表(tables):
expdp hmt/123456 dumpfile=hmtdb20130301.dmp directory=hmt_dir tables = DATES, DIM_CITY_PROVINCE, DIM_DOMAIN_TYPE, DIM_DOMAIN_TYPE_R, DURATIONS,HOURS