2006-08-08 @ 07:42:42 · 作者 volcano · 归类于 MYSQL 你可能会感兴趣的内容 混合使用SQL和shell命令 在命令行里执行mysql的sql PDO_MYSQL的一些预定义常量 Sun今日宣布收购Mysql MySQL Proxy的Alpha版本发布
mysql 5.1已经到了beta版,官方网站上也陆续有一些文章介绍,比如上次看到的Improving Database Performance with Partitioning。在使用分区的前提下,可以用mysql实现非常大的数据量存储。今天在mysql的站上又看到一篇进阶的文章 —— 按日期分区存储。如果能够实现按日期分区,这对某些时效性很强的数据存储是相当实用的功能。下面是从这篇文章中摘录的一些内容。 错误的按日期分区例子
最直观的方法,就是直接用年月日这种日期格式来进行常规的分区: PLAIN TEXT CODE: mysql> create table rms (d date) -> partition by range (d) -> (partition p0 values less than ('1995-01-01'), -> partition p1 VALUES LESS THAN ('2010-01-01'));
ERROR 1064 (42000): VALUES value must be of same type as partition function near '), partition p1 VALUES LESS THAN ('2010-01-01'))' at line 3
上述分区方式没有成功,而且明显的不经济,老练的DBA会用整型数值来进行分区: PLAIN TEXT CODE: mysql> CREATE TABLE part_date1 -> ( c1 int default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL) engine=myisam -> partition by range (cast(date_format(c3,'%Y%m%d') as signed)) -> (PARTITION p0 VALUES LESS THAN (19950101), -> PARTITION p1 VALUES LESS THAN (19960101) , -> PARTITION p2 VALUES LESS THAN (19970101) , -> PARTITION p3 VALUES LESS THAN (19980101) , -> PARTITION p4 VALUES LESS THAN (19990101) , -> PARTITION p5 VALUES LESS THAN (20000101) , -> PARTITION p6 VALUES LESS THAN (20010101) , -> PARTITION p7 VALUES LESS THAN (20020101) , -> PARTITION p8 VALUES LESS THAN (20030101) , -> PARTITION p9 VALUES LESS THAN (20040101) , -> PARTITION p10 VALUES LESS THAN (20100101), -> PARTITION p11 VALUES LESS THAN MAXVALUE ); Query OK, 0 rows affected (0.01 sec)
搞定?接着往下分析
PLAIN TEXT CODE: mysql> explain partitions -> select count(*) from part_date1 where -> c3> date '1995-01-01' and c3 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: part_date1 partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 8100000 Extra: Using where 1 row in set (0.00 sec)
看个例子: PLAIN TEXT CODE: mysql> CREATE TABLE part_date3 -> ( c1 int default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL) engine=myisam -> partition by range (to_days(c3)) -> (PARTITION p0 VALUES LESS THAN (to_days('1995-01-01')), -> PARTITION p1 VALUES LESS THAN (to_days('1996-01-01')) , -> PARTITION p2 VALUES LESS THAN (to_days('1997-01-01')) , -> PARTITION p3 VALUES LESS THAN (to_days('1998-01-01')) , -> PARTITION p4 VALUES LESS THAN (to_days('1999-01-01')) , -> PARTITION p5 VALUES LESS THAN (to_days('2000-01-01')) , -> PARTITION p6 VALUES LESS THAN (to_days('2001-01-01')) , -> PARTITION p7 VALUES LESS THAN (to_days('2002-01-01')) , -> PARTITION p8 VALUES LESS THAN (to_days('2003-01-01')) , -> PARTITION p9 VALUES LESS THAN (to_days('2004-01-01')) , -> PARTITION p10 VALUES LESS THAN (to_days('2010-01-01')), -> PARTITION p11 VALUES LESS THAN MAXVALUE ); Query OK, 0 rows affected (0.00 sec)
以to_days()函数分区成功,我们分析一下看看: PLAIN TEXT CODE: mysql> explain partitions -> select count(*) from part_date3 where -> c3> date '1995-01-01' and c3 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: part_date3 partitions: p1 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 808431 Extra: Using where 1 row in set (0.00 sec)
可以看到,mysql优化器这次不负众望,仅仅在p1分区进行查询。在这种情况下查询,真的能够带来提升查询效率么?下面分别对这次建立的part_date3和之前分区失败的part_date1做一个查询对比: PLAIN TEXT CODE: mysql> select count(*) from part_date3 where -> c3> date '1995-01-01' and c3 +----------+ | count(*) | +----------+ | 805114 | +----------+ 1 row in set (4.11 sec)
mysql> select count(*) from part_date1 where -> c3> date '1995-01-01' and c3 +----------+ | count(*) | +----------+ | 805114 | +----------+ 1 row in set (40.33 sec)
CREATE TABLE measurement_yy04mm02 ( CHECK ( logdate >= DATE ‘2004-02-01′ AND logdate < DATE ‘2004-03-01′ ) )
INHERITS (measurement);
CREATE TABLE measurement_yy04mm03 ( CHECK ( logdate >= DATE ‘2004-03-01′ AND logdate < DATE ‘2004-04-01′ ) )
INHERITS (measurement);
...
CREATE TABLE measurement_yy05mm11 ( CHECK ( logdate >= DATE ‘2005-11-01′ AND logdate < DATE ‘2005-12-01′ ) )
INHERITS (measurement);
CREATE TABLE measurement_yy05mm12 ( CHECK ( logdate >= DATE ‘2005-12-01′ AND logdate < DATE ‘2006-01-01′ ) )
INHERITS (measurement);
CREATE TABLE measurement_yy06mm01 ( CHECK ( logdate >= DATE ‘2006-01-01′ AND logdate < DATE ‘2006-02-01′ ) )
INHERITS (measurement); 我们可能还需要在键字字段上有索引:
CREATE INDEX measurement_yy04mm02_logdate ON measurement_yy04mm02 (logdate);
CREATE INDEX measurement_yy04mm03_logdate ON measurement_yy04mm03 (logdate);
...
CREATE INDEX measurement_yy05mm11_logdate ON measurement_yy05mm11 (logdate);
CREATE INDEX measurement_yy05mm12_logdate ON measurement_yy05mm12 (logdate);
CREATE INDEX measurement_yy06mm01_logdate ON measurement_yy06mm01 (logdate); 我们选择先不建立更多的索引。 如果数据只进入最新的分区,我们可以设置一个非常简单的规则来插入数据。 我们必须每个月都重新定义这个规则,这样它总是指向当前分区。
CREATE OR REPLACE RULE measurement_current_partition AS ON INSERT TO measurement DO INSTEAD INSERT INTO