星期三, 2019-04-24 05:41 — adminshiping1

mysql：distinct与group by 效率对比

2017年08月25日 11:23:34 江流川阅读数：609

在一个有10w条记录的表中进行查询，

distict耗时： 0.078ms

group by 耗时：0.031ms

给查询的列添加索引之后：
distict耗时： 0.00072550ms
group by 耗时：0.00071650ms

所以不管咋滴，group by都要比distinct要快。

关于索引更多详情，点击这里

distict耗时：0.078ms

group by 耗时：0.031ms

来自 https://blog.csdn.net/fgx_123456/article/details/77568755

MySQL中distinct和group by性能比较

MySQL中distinct和group by性能比较[转]

之前看了网上的一些测试，感觉不是很准确，今天亲自测试了一番。得出了结论(仅在个人计算机上测试，可能不全面，仅供参考)

测试过程：

准备一张测试表

1    CREATE TABLE `test_test` (
2      `id` int(11) NOT NULL auto_increment,
3      `num` int(11) NOT NULL default '0',
4      PRIMARY KEY  (`id`)
5     ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

建个储存过程向表中插入10W条数据

01    create procedure p_test(pa int(11)) 
02     begin 
03       
04      declare max_num int(11) default 100000; 
05      declare i int default 0; 
06      declare rand_num int; 
07       
08      select count(id) into max_num from test_test; 
09       
10      while i < pa do 
11              if max_num < 100000 then 
12                      select cast(rand()*100 as unsigned) into rand_num; 
13                      insert into test_test(num)values(rand_num); 
14              end if; 
15              set i = i +1; 
16      end while; 
17     end

调用存储过程插入数据

1    call p_test(100000);

开始测试：（不加索引）

01    select distinct num from test_test; 
02    select num from test_test group by num; 
03     
04    [SQL] select distinct num from test_test;
05    受影响的行: 0
06    时间: 0.078ms
07     
08    [SQL]  
09    select num from test_test group by num;
10    受影响的行: 0
11    时间: 0.031ms

二、num字段上创建索引

1    ALTER TABLE `test_test` ADD INDEX `num_index` (`num`) ;

再次查询

01    select distinct num from test_test; 
02    select num from test_test group by num; 
03    [SQL] select distinct num from test_test;
04    受影响的行: 0
05    时间: 0.000ms
06     
07    [SQL]  
08    select num from test_test group by num;
09    受影响的行: 0
10    时间: 0.000ms

这时候我们发现时间太小了 0.000秒都无法精确了。

我们转到命令行下测试

01    mysql> set profiling=1;
02    mysql> select distinct(num) from test_test; 
03    mysql> select num from test_test group by num;
04    mysql> show profiles;
05    +----------+------------+----------------------------------------+
06    | Query_ID | Duration   | Query                                  |
07    +----------+------------+----------------------------------------+
08    |        1 | 0.00072550 | select distinct(num) from test_test    |
09    |        2 | 0.00071650 | select num from test_test group by num |
10    +----------+------------+----------------------------------------+

加了索引之后 distinct 比没加索引的 distinct 快了 107倍。

加了索引之后 group by 比没加索引的 group by 快了 43倍。

再来对比：distinct 和 group by

不管是加不加索引 group by 都比 distinct 快。因此使用的时候建议选 group by

来自 https://www.cnblogs.com/zox2011/archive/2012/09/12/2681797.html

group by与distinct效率分析及优化措施

2016年08月12日 12:52:58 DBTomato 阅读数：36923 标签： mysql 更多

个人分类： SQL

如何使用group by进行去重

因为mysql的distinct在结果集中，全部不同，才可以去重。

所以，当我们进行去重处理的时候，需要单独对某列进行去重，可以使用group by子句进行分组去重

select _auto_id from account_login group by _auto_id; 该语句可以对_auto_id列进行去重。

在使用group by进行去重效率分析

无索引

0.23s

mysql> explain select _auto_id from account_login group by _auto_id;
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table         | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | SIMPLE      | account_login | ALL  | NULL          | NULL | NULL    | NULL | 133257 | Using temporary; Using filesort |
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+

mysql> show profile;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000154 |
| checking permissions | 0.000012 |
| Opening tables       | 0.000029 |
| init                 | 0.000029 |
| System lock          | 0.000014 |
| optimizing           | 0.000010 |
| statistics           | 0.000021 |
| preparing            | 0.000020 |
| Creating tmp table   | 0.000036 |
| Sorting result       | 0.000007 |
| executing            | 0.000005 |
| Sending data         | 0.207841 |
| Creating sort index  | 0.021024 |
| end                  | 0.000010 |
| removing tmp table   | 0.000130 |
| end                  | 0.000010 |
| query end            | 0.000016 |
| closing tables       | 0.000019 |
| freeing items        | 0.000035 |
| cleaning up          | 0.000039 |
+----------------------+----------+
20 rows in set, 1 warning (0.00 sec)

此处创建了sort index进行排序，说明对MySQL使用了内存临时表，group by后面的排序过程是使用sort index来完成的，而且该内存临时表的大小是由MAX_HEAP_TABLE_SIZE来控制。

Sending data 显示的这个时间 = Time(Sending data) + Time (Sorting result), 这样其实应该是排序所用的时间

因为在group by后会进行自动排序，如果该我们仅仅想去重，而不需要排序，可以使用

mysql> explain select _auto_id from account_login group by _auto_id order by null;
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+
| id | select_type | table         | type | possible_keys | key  | key_len | ref  | rows   | Extra           |
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+
|  1 | SIMPLE      | account_login | ALL  | NULL          | NULL | NULL    | NULL | 133257 | Using temporary |
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000155 |
| checking permissions | 0.000012 |
| Opening tables       | 0.000029 |
| init                 | 0.000029 |
| System lock          | 0.000014 |
| optimizing           | 0.000009 |
| statistics           | 0.000022 |
| preparing            | 0.000020 |
| Creating tmp table   | 0.000042 |
| executing            | 0.000006 |
| Sending data         | 0.219640 |
| end                  | 0.000021 |
| removing tmp table   | 0.000014 |
| end                  | 0.000008 |
| query end            | 0.000014 |
| closing tables       | 0.000020 |
| freeing items        | 0.000033 |
| cleaning up          | 0.000020 |
+----------------------+----------+

可以发现，在加入order by null子句后，MySQL并没有创建sort index进行排序（内存排序非常快，优化效果并不明显，并且这个阶段只是每个数据块的排序，）。但是在group by后添加多列，并且不能进行

有索引

mysql> explain select _auto_id from account_login group by _auto_id;

使用时间 0.11s

执行计划

+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table         | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | account_login | index | idx_acc       | idx_acc | 4       | NULL | 133257 | Using index |
+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------------+

profile

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000140 |
| checking permissions | 0.000011 |
| Opening tables       | 0.000027 |
| init                 | 0.000028 |
| System lock          | 0.000014 |
| optimizing           | 0.000009 |
| statistics           | 0.000035 |
| preparing            | 0.000028 |
| Sorting result       | 0.000006 |
| executing            | 0.000005 |
| Sending data         | 0.105595 |
| end                  | 0.000012 |
| query end            | 0.000013 |
| closing tables       | 0.000015 |
| freeing items        | 0.000026 |
| cleaning up          | 0.000034 |
+----------------------+----------+

explain select _auto_id from account_login group by _auto_id 时间0.11s

explain select _auto_id from account_login group by _auto_id order by null 时间0.11s

在使用索情况下，因为使用了索引自身的有序性，所以不需MySQL再次创建临时表（create sort index）进行排序，可以直接输出有序结果，两者的计算时间相同。

正常使用场景效率分析

mysql> explain select _auto_id,max(date) from account_login group by _auto_id;

没有索引

用时 3.16s

+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table         | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | SIMPLE      | account_login | ALL  | NULL          | NULL | NULL    | NULL | 133257 | Using temporary; Using filesort |
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+

mysql> show profile;

+---------------------------+----------+
| Status                    | Duration |
+---------------------------+----------+
| starting                  | 0.000111 |
| checking permissions      | 0.000010 |
| Opening tables            | 0.000018 |
| init                      | 0.000030 |
| System lock               | 0.000011 |
| optimizing                | 0.000007 |
| statistics                | 0.000014 |
| preparing                 | 0.000013 |
| Creating tmp table        | 0.000037 |
| Sorting result            | 0.000007 |
| executing                 | 0.000005 |
| Sending data              | 0.545211 |
| converting HEAP to MyISAM | 1.307225 |
| Sending data              | 0.738511 |
| Creating sort index       | 0.573640 |
| end                       | 0.000020 |
| removing tmp table        | 0.001682 |
| end                       | 0.000009 |
| query end                 | 0.000012 |
| closing tables            | 0.000016 |
| freeing items             | 0.000030 |
| logging slow query        | 0.000051 |
| cleaning up               | 0.000018 |
+---------------------------+----------+

在group by过程中，先使用sort index对group by子句进行处理，然后创建临时表，然后转换到磁盘临时表使用文件排序取出max(date)

如果group by后面列数过多（即使不排序），也是会用converting HEAP to MyISAM

converting HEAP to MyISAM 该语句表明了在执行过程中，内存临时表转变成了硬盘临时表。可以使用 tmp_table_size，MAX_HEAP_TABLE_SIZE来改变内存临时表的最大大小，但是在该SQL下，因为要使用文件排序，所以无论内存临时表设置多大，都会进行内存临时表到文件临时表的转变。

有索引情况

时间 0.31s

mysql> explain select _auto_id,max(date) from account_login group by _auto_id;

+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------+
| id | select_type | table         | type  | possible_keys | key     | key_len | ref  | rows   | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------+
|  1 | SIMPLE      | account_login | index | idx_acc       | idx_acc | 4       | NULL | 133257 | NULL  |
+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------+

profile

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000109 |
| checking permissions | 0.000010 |
| Opening tables       | 0.000022 |
| init                 | 0.000031 |
| System lock          | 0.000012 |
| optimizing           | 0.000007 |
| statistics           | 0.000021 |
| preparing            | 0.000022 |
| Sorting result       | 0.000006 |
| executing            | 0.000005 |
| Sending data         | 0.314817 |
| end                  | 0.000024 |
| query end            | 0.000015 |
| closing tables       | 0.000032 |
| freeing items        | 0.000042 |
| cleaning up          | 0.000023 |
+----------------------+----------+

在有索引的情况下，仅仅靠索引本身就完成了全部需求。

distinct进行分析

explain select distinct(_auto_id) from account_login;

+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+
| id | select_type | table         | type | possible_keys | key  | key_len | ref  | rows   | Extra           |
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+
|  1 | SIMPLE      | account_login | ALL  | NULL          | NULL | NULL    | NULL | 133257 | Using temporary |
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+

mysql> show profile;

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000087 |
| checking permissions | 0.000009 |
| Opening tables       | 0.000016 |
| init                 | 0.000016 |
| System lock          | 0.000011 |
| optimizing           | 0.000007 |
| statistics           | 0.000013 |
| preparing            | 0.000014 |
| Creating tmp table   | 0.000026 |
| executing            | 0.000006 |
| Sending data         | 0.221214 |
| end                  | 0.000024 |
| removing tmp table   | 0.000190 |
| end                  | 0.000011 |
| query end            | 0.000014 |
| closing tables       | 0.000019 |
| freeing items        | 0.000036 |
| cleaning up          | 0.000024 |
+----------------------+----------+

select distinct _auto_id,sid,uid from account_login;
+---------------------------+----------+
| Status                    | Duration |
+---------------------------+----------+
| starting                  | 0.000095 |
| checking permissions      | 0.000010 |
| Opening tables            | 0.000019 |
| init                      | 0.000019 |
| System lock               | 0.000010 |
| optimizing                | 0.000006 |
| statistics                | 0.000015 |
| preparing                 | 0.000016 |
| Creating tmp table        | 0.000030 |
| executing                 | 0.000006 |
| Sending data              | 0.529466 |
| converting HEAP to MyISAM | 1.928813 |
| Sending data              | 0.157253 |
| end                       | 0.000020 |
| removing tmp table        | 0.002778 |
| end                       | 0.000009 |
| query end                 | 0.000012 |
| closing tables            | 0.000016 |
| freeing items             | 0.000031 |
| logging slow query        | 0.000062 |
| cleaning up               | 0.000033 |
+---------------------------+----------+

发现distinct和没有排序的group by几乎是一样的，并且在进行多列的去重的时候也使用了 converting HEAP to MyISAM进行汇总

总结：

create sort index　使用内存临时表进行分块排序，分块排序后再进入磁盘进行汇总排序

converting HEAP to MyISAM 是进入硬盘进行汇总排序，如果group by数据列过多，即使不排序，也需要使用磁盘临时表进行汇总数据。

group by的主要消耗是在临时表排序阶段，而不是分组阶段。

所以制约group by性能的问题，就是临时表+排序，尽量减少磁盘排序，较少磁盘临时表的创建，是比较有用的处理办法。

最好的办法就是在group by条件后，添加索引或者复合索引，这样MySQL就会利用索引完成排序，分组

来自 https://blog.csdn.net/u013983450/article/details/52190699

mysql distinct 去重、group by 用法解析(详细)

2016年10月23日 21:29:45 追梦的小猴子阅读数：31112

mysql distinct 去重

在使用mysql时，有时需要查询出某个字段不重复的记录，虽然mysql提供有distinct这个关键字来过滤掉多余的重复记录只保留一条，但往往只用它来返回不重复记录的条数，而不是用它来返回不重记录的所有值。其原因是 distinct只能返回它的目标字段，而无法返回其它字段，这个问题让我困扰了很久，用distinct不能解决的话，我只有用二重循环查询来解决，而这样对于一个数据量非常大的站来说，无疑是会直接影响到效率的。所以我花了很多时间来研究这个问题，网上也查不到解决方案

下面先来看看例子：

     table
   id name
   1 a
   2 b
   3 c
   4 c
   5 b

库结构大概这样，这只是一个简单的例子，实际情况会复杂得多。

比如我想用一条语句查询得到name不重复的所有数据，那就必须使用distinct去掉多余的重复记录。

select distinct name from table
得到的结果是:

   name
   a
   b
   c

好像达到效果了，可是，我想要得到的是id值呢？改一下查询语句吧:

select distinct name, id from table

结果会是:

   id name
   1 a
   2 b
   3 c
   4 c
   5 b

distinct怎么没起作用？作用是起了的，不过他同时作用了两个字段，也就是必须得id与name都相同的才会被排除。。。。。。。

我们再改改查询语句:

select id, distinct name from table

很遗憾，除了错误信息你什么也得不到，distinct必须放在开头。难到不能把distinct放到where条件里？能，照样报错。。。。。。。

很麻烦吧？确实，费尽心思都没能解决这个问题。

用count函数一试，成功，我。。。。。。。想哭啊，费了这么多工夫。。。。。。。。原来就这么简单。。。。。。

现在将完整语句放出:

select *, count(distinct name) from table group by name

结果:

   id name count(distinct name)
   1 a 1
   2 b 1
   3 c 1

最后一项是多余的，不用管就行了，目的达到。。。。。

唉，原来mysql这么笨，轻轻一下就把他骗过去了，郁闷也就我吧（对了，还有容容那家伙），现在拿出来希望大家不要被这问题折腾。

哦，对，再顺便说一句，group by

group by 用法解析

group by语法可以根据给定数据列的每个成员对查询结果进行分组统计，最终得到一个分组汇总表。
SELECT子句中的列名必须为分组列或列函数。列函数对于GROUP BY子句定义的每个组各返回一个结果。
某个员工信息表结构和数据如下：
id name dept salary edlevel hiredate
      1 张三开发部 2000 3 2009-10-11
      2 李四开发部 2500 3 2009-10-01
      3 王五设计部 2600 5 2010-10-02
      4 王六设计部 2300 4 2010-10-03
      5 马七设计部 2100 4 2010-10-06
      6 赵八销售部 3000 5 2010-10-05
      7 钱九销售部 3100 7 2010-10-07
      8 孙十销售部 3500 7 2010-10-06
例如，我想列出每个部门最高薪水的结果，sql语句如下：
SELECT DEPT, MAX(SALARY) AS MAXIMUM
FROM STAFF
GROUP BY DEPT
查询结果如下：
      DEPT MAXIMUM
      开发部 2500
      设计部 2600
      销售部 3500
解释一下这个结果：
1、满足“SELECT子句中的列名必须为分组列或列函数”，因为SELECT有GROUP BY DEPT中包含的列DEPT。
2、“列函数对于GROUP BY子句定义的每个组各返回一个结果”，根据部门分组，对每个部门返回一个结果，就是每个部门的最高薪水。
注意：计算的是每个部门（由 GROUP BY 子句定义的组）而不是整个公司的 MAX(SALARY)。
例如，查询每个部门的总的薪水数
SELECT DEPT, sum( SALARY ) AS total
FROM STAFF
GROUP BY DEPT
查询结果如下：
DEPT total
开发部 4500
设计部 7000
销售部 9600
将 WHERE 子句与 GROUP BY 子句一起使用
分组查询可以在形成组和计算列函数之前具有消除非限定行的标准 WHERE 子句。必须在GROUP BY 子句之前指定 WHERE 子句。
例如，查询公司2010年入职的各个部门每个级别里的最高薪水
SELECT DEPT, EDLEVEL, MAX( SALARY ) AS MAXIMUM
FROM staff
WHERE HIREDATE > '2010-01-01'
GROUP BY DEPT, EDLEVEL
ORDER BY DEPT, EDLEVEL
查询结果如下：
DEPT EDLEVEL MAXIMUM
      设计部 4 2300
      设计部 5 2600
      销售部 5 3000
      销售部 7 3500
注意：在SELECT语句中指定的每个列名也在GROUP BY子句中提到。未在这两个地方提到的列名将产生错误。
GROUP BY子句对DEPT和EDLEVEL的每个唯一组合各返回一行。
在GROUP BY子句之后使用HAVING子句
可应用限定条件进行分组，以便系统仅对满足条件的组返回结果。为此，在GROUP BY子句后面包含一个HAVING子句。HAVING子句可包含一个或多个用AND和OR连接的谓词。每个谓词将组特性（如AVG(SALARY)）与下列之一进行比较：
例如：寻找雇员数超过2个的部门的最高和最低薪水：
SELECT DEPT, MAX( SALARY ) AS MAXIMUM, MIN( SALARY ) AS MINIMUM
FROM staff
GROUP BY DEPT
HAVING COUNT( * ) >2
ORDER BY DEPT
查询结果如下：
DEPT MAXIMUM MINIMUM
      设计部 2600 2100
      销售部 3500 3000
例如：寻找雇员平均工资大于3000的部门的最高和最低薪水：
SELECT DEPT, MAX( SALARY ) AS MAXIMUM, MIN( SALARY ) AS MINIMUM
FROM staff
GROUP BY DEPT
HAVING AVG( SALARY ) >3000
ORDER BY DEPT
查询结果如下：
DEPT MAXIMUM MINIMUM

销售部 3500 3000

转自：http://hanxiujun9.blog.163.com/blog/static/9193231201311923227670/

4
2

条评论
收藏
分享
CSDN APP 程序员必备
下一篇
来自 https://blog.csdn.net/zhangqun23/article/details/52904707

普通分类:

mysql

You are here

MySQL中distinct和group by性能比较效率有大用有大大用

mysql：distinct与group by 效率对比

MySQL中distinct和group by性能比较

MySQL中distinct和group by性能比较[转]

group by与distinct效率分析及优化措施

mysql distinct 去重、group by 用法解析(详细)

mysql distinct 去重

group by 用法解析

友情链接

搜索表单

用户登录

You are here

MySQL中distinct和group by性能比较 效率 有大用 有大大用

mysql：distinct与group by 效率对比

MySQL中distinct和group by性能比较

MySQL中distinct和group by性能比较[转]

group by与distinct效率分析及优化措施

mysql distinct 去重、group by 用法解析(详细)

mysql distinct 去重

group by 用法解析

友情链接

MySQL中distinct和group by性能比较效率有大用有大大用