马哥 48_03 _CPU负载观察及调优方法有大用

星期二, 2021-07-13 07:50 — adminshiping1

内核的调度类别

三个调度器分别用来调度不同优先级的进程,不同类的进程

实时进程:调度器有两个

SCHED_FIFO: scheduler first in first out,调度先进先出,先运行完了,其它进程才能运行,调度方法很粗糙

SCHED_RR: scheduler Round Bobin 调度是轮调的,每个进程有时间片,就算是实时的?????,优先级一样,时间片运行完了,就换同级别的下一个,

SCHED_OTHER: (linux中)scheduler other 调度其它的,调度用户空间(100-139之间的)进程的,,,, 在unix (SCHED_NORMAL),,,, 未必调度的都是用户空间的线程,用来调度100-139之间的优先级的进程,,按优先级进行调度

SCHED_BATCH #红帽6上 #调整批处理进程类别????

SCHED_IDLE #红帽6上 #调整空闲进程类别????

如下图

linux支持进程抢占的,tick(嘀嗒):时间中断时才能抢,内核工作在一定的频率下(100Hz,256Hz,1000Hz等)(赫兹越高,时钟频率精度越高,解析度越高,计时的精度也会越高,)(频率越高,进程抢占的机会越多,进程切换的可能性也会越多,,如果cpu足够快,,,,嘀嗒快无所谓,但是经过验证发现,嘀嗒快了,性能未必好,太快的嘀嗒会导致切换的太快,导致运行了一刻,就又下来了)(红帽内核,早期100Hz,后来1000Hz,现在又回到100Hz)

红帽6.4实现了无嘀嗒的系统,,,,如果有嘀嗒的话,如果100Hz的话,则有100次中断,(空闲时空消耗资源,cpu空转),,,

无嘀嗒的话,cpu可以深度睡眠,,要依赖于中断进行驱动了,,,系统可节约电源,还可以不致于空载而发热量过高等优点

tick less,

interrupt-driven

分为硬中断 hardirq

软中断 softirq (系统调用的时候,需要从用户模式切换到内核模式,可以理解为软中断)

如下图

cpu现在一般多核,每个核都有一级缓存,二级缓存,三级缓存

I1: instructions 1 ,一级指令缓存

D1: Data 1,一级数据缓存

L2: 二级缓存,level 水平等级的意思,,,,有时也叫D2吧,Data 2的意思,应该二级缓存只有数据缓存,没有指令的缓存吧

一级缓存,二级缓存是cpu核独有的,三级缓存是cpu核共享的

SMP: 对称多处理"（Symmetrical Multi-Processing）,,,,多cpu,不一定多核心,

如下图

一块主板四个cpu插槽,(每一个插槽称为socket),如果每颗插槽上插一个四核的cpu,,,四个cpu访问的是同一个物理内存,会产生资源竞争,有临界区???? 我们要解决仲裁机制,,,,一个cpu访问内存的时候,时钟周期有三个,(第一,向内存控制传递一个寻址要求,由内存控制器向cpu返回寻址指令,第二,cpu找到存储单元,cpu访问内存,访问时需要施加锁,即施加一定的请求机制,对内存的访问也有读写两种模式,第三,cpu完成读或写的操作,,,,,,,,,,,,,当第一个cpu与内存控制器打交道时,第二个cpu就不能了,,因为内存控制器是一个临界区?????

在对称多处理器模型当中,因为内存节点(把一个内存称为一个节点)只有一个,使得性能的提升有限,一般随着cpu颗数的增多,性能成抛物线的趋势,,,为什么cpu颗数增太多时,性能会下降,是因为内存资源的争用,当然也可能是内存总线的速度受影响

如下图,

双核时(多核时),三级缓存也是资源争用区,但三级缓存比内存快得多,,,,它在同一个socket上,

如下图

如果两颗cpu,每颗cpu都有自己独有的内存访问区域,第一个cpu中含有cpu0,cpu1,第二个cpu中含有cpu2,cpu3,,,,, cpu0,cpu1对三级缓存的访问速度肯定比对内存快很多,同时又因为速度快,所以cpu0,cpu1因为对三级缓存的争用比较小,

如下图

假设单核cpu,不考虑三级缓存了,第一个cpu(cpu0)有自己单独的内存控制器和专用的内存,,,,第二个cpu(cpu1)有自己单独的内存控制器和专用的内存,,,,,, 因为内存是系统级别的,所以内核装载数据的时候,内存任何一处都可以装载,,,第一个cpu(cpu0)访问的数据,可能在一级缓存或二级缓存或自己这边的内存,也可能在另外一个内存当中,

如下图

有了两个cpu,进程调度就不是一个队列了,每颗cpu都有自己的队列,队列会不断的被内核所平衡,,,,假设共200个进程,第一个cpu(cpu0)运行了100个,第二个cpu(cpu1)也运行了100个,,,,过一段时间,第一个cpu(cpu0)上90个进程运行完了,,,,第二个cpu(cpu1)上只有10个进程运行完了,,,,,,,,,,,,,此时第一个cpu(cpu0)上还剩10个进程,第二个cpu(cpu1)上还剩90个进程,,,,为了防止有的cpu忙,有的cpu闲,,我们内核此时会重新rebalance,rebalancing,,,,比如此时从第二个cpu(cpu1)上划分出40个给第一个cpu(cpu0),,,,,那么此时第一个cpu(cpu0)上还剩50个进程,第二个cpu(cpu0)上还剩50个进程,,,,,划分给第一个cpu(cpu0)的40个进程在右边的1段内存段中,所以此时第一个cpu(cpu0)要到右边的1段内存段中去找数据,,即访问另外一个cpu的专用内存

cpu自己专用内存,并不是别人不能用,而是说这是自己这个cpu的主要访问区域,对其它的cpu来说是次要访问区域,

如下图,

第一个cpu(cpu0)访问自己的内存控制器,比到对方的内存控制器距离短,速度快,,,

每一个cpu都有有自己的内存控制器,我们称为非一致性内存访问NUMA

NUMA: Non-uniform Memory Architecture 非一致性内存架构;非一致性内存架构(Non-uniform Memory Architecture)是为了解决传统的对称多处理(SMP Symmetric Multi-processor)系统中的可扩展性问题而诞生的。 Non-uniform Memory Access (非一致性内存访问)

SMP内存是共享的,

如下图,NUMA下每一个cpu都有自己的内存,而且每个内存都有自己的内存控制器,内存控制器很可能在cpu内部,而且内存离当前cpu非常近,所以内存与socket的关联性非常大,而且cpu到自己内存的总线速度非常高,,,cpu假如访问对方的内存控制器,这中间要跨越cpu插槽,此时要经过一个外部的总线才能过去,,,,cpu到对方的内存控制器,与到自己的内存控制器,时间相差很多,,,一个cpu访问自己的内存,只需要3个时钟周期,而访问对方的内存,至少需要6个时间周期(cpu先到自己的内存控制器,自己的内存控制器再去请求对方的内存控制器)

第一个cpu(cpu0)访问自己的内存要三步,3个时钟周期,

第一个cpu(cpu0)访问别人的内存要四步,其中1a这一步就要3个时钟周期,共6个时钟周期

我们可以把左边的cpu+内存控制器+内存称为一个node节点,,,右边也称为一个节点

左边 cpu0-3,是四核的cpu(cpu0,cpu1,cpu2,cpu3)(如果是超线程的话,显示为8个核心的)

NUMA结构上,尽可能让cpu只访问自己的内存,,,因为内核要尽可能的平衡进程,所以进程常常在两个cpu之间转换,,所以就会发生cpu进行交叉内存访问了,,所以性能下降在所难免了,,所以我们要禁止内核给它重新平衡

对于很繁忙的进程,我们使用cpu_affinity,即cpu绑定(cpu的密切关系,姻亲关系),将某些经常运行的批处理进程或服务进程,启动起来以后,直接绑定在某颗cpu上(或cpu上的某颗核心上),,,,,所以它只能在这颗cpu上运行,从而它再也不会被调用到其它cpu上面去,从而不会交叉内存访问了

有时平衡cpu的访问仍有必要,因为不平衡的一颗cpu忙,另一颗cpu闲,那么性能仍然是下降的,

在NUMA结构上,内存本身的命中次数非常少的时候,要不要平衡cpu???要不要绑定????

[root@localhost ~]# numa #按tab键 , 相关命令四个

numactl numad numademo numastat

[root@localhost ~]#

numactl 控制命令

numastat 显示状态命令

numad 服务进程

numademo 演示示例

[root@localhost ~]# numastat #好像 32位不行,必须要64的才位,反正 64位的红帽6 是行的

sysfs not mounted or system not NUMA aware: No such file or directory

[root@localhost ~]#

装一下 64位的红帽6,下面的操作就是在64位的红帽6下进行的

[root@localhost ~]# numastat #这是SMP

node0 #如果有多个node,支持非对称性(NUMA)的转换????,这里可能有n个node???????

numa_hit 1327673 #cpu命中内存的数据的个数

numa_miss 0 #NUMA的情况下,cpu未命中内存的数据的个数什么时候需要绑定进程到cpu???numa_miss量过高的时候,就需要绑定了????numa_miss有很高的值,表示在本地找数据,总是找不着的时候, 可能cpu重新平衡进程的次数太多了,此时就需要将某个进程(要观察一下,看哪个进程是服务进程,而且经常会被重新平衡的)绑定在某个特定的cpu上,(如果是个nginx服务器的话,将nginx的某些进程直接跟cpu关联,从而使得本地的命中率很高)

numa_foreign 0

interleave_hit 21013

local_node 1327673

other_node 0

[root@localhost ~]#

[root@localhost ~]# man numastat

Cannot open the message catalog "man" for locale "zh_CN.UTF-8"

(NLSPATH="/usr/share/locale/%l/LC_MESSAGES/%N")

Formatting page, please wait...

numastat(8) Administration numastat(8)

numastat

numastat - Show per-NUMA-node memory statistics for processes and the

operating system

SYNTAX

numastat

numastat [-V]

numastat [<PID>|<pattern>...]

numastat [-c] [-m] [-n] [-p <PID>|<pattern>] [-s[<node>]] [-v] [-z]

[<PID>|<pattern>...]

DESCRIPTION

numastat with no command options or arguments at all, displays per-node

NUMA hit and miss system statistics from the kernel memory allocator.

This default numastat behavior is strictly compatible with the previous

long-standing numastat perl script, written by Andi Kleen. The default

numastat statistics shows per-node numbers (in units of pages of mem-

ory) in these categories:

numa_hit is memory successfully allocated on this node as intended. #命中率

numa_miss is memory allocated on this node despite the process prefer- #未命中率

ring some different node. Each numa_miss has a numa_foreign on another

node.

numa_foreign is memory intended for this node, but actually allocated

on some different node. Each numa_foreign has a numa_miss on another

node. #本来分配给自己使用的,但是被分配给非本地cpu使用了

interleave_hit is interleaved memory successfully allocated on this

node as intended. #不用关心

local_node is memory allocated on this node while a process was running

on it.

other_node is memory allocated on this node while a process was running

on some other node.

Any supplied options or arguments with the numastat command will sig-

nificantly change both the content and the format of the display.

Specified options will cause display units to change to megabytes of

memory, and will change other specific behaviors of numastat as

described below.

OPTIONS

-c Minimize table display width by dynamically shrinking column

widths based on data contents. With this option, amounts of

memory will be rounded to the nearest megabyte (rather than the

usual display with two decimal places). Column width and inter-

column spacing will be somewhat unpredictable with this option,

but the more dense display will be very useful on systems with

many NUMA nodes.

-m Show the meminfo-like system-wide memory usage information.

This option produces a per-node breakdown of memory usage infor-

mation similar to that found in /proc/meminfo.

-n Show the original numastat statistics info. This will show the

same information as the default numastat behavior but the units

will be megabytes of memory, and there will be other formatting

and layout changes versus the original numastat behavior.

-p <PID> or <pattern>#查看每一个特定进程的内存分配,如果某一个进程的内存分配跨越了多个node,意味着我们应该进行绑定

Show per-node memory allocation information for the specified

PID or pattern. If the -p argument is only digits, it is

assumed to be a numerical PID. If the argument characters are

not only digits, it is assumed to be a text fragment pattern to

search for in process command lines. For example, numastat -p

qemu will attempt to find and show information for processes

with "qemu" in the command line. Any command line arguments

remaining after numastat option flag processing is completed,

are assumed to be additional <PID> or <pattern> process speci-

fiers. In this sense, the -p option flag is optional: numastat

qemu is equivalent to numastat -p qemu

-s[<node>] #查看 show 某一个node的,有排序的作用

Sort the table data in descending order before displaying it, so

the biggest memory consumers are listed first. With no speci-

fied <node>, the table will be sorted by the total column. If

the optional <node> argument is supplied, the data will be

sorted by the <node> column. Note that <node> must follow the

-s immediately with no intermediate white space (e.g., numastat

-s2). Because -s can allow an optional argument, it must always

be the last option character in a compound option character

string. For example, instead of numastat -msc (which probably

will not work as you expect), use numastat -mcs

-v Make some reports more verbose. In particular, process informa-

tion for multiple processes will display detailed information

for each process. Normally when per-node information for multi-

ple processes is displayed, only the total lines are shown.

-V Display numastat version information and exit.

-z Skip display of table rows and columns of only zero valuess.

This can be used to greatly reduce the amount of uninteresting

zero data on systems with many NUMA nodes. Note that when rows

or columns of zeros are still displayed with this option, that

probably means there is at least one value in the row or column

that is actually non-zero, but rounded to zero for display.

[root@localhost ~]# numastat -s #显示所有node

Per-node numastat info (in MBs):

Node 0 Total

--------------- ---------------

Numa_Hit 5229.84 5229.84

Local_Node 5229.84 5229.84

Interleave_Hit 82.08 82.08

Numa_Foreign 0.00 0.00

Numa_Miss 0.00 0.00

Other_Node 0.00 0.00

[root@localhost ~]#

[root@localhost ~]# numastat -s node0 #只显示node0和Total

Found no processes containing pattern: "node0"

Per-node numastat info (in MBs):

Node 0 Total

--------------- ---------------

Numa_Hit 5232.39 5232.39

Local_Node 5232.39 5232.39

Interleave_Hit 82.08 82.08

Numa_Foreign 0.00 0.00

Numa_Miss 0.00 0.00

Other_Node 0.00 0.00

[root@localhost ~]#

[root@localhost ~]# man numactl

Cannot open the message catalog "man" for locale "zh_CN.UTF-8"

(NLSPATH="/usr/share/locale/%l/LC_MESSAGES/%N")

Formatting page, please wait...

NUMACTL(8) Linux Administrator’s Manual NUMACTL(8)

NAME

numactl - Control NUMA policy for processes or shared memory #实现numa策略控制的

SYNOPSIS

numactl [ --all ] [ --interleave nodes ] [ --preferred node ] [ --mem-

bind nodes ] [ --cpunodebind nodes ] [ --physcpubind cpus ] [

--localalloc ] [--] command {arguments ...}

numactl --show

numactl --hardware

numactl [ --huge ] [ --offset offset ] [ --shmmode shmmode ] [ --length

length ] [ --strict ]

[ --shmid id ] --shm shmkeyfile | --file tmpfsfile

[ --touch ] [ --dump ] [ --dump-nodes ] memory policy

DESCRIPTION

numactl runs processes with a specific NUMA scheduling or memory place-

ment policy. The policy is set for command and inherited by all of its

children. In addition it can set persistent policy for shared memory

segments or files.

Use -- before command if using command options that could be confused

with numactl options.

nodes may be specified as N,N,N or N-N or N,N-N or N-N,N-N and so

forth. Relative nodes may be specifed as +N,N,N or +N-N or +N,N-N and

so forth. The + indicates that the node numbers are relative to the

process’ set of allowed nodes in its current cpuset. A !N-N notation

indicates the inverse of N-N, in other words all nodes except N-N. If

used with + notation, specify !+N-N. When same is specified the previ-

ous nodemask specified on the command line is used. all means all

nodes in the current cpuset.

Instead of a number a node can also be:

netdev:DEV The node connected to network device DEV.

file:PATH The node the block device of PATH.

ip:HOST The node of the network device of HOST

block:PATH The node of block device PATH

pci:[seg:]bus:dev[:func] The node of a PCI device.

Note that block resolves the kernel block device names only for udev

names in /dev use file:

Policy settings are:

--all, -a

Unset default cpuset awareness, so user can use all possible

CPUs/nodes for following policy settings.

--interleave=nodes, -i nodes

Set a memory interleave policy. Memory will be allocated using

round robin on nodes. When memory cannot be allocated on the

current interleave target fall back to other nodes. Multiple

nodes may be specified on --interleave, --membind and --cpunode-

bind.

--membind=nodes, -m nodes

Only allocate memory from nodes. Allocation will fail when

there is not enough memory available on these nodes. nodes may

be specified as noted above.

--cpunodebind=nodes, -N nodes # cpu node bind,只将命令运行在cpu自己所属的node上,,即将cpu与node完成了绑定,不让cpu访问其它node了,,,在某个进程运行的时候,让某颗cpu必须要访问自己的node,,,不访问其它node有什么缺陷???假如说cpu找数据,自己内存中没有,不找别人的内存了,就直接从硬盘加载了,,,如果其它内存中有数据,也用不上了,,,长久以后,就保证了,每一个进程的数据就只在当前cpu node所在的那个内存里了

Only execute command on the CPUs of nodes. Note that nodes may

consist of multiple CPUs. nodes may be specified as noted

above.

--physcpubind=cpus, -C cpus #Physicals cpu bind,物理的cpu绑定,,就是将进程与cpu绑定,

Only execute process on cpus. This accepts cpu numbers as shown

in the processor fields of /proc/cpuinfo, or relative cpus as in

relative to the current cpuset. You may specify "all", which

means all cpus in the current cpuset. Physical cpus may be

specified as N,N,N or N-N or N,N-N or N-N,N-N and so forth.

Relative cpus may be specifed as +N,N,N or +N-N or +N,N-N and

so forth. The + indicates that the cpu numbers are relative to

the process’ set of allowed cpus in its current cpuset. A !N-N

notation indicates the inverse of N-N, in other words all cpus

except N-N. If used with + notation, specify !+N-N.

--localalloc, -l

Falls back to the system default which is local allocation by

using MPOL_DEFAULT policy. See mbind(2) for details.

--preferred=node

Preferably allocate memory on node, but if memory cannot be

allocated there fall back to other nodes. This option takes

only a single node number. Relative notation may be used.

--show, -s #也可以显示当前进程运行的设定

Show NUMA policy settings of the current process.

--hardware, -H

Show inventory of available nodes on the system.

Numactl can set up policy for a SYSV shared memory segment or a file in

shmfs/hugetlbfs.

This policy is persistent and will be used by all mappings from that

shared memory. The order of options matters here. The specification

must at least include either of --shm, --shmid, --file to specify the

shared memory segment or file and a memory policy like described above

( --interleave, --localalloc, --preferred, --membind ).

--huge

When creating a SYSV shared memory segment use huge pages. Only valid

before --shmid or --shm

--offset

Specify offset into the shared memory segment. Default 0. Valid units

are m (for MB), g (for GB), k (for KB), otherwise it specifies bytes.

--strict

Give an error when a page in the policied area in the shared memory

segment already was faulted in with a conflicting policy. Default is to

silently ignore this.

--shmmode shmmode

Only valid before --shmid or --shm When creating a shared memory seg-

ment set it to numeric mode shmmode.

--length length

Apply policy to length range in the shared memory segment or make the

segment length long Default is to use the remaining length Required

when a shared memory segment is created and specifies the length of the

new segment then. Valid units are m (for MB), g (for GB), k (for KB),

otherwise it specifies bytes.

--shmid id

Create or use an shared memory segment with numeric ID id

--shm shmkeyfile

Create or use an shared memory segment, with the ID generated using

ftok(3) from shmkeyfile

--file tmpfsfile

Set policy for a file in tmpfs or hugetlbfs

--touch

Touch pages to enforce policy early. Default is to not touch them, the

policy is applied when an applications maps and accesses a page.

--dump

Dump policy in the specified range.

--dump-nodes

Dump all nodes of the specific range (very verbose!)

Valid node specifiers

all All nodes

number Node number

number1{,number2} Node number1 and Node number2

number1-number2 Nodes from number1 to number2

! nodes Invert selection of the following specification.

[root@localhost ~]# numactl --show #也能够显示策略

policy: default

preferred node: current

physcpubind: 0 1 2 3 4 5 6 7 # 所有的进程绑定到这八个cpu核心上,,其实也就是没有绑定吧

cpubind: 0 #cpu没绑定,cpu绑定在0 节点上,啥意思????

nodebind: 0 #node绑定在0 节点上

membind: 0 #mem绑定在0 节点上

[root@localhost ~]#

[root@localhost ~]# cat /proc/cpuinfo | grep processor #八核

processor : 0

processor : 1

processor : 2

processor : 3

processor : 4

processor : 5

processor : 6

processor : 7

[root@localhost ~]#

[root@localhost ~]# cat /proc/cpuinfo | grep "physical id" #两颗cpu

physical id : 0

physical id : 1

[root@localhost ~]#

numa调整完后,重启,就失效了, 因为是用命令强行将进程绑定的,

numad 是服务,这样重启电脑时,重启这个服务,就能长久的起作用了

[root@localhost ~]# man numad

Cannot open the message catalog "man" for locale "zh_CN.UTF-8"

(NLSPATH="/usr/share/locale/%l/LC_MESSAGES/%N")

Formatting page, please wait...

numad(8) Administration numad(8)

NAME

numad - A user-level daemon that provides placement advice and process

management for efficient use of CPUs and memory on systems with NUMA

topology. #用户级别的守护进程能够提供自我启发式的策略,能够通过观察cpu上的每一个进程的运行状况,自动的将某个进程给它关联到某cpu上,将某个cpu关联到某个node上;;;;;自我监控,并完成自我优化和管理,,,,,,如果是numa架构的话,可以启动 numad 服务,,,这个进程本身也可以在启动的时候限定只让它监控某些进程,而不是所有进程,,,,并且在必要的时候,还可以将它停下来,,,它可以指定多长时间观察一次,最长多久观察一次,最短多久观察一次,,,,观察级别,调整级别,,,,,,,有人做过测试,在一个非常繁忙的服务器中,假如说需要重新平衡的场景当中,启动numad这个进程,可以让性能提高50%左右,前提是 # numastat 命令检查一下,看miss量是不是很高,,, 将来很可能要用到

SYNOPSIS

numad [-dhvV]

numad [-C 0|1]

numad [-H THP_hugepage_scan_sleep_ms]

numad [-i [min_interval:]max_interval]

numad [-K 0|1]

numad [-l log_level]

numad [-m target_memory_locality]

numad [-p PID]

三个命令

numastat

numactl

numad #可以使用 service numad start 等,,,可以使用# chkconfig on 这些命令的

numad只是在硬件级别将某个进程(某些进程)跟我们的cpu和node绑定的, 真想建立cpu affinity(cpu姻亲关系),得实现将进程跟cpu绑定,,,,,,我们有专门的工具,就算是非numa架构的,我们也可以实现专门将某个进程跟cpu进行绑定,也有专门的命令,叫taskset

taskset:主要功能就是绑定进程至某cpu,

它以mask掩码的方式来引用cpu,用16进制来表示

0x0000 0001 换成二进制 0001 表示0号 cpu 对应的上面有1,表示有那颗cpu

0x0000 0003 换成二进制 0011 表示0号 cpu 和 1号 cpu 对应的上面有1,表示有那两颗cpu

0x0000 0005 换成二进制 0101 表示0号 cpu 和 2号 cpu 对应的上面有1,表示有那两颗cpu

0x0000 0007 换成二进制 0111 表示0号 cpu 和 1号 cpu 和 2号 cpu (即0-2号)对应的上面有1,表示有那三颗cpu

# taskset -p mask pid #绑定某个进程到某个cpu

# taskset -p 0x00000004 101 #将101号进程绑定在3号cpu ( 0x00000004的二进制是 0100 即3号)上

# taskset -p 0x00000003 101 #将101号进程绑定在0号和1号 cpu ( 0x00000003的二进制是 0011 即0号和1号)上

# taskset -p -c 3 101 #将101号进程绑定在3号cpu,不需要做掩码转换了

# taskset -p -c 0,1 101 #将101号进程绑定在0号和1号 cpu

# taskset -p -c 0-2 101 #将101号进程绑定在0号,1号,2号 cpu

# taskset -p -c 0-2,7 101 #将101号进程绑定在0号,1号,2号,7号 cpu

服务器很少关机,所以手动绑一下没关系,,但是重启后要重新绑了,,,这些命令可以写成脚本定义好,但是下次重新启动时,进程号会不一样的,,,,,,所以nginx直接在配置文件中绑定 nginx的 work affinity 明确说明每一个work分别绑定在哪一号cpu上,,此时的cpu的表示方式与taskset的0x是类似的

[root@localhost ~]# man taskset

Cannot open the message catalog "man" for locale "zh_CN.UTF-8"

(NLSPATH="/usr/share/locale/%l/LC_MESSAGES/%N")

Formatting page, please wait...

TASKSET(1) Linux User’s Manual TASKSET(1)

NAME

taskset - retrieve or set a process’s CPU affinity

SYNOPSIS

taskset [options] mask command [arg]...

taskset [options] -p [mask] pid

DESCRIPTION

taskset is used to set or retrieve the CPU affinity of a running pro-

cess given its PID or to launch a new COMMAND with a given CPU affin-

ity. CPU affinity is a scheduler property that "bonds" a process to a

given set of CPUs on the system. The Linux scheduler will honor the

given CPU affinity and the process will not run on any other CPUs.

Note that the Linux scheduler also supports natural CPU affinity: the

scheduler attempts to keep processes on the same CPU as long as practi-

cal for performance reasons. Therefore, forcing a specific CPU affin-

ity is useful only in certain applications.

The CPU affinity is represented as a bitmask, with the lowest order bit

corresponding to the first logical CPU and the highest order bit corre-

sponding to the last logical CPU. Not all CPUs may exist on a given

system but a mask may specify more CPUs than are present. A retrieved

mask will reflect only the bits that correspond to CPUs physically on

the system. If an invalid mask is given (i.e., one that corresponds to

no valid CPUs on the current system) an error is returned. The masks

are typically given in hexadecimal. For example,

0x00000001 # 这是16进制这是cpu0

is processor #0

0x00000003 # 这是16进制.转为二进制 00000011

is processors #0 and #1

0xFFFFFFFF

is all processors (#0 through #31)

When taskset returns, it is guaranteed that the given program has been

scheduled to a legal CPU.

OPTIONS

-p, --pid

operate on an existing PID and not launch a new task

-c, --cpu-list #明确指定第几号cpu

specify a numerical list of processors instead of a bitmask.

The list may contain multiple items, separated by comma, and

ranges. For example, 0,5,7,9-11.

-h, --help

display usage information and exit

-V, --version

output version information and exit

如下图

就算使用了taskset 的方式绑定了某进程到某号cpu上,但是cpu还运行其它进程,其它进程有可能会被调到这个cpu上的,仍然有可能切换,,,,,假设当前主机有16核,留下两核供系统所有,剩下14核心专门用来运行某些进程,再也不切换了,,,只不过我们要把这些cpu核心从所有进程中隔离出来,,,,,由此启动电脑的时候,在/etc/grub.conf里面传递一个参数, isolcpus ( isolate cpus 隔离cpu),将这些cpu从操作系统中隔离出来,意味着系统启动起来以后,内核不会让已经启动的进程使用这些cpu的,而后使用taskset命令 pin 某个(某些)任务,某个(某些)任务钉在某个cpu上去了,,,这仍然不能保证,cpu只服务于这个进程,,,因为服务器还得服务于中断的(进程正在这个cpu上运行着,突然间其它硬件发来一个中断,意味着cpu必须要停下来转换为内核模式处理中断了),,,我们还可以为将这个cpu上的中断处理程序通通移走,隔离中断,不再处理任何中断了,,,从此以后这个cpu要么运行这个进程,要么运行内核?????....甚至隔离出来之后,内核也不在这个cpu上运行了,从此以后就直接绑定这一个进程了,再也不切换了,,,,当它需要跟内核进行交互的时候,内核需要调度到其它cpu上去执行,但是一般来讲,内核还是可以在当前cpu上执行的,,,,,,,,因为在同一颗cpu上,我们应该完成模式切换

如下图

我们来看如何实现完成smp的affinity,或者说完成将某个cpu的mask跟某个进程(某个中断???)进行绑定

<irq_num>中断号,中断线

[root@localhost ~]# cat /proc/irq/ #按tab键

0/ 29/ 45/

1/ 3/ 46/

10/ 30/ 47/

11/ 31/ 48/

12/ 32/ 49/

13/ 33/ 5/

14/ 34/ 50/

15/ 35/ 51/

16/ 36/ 52/

17/ 37/ 53/

18/ 38/ 54/

19/ 39/ 55/

2/ 4/ 6/

24/ 40/ 7/

25/ 41/ 8/

26/ 42/ 9/

27/ 43/ default_smp_affinity

28/ 44/

[root@localhost ~]# cat /proc/irq/0/smp_affinity #一大堆f,表示0号中断线可以运行在所有cpu上,,ffffffff表示任意cpu,我们可以将这个中断线绑定到某个cpu上,

ffffffff

下面马哥的,比我多出 ,ffffffff

[root@localhost ~]#

如下图,

我们一共16核,我们期望我们的系统只运行在0核和1核上,剩下的14核(从2-15)都要隔离出来了,我们将中断只绑定在0和1上,2-15不处理中断了,,,,,echo cpu_mask(cpu的掩码比如 0x000000001 (0001??),0x000000010(0002)),,,,因此我们将第0个(及第1,2,3个)中断绑定到0号cpu,将第4个(及第5,6,7个)中断绑定到1号cpu,,,,这样一来,将所有的中断就绑定到这两个特定的cpu上,第2-15个cpu就不用处理中断了,,,,所以从此就实现了,将某一个(某些个)cpu从中断处理中隔离了出来,,,,,,这种绑定,是把它们绑定在我们不打算隔离出来的cpu上,

应该将中断绑定至那些非隔离的CPU上,从而避免那些隔离的CPU处理中断程序,

<irq number> 就是某一个中断号码

echo CPU_MASK > /proc/irq/<irq number>/smp_affinity

什么时候需要绑定中断,什么时候需要将进程绑定在cpu上,numa场景当中,命中率很低的时候,需要绑定,,,,非numa场景当中,

如果不绑定,进程需要在各个cpu之间来回进行切换,切换率过高,,,非常重要的服务老是被切换出去,会导致用户响应速度慢,,,,,,cpu核心数非常多,而某一个服务非常繁忙,我们期望它特定的在某颗cpu上始终处于运行状态,就需要绑定了,,,,,,,,,,,,,,,怎么知道哪个进程非常繁忙,哪个进程上下文切换次数很多,看看下图所说的某些命令吧

如下图

查看cpu活动状态的命令,主要看task_runnable 和 task_uninterruptable的进程,

Load average: 查看cpu的平均利用率的

sar -q: 也能查看cpu的平均利用率的

top: 也可以查看cpu的使用率的

uptime

vmstat 1 5

[root@localhost ~]# rpm -qf `which sar` #看看sar是由谁安装的

sysstat-9.0.4-33.el6_9.1.x86_64 #这个包

[root@localhost ~]#

[root@localhost ~]# rpm -ql sysstat

/etc/cron.d/sysstat

/etc/rc.d/init.d/sysstat

/etc/sysconfig/sysstat

/etc/sysconfig/sysstat.ioconf

/usr/bin/cifsiostat

/usr/bin/iostat #命令会用到

/usr/bin/mpstat #命令会用到

/usr/bin/pidstat #命令会用到

/usr/bin/sadf #命令会用到

/usr/bin/sar #命令会用到

/usr/lib64/sa #命令会用到

/usr/lib64/sa/sa1 #命令会用到,生成文件并分析过去的执行状态的

/usr/lib64/sa/sa2 #命令会用到,生成文件并分析过去的执行状态的

/usr/lib64/sa/sadc #命令会用到

/usr/share/doc/sysstat-9.0.4

/usr/share/doc/sysstat-9.0.4/CHANGES

/usr/share/doc/sysstat-9.0.4/COPYING

/usr/share/doc/sysstat-9.0.4/CREDITS

/usr/share/doc/sysstat-9.0.4/FAQ

/usr/share/doc/sysstat-9.0.4/README

/usr/share/doc/sysstat-9.0.4/TODO

/usr/share/locale/af/LC_MESSAGES/sysstat.mo

/usr/share/locale/da/LC_MESSAGES/sysstat.mo

/usr/share/locale/de/LC_MESSAGES/sysstat.mo

/usr/share/locale/es/LC_MESSAGES/sysstat.mo

/usr/share/locale/fi/LC_MESSAGES/sysstat.mo

/usr/share/locale/fr/LC_MESSAGES/sysstat.mo

/usr/share/locale/id/LC_MESSAGES/sysstat.mo

/usr/share/locale/it/LC_MESSAGES/sysstat.mo

/usr/share/locale/ja/LC_MESSAGES/sysstat.mo

/usr/share/locale/ky/LC_MESSAGES/sysstat.mo

/usr/share/locale/lv/LC_MESSAGES/sysstat.mo

/usr/share/locale/mt/LC_MESSAGES/sysstat.mo

/usr/share/locale/nb/LC_MESSAGES/sysstat.mo

/usr/share/locale/nl/LC_MESSAGES/sysstat.mo

/usr/share/locale/nn/LC_MESSAGES/sysstat.mo

/usr/share/locale/pl/LC_MESSAGES/sysstat.mo

/usr/share/locale/pt/LC_MESSAGES/sysstat.mo

/usr/share/locale/pt_BR/LC_MESSAGES/sysstat.mo

/usr/share/locale/ro/LC_MESSAGES/sysstat.mo

/usr/share/locale/ru/LC_MESSAGES/sysstat.mo

/usr/share/locale/sk/LC_MESSAGES/sysstat.mo

/usr/share/locale/sv/LC_MESSAGES/sysstat.mo

/usr/share/locale/vi/LC_MESSAGES/sysstat.mo

/usr/share/locale/zh_CN/LC_MESSAGES/sysstat.mo

/usr/share/locale/zh_TW/LC_MESSAGES/sysstat.mo

/usr/share/man/man1/cifsiostat.1.gz

/usr/share/man/man1/iostat.1.gz

/usr/share/man/man1/mpstat.1.gz

/usr/share/man/man1/pidstat.1.gz

/usr/share/man/man1/sadf.1.gz

/usr/share/man/man1/sar.1.gz

/usr/share/man/man5/sysstat.5.gz

/usr/share/man/man8/sa1.8.gz

/usr/share/man/man8/sa2.8.gz

/usr/share/man/man8/sadc.8.gz

/var/log/sa

[root@localhost ~]#

[root@localhost ~]# vmstat 1 7 #对当下的cpu采样的, 每隔一秒显示一次,共显示七次

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

2 0 0 1656780 31224 75212 0 0 2 0 4 3 0 0 100 0 0

0 0 0 1656772 31224 75212 0 0 0 0 44 29 0 0 100 0 0

0 0 0 1656772 31224 75212 0 0 0 0 14 16 0 0 100 0 0

0 0 0 1656748 31224 75212 0 0 0 0 21 27 0 0 100 0 0

0 0 0 1656748 31224 75212 0 0 0 0 21 33 0 0 100 0 0

0 0 0 1656756 31224 75212 0 0 0 0 27 27 0 0 100 0 0

0 0 0 1656756 31224 75212 0 0 0 0 13 16 0 0 100 0 0

[root@localhost ~]#

过去一天的cpu使用率如何,怎么分析?

sar能够将过去一天的cpu,或者各种资源使用率的情况定时采样(按照某种频率采样),采样的结果还保存在某个文件里面????,,,可以用sar命令去分析这个文件的,,,,,,,sar本身也支持直接去实现采样的分析

[root@localhost ~]# sar -q #也能显示当前系统的负载情况的

#也有过去的情况,每10分钟一次

Linux 2.6.32-754.el6.x86_64 (localhost.localdomain) 2021年07月15日 _x86_64_ (8 CPU)

00时00分01秒 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15

00时10分01秒 0 316 0.00 0.00 0.00

00时20分01秒 0 316 0.00 0.00 0.00

00时30分01秒 0 316 0.00 0.00 0.00

00时40分01秒 0 316 0.00 0.00 0.00

00时50分01秒 0 316 0.00 0.00 0.00

01时00分01秒 0 316 0.00 0.00 0.00

01时10分01秒 0 316 0.00 0.00 0.00

01时20分01秒 0 316 0.00 0.00 0.00

01时30分01秒 0 316 0.00 0.00 0.00

01时40分01秒 0 316 0.00 0.00 0.00

01时50分01秒 0 316 0.00 0.00 0.00

02时00分01秒 0 316 0.00 0.00 0.00

02时10分01秒 0 317 0.00 0.00 0.00

02时20分01秒 0 317 0.00 0.00 0.00

02时30分01秒 0 317 0.08 0.02 0.01

02时40分01秒 0 317 0.00 0.00 0.00

02时50分01秒 0 317 0.00 0.00 0.00

03时00分01秒 0 317 0.00 0.00 0.00

03时10分01秒 0 317 0.00 0.00 0.00

03时20分01秒 0 316 0.00 0.00 0.00

平均时间: 0 316 0.00 0.00 0.00

16时16分02秒 LINUX RESTART

16时20分01秒 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15

16时30分01秒 0 248 0.00 0.00 0.00

16时40分01秒 0 248 0.00 0.00 0.00

16时50分01秒 0 248 0.00 0.00 0.00

17时00分01秒 0 248 0.00 0.00 0.00

17时10分01秒 0 249 0.00 0.00 0.00

17时20分01秒 0 249 0.00 0.00 0.00

17时30分01秒 0 249 0.00 0.00 0.00

17时40分01秒 0 252 0.00 0.00 0.00

17时50分01秒 0 249 0.00 0.02 0.01

18时00分01秒 0 248 0.00 0.00 0.00

18时10分01秒 0 248 0.00 0.00 0.00

18时20分01秒 0 248 0.00 0.00 0.00

18时30分02秒 0 248 0.00 0.00 0.00

平均时间: 0 249 0.00 0.00 0.00

20时45分48秒 LINUX RESTART

20时50分01秒 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15

21时00分01秒 0 246 0.00 0.02 0.04

21时10分01秒 0 248 0.00 0.00 0.00

21时20分01秒 0 248 0.00 0.00 0.00

21时30分01秒 0 248 0.00 0.00 0.00

21时40分01秒 0 248 0.00 0.00 0.00

21时50分01秒 0 248 0.00 0.00 0.00

22时00分01秒 0 248 0.00 0.00 0.00

22时10分01秒 0 248 0.00 0.00 0.00

22时20分01秒 0 248 0.00 0.00 0.00

22时30分01秒 0 248 0.00 0.00 0.00

22时40分01秒 0 252 0.00 0.00 0.00

平均时间: 0 248 0.00 0.00 0.00

[root@localhost ~]#

[root@localhost ~]# sar -q 1 #查看当前,实时采样 ,每隔一秒钟,,,,,这个命令在红帽5上也有

[root@localhost ~]# man sar

Cannot open the message catalog "man" for locale "zh_CN.UTF-8"

(NLSPATH="/usr/share/locale/%l/LC_MESSAGES/%N")

Formatting page, please wait...

SAR(1) Linux User’s Manual SAR(1)

NAME

sar - Collect, report, or save system activity information.

SYNOPSIS

sar [ -A ] [ -b ] [ -B ] [ -C ] [ -d ] [ -h ] [ -i interval ] [ -m ] [

-p ] [ -q ] [ -r ] [ -R ] [ -S ] [ -t ] [ -u [ ALL ] ] [ -v ] [ -V ] [

-w ] [ -W ] [ -y ] [ -j { ID | LABEL | PATH | UUID | ... } ] [ -n {

keyword [,...] | ALL } ] [ -I { int [,...] | SUM | ALL | XALL } ] [ -P

{ cpu [,...] | ALL } ] [ -o [ filename ] | -f [ filename ] ] [ --legacy

] [ -s [ hh:mm:ss ] ] [ -e [ hh:mm:ss ] ] [ interval [ count ] ]

DESCRIPTION

The sar command writes to standard output the contents of selected

cumulative activity counters in the operating system. The accounting

system, based on the values in the count and interval parameters,

writes information the specified number of times spaced at the speci-

fied intervals in seconds. If the interval parameter is set to zero,

the sar command displays the average statistics for the time since the

system was started. If the interval parameter is specified without the

count parameter, then reports are generated continuously. The col-

lected data can also be saved in the file specified by the -o filename

flag, in addition to being displayed onto the screen. If filename is

omitted, sar uses the standard system activity daily data file, the

/var/log/sa/sadd file, where the dd parameter indicates the current

day. By default all the data available from the kernel are saved in

the data file.

The sar command extracts and writes to standard output records previ-

ously saved in a file. This file can be either the one specified by the

-f flag or, by default, the standard system activity daily data file.

Without the -P flag, the sar command reports system-wide (global among

all processors) statistics, which are calculated as averages for values

expressed as percentages, and as sums otherwise. If the -P flag is

given, the sar command reports activity which relates to the specified

processor or processors. If -P ALL is given, the sar command reports

statistics for each individual processor and global statistics among

all processors.

You can select information about specific system activities using

flags. Not specifying any flags selects only CPU activity. Specifying

the -A flag is equivalent to specifying -bBdqrRSvwWy -I SUM -I XALL -n

ALL -u ALL -P ALL.

The default version of the sar command (CPU utilization report) might

be one of the first facilities the user runs to begin system activity

investigation, because it monitors major system resources. If CPU uti-

lization is near 100 percent (user + nice + system), the workload sam-

pled is CPU-bound.

If multiple samples and multiple reports are desired, it is convenient

to specify an output file for the sar command. Run the sar command as

a background process. The syntax for this is:

sar -o datafile interval count >/dev/null 2>&1 &

All data is captured in binary form and saved to a file (datafile).

The data can then be selectively displayed with the sar command using

the -f option. Set the interval and count parameters to select count

records at interval second intervals. If the count parameter is not

set, all the records saved in the file will be selected. Collection of

data in this manner is useful to characterize system usage over a

period of time and determine peak usage hours.

Note: The sar command only reports on local activities.

OPTIONS

-A This is equivalent to specifying -bBdqrRSuvwWy -I SUM -I XALL -n

ALL -u ALL -P ALL.

-b Report I/O and transfer rate statistics. The following values

are displayed: #与I/O有关的

tps

Total number of transfers per second that were issued to

physical devices. A transfer is an I/O request to a

physical device. Multiple logical requests can be com-

bined into a single I/O request to the device. A trans-

fer is of indeterminate size.

rtps

Total number of read requests per second issued to physi-

cal devices.

wtps

Total number of write requests per second issued to phys-

ical devices.

bread/s

Total amount of data read from the devices in blocks per

second. Blocks are equivalent to sectors with 2.4 ker-

nels and newer and therefore have a size of 512 bytes.

With older kernels, a block is of indeterminate size.

bwrtn/s

Total amount of data written to devices in blocks per

second.

-B Report paging statistics. Some of the metrics below are avail-

able only with post 2.5 kernels. The following values are dis-

played: #查看内存页面置换情况的

pgpgin/s

Total number of kilobytes the system paged in from disk

per second. Note: With old kernels (2.2.x) this value is

a number of blocks per second (and not kilobytes).

pgpgout/s

Total number of kilobytes the system paged out to disk

per second. Note: With old kernels (2.2.x) this value is

a number of blocks per second (and not kilobytes).

fault/s

Number of page faults (major + minor) made by the system

per second. This is not a count of page faults that gen-

erate I/O, because some page faults can be resolved with-

out I/O.

majflt/s

Number of major faults the system has made per second,

those which have required loading a memory page from

disk.

pgfree/s

Number of pages placed on the free list by the system per

second.

pgscank/s

Number of pages scanned by the kswapd daemon per second.

pgscand/s

pgsteal/s

Number of pages the system has reclaimed from cache

(pagecache and swapcache) per second to satisfy its mem-

ory demands.

%vmeff

Calculated as pgsteal / pgscan, this is a metric of the

efficiency of page reclaim. If it is near 100% then

almost every page coming off the tail of the inactive

list is being reaped. If it gets too low (e.g. less than

30%) then the virtual memory is having some difficulty.

This field is displayed as zero if no pages have been

scanned during the interval of time.

-C When reading data from a file, tell sar to display comments that

have been inserted by sadc.

-d Report activity for each block device (kernels 2.4 and newer

only). When data is displayed, the device specification dev m-n

is generally used ( DEV column). m is the major number of the

device. With recent kernels (post 2.5), n is the minor number

of the device, but is only a sequence number with pre 2.5 ker-

nels. Device names may also be pretty-printed if option -p is

used or persistent device names can be printed if option -j is

used (see below). Values for fields avgqu-sz, await, svctm and

%util may be unavailable and displayed as 0.00 with some 2.4

kernels. Note that disk activity depends on sadc options "-S

DISK" and "-S XDISK" to be collected. The following values are

displayed:

tps # -d tps 每秒钟的事务数

Indicate the number of transfers per second that were

issued to the device. Multiple logical requests can be

combined into a single I/O request to the device. A

transfer is of indeterminate size.

rd_sec/s

Number of sectors read from the device. The size of a

sector is 512 bytes.

wr_sec/s

Number of sectors written to the device. The size of a

sector is 512 bytes.

avgrq-sz

The average size (in sectors) of the requests that were

issued to the device.

avgqu-sz

The average queue length of the requests that were issued

to the device.

await

The average time (in milliseconds) for I/O requests

issued to the device to be served. This includes the time

spent by the requests in queue and the time spent servic-

ing them.

svctm

The average service time (in milliseconds) for I/O

requests that were issued to the device.

%util

Percentage of elapsed time during which I/O requests were

issued to the device (bandwidth utilization for the

device). Device saturation occurs when this value is

close to 100%.

-e [ hh:mm:ss ]

Set the ending time of the report. The default ending time is

18:00:00. Hours must be given in 24-hour format. This option

can be used when data are read from or written to a file

(options -f or -o ).

-f [ filename ]

Extract records from filename (created by the -o filename flag).

The default value of the filename parameter is the current daily

data file, the /var/log/sa/sadd file. The -f option is exclusive

of the -o option.

-h Display a short help message then exit.

-i interval

Select data records at seconds as close as possible to the num-

ber specified by the interval parameter.

-I { int [,...] | SUM | ALL | XALL }

Report statistics for a given interrupt. int is the interrupt

number. Specifying multiple -I int parameters on the command

line will look at multiple independent interrupts. The SUM key-

word indicates that the total number of interrupts received per

second is to be displayed. The ALL keyword indicates that

statistics from the first 16 interrupts are to be reported,

whereas the XALL keyword indicates that statistics from all

interrupts, including potential APIC interrupt sources, are to

be reported. Note that interrupt statistics depend on sadc

option "-S INT" to be collected.

-j { ID | LABEL | PATH | UUID | ... }

Display persistent device names. Use this option in conjunction

with option -d. Options ID, LABEL, etc. specify the type of the

persistent name. These options are not limited, only prerequi-

site is that directory with required persistent names is present

in /dev/disk. If persistent name is not found for the device,

the device name is pretty-printed (see option -p below).

--legacy

Enable reading older /var/log/sa/sadd data files. In Red Hat

Enterprise Linux 6.3, the sysstat package was updated to version

9.0.4-20. This update changed the format of /var/log/sa/sadd

data files, but unfortunately, the format version was not

updated. Because of this, sysstat did not restrict reading of

data files in old format and while interpreting them, some dis-

played values could have been incorrect. The updated sysstat

package in Red Hat Enterprise Linux 6.5 contains fixed format

version of data files and prevents reading data files created by

older sysstat packages. However, data files created by the sys-

stat packages from Red Hat Enterprise Linux 6.3 and 6.4 are

fully compatible with the sysstat package from Red Hat Enter-

prise Linux 6.5. To enable latest sysstat to read older data

files, use this option. Note that this option allows you to read

also data files created on Red Hat Enterprise Linux 6.2 and ear-

lier, however, these files are not compatible with the latest

sysstat package.

-m Report power management statistics. Note that these statistics

depend on sadc option "-S POWER" to be collected. The following

value is displayed:

MHz

CPU clock frequency in MHz.

-n { keyword [,...] | ALL }

Report network statistics.

Possible keywords are DEV, EDEV, NFS, NFSD, SOCK, IP, EIP, ICMP,

EICMP, TCP, ETCP, UDP, SOCK6, IP6, EIP6, ICMP6, EICMP6 and UDP6.

With the DEV keyword, statistics from the network devices are

reported. The following values are displayed:

IFACE

Name of the network interface for which statistics are

reported.

rxpck/s

Total number of packets received per second.

txpck/s

Total number of packets transmitted per second.

rxkB/s

Total number of kilobytes received per second.

txkB/s

Total number of kilobytes transmitted per second.

rxcmp/s

Number of compressed packets received per second (for

cslip etc.).

txcmp/s

Number of compressed packets transmitted per second.

rxmcst/s

Number of multicast packets received per second.

With the EDEV keyword, statistics on failures (errors) from the

network devices are reported. The following values are dis-

played:

IFACE

Name of the network interface for which statistics are

reported.

rxerr/s

Total number of bad packets received per second.

txerr/s

Total number of errors that happened per second while

transmitting packets.

coll/s

Number of collisions that happened per second while

transmitting packets.

rxdrop/s

Number of received packets dropped per second because of

a lack of space in linux buffers.

txdrop/s

Number of transmitted packets dropped per second because

of a lack of space in linux buffers.

txcarr/s

Number of carrier-errors that happened per second while

transmitting packets.

rxfram/s

Number of frame alignment errors that happened per second

on received packets.

rxfifo/s

Number of FIFO overrun errors that happened per second on

received packets.

txfifo/s

Number of FIFO overrun errors that happened per second on

transmitted packets.

With the NFS keyword, statistics about NFS client activity are

reported. The following values are displayed:

....................................................................................................

-o [ filename ]

Save the readings in the file in binary form. Each reading is in

a separate record. The default value of the filename parameter

is the current daily data file, the /var/log/sa/sadd file. The

-o option is exclusive of the -f option. All the data available

from the kernel are saved in the file (in fact, sar calls its

data collector sadc with the option "-S ALL". See sadc(8) manual

page).

-P { cpu [,...] | ALL }

Report per-processor statistics for the specified processor or

processors. Specifying the ALL keyword reports statistics for

each individual processor, and globally for all processors.

Note that processor 0 is the first processor.

-p Pretty-print device names. Use this option in conjunction with

option -d. By default names are printed as dev m-n where m and

n are the major and minor numbers for the device. Use of this

option displays the names of the devices as they (should) appear

in /dev. Name mappings are controlled by /etc/sysconfig/sys-

stat.ioconf.

-q Report queue length and load averages. The following values are

displayed: #报告队列长度和负载平均值

runq-sz #运行队列的长度 run queue size

Run queue length (number of tasks waiting for run time).

plist-sz # process list size 进程列表中的个数,当前系统中的进程的个数

Number of tasks in the task list.

ldavg-1

System load average for the last minute. The load aver-

age is calculated as the average number of runnable or

running tasks (R state), and the number of tasks in unin-

terruptible sleep (D state) over the specified interval.

ldavg-5

System load average for the past 5 minutes.

ldavg-15

System load average for the past 15 minutes.

-r Report memory utilization statistics. The following values are

displayed:

kbmemfree

Amount of free memory available in kilobytes.

kbmemused

Amount of used memory in kilobytes. This does not take

into account memory used by the kernel itself.

%memused

Percentage of used memory.

kbbuffers

Amount of memory used as buffers by the kernel in kilo-

bytes.

kbcached

Amount of memory used to cache data by the kernel in

kilobytes.

kbcommit

Amount of memory in kilobytes needed for current work-

load. This is an estimate of how much RAM/swap is needed

to guarantee that there never is out of memory.

%commit

Percentage of memory needed for current workload in rela-

tion to the total amount of memory (RAM+swap). This num-

ber may be greater than 100% because the kernel usually

overcommits memory.

-R Report memory statistics. The following values are displayed:

frmpg/s

Number of memory pages freed by the system per second. A

negative value represents a number of pages allocated by

the system. Note that a page has a size of 4 kB or 8 kB

according to the machine architecture.

bufpg/s

Number of additional memory pages used as buffers by the

system per second. A negative value means fewer pages

used as buffers by the system.

campg/s

Number of additional memory pages cached by the system

per second. A negative value means fewer pages in the

cache.

-s [ hh:mm:ss ]

Set the starting time of the data, causing the sar command to

extract records time-tagged at, or following, the time speci-

fied. The default starting time is 08:00:00. Hours must be

given in 24-hour format. This option can be used only when data

are read from a file (option -f ).

-S Report swap space utilization statistics. The following values

are displayed:

kbswpfree

Amount of free swap space in kilobytes.

kbswpused

Amount of used swap space in kilobytes.

%swpused

Percentage of used swap space.

kbswpcad

Amount of cached swap memory in kilobytes. This is mem-

ory that once was swapped out, is swapped back in but

still also is in the swap area (if memory is needed it

doesn’t need to be swapped out again because it is

already in the swap area. This saves I/O).

%swpcad

Percentage of cached swap memory in relation to the

amount of used swap space.

.........................................................................................

runq-sz: run queue size 运行队列的长度,,,表示运行队列中的任务数,状态为TASK_RUNNING,对应于ps中的R sate.运行队列的长度（等待运行的进程数）,,, 等待运行队列的长度, 值为29就表示有20个进程在那边排着队,等着要运行,等着要调度,,,,如果单个cpu长期超出3的话,cpu该升级了,,,如果多cpu(或多核),需要平均一下进行计算,,,假如有3个进程一直在那时等待,换句话说这三个进程一直不会得到cpu,所以cpu性能就比较差了,观察一下cpu的使用率,可能就非常高了

plist-sz: process list size 进程列表中的个数,当前系统中的进程的个数,,,进程列表中进程（processes）和线程（threads）的数量

ldavg-1: load average last(past) 1 minute 最后一分钟的负载平均值最后1分钟的系统平均负载（System load average）

ldavg-5: load average last(past) 5 minute 最后五分钟的负载平均值过去5分钟的系统平均负载

ldavg-15: load average last(past) 15 minute 最后五分钟的负载平均值过去15分钟的系统平均负载

[root@localhost ~]# sar -q 1

Linux 2.6.32-754.el6.x86_64 (localhost.localdomain) 2021年07月15日 _x86_64_ (8 CPU)

15时22分51秒 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15

15时22分52秒 0 248 0.00 0.00 0.00

15时22分53秒 0 248 0.00 0.00 0.00

15时22分54秒 0 248 0.00 0.00 0.00

15时22分55秒 0 248 0.00 0.00 0.00

15时22分56秒 0 248 0.00 0.00 0.00

15时22分57秒 0 248 0.00 0.00 0.00

15时22分58秒 0 248 0.00 0.00 0.00

15时22分59秒 0 248 0.00 0.00 0.00

15时23分00秒 0 248 0.00 0.00 0.00

15时23分01秒 0 248 0.00 0.00 0.00

15时23分02秒 0 248 0.00 0.00 0.00

15时23分03秒 0 248 0.00 0.00 0.00

15时23分04秒 0 248 0.00 0.00 0.00

15时23分05秒 0 248 0.00 0.00 0.00

15时23分06秒 0 248 0.00 0.00 0.00

15时23分07秒 0 248 0.00 0.00 0.00

15时23分08秒 0 248 0.00 0.00 0.00

再开一个putty窗口

[root@localhost ~]# ab -n 100000 -c 300 http://127.0.0.1/index.php #进行压力测试

原来的putty窗口

[root@localhost yum.repos.d]# sar -q 1

看到了 runq-sz,ldave-1,ldave-5,ldave-15 都有了增长

mpstat 1 2 # 能够显示对称多处理器的每一颗cpu的平均使用率,也是由软件sysstat提供的

sar -P ALL 1 2 # -P (processor)查看cpu的

iostat -c 1 2

/proc/stat

[root@localhost yum.repos.d]# man mpstat

Cannot open the message catalog "man" for locale "zh_CN.UTF-8"

(NLSPATH="/usr/share/locale/%l/LC_MESSAGES/%N")

Formatting page, please wait...

MPSTAT(1) Linux User’s Manual MPSTAT(1)

NAME

mpstat - Report processors related statistics.

SYNOPSIS

mpstat [ -A ] [ -I { SUM | CPU | ALL } ] [ -u ] [ -P { cpu [,...] | ON

| ALL } ] [ -V ] [ interval [ count ] ]

DESCRIPTION

The mpstat command writes to standard output activities for each avail-

able processor, processor 0 being the first one. Global average activ-

ities among all processors are also reported. The mpstat command can

be used both on SMP and UP machines, but in the latter, only global

average activities will be printed. If no activity has been selected,

then the default report is the CPU utilization report.

The interval parameter specifies the amount of time in seconds between

each report. A value of 0 (or no parameters at all) indicates that

processors statistics are to be reported for the time since system

startup (boot). The count parameter can be specified in conjunction

with the interval parameter if this one is not set to zero. The value

of count determines the number of reports generated at interval seconds

apart. If the interval parameter is specified without the count parame-

ter, the mpstat command generates reports continuously.

OPTIONS

-A This option is equivalent to specifying -I ALL -u -P ALL

-I { SUM | CPU | ALL } #interrupts ,显示cpu上所处理的中断的次数

Report interrupts statistics.

With the SUM keyword, the mpstat command reports the total num-

ber of interrupts per processor. The following values are dis-

played:

CPU

Processor number. The keyword all indicates that statis-

tics are calculated as averages among all processors.

intr/s

Show the total number of interrupts received per second

by the CPU or CPUs.

With the CPU keyword, the number of each individual interrupt

received per second by the CPU or CPUs is displayed.

The ALL keyword is equivalent to specifying all the keywords

above and therefore all the interrupts statistics are displayed.

-P { cpu [,...] | ON | ALL } #可以指定查看哪一颗cpu,不使用-P就是查看所有cpu

Indicate the processor number for which statistics are to be

reported. cpu is the processor number. Note that processor 0 is

the first processor. The ON keyword indicates that statistics

are to be reported for every online processor, whereas the ALL

keyword indicates that statistics are to be reported for all

processors.

-u Report CPU utilization. The following values are displayed:

CPU

Processor number. The keyword all indicates that statis-

tics are calculated as averages among all processors.

%usr

Show the percentage of CPU utilization that occurred

while executing at the user level (application).

%nice

Show the percentage of CPU utilization that occurred

while executing at the user level with nice priority.

%sys

Show the percentage of CPU utilization that occurred

while executing at the system level (kernel). Note that

this does not include time spent servicing hardware and

software interrupts.

%iowait

Show the percentage of time that the CPU or CPUs were

idle during which the system had an outstanding disk I/O

request.

%irq #硬中断

Show the percentage of time spent by the CPU or CPUs to

service hardware interrupts.

%soft #软中断

Show the percentage of time spent by the CPU or CPUs to

service software interrupts.

%steal

Show the percentage of time spent in involuntary wait by

the virtual CPU or CPUs while the hypervisor was servic-

ing another virtual processor.

%guest #来宾,表示虚拟机???

Show the percentage of time spent by the CPU or CPUs to

run a virtual processor.

%idle

Show the percentage of time that the CPU or CPUs were

idle and the system did not have an outstanding disk I/O

request.

Note: On SMP machines a processor that does not have any activ-

ity at all is a disabled (offline) processor.

-V Print version number then exit.

ENVIRONMENT

The mpstat command takes into account the following environment vari-

able:

S_TIME_FORMAT

If this variable exists and its value is ISO then the current

locale will be ignored when printing the date in the report

header. The mpstat command will use the ISO 8601 format (YYYY-

MM-DD) instead.

EXAMPLES

mpstat 2 5

Display five reports of global statistics among all processors

at two second intervals.

mpstat -P ALL 2 5

Display five reports of statistics for all processors at two

second intervals.

BUGS

[root@localhost yum.repos.d]# mpstat #只显示一次

Linux 2.6.32-754.el6.x86_64 (localhost.localdomain) 2021年07月15日 _x86_64_ (8 CPU)

16时38分47秒 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

16时38分47秒 all 0.01 0.00 0.03 0.02 0.00 0.01 0.00 0.00 99.94

[root@localhost yum.repos.d]#

%usr 用户空间占用

%nice 不同用户级别nice值对应使用的

%sys 内核空间占用

%iowait IO等待的

%irq:处理硬中断的

%soft 处理软中断

%steal 被虚拟机偷走的

%guest 来宾,虚拟机使用的

%idle 空闲的

[root@localhost yum.repos.d]# mpstat -P 0 1 # -P 0 表示0号cpu,,1表示每秒钟显示一次

Linux 2.6.32-754.el6.x86_64 (localhost.localdomain) 2021年07月15日 _x86_64_ (8 CPU)

16时39分25秒 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

16时39分26秒 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

16时39分27秒 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

16时39分28秒 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

16时39分29秒 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

16时39分30秒 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

16时39分31秒 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

16时39分32秒 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

16时39分33秒 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

16时39分34秒 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

16时39分35秒 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

[root@localhost yum.repos.d]# mpstat -I CPU 1 #cpu上对每一个中断情况的处理 -I (interrupts )

[root@localhost yum.repos.d]# sar -P 0 1 #查看cpu的使用情况的第0号cpu每秒钟显示一次

#跟mpstat显示的近似吧,没有mpstat显示的详细

Linux 2.6.32-754.el6.x86_64 (localhost.localdomain) 2021年07月15日 _x86_64_(8 CPU)

16时58分23秒 CPU %user %nice %system %iowait %steal %idle

16时58分24秒 0 0.00 0.00 0.00 0.00 0.00 100.00

16时58分25秒 0 0.00 0.00 0.00 0.00 0.00 100.00

16时58分26秒 0 0.00 0.00 0.00 0.00 0.00 100.00

16时58分27秒 0 0.00 0.00 0.00 0.00 0.00 100.00

16时58分28秒 0 0.00 0.00 0.00 0.00 0.00 100.00

[root@localhost yum.repos.d]# man iostat #io统计的命令,也能实现cpu使用率的统计

Cannot open the message catalog "man" for locale "zh_CN.UTF-8"

(NLSPATH="/usr/share/locale/%l/LC_MESSAGES/%N")

Formatting page, please wait...

IOSTAT(1) Linux User’s Manual IOSTAT(1)

NAME

iostat - Report Central Processing Unit (CPU) statistics and input/out-

put statistics for devices, partitions and network filesystems (NFS).

#报告中央处理器(CPU)的统计数据和输入输出统计数据,还能实现分区数据的查看和网络文件系统的数据的查看

SYNOPSIS

iostat [ -c ] [ -d ] [ -N ] [ -n ] [ -h ] [ -k | -m ] [ -t ] [ -V ] [

-x ] [ -y ] [ -z ] [ -j { ID | LABEL | PATH | UUID | ... } [ device

[...] | ALL ] ] [ device [...] | ALL ] [ -p [ device [,...] | ALL ] ] [

interval [ count ] ]

DESCRIPTION

The iostat command is used for monitoring system input/output device

loading by observing the time the devices are active in relation to

their average transfer rates. The iostat command generates reports that

can be used to change system configuration to better balance the

input/output load between physical disks.

The first report generated by the iostat command provides statistics

concerning the time since the system was booted, unless the -y option

is used, when this first report is omitted. Each subsequent report cov-

ers the time since the previous report. All statistics are reported

each time the iostat command is run. The report consists of a CPU

header row followed by a row of CPU statistics. On multiprocessor sys-

tems, CPU statistics are calculated system-wide as averages among all

processors. A device header row is displayed followed by a line of

statistics for each device that is configured. When option -n is used,

an NFS header row is displayed followed by a line of statistics for

each network filesystem that is mounted.

The interval parameter specifies the amount of time in seconds between

each report. The first report contains statistics for the time since

system startup (boot), unless the -y option is used, when this report

is omitted. Each subsequent report contains statistics collected dur-

ing the interval since the previous report. The count parameter can be

specified in conjunction with the interval parameter. If the count

parameter is specified, the value of count determines the number of

reports generated at interval seconds apart. If the interval parameter

is specified without the count parameter, the iostat command generates

reports continuously.

REPORTS

The iostat command generates three types of reports, the CPU Utiliza-

tion report, the Device Utilization report and the Network Filesystem

report.

CPU Utilization Report #报告cpu使用情况

The first report generated by the iostat command is the CPU Uti-

lization Report. For multiprocessor systems, the CPU values are

global averages among all processors. The report has the fol-

lowing format:

%user

Show the percentage of CPU utilization that occurred

while executing at the user level (application).

%nice

Show the percentage of CPU utilization that occurred

while executing at the user level with nice priority.

%system

Show the percentage of CPU utilization that occurred

while executing at the system level (kernel).

%iowait

Show the percentage of time that the CPU or CPUs were

idle during which the system had an outstanding disk I/O

request.

%steal

Show the percentage of time spent in involuntary wait by

the virtual CPU or CPUs while the hypervisor was servic-

ing another virtual processor.

%idle

Show the percentage of time that the CPU or CPUs were

idle and the system did not have an outstanding disk I/O

request.

Device Utilization Report #报告设备使用情况

The second report generated by the iostat command is the Device

Utilization Report. The device report provides statistics on a

per physical device or partition basis. Block devices for which

statistics are to be displayed may be entered on the command

line. Partitions may also be entered on the command line provid-

ing that option -x is not used. If no device nor partition is

entered, then statistics are displayed for every device used by

the system, and providing that the kernel maintains statistics

for it. If the ALL keyword is given on the command line, then

statistics are displayed for every device defined by the system,

including those that have never been used. The report may show

the following fields, depending on the flags used:

Device:

This column gives the device (or partition) name, which

is displayed as hdiskn with 2.2 kernels, for the nth

device. It is displayed as devm-n with 2.4 kernels, where

m is the major number of the device, and n a distinctive

number. With newer kernels, the device name as listed in

the /dev directory is displayed.

tps

Indicate the number of transfers per second that were

issued to the device. A transfer is an I/O request to the

device. Multiple logical requests can be combined into a

single I/O request to the device. A transfer is of inde-

terminate size.

Blk_read/s

Indicate the amount of data read from the device

expressed in a number of blocks per second. Blocks are

equivalent to sectors with kernels 2.4 and later and

therefore have a size of 512 bytes. With older kernels, a

block is of indeterminate size.

Blk_wrtn/s

Indicate the amount of data written to the device

expressed in a number of blocks per second.

Blk_read

The total number of blocks read.

Blk_wrtn

The total number of blocks written.

kB_read/s

Indicate the amount of data read from the device

expressed in kilobytes per second.

kB_wrtn/s

Indicate the amount of data written to the device

expressed in kilobytes per second.

kB_read

The total number of kilobytes read.

kB_wrtn

The total number of kilobytes written.

MB_read/s

Indicate the amount of data read from the device

expressed in megabytes per second.

MB_wrtn/s

Indicate the amount of data written to the device

expressed in megabytes per second.

MB_read

The total number of megabytes read.

MB_wrtn

The total number of megabytes written.

rrqm/s

The number of read requests merged per second that were

queued to the device.

wrqm/s

The number of write requests merged per second that were

queued to the device.

r/s

The number of read requests that were issued to the

device per second.

w/s

The number of write requests that were issued to the

device per second.

rsec/s

The number of sectors read from the device per second.

wsec/s

The number of sectors written to the device per second.

rkB/s

The number of kilobytes read from the device per second.

wkB/s

The number of kilobytes written to the device per second.

rMB/s

The number of megabytes read from the device per second.

wMB/s

The number of megabytes written to the device per second.

avgrq-sz

The average size (in sectors) of the requests that were

issued to the device.

avgqu-sz

The average queue length of the requests that were issued

to the device.

await

The average time (in milliseconds) for I/O requests

issued to the device to be served. This includes the time

spent by the requests in queue and the time spent servic-

ing them.

svctm

The average service time (in milliseconds) for I/O

requests that were issued to the device. Warning! Do not

trust this field any more. This field will be removed in

a future sysstat version.

%util

Percentage of elapsed time during which I/O requests were

issued to the device (bandwidth utilization for the

device). Device saturation occurs when this value is

close to 100%.

Network Filesystem report #报告网络文件系统使用情况

The Network Filesystem (NFS) report provides statistics for each

mounted network filesystem. The report shows the following

fields:

Filesystem:

This columns shows the hostname of the NFS server fol-

lowed by a colon and by the directory name where the net-

work filesystem is mounted.

rBlk_nor/s

Indicate the number of blocks read by applications via

the read(2) system call interface. A block has a size of

512 bytes.

wBlk_nor/s

Indicate the number of blocks written by applications via

the write(2) system call interface.

rBlk_dir/s

Indicate the number of blocks read from files opened with

the O_DIRECT flag.

wBlk_dir/s

Indicate the number of blocks written to files opened

with the O_DIRECT flag.

rBlk_svr/s

Indicate the number of blocks read from the server by the

NFS client via an NFS READ request.

wBlk_svr/s

Indicate the number of blocks written to the server by

the NFS client via an NFS WRITE request.

rkB_nor/s

Indicate the number of kilobytes read by applications via

the read(2) system call interface.

wkB_nor/s

Indicate the number of kilobytes written by applications

via the write(2) system call interface.

rkB_dir/s

Indicate the number of kilobytes read from files opened

with the O_DIRECT flag.

wkB_dir/s

Indicate the number of kilobytes written to files opened

with the O_DIRECT flag.

...................................................................

OPTIONS

-c Display the CPU utilization report. #显示cpu的

-d Display the device utilization report. #显示设备的

-h Make the NFS report displayed by option -n easier to read by a

human.

-j { ID | LABEL | PATH | UUID | ... } [ device [...] | ALL ]

Display persistent device names. Options ID, LABEL, etc. specify

the type of the persistent name. These options are not limited,

only prerequisite is that directory with required persistent

names is present in /dev/disk. Optionally, multiple devices can

be specified in the chosen persistent name type.

-k Display statistics in kilobytes per second instead of blocks per

second. Data displayed are valid only with kernels 2.4 and

later.

-m Display statistics in megabytes per second instead of blocks or

kilobytes per second. Data displayed are valid only with ker-

nels 2.4 and later.

-N Display the registered device mapper names for any device mapper

devices. Useful for viewing LVM2 statistics.

-n Display the network filesystem (NFS) report. This option works

only with kernel 2.6.17 and later. #显示网络文件系统的

-p [ { device [,...] | ALL } ]

The -p option displays statistics for block devices and all

their partitions that are used by the system. If a device name

is entered on the command line, then statistics for it and all

its partitions are displayed. Last, the ALL keyword indicates

that statistics have to be displayed for all the block devices

and partitions defined by the system, including those that have

never been used. If option -j is defined before this option,

devices entered on the command line can be specified with the

chosen persistent name type. Note that this option works only

with post 2.5 kernels.

-t Print the time for each report displayed. The timestamp format

may depend on the value of the S_TIME_FORMAT environment vari-

able (see below).

-V Print version number then exit.

-x Display extended statistics. This option works with post 2.5

kernels since it needs /proc/diskstats file or a mounted sysfs

to get the statistics. This option may also work with older ker-

nels (e.g. 2.4) only if extended statistics are available in

/proc/partitions (the kernel needs to be patched for that).

-y Omit first report with statistics since the system boot, if dis-

playing multiple records in given interval.

-z Tell iostat to omit output for any devices for which there was

no activity during the sample period.

ENVIRONMENT

[root@localhost yum.repos.d]# iostat -c #cpu的使用状况

Linux 2.6.32-754.el6.x86_64 (localhost.localdomain) 2021年07月15日 _x86_64_ (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle

0.01 0.00 0.03 0.01 0.00 99.95

[root@localhost yum.repos.d]#

[root@localhost yum.repos.d]# iostat -c 1 #cpu的使用状况每秒显示一次

Linux 2.6.32-754.el6.x86_64 (localhost.localdomain) 2021年07月15日 _x86_64_ (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle

0.01 0.00 0.03 0.01 0.00 99.95

avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 0.12 0.00 0.00 99.88

avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 0.00 0.00 0.00 100.00

avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 0.00 0.00 0.00 100.00

avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 0.12 0.00 0.00 99.88

iostat的这些用法与vmstat一样

[root@localhost yum.repos.d]# iostat -c 1 6 #cpu的使用状况每秒显示一次,共采样6次,

#%iowait过长的话,意味着要检查io输入输出状况了,有可能磁盘是性能瓶颈

#%system占用的比例过高,意味着内核空间消耗的时间太长,真正提供服务的时间太短了,这时要观察一下哪个进程,需要执行的什么操作,到底为什么内核占据那么长的时间,,,,

#%steal在支持硬件虚拟化的cpu上,而且使用了虚拟机的cpu上,%steal的量可能也不小

Linux 2.6.32-754.el6.x86_64 (localhost.localdomain) 2021年07月15日 _x86_64_(8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle

0.01 0.00 0.03 0.01 0.00 99.95

avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 0.00 0.00 0.00 100.00

avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 0.00 0.00 0.00 100.00

avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 0.00 0.00 0.00 100.00

avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 0.00 0.00 0.00 100.00

avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 0.00 0.00 0.00 100.00

[root@localhost yum.repos.d]#

[root@localhost yum.repos.d]# cat /proc/stat #cpu的数据统计,其实上面的几个命令基本上都是从这里面专门去获取信息并以用户非常直观的方式显示出来的

cpu 728 0 3062 14251890 1758 5 1251 0 0

cpu0 93 0 393 1781419 199 2 180 0 0

cpu1 72 0 331 1781152 150 0 414 0 0

cpu2 81 0 256 1781801 147 0 118 0 0

cpu3 138 0 770 1781034 367 0 86 0 0

cpu4 88 0 421 1781457 309 0 166 0 0

cpu5 82 0 356 1781459 180 2 139 0 0

cpu6 94 0 292 1781680 219 0 93 0 0

cpu7 75 0 240 1781884 182 0 52 0 0

intr 657463 205 8 0 1 1 0 0 0 60 0 0 0 110 0 0 245 0 8377 63 37271 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

ctxt 728376

btime 1626324304

processes 3485

procs_running 1

procs_blocked 0

softirq 1947767 0 441372 3562 751530 8542 0 2 240769 340 501650

[root@localhost yum.repos.d]#

dstat (data Statistics ?????)命令,如果没有这个命令,装下dstat包吧,红帽5上系统没有自带,红帽6上有,

[root@localhost yum.repos.d]# man dstat

Cannot open the message catalog "man" for locale "zh_CN.UTF-8"

(NLSPATH="/usr/share/locale/%l/LC_MESSAGES/%N")

Formatting page, please wait...

DSTAT(1) DSTAT(1)

NAME

dstat - versatile tool for generating system resource statistics #又是一个系统资源统计的命令

SYNOPSIS

dstat [-afv] [options..] [delay [count]]

DESCRIPTION

Dstat is a versatile replacement for vmstat, iostat and ifstat. Dstat

overcomes some of the limitations and adds some extra features.

Dstat allows you to view all of your system resources instantly, you

can eg. compare disk usage in combination with interrupts from your IDE

controller, or compare the network bandwidth numbers directly with the

disk throughput (in the same interval).

Dstat also cleverly gives you the most detailed information in columns

and clearly indicates in what magnitude and unit the output is

displayed. Less confusion, less mistakes, more efficient.

Dstat is unique in letting you aggregate block device throughput for a

certain diskset or network bandwidth for a group of interfaces, ie. you

can see the throughput for all the block devices that make up a single

filesystem or storage system.

Dstat allows its data to be directly written to a CSV file to be

imported and used by OpenOffice, Gnumeric or Excel to create graphs.

Note

Users of Sleuthkit might find Sleuthkit’s dstat being renamed to

datastat to avoid a name conflict. See Debian bug #283709 for more

information.

OPTIONS

-c, --cpu #这里-c cpu的

enable cpu stats (system, user, idle, wait, hardware interrupt,

software interrupt)

-C 0,3,total #这里-C

include cpu0, cpu3 and total

-d, --disk #这里-d

enable disk stats (read, write)

-D total,hda #这里-D

include hda and total

-g, --page

enable page stats (page in, page out)

-i, --int

enable interrupt stats

-I 5,10

include interrupt 5 and 10

-l, --load

enable load average stats (1 min, 5 mins, 15mins)

-m, --mem #这里是内存的

enable memory stats (used, buffers, cache, free)

-n, --net #这里是网络的

enable network stats (receive, send)

-N eth1,total #这里是网卡的

include eth1 and total

-p, --proc #这里是proc的

enable process stats (runnable, uninterruptible, new)

-r, --io #这里是io的

enable I/O request stats (read, write requests)

-s, --swap

enable swap stats (used, free)

-S swap1,total

include swap1 and total

-t, --time #这里是时间的

enable time/date output

-T, --epoch

enable time counter (seconds since epoch)

-y, --sys

enable system stats (interrupts, context switches)

--aio enable aio stats (asynchronous I/O) #这里是aio的

--fs enable filesystem stats (open files, inodes) #这里是文件系统的

--ipc enable ipc stats (message queue, semaphores, shared memory) #这里是ipc的

--lock enable file lock stats (posix, flock, read, write) #这里锁的

--raw enable raw stats (raw sockets)

--socket #这里套接字的

enable socket stats (total, tcp, udp, raw, ip-fragments)

--tcp enable tcp stats (listen, established, syn, time_wait, close) #这里tcp 的

--udp enable udp stats (listen, active) #这里udp 的

--unix enable unix stats (datagram, stream, listen, active) #这里unix 的

--vm enable vm stats (hard pagefaults, soft pagefaults, allocated,#这里vm 的

free)

--stat1 --stat2 #这里stat的

enable (external) plugins by plugin name, see PLUGINS for

options

Possible internal stats are

aio, cpu, cpu24, disk, disk24, disk24old, epoch, fs, int, int24,

io, ipc, load, lock, mem, net, page, page24, proc, raw, socket,

swap, swapold, sys, tcp, time, udp, unix, vm

--list list the internal and external plugin names

-a, --all #这里是all的

equals -cdngy (default)

-f, --full

expand -C, -D, -I, -N and -S discovery lists

-v, --vmstat

equals -pmgdsc -D total

--bw, --blackonwhite

change colors for white background terminal

--float

force float values on screen (mutual exclusive with --integer)

--integer

force integer values on screen (mutual exclusive with --float)

--nocolor

disable colors (implies --noupdate)

--noheaders

disable repetitive headers

--noupdate

disable intermediate updates when delay > 1

--output file

write CSV output to file

PLUGINS #利用某些插件功能更强大

While anyone can create their own dstat plugins (and contribute them)

dstat ships with a number of plugins already that extend its

capabilities greatly. Here is an overview of the plugins dstat ships

with:

--battery #电池

battery in percentage (needs ACPI)

--battery-remain #电池剩余

battery remaining in hours, minutes (needs ACPI)

--cpufreq #cpu频率

CPU frequency in percentage (needs ACPI)

--dbus number of dbus connections (needs python-dbus)

--disk-util #磁盘利用率

per disk utilization in percentage

--fan fan speed (needs ACPI)

--freespace

per filesystem disk usage

--gpfs GPFS read/write I/O (needs mmpmon) #gpfs 文件系统的读取

--gpfs-ops

GPFS filesystem operations (needs mmpmon)

--helloworld

Hello world example dstat plugin

--innodb-buffer #mysql的innodb存储引擎的相关情况

show innodb buffer stats

--innodb-io

show innodb I/O stats

--innodb-ops

show innodb operations counters

--lustre #一种分布式文件系统的存储情况

show lustre I/O throughput

--memcache-hits #memcache的命中率

show the number of hits and misses from memcache

--mysql5-cmds #mysql的各种命令

show the MySQL5 command stats

--mysql5-conn

show the MySQL5 connection stats

--mysql5-io

--mysql5-keys

show the MySQL5 keys stats

--mysql-io

show the MySQL I/O stats

--mysql-keys

show the MySQL keys stats

--net-packets

show the number of packets received and transmitted

--nfs3 show NFS v3 client operations

--nfs3-ops

show extended NFS v3 client operations

--nfsd3

show NFS v3 server operations

--nfsd3-ops

show extended NFS v3 server operations

--ntp show NTP time from an NTP server

--postfix

show postfix queue sizes (needs postfix)

--power

show power usage

--proc-count

show total number of processes

--rpc show RPC client calls stats

--rpcd show RPC server calls stats

--sendmail

show sendmail queue size (needs sendmail)

--snooze

show number of ticks per second

--test show test plugin output

--thermal

system temperature sensors

--top-bio

show most expensive block I/O process

--top-cpu #这个很有用哪个进程最消耗cpu

show most expensive CPU process

--top-cputime #哪个进程最消耗cpu时间(毫秒为单位)

show process using the most CPU time (in ms)

--top-cputime-avg #哪个进程消耗了最多的cpu时间片

show process with the highest average timeslice (in ms)

--top-io #这个很有用哪个进程最消耗I/O

show most expensive I/O process

--top-latency #哪个进程有着最大的延迟(可能因为i/o或其它情况)

show process with highest total latency (in ms)

--top-latency-avg

show process with the highest average latency (in ms)

--top-mem #哪个进程用了最多的内存

show process using the most memory

--top-oom

show process that will be killed by OOM the first

--utmp show number of utmp connections (needs python-utmp)

--vmk-hba

show VMware ESX kernel vmhba stats

--vmk-int

show VMware ESX kernel interrupt stats

--vmk-nic

show VMware ESX kernel port stats

--vm-memctl

show ballooning status inside VMware guests

--vz-io

show CPU usage per OpenVZ guest

--vz-ubc

show OpenVZ user beancounters

--wifi wireless link quality and signal to noise ratio

[root@localhost ~]# dstat --top-cpu #不停的显示哪个进程最占cpu

[root@localhost ~]# dstat --top-mem #不停的显示哪个进程最占内存

[root@localhost ~]# dstat --top-mem --top-cpu #两个mem 和 cpu同时使用

[root@localhost ~]# dstat --top-mem --top-cpu --top-io #三个同时使用

[root@localhost ~]# dstat -c #显示cpu的使用率

用户空间系统空间空闲的等待io 硬件中断软中断

usr sys idl wai hiq siq

观察上下文切换的次数

[root@localhost ~]# vmstat 1 #cs就是上下文切换

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

0 0 0 1669168 18864 74352 0 0 4 0 8 5 0 0 100 0 0

0 0 0 1669012 18864 74352 0 0 0 0 45 30 0 0 100 0 0

0 0 0 1669012 18864 74352 0 0 0 0 19 15 0 0 100 0 0

0 0 0 1669012 18864 74352 0 0 0 0 28 28 0 0 100 0 0

0 0 0 1669012 18864 74352 0 0 0 0 18 15 0 0 100 0 0

0 0 0 1669012 18864 74352 0 0 0 0 32 26 0 0 100 0 0

0 0 0 1669012 18864 74352 0 0 0 0 18 15 0 0 100 0 0

[root@localhost ~]# man sar

...................................................................................................................................................

-w Report task creation and system switching activity. #-w上下文切换,

proc/s

Total number of tasks created per second. #它能够报告每秒钟我们系统创建的进程的个数

cswch/s

Total number of context switches per second. #它能够报告每秒钟我们上下文切换的次数

...................................................................................................................................................

[root@localhost ~]# sar -w 1 #查看上下文切换的平均次数以及进程创建的平均值,以秒为单位这里-w 后面的1表示的是1秒显示一批,,,如果-w 后面的5表示的是5秒显示一批

Linux 2.6.32-754.el6.x86_64 (localhost.localdomain) 2021年07月16日 _x86_64_(8 CPU)

09时11分03秒 proc/s cswch/s

09时11分04秒 0.00 18.00

09时11分05秒 0.00 90.00

09时11分06秒 0.00 81.00

09时11分07秒 0.00 94.00

#上下文切换如果过多的话,意味着我们的进程有点多了,,,如果中断处理过多的话,可能意味着外围硬件很繁忙,,,通常情况下,作为网络服务器的话,中断次数多是很正常的

cpu上linux支持调度器域,可以事先将cpu划好组,而后使用类似于taskset所实现的功能,将某个进程绑定在某个组内的cpu上,而不是绑定在某个特定的cpu上

如下图,

cpu的调度域,根文件系统一样,把cpu组织成了倒置的树状结构,组成了一个文件系统,,,,所有的cpu都属于根域,里面划分两个左右子域,每个子域里面有两个cpu,,,我们可以把某进程绑定在左边的域上,,,每一个子域,还可以再划分子子域,比如把左边子域再划分一次子子域,把cpu0绑定到子子域上,,,,,再划分一个子子域,把cpu2绑定到另一个子子域上,,,,此时如果让某一个进程运行在根域上,意味着这个进程可以运行在所有的cpu上,,如果某一个进程运行在右边子域上,表示此进程可以运行在cpu1,cpu3,,,,,如果运行在..........

这种cpu的划分机制,除了cpu划分归类,还得内存划分归类,,,,如果是numa结构的话,容易理解,如果是非numa结构的话,内存段仅有一段,由此任何一个子域内得包括第0段内存???????,,,,,,,否则第0号cpu所在的位置可以带上第0段内存,否则第2号cpu所在的位置可以带上第2段内存,???????

怎么去划分,我们只需要在某个位置创建一个子目录,而后挂载一个文件系统

设备挂载点文件系统类型挂载选项

cpuset /cpusets cpuset defaults 0 0

挂完以后,自动在 /cpusets 下面生成了几个目录

/cpusets/cpus

/cpusets/mems

/cpusets/tasks

[root@localhost ~]# mkdir /cpusets

[root@localhost ~]# vim /etc/fstab

# /etc/fstab

# Created by anaconda on Wed Jul 14 21:11:42 2021

# Accessible filesystems, by reference, are maintained under '/dev/disk'

# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info

/dev/mapper/VolGroup-lv_root / ext4 defaults 1 1

UUID=02ded709-9bb2-40dd-b5ba-b274ae0bd2f0 /boot ext4 defaults 1 2

/dev/mapper/VolGroup-lv_home /home ext4 defaults 1 2

/dev/mapper/VolGroup-lv_swap swap swap defaults 0 0

tmpfs /dev/shm tmpfs defaults 0 0

devpts /dev/pts devpts gid=5,mode=620 0 0

sysfs /sys sysfs defaults 0 0

proc /proc proc defaults 0 0

cpuset /cpusets cpuset defaults 0 0 #增加这一行

[root@localhost ~]# mount -a

[root@localhost ~]# mount

/dev/mapper/VolGroup-lv_root on / type ext4 (rw)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw)

devpts on /dev/pts type devpts (rw,gid=5,mode=620)

tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")

/dev/sda1 on /boot type ext4 (rw)

/dev/mapper/VolGroup-lv_home on /home type ext4 (rw)

none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

cpuset on /cpusets type cpuset (rw) #已挂载上

[root@localhost ~]#

[root@localhost ~]# ls /cpusets/ #有内容

cgroup.event_control memory_migrate notify_on_release

cgroup.procs memory_pressure release_agent

cpu_exclusive memory_pressure_enabled sched_load_balance

cpus(根域) memory_spread_page sched_relax_domain_level

mem_exclusive memory_spread_slab tasks(运行在这个域内的进程有哪些)

mem_hardwall mems(关联了哪些内存)

[root@localhost ~]#

[root@localhost ~]# cat /cpusets/cpus #我们有8核

0-7

[root@localhost ~]#

[root@localhost ~]# cat /cpusets/mems #哪一段内存属于这儿????若不是numa结构,所以只有一段内存,一定属于根域的

[root@localhost ~]#

[root@localhost ~]# cat /cpusets/tasks #所有进程都在这儿

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

158

159

160

161

162

163

164

165

166

168

169

170

203

204

243

424

425

426

437

438

565

567

633

634

669

740

926

1186

1226

1227

1228

1229

1326

1385

1390

1391

1392

1393

1394

1395

1396

1397

1406

1411

1416

1417

1418

1419

1420

1421

1422

1423

1428

1449

1683

1684

1717

1718

1719

1720

1751

1769

1791

1825

1827

1849

1881

1893

1894

1895

1926

1937

1961

1962

1963

1978

1981

2103

2120

2199

2206

2213

2240

2255

2272

2289

2335

2339

2342

2345

2347

2349

2360

2361

2368

2372

2911

2973

[root@localhost ~]#

划分子域

[root@localhost ~]# cd /cpusets/

[root@localhost cpusets]# ls

cgroup.event_control memory_migrate notify_on_release

cgroup.procs memory_pressure release_agent

cpu_exclusive memory_pressure_enabled sched_load_balance

cpus memory_spread_page sched_relax_domain_level

mem_exclusive memory_spread_slab tasks

mem_hardwall mems

[root@localhost cpusets]# mkdir domain1 #创建一个目录

[root@localhost cpusets]#

[root@localhost cpusets]# cd domain1

[root@localhost domain1]# ls #给我们自动创建了很多文件

cgroup.event_control mem_hardwall mems(有它,设定这个域内的mem)

cgroup.procs memory_migrate notify_on_release

cpu_exclusive memory_pressure sched_load_balance

cpus(有它,设定这个域内的cpu) memory_spread_page sched_relax_domain_level

mem_exclusive memory_spread_slab tasks(有它,将进程绑定在这个域内)

[root@localhost domain1]#

[root@localhost domain1]# cat cpus #此时没有内容

[root@localhost domain1]# cat mems #此时没有内容

[root@localhost domain1]# cat tasks #此时没有内容

[root@localhost domain1]#

[root@localhost domain1]# echo 0 > cpus #只绑定0号cpu

[root@localhost domain1]# echo 0 > mems #绑定0段内存,事实上我们只有0段内存

[root@localhost domain1]#

而后我们将某个进程绑定在这里的tasks上了,那么这个进程只能运行在这个cpu和这段内存上了

[root@localhost domain1]# ps axo pid,cmd

PID CMD

1 /sbin/init

2 [kthreadd]

3 [migration/0]

4 [ksoftirqd/0]

5 [stopper/0]

6 [watchdog/0]

7 [migration/1]

8 [stopper/1]

9 [ksoftirqd/1]

10 [watchdog/1]

11 [migration/2]

12 [stopper/2]

13 [ksoftirqd/2]

14 [watchdog/2]

15 [migration/3]

16 [stopper/3]

17 [ksoftirqd/3]

18 [watchdog/3]

19 [migration/4]

20 [stopper/4]

21 [ksoftirqd/4]

22 [watchdog/4]

23 [migration/5]

24 [stopper/5]

25 [ksoftirqd/5]

26 [watchdog/5]

27 [migration/6]

28 [stopper/6]

29 [ksoftirqd/6]

30 [watchdog/6]

31 [migration/7]

32 [stopper/7]

33 [ksoftirqd/7]

34 [watchdog/7]

35 [events/0]

36 [events/1]

37 [events/2]

38 [events/3]

39 [events/4]

40 [events/5]

41 [events/6]

42 [events/7]

43 [events/0]

44 [events/1]

45 [events/2]

46 [events/3]

47 [events/4]

48 [events/5]

49 [events/6]

50 [events/7]

51 [events_long/0]

52 [events_long/1]

53 [events_long/2]

54 [events_long/3]

55 [events_long/4]

56 [events_long/5]

57 [events_long/6]

58 [events_long/7]

59 [events_power_ef]

60 [events_power_ef]

61 [events_power_ef]

62 [events_power_ef]

63 [events_power_ef]

64 [events_power_ef]

65 [events_power_ef]

66 [events_power_ef]

67 [cgroup]

68 [khelper]

69 [netns]

70 [async/mgr]

71 [pm]

72 [sync_supers]

73 [bdi-default]

74 [kintegrityd/0]

75 [kintegrityd/1]

76 [kintegrityd/2]

77 [kintegrityd/3]

78 [kintegrityd/4]

79 [kintegrityd/5]

80 [kintegrityd/6]

81 [kintegrityd/7]

82 [kblockd/0]

83 [kblockd/1]

84 [kblockd/2]

85 [kblockd/3]

86 [kblockd/4]

87 [kblockd/5]

88 [kblockd/6]

89 [kblockd/7]

90 [kacpid]

91 [kacpi_notify]

92 [kacpi_hotplug]

93 [ata_aux]

94 [ata_sff/0]

95 [ata_sff/1]

96 [ata_sff/2]

97 [ata_sff/3]

98 [ata_sff/4]

99 [ata_sff/5]

100 [ata_sff/6]

101 [ata_sff/7]

102 [ksuspend_usbd]

103 [khubd]

104 [kseriod]

105 [md/0]

106 [md/1]

107 [md/2]

108 [md/3]

109 [md/4]

110 [md/5]

111 [md/6]

112 [md/7]

113 [md_misc/0]

114 [md_misc/1]

115 [md_misc/2]

116 [md_misc/3]

117 [md_misc/4]

118 [md_misc/5]

119 [md_misc/6]

120 [md_misc/7]

121 [linkwatch]

124 [khungtaskd]

125 [lru-add-drain/0]

126 [lru-add-drain/1]

127 [lru-add-drain/2]

128 [lru-add-drain/3]

129 [lru-add-drain/4]

130 [lru-add-drain/5]

131 [lru-add-drain/6]

132 [lru-add-drain/7]

133 [kswapd0]

134 [ksmd]

135 [khugepaged]

136 [aio/0]

137 [aio/1]

138 [aio/2]

139 [aio/3]

140 [aio/4]

141 [aio/5]

142 [aio/6]

143 [aio/7]

144 [crypto/0]

145 [crypto/1]

146 [crypto/2]

147 [crypto/3]

148 [crypto/4]

149 [crypto/5]

150 [crypto/6]

151 [crypto/7]

158 [kthrotld/0]

159 [kthrotld/1]

160 [kthrotld/2]

161 [kthrotld/3]

162 [kthrotld/4]

163 [kthrotld/5]

164 [kthrotld/6]

165 [kthrotld/7]

166 [pciehpd]

168 [kpsmoused]

169 [usbhid_resumer]

170 [deferwq]

203 [kdmremove]

204 [kstriped]

243 [ttm_swap]

424 [mpt_poll_0]

425 [mpt/0]

426 [scsi_eh_0]

437 [scsi_eh_1]

438 [scsi_eh_2]

565 [kdmflush]

567 [kdmflush]

633 [jbd2/dm-0-8]

634 [ext4-dio-unwrit]

669 [flush-253:0]

740 /sbin/udevd -d

926 [vmmemctl]

1186 [kdmflush]

1226 [jbd2/sda1-8]

1227 [ext4-dio-unwrit]

1228 [jbd2/dm-2-8]

1229 [ext4-dio-unwrit]

1326 [kauditd]

1385 [ib_addr]

1390 [infiniband/0]

1391 [infiniband/1]

1392 [infiniband/2]

1393 [infiniband/3]

1394 [infiniband/4]

1395 [infiniband/5]

1396 [infiniband/6]

1397 [infiniband/7]

1406 [ib_mcast]

1411 [iw_cm_wq]

1416 [ib_cm/0]

1417 [ib_cm/1]

1418 [ib_cm/2]

1419 [ib_cm/3]

1420 [ib_cm/4]

1421 [ib_cm/5]

1422 [ib_cm/6]

1423 [ib_cm/7]

1428 [rdma_cm]

1449 [ipoib_flush]

1683 auditd

1717 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5

1751 irqbalance --pid=/var/run/irqbalance.pid

1769 rpcbind

1791 rpc.statd

1825 dbus-daemon --system

1849 cupsd -C /etc/cups/cupsd.conf

1881 /usr/sbin/acpid

1893 hald

1894 hald-runner

1926 hald-addon-input: Listening on /dev/input/event2 /dev/input/event0

1937 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket

1961 automount --pid-file /var/run/autofs.pid

2103 /usr/sbin/mcelog --daemon

2120 /usr/sbin/sshd

2199 /usr/libexec/postfix/master

2206 qmgr -l -t fifo -u

2213 /usr/sbin/abrtd

2240 crond

2255 /usr/sbin/atd

2272 /usr/bin/rhsmcertd

2289 /usr/sbin/certmonger -S -p /var/run/certmonger.pid

2335 /sbin/mingetty /dev/tty1

2339 /sbin/mingetty /dev/tty2

2342 /sbin/mingetty /dev/tty3

2345 /sbin/mingetty /dev/tty4

2347 /sbin/mingetty /dev/tty5

2349 /sbin/mingetty /dev/tty6

2360 /sbin/udevd -d

2361 /sbin/udevd -d

2368 sshd: root@pts/0

2372 -bash

2911 pickup -l -t fifo -u

3079 /usr/sbin/httpd

3081 /usr/sbin/httpd

3082 /usr/sbin/httpd

3083 /usr/sbin/httpd

3084 /usr/sbin/httpd

3085 /usr/sbin/httpd

3086 /usr/sbin/httpd

3087 /usr/sbin/httpd

3088 /usr/sbin/httpd #绑定它吧

3090 ps axo pid,cmd

[root@localhost domain1]#

[root@localhost domain1]# echo 3088 > tasks #绑下进程,,此时3088的进程只能运行在这个cpu上了

[root@localhost domain1]#

[root@localhost domain1]# man ps

........................................................................................

psr PSR processor that process is currently assigned to. #ps 命令能显示进程运行在哪个cpu上

........................................................................................

[root@localhost domain1]# ps -e -o psr,pid,cmd | grep httpd

2 3079 /usr/sbin/httpd

3 3081 /usr/sbin/httpd

4 3082 /usr/sbin/httpd

3 3083 /usr/sbin/httpd

4 3084 /usr/sbin/httpd

3 3085 /usr/sbin/httpd

4 3086 /usr/sbin/httpd #不在0号cpu上????

3 3087 /usr/sbin/httpd

4 3088 /usr/sbin/httpd

5 3133 grep httpd

[root@localhost domain1]#

[root@localhost domain1]# cat tasks # 已绑定了进程

3088

[root@localhost domain1]#

[root@localhost domain1]# watch -n 0.5 'ps -e -o psr,pid, cmd | grep httpd' #观察命令的执行

Every 0.5s: ps -e -o psr,pid,cmd | grep httpd Fri Jul 16 10:39:02 2021

2 3079 /usr/sbin/httpd

3 3081 /usr/sbin/httpd

4 3082 /usr/sbin/httpd

3 3083 /usr/sbin/httpd

4 3084 /usr/sbin/httpd

3 3085 /usr/sbin/httpd

4 3086 /usr/sbin/httpd

3 3087 /usr/sbin/httpd

4 3088 /usr/sbin/httpd

1 3139 watch -n 0.5 ps -e -o psr,pid,cmd | grep httpd #随时动,切换cpu

3 3239 sh -c ps -e -o psr,pid,cmd | grep httpd #随时动,切换cpu

5 3241 grep httpd #随时动,切换cpu

重开一下putty窗口,做压力测试

[root@localhost ~]# ab -n 10000 -c 300 http://127.0.0.1/index.php

This is ApacheBench, Version 2.3 <$Revision: 655654 $>

Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)

Completed 1000 requests

Completed 2000 requests

Completed 3000 requests

Completed 4000 requests

Completed 5000 requests

Completed 6000 requests

Completed 7000 requests

Completed 8000 requests

Completed 9000 requests

Completed 10000 requests

Finished 10000 requests

Server Software: Apache/2.2.15

Server Hostname: 127.0.0.1

Server Port: 80

Document Path: /index.php

Document Length: 283 bytes

Concurrency Level: 300

Time taken for tests: 1.441 seconds

Complete requests: 10000

Failed requests: 0

Write errors: 0

Non-2xx responses: 10000

Total transferred: 4640000 bytes

HTML transferred: 2830000 bytes

Requests per second: 6938.20 [#/sec] (mean)

Time per request: 43.239 [ms] (mean)

Time per request: 0.144 [ms] (mean, across all concurrent requests)

Transfer rate: 3143.87 [Kbytes/sec] received

Connection Times (ms)

min mean[+/-sd] median max

Connect: 0 1 24.5 0 1000

Processing: 0 28 157.6 9 1412

Waiting: 0 28 157.6 9 1412

Total: 4 30 160.7 9 1422

Percentage of the requests served within a certain time (ms)

50% 9

66% 10

75% 11

80% 11

90% 12

95% 13

98% 26

99% 1416

100% 1422 (longest request)

[root@localhost ~]#

[root@localhost domain1]# ps -e -o psr,pid,cmd | grep httpd

Every 0.5s: ps -e -o psr,pid,cmd | grep httpd Fri Jul 16 10:47:02 2021

2 3079 /usr/sbin/httpd

1 3081 /usr/sbin/httpd

4 3082 /usr/sbin/httpd

1 3083 /usr/sbin/httpd

4 3084 /usr/sbin/httpd

5 3085 /usr/sbin/httpd

5 3086 /usr/sbin/httpd

6 3087 /usr/sbin/httpd

0 3088 /usr/sbin/httpd #另一putty窗口做了压力测试后,看到此进程在0号cpu上了,,它被重新调度了,,无论别人怎么调度,它一直在0号cpu上了

6 3139 watch -n 0.5 ps -e -o psr,pid,cmd | grep httpd

6 5929 /usr/sbin/httpd

3 5937 /usr/sbin/httpd

3 5938 /usr/sbin/httpd

0 6083 sh -c ps -e -o psr,pid,cmd | grep httpd

6 6085 grep httpd

上面的划分子域的这些数据都是echo过来的,下次重启就没有了?????为什么?????这个文件系统仍然会在,因为开机自动挂载了,,,,但是 /cpusets 目录下的非自动生成的子目录可能就不存在了,

另一种方法taskset绑定吧

[root@localhost domain1]# ps -e -o psr,pid,cmd | grep httpd

0 508 grep httpd

2 3079 /usr/sbin/httpd

5 3081 /usr/sbin/httpd

1 3087 /usr/sbin/httpd

0 3088 /usr/sbin/httpd

7 5938 /usr/sbin/httpd

1 6813 /usr/sbin/httpd

1 6821 /usr/sbin/httpd

3 6824 /usr/sbin/httpd

2 6826 /usr/sbin/httpd

1 6838 /usr/sbin/httpd

3 6842 /usr/sbin/httpd

2 6844 /usr/sbin/httpd

1 6846 /usr/sbin/httpd

1 6848 /usr/sbin/httpd

2 6850 /usr/sbin/httpd

3 6867 /usr/sbin/httpd

2 6880 /usr/sbin/httpd

3 6882 /usr/sbin/httpd

3 6883 /usr/sbin/httpd

0 6885 /usr/sbin/httpd

2 6886 /usr/sbin/httpd

[root@localhost domain1]#

[root@localhost domain1]# taskset -p -c 0 3087 #将进程3087绑定到0号cpu,,, -p -c 的顺序不能错

pid 3087's current affinity list: 0-7

pid 3087's new affinity list: 0

[root@localhost domain1]#

[root@localhost domain1]# taskset -c 0 -p 3087 #顺序一错,就错了

execvp: No such file or directory

failed to execute -p

[root@localhost domain1]#

[root@localhost domain1]# taskset -p -c 1 3087 #将进程3087绑定到1号cpu

pid 3087's current affinity list: 0

pid 3087's new affinity list: 1

[root@localhost domain1]#

cpu调优当中,要观察cpu的负载,观察cpu的各种统计数据,像中断数,上下文切换数,用户空间占据的百分比,,内核空间占据的百分比,io等待的时间等等,这些值对我们都很关键,每一个值都代表着一定的意义,,,正常用户空间与内核空间的比为7:3,,,cpu的使用率不要超过80%,若长期超过80%,意味着cpu已经出现瓶颈了,意味着系统可能由于太繁忙,将来可能会在某一时刻挂掉的,,,,,,,在非常繁忙的时刻,如果cpu是多核的,它的使用率可能会达到百分之几百,(如七八百,看有几核的),,,,,,,,,如果上下文切换的次数过多,而且有些进程可能因为非常繁忙老是切换来切换去的,可以实现绑定,,,,,在numa体系结构当中,应该将某些被重新平衡切换的进程绑定在某一个node?????上(或者某一个cpu上),进而实现它只在当前node上运行,而且它的数据也只在当前node上装载,,,,我们手动绑定进程到cpu也可以,taskset或者 cpuset文件系统,,,,这是跟cpu调整相关的内容,,,像dstat,sar,iostat这三个命令很重要,

普通分类:

linux

You are here

友情链接

搜索表单

用户登录

You are here

马哥 48_03 _CPU负载观察及调优方法 有大用

友情链接

马哥 48_03 _CPU负载观察及调优方法有大用