引言
要评价一个系统的性能,通常有不同的指标,相应的会有不同的测试方法和测试工具,一般来说为了确保测试结果的公平和权威性,会选用比较成熟的商业测试软件。但在特定情形下,只是想要简单比较不同系统或比较一些函数库性能时,也能够从开源世界里选用一些优秀的工具来完成这个任务,本文就通过lmbench 简要介绍系统综合性能测试。
测试软件
Lmbench是一套简易,可移植的,符合ANSI/C标准为UNIX/POSIX而制定的微型测评工具。一般来说,它衡量两个关键特征:反应时间和带宽。Lmbench旨在使系统开发者深入了解关键操作的基础成本。
软件说明:
lmbench是个用于评价系统综合性能的多平台开源benchmark,能够测试包括文档读写、内存操作、进程创建销毁开销、网络等性能,测试方法简单。
Lmbench是个多平台软件,因此能够对同级别的系统进行比较测试,反映不同系统的优劣势,通过选择不同的库函数我们就能够比较库函数的性能;更为重要的是,作为一个开源软件,lmbench提供一个测试框架,假如测试者对测试项目有更高的测试需要,能够通过少量的修改源代码达到目的(比如现在只能评测进程创建、终止的性能和进程转换的开销,通过修改部分代码即可实现线程级别的性能测试)。
下载:
www.bitmover.com/lmbench,最新版本3.0-a9
LMbench的主要功能:
带宽测评工具
—读取缓存文件 —拷贝内存 —读内存 —写内存 —管道 —TCP
反应时间测评工具
—上下文切换 —网络: 连接的建立,管道,TCP,UDP和RPC hot potato —文件系统的建立和删除 —进程创建 —信号处理 —上层的系统调用 —内存读入反应时间
其他
—处理器时钟比率计算
LMbench的主要特性:
—对于操作系统的可移植性测试
评测工具是由C语言编写的,具有较好的可移植性(尽管它们更易于被GCC编译)。这对于产生系统间逐一明细的对比结果是有用的。—自适应调整
Lmbench对于应激性行为是非常有用的。当遇到BloatOS比所有竞争者慢4倍的情况时,这个工具会将资源进行分配来修正这个问题。— 数据库计算结果
数据库的计算结果包括了从大多数主流的计算机工作站制造商上的运行结果。—存储器延迟计算结果
存储器延迟测试展示了所有系统(数据)的缓存延迟,例如一级,二级和三级缓存,还有内存和TLB表的未命中延迟。另外,缓存的大小可以被正确划分成一些结果集并被读出。硬件族与上面的描述相象。这种测评工具已经找到了操作系统分页策略的中的一些错误。—上下文转换计算结果
很多人好象喜欢上下文转换的数量。这种测评工具并不是特别注重仅仅引用“在缓存中”的数量。它时常在进程数量和大小间进行变化,并且在当前内容不在缓存中的时候,将结果以一种对用户可见的方式进行划分。您也可以得到冷缓存上下文切换的实际开销。— 回归测试
Sun公司和SGI公司已经使用这种测评工具以寻找和补救存在于性能上的问题。 Intel公司在开发P6的过程中,使用了它们。 Linux在Linux的性能调整中使用了它们。— 新的测评工具
源代码是比较小的,可读并且容易扩展。它可以按常规组合成不同的形式以测试其他内容。举例来说,如包括处理连接建立的库函数的网络测量,服务器关闭等。目录结构
[root@jiangyi01.sqa.zmf /tmp/lmbench3]#lslmbench3 lmbench3.tar.gz[root@jiangyi01.sqa.zmf /tmp/lmbench3]#cd lmbench3/[root@jiangyi01.sqa.zmf /tmp/lmbench3/lmbench3]#lsACKNOWLEDGEMENTS CHANGES COPYING-2 hbench-REBUTTAL README SCCS srcbin COPYING doc Makefile results scripts
配置文件
[root@jiangyi01.sqa.zmf /tmp/lmbench3/lmbench3]#ll bin/x86_64-linux-gnu/*`hostname`-rw-r--r-- 1 root root 719 Mar 8 17:18 bin/x86_64-linux-gnu/CONFIG.jiangyi01.sqa.zmf-rwxr-xr-x 1 root root 1232 Mar 7 20:52 bin/x86_64-linux-gnu/INFO.jiangyi01.sqa.zmf
生成配置文件脚本
[root@jiangyi01.sqa.zmf /tmp/lmbench3/lmbench3]#ll scripts/config-run-r-xr-xr-x 1 14557 501 21018 Mar 8 17:18 scripts/config-run
生成配置文件脚本
make results 命令实际上是调用了 scripts/config-run
[root@jiangyi01.sqa.zmf /tmp/lmbench3/lmbench3]#make resultscd src && make resultsmake[1]: Entering directory `/tmp/lmbench3/lmbench3/src'gmake[2]: Entering directory `/tmp/lmbench3/lmbench3/src'gmake[2]: Nothing to be done for `all'.gmake[2]: Leaving directory `/tmp/lmbench3/lmbench3/src'gmake[2]: Entering directory `/tmp/lmbench3/lmbench3/src'gmake[2]: Nothing to be done for `opt'.gmake[2]: Leaving directory `/tmp/lmbench3/lmbench3/src'===================================================================== L M B E N C H C ON F I G U R A T I O N ----------------------------------------You need to configure some parameters to lmbench. Once you have configuredthese parameters, you may do multiple runs by saying "make rerun"in the src subdirectory.NOTICE: please do not have any other activity on the system if you canhelp it. Things like the second hand on your xclock or X perfmetersare not so good when benchmarking. In fact, X is not so good whenbenchmarking.=====================================================================Hang on, we are calculating your timing granularity.OK, it looks like you can time stuff down to 5000 usec resolution.Hang on, we are calculating your timing overhead.OK, it looks like your gettimeofday() costs 0 usecs.Hang on, we are calculating your loop overhead.OK, it looks like your benchmark loop costs 0.00000197 usecs.=====================================================================If you are running on an MP machine and you want to try runningmultiple copies of lmbench in parallel, you can specify how many here.Using this option will make the benchmark run 100x slower (sorry).NOTE: WARNING! This feature is experimental and many results are known to be incorrect or random!MULTIPLE COPIES [default 1] 1Options to control job placement1) Allow scheduler to place jobs2) Assign each benchmark process with any attendent child processes to its own processor3) Assign each benchmark process with any attendent child processes to its own processor, except that it will be as far as possible from other processes4) Assign each benchmark and attendent processes to their own processors5) Assign each benchmark and attendent processes to their own processors, except that they will be as far as possible from each other and other processes6) Custom placement: you assign each benchmark process with attendent child processes to processors7) Custom placement: you assign each benchmark and attendent processes to processorsNote: some benchmarks, such as bw_pipe, create attendent childprocesses for each benchmark process. For example, bw_pipeneeds a second process to send data down the pipe to be readby the benchmark process. If you have three copies of thebenchmark process running, then you actually have six processes;three attendent child processes sending data down the pipes andthree benchmark processes reading data and doing the measurements.Job placement selection: 1=====================================================================Several benchmarks operate on a range of memory. This memory should besized such that it is at least 4 times as big as the external cache[s]on your system. It should be no more than 80% of your physical memory.The bigger the range, the more accurate the results, but larger sizestake somewhat longer to run the benchmark.MB [default 67535] 100Checking to see if you have 100 MB; please wait for a moment...100MB OK100MB OK100MB OKHang on, we are calculating your cache line size.OK, it looks like your cache line is 128 bytes.=====================================================================lmbench measures a wide variety of system performance, and the full suiteof benchmarks can take a long time on some platforms. Consequently, weoffer the capability to run only predefined subsets of benchmarks, onefor operating system specific benchmarks and one for hardware specificbenchmarks. We also offer the option of running only selected benchmarkswhich is useful during operating system development.Please remember that if you intend to publish the results you either needto do a full run or one of the predefined OS or hardware subsets.SUBSET (ALL|HARWARE|OS|DEVELOPMENT) [default all] h=====================================================================This benchmark measures, by default, memory latency for a number ofdifferent strides. That can take a long time and is most useful if youare trying to figure out your cache line size or if your cache line sizeis greater than 128 bytes.If you are planning on sending in these results, please don't do a fastrun.Answering yes means that we measure memory latency with a 128 byte stride.FASTMEM [default no]=====================================================================This benchmark measures, by default, file system latency. That cantake a long time on systems with old style file systems (i.e., UFS,FFS, etc.). Linux' ext2fs and Sun's tmpfs are fast enough that thistest is not painful.If you are planning on sending in these results, please don't do a fastrun.If you want to skip the file system latency tests, answer "yes" below.SLOWFS [default no]=====================================================================This benchmark can measure disk zone bandwidths and seek times. These canbe turned into whizzy graphs that pretty much tell you everything you mightneed to know about the performance of your disk.This takes a while and requires read access to a disk drive.Write is not measured, see disk.c to see how if you want to do so.If you want to skip the disk tests, hit return below.If you want to include disk tests, then specify the path to the diskdevice, such as /dev/sda. For each disk that is readable, you'll beprompted for a one line description of the drive, i.e., Iomega IDE ZIPor HP C3725S 2GB on 10MB/sec NCR SCSI busDISKS [default none]=====================================================================If you are running on an idle network and there are other, identicallyconfigured systems, on the same wire (no gateway between you and them),and you have rsh access to them, then you should run the network partof the benchmarks to them. Please specify any such systems as a spaceseparated list such as: ether-host fddi-host hippi-host.REMOTE [default none]=====================================================================Calculating mhz, please wait for a moment...I think your CPU mhz is 2194 MHz, 0.4558 nanosec clockbut I am frequently wrong. If that is the wrong Mhz, type in yourbest guess as to your processor speed. It doesn't have to be exact,but if you know it is around 800, say 800.Please note that some processors, such as the P4, have a core whichis double-clocked, so on those processors the reported clock speedwill be roughly double the advertised clock rate. For example, a1.8GHz P4 may be reported as a 3592MHz processor.Processor mhz [default 2194 MHz, 0.4558 nanosec clock]=====================================================================We need a place to store a 100 Mbyte file as well as create and delete alarge number of small files. We default to /usr/tmp. If /usr/tmp is amemory resident file system (i.e., tmpfs), pick a different place.Please specify a directory that has enough space and is a local filesystem.FSDIR [default /usr/tmp]=====================================================================lmbench outputs status information as it runs various benchmarks.By default this output is sent to /dev/tty, but you may redirectit to any file you wish (such as /dev/null...).Status output file [default /dev/tty]=====================================================================There is a database of benchmark results that is shipped with newreleases of lmbench. Your results can be included in the databaseif you wish. The more results the better, especially if they includeremote networking. If your results are interesting, i.e., for a newfast box, they may be made available on the lmbench web page, which is http://www.bitmover.com/lmbenchMail results [default yes] nOK, no results mailed.=====================================================================Confguration done, thanks.There is a mailing list for discussing lmbench hosted at BitMover.Send mail to majordomo@bitmover.com to join the list.Using config in CONFIG.jiangyi01.sqa.zmfWed Mar 8 16:30:53 CST 2017Latency measurementsWed Mar 8 16:31:10 CST 2017Local networkingWed Mar 8 16:31:14 CST 2017Bandwidth measurementsWed Mar 8 16:31:27 CST 2017Calculating effective TLB sizeWed Mar 8 16:31:29 CST 2017Calculating memory load parallelismWed Mar 8 16:32:12 CST 2017McCalpin's STREAM benchmarkWed Mar 8 16:32:14 CST 2017Calculating memory load latencyWed Mar 8 16:52:16 CST 2017make[1]: Leaving directory `/tmp/lmbench3/lmbench3/src'
读取结果
[root@jiangyi01.sqa.zmf /tmp/lmbench3/lmbench3]#make seecd results && make summary percent 2>/dev/null | moremake[1]: Entering directory `/tmp/lmbench3/lmbench3/results' L M B E N C H 3 . 0 S U M M A R Y ------------------------------------ (Alpha software, do not distribute)Basic system parameters------------------------------------------------------------------------------Host OS Description Mhz tlb cache mem scal pages line par load bytes--------- ------------- ----------------------- ---- ----- ----- ------ ----jiangyi01 Linux 3.10.0- x86_64-linux-gnu 2194 32 128 6.4300 1Processor, Processes - times in microseconds - smaller is better------------------------------------------------------------------------------Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----jiangyi01 Linux 3.10.0- 2195 0.07 0.15 0.99 1.96 4.12 0.16 1.05Basic integer operations - times in nanoseconds - smaller is better-------------------------------------------------------------------Host OS intgr intgr intgr intgr intgr bit add mul div mod--------- ------------- ------ ------ ------ ------ ------jiangyi01 Linux 3.10.0- 0.4600 0.0700 1.4100 10.3 11.5Basic float operations - times in nanoseconds - smaller is better-----------------------------------------------------------------Host OS float float float float add mul div bogo--------- ------------- ------ ------ ------ ------jiangyi01 Linux 3.10.0- 1.3700 2.2800 6.5400 6.3900Basic double operations - times in nanoseconds - smaller is better------------------------------------------------------------------Host OS double double double double add mul div bogo--------- ------------- ------ ------ ------ ------jiangyi01 Linux 3.10.0- 1.3700 2.2800 10.2 10.0File & VM system latencies in microseconds - smaller is better-------------------------------------------------------------------------------Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----jiangyi01 Linux 3.10.0- 0.309 1.549*Local* Communication bandwidths in MB/s - bigger is better-----------------------------------------------------------------------------Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----jiangyi01 Linux 3.10.0- 2405.2 4445.5 6114 5489.Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs)------------------------------------------------------------------------------Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses--------- ------------- --- ---- ---- -------- -------- -------jiangyi01 Linux 3.10.0- 2194 1.8250 5.4860 49.3 117.6make[1]: Leaving directory `/tmp/lmbench3/lmbench3/results'
技术参数
参数说明
我这里对每个测试结果参数的说明不全,更加全面的请看REF链接
(1)Basic system parameters(系统基本参数)
Tlb pages:TLB(Translation Lookaside Buffer)的页面数
Cache line bytes :(cache的行字节数) Mem par memory hierarchy parallelism Scal load:并行的lmbench数(2)Processor, Processes(处理器、进程操作时间)
Null call:简单系统调用(取进程号)
Null I/O:简单IO操作(空读写的平均) Stat:取文档状态的操作 Open clos:打开然后立即关闭关闭文档操作 Slct tcp Select:配置 Sig inst:配置信号 Sig hndl:捕获处理信号 Fork proc :Fork进程后直接退出 Exec proc:Fork后执行execve调用再退出 Sh proc:Fork后执行shell再退出(3)Basic integer/float/double operations
略
(4)Context switching 上下文切换时间
2p/16K: 表示2个并行处理16K大小的数据
(5)Local Communication latencies(本地通信延时,通过不同通信方式发送后自己立即读)
Pipe:管道通信
AF UNIX Unix协议 UDP UDP RPC/UDP TCP RPC/TCP TCP conn TCP建立connect并关闭描述字(6)File & VM system latencies(文档、内存延时)
File Create & Delete:创建并删除文档
MMap Latency:内存映射 Prot Fault Protect fault Page Fault:缺页 100fd selct:对100个文档描述符配置select的时间(7)Local Communication bandwidths(本地通信带宽)
Pipe:管道操作
AF UNIX Unix协议 TCP TCP通信 File reread:文档重复读 MMap reread:内存映射重复读 Bcopy(libc):内存拷贝 Bcopy(hand):内存拷贝 Mem read:内存读 Mem write:内存写(8)Memory latencies(内存操作延时)
L1:缓存1
L2:缓存2 Main Mem:连续内存 Rand Mem:内存随机访问延时 Guesses 假如L1和L2近似,会显示“No L1 cache?” 假如L2和Main Mem近似,会显示“No L2 cache?”REF
http://wenku.baidu.com/link?url=ok-5odtKwsn6kgkpHZFHsVsDxXA70fBjRX8koMzbcaxBvAKJks4pm2eSyEO78oPTkOHt0pcaXG37C-FZO3140yDGheAOYSRdDrEJqLEDytG