Rcollectl 1.0.1
Rcollectl
library("Rcollectl")
Collectl is a unix-based tool that will perform measurements on system resource consumption of various types. We provide a demonstration output with the package:
lk = cl_parse(system.file("demotab/demo_1123.tab.gz", package="Rcollectl"))
dim(lk)
#> [1] 478 71
attr(lk, "meta")
#> [1] "################################################################################"
#> [2] "# Collectl: V4.3.1-1 HiRes: 1 Options: -scdnm -P -f./col2.txt "
#> [3] "# Host: stvjc-XPS-13-9300 DaemonOpts: "
#> [4] "# Booted: 1606052236.57 [20201122-08:37:16]"
#> [5] "# Distro: debian bullseye/sid, Ubuntu 20.04.1 LTS Platform: "
#> [6] "# Date: 20201123-144054 Secs: 1606160454 TZ: -0500"
#> [7] "# SubSys: cdnm Options: Interval: 1 NumCPUs: 8 [HYPER] NumBud: 0 Flags: i"
#> [8] "# Filters: NfsFilt: EnvFilt: TcpFilt: ituc"
#> [9] "# HZ: 100 Arch: x86_64-linux-gnu-thread-multi PageSize: 4096"
#> [10] "# Cpu: GenuineIntel Speed(MHz): 1745.513 Cores: 4 Siblings: 8 Nodes: 1"
#> [11] "# Kernel: 5.4.0-54-generic Memory: 15969160 kB Swap: 2097148 kB"
#> [12] "# NumDisks: 1 DiskNames: nvme0n1"
#> [13] "# NumNets: 4 NetNames: lo:?? enxc03ebaccccfd:100 docker0:?? wlp0s20f3:??"
#> [14] "################################################################################"
lk[1:5,1:5]
#> #Date Time CPU_User% CPU_Nice% CPU_Sys%
#> 1 20201123 14:40:56 2 0 1
#> 2 20201123 14:40:57 1 0 0
#> 3 20201123 14:40:58 1 0 0
#> 4 20201123 14:40:59 2 0 0
#> 5 20201123 14:41:00 3 0 1
plot_usage(lk)
From this display, we can see that about a burst of network activity around 14:43 is followed by consumption of CPU, memory, and disk resources. The % CPU active never exceeds 30, memory consumption started relatively high when sampling began, growing to about 15.5 GB. and 250MB were written to disk over the entire interval.
To generate a display like this, we use commands shown below. You can
use an arbitrary string as [target file prefix]
. Thus cl_start("foo")
will produce a file foo-[hostname]-[yyyymmdd].tab.gz, containing timing
and consumption data, where [hostname] is the value of hostname
and
[yyyymmdd] is a representation of the current date. Use different target
file prefixes for runs you wish to distinguish.
id = cl_start([target file prefix])
[use R until task to be measured is complete]
cl_stop(id)
usage_df = cl_parse(dir(patt=[target file prefix]))
# analyze or filter the usage_df (for example, to trim away
# time related to task delay or delay of `cl_stop`
plot_usage(usage_df)
Yubo Cheng has added functionality allowing us to annotate usage plots with labels related to task phases. Here is the code from the example showing how to introduce annotations in the time profile.
id <- cl_start()
Sys.sleep(2)
#code
cl_timestamp(id, "step1")
Sys.sleep(2)
# code
Sys.sleep(2)
cl_timestamp(id, "step2")
Sys.sleep(2)
# code
Sys.sleep(2)
cl_timestamp(id, "step3")
Sys.sleep(2)
# code
cl_stop(id)
path <- cl_result_path(id)
plot_usage(cl_parse(path)) +
cl_timestamp_layer(path) +
cl_timestamp_label(path) +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5, hjust=1))
The Rcollectl package (Carey and Cheng, 2023) was made possible thanks to: