Home software-development RabbitMQ performance test in a home lab

RabbitMQ performance test in a home lab

by Tomasz Jarosik

RabbitMQ is a message broker which is very popular. It is written in Erlang. You can find out more about it from the webpage: http://www.rabbitmq.com/ In this post, I’m going to focus on the performance of RabbitMQ with persistent messages and durable queues.

Installing RabbitMQ server

Let’s start with setting it up. I’ll be using my home lab, so let’s first create a container for tests:

$ lxc launch ubuntu:18.04 prod-nvme-rabbitmq

Now, following the tutorial from Install RabbitMQ on Ubuntu we must install two components: Erlang and RabbitMQ itself. In the tutorial, there are apt sources to be added, to get the latest version. I’ll skip these details, we can always find them there, and just provide basic commands to install what we want:

$ apt update
$ apt install erlang-nox
$ apt install rabbitmq-server

Note: I ran these commands in an LXC container which by default uses root user.
We need a few more things to be able to work efficiently with RabbitMQ. First is to enable management UI, which allows to inspect and modify various bits of RabbitMQ from WebUI.

$ rabbitmq-plugins enable rabbitmq_management
$ rabbitmqctl add_user tjarosik tjarosik
$ rabbitmqctl set_user_tags tjarosik administrator
$ rabbitmqctl set_permissions -p / tjarosik ".*" ".*" ".*"
$ rabbitmqctl status

After these commands, the status of you RabbitMQ should look like on the snippet below:

Status of node rabbit@prod-nvme-rabbitmq ...
[{pid,191},
 {running_applications,
     [{rabbitmq_management,"RabbitMQ Management Console","3.7.7"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.7.7"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.7.7"},
      {rabbit,"RabbitMQ","3.7.7"},
      {amqp_client,"RabbitMQ AMQP Client","3.7.7"},
      {rabbit_common,
          "Modules shared by rabbitmq-server and rabbitmq-erlang-client",
          "3.7.7"},
      {ranch_proxy_protocol,"Ranch Proxy Protocol Transport","1.5.0"},
      {cowboy,"Small, fast, modern HTTP server.","2.2.2"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.5.0"},
      {ssl,"Erlang/OTP SSL application","8.2.3"},
      {public_key,"Public key infrastructure","1.5.2"},
      {asn1,"The Erlang ASN1 compiler version 5.0.4","5.0.4"},
      {cowlib,"Support library for manipulating Web protocols.","2.1.0"},
      {crypto,"CRYPTO","4.2"},
      {xmerl,"XML parser","1.3.16"},
      {jsx,"a streaming, evented json parsing toolkit","2.8.2"},
      {inets,"INETS  CXC 138 49","6.4.5"},
      {recon,"Diagnostic tools for production use","2.3.2"},
      {os_mon,"CPO  CXC 138 46","2.4.4"},
      {mnesia,"MNESIA  CXC 138 12","4.15.3"},
      {lager,"Erlang logging framework","3.6.3"},
      {goldrush,"Erlang event stream processor","0.1.9"},
      {compiler,"ERTS  CXC 138 10","7.1.4"},
      {syntax_tools,"Syntax tools","2.1.4"},
      {syslog,"An RFC 3164 and RFC 5424 compliant logging framework.","3.4.2"},
      {sasl,"SASL  CXC 138 11","3.1.1"},
      {stdlib,"ERTS  CXC 138 10","3.4.3"},
      {kernel,"ERTS  CXC 138 10","5.4.1"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 20 [erts-9.2]  [64-bit] [smp:28:28] [ds:28:28:10] [async-threads:448] [kernel-poll:true]\n"},
 {memory,
     [{connection_readers,0},
      {connection_writers,0},
      {connection_channels,0},
      {connection_other,37136},
      {queue_procs,70273016},
      {queue_slave_procs,0},
      {plugins,3787216},
      {other_proc,23535968},
      {metrics,226832},
      {mgmt_db,1855576},
      {mnesia,150232},
      {other_ets,2252864},
      {binary,2464040},
      {msg_index,29296},
      {code,28479334},
      {atom,1131721},
      {other_system,50112505},
      {allocated_unused,145015432},
      {reserved_unallocated,0},
      {strategy,rss},
      {total,[{erlang,184335736},{rss,290844672},{allocated,329351168}]}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{http,15672,"::"}]},
 {vm_memory_calculation_strategy,rss},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,40510039654},
 {disk_free_limit,50000000},
 {disk_free,262025904128},
 {file_descriptors,
     [{total_limit,1048476},
      {total_used,14},
      {sockets_limit,943626},
      {sockets_used,0}]},
 {processes,[{limit,1048576},{used,422}]},
 {run_queue,0},
 {uptime,50275},
 {kernel,{net_ticktime,60}}]

As you see in the above image RabbitMQ is up and running. It’s in version 3.7.7, using Erlang 20.2.2, and contains 72 mln ready messages. Because we are focusing on persistent messages, I’m going to enable “lazy queues” in the policy file. This is a great feature which minimizes memory usage for long queues, by keeping all messages on disk. You can read more: CloudAMQP – Solving the thundering herd problem with lazy queuesand RabbitMQ – Lazy queues
The below image shows how to set the policy up:
You can also do it from command line with great rabbitmqctl tool (rabbitmqctl) and just cross-check in the management UI that policy has been set up. Now we are ready

Installing RabbitMQ Java-based performance testing tools

RabbitMQ has great tool for performance testing of various scenarios. All we have to do is to download and unpack a binary. See full tutorial here: RabbitMQ-Perf-Test tool

I’m going to start with the simplest test, with only one publisher and one consumer, and one queue, and with each message of 512 bytes. I’m going to use another container to run the command:

$ bin/runjava com.rabbitmq.perf.PerfTest -x 1 -y 1 -u "throughput-test-1" --id "test-1" -f persistent -s 512 -h amqp://tjarosik:
tjarosik@10.39.124.201:5672
id: test-1, time: 66.018s, sent: 10404 msg/s, received: 9902 msg/s, min/median/75th/95th/99th latency: 783053/903448/927239/963003 µs
id: test-1, time: 67.020s, sent: 9865 msg/s, received: 9798 msg/s, min/median/75th/95th/99th latency: 776111/868255/894370/925178 µs
id: test-1, time: 68.020s, sent: 10226 msg/s, received: 9920 msg/s, min/median/75th/95th/99th latency: 802694/875716/903087/928019 µs
id: test-1, time: 69.021s, sent: 10364 msg/s, received: 9832 msg/s, min/median/75th/95th/99th latency: 840004/904027/928753/960552 µs
id: test-1, time: 70.021s, sent: 8740 msg/s, received: 9677 msg/s, min/median/75th/95th/99th latency: 816603/909349/932969/974417 µs
id: test-1, time: 71.021s, sent: 9988 msg/s, received: 9672 msg/s, min/median/75th/95th/99th latency: 811436/889258/920898/952304 µs
id: test-1, time: 72.021s, sent: 8740 msg/s, received: 8901 msg/s, min/median/75th/95th/99th latency: 903809/978200/1011541/1041838 µs
id: test-1, time: 73.022s, sent: 9875 msg/s, received: 9826 msg/s, min/median/75th/95th/99th latency: 798121/890863/926020/1002907 µs

Perfect 14 producers

Now, let’s focus only on publishing persistent messages without consuming them. And let’s use multiple publisher threads and multiple queues:

bin/runjava com.rabbitmq.perf.PerfTest --queue-pattern 'perf-test-%d' --queue-pattern-from 1 --queue-pattern-to 14 --producers 14 --consumers 0 -f persistent -h amqp://tjarosik:tjarosik@10.39.124.201:5672
id: test-134340-156, time: 50.486s, sent: 150786 msg/s
id: test-134340-156, time: 51.554s, sent: 115349 msg/s
id: test-134340-156, time: 52.554s, sent: 132186 msg/s
id: test-134340-156, time: 53.570s, sent: 154954 msg/s
id: test-134340-156, time: 54.578s, sent: 134205 msg/s
id: test-134340-156, time: 55.578s, sent: 122321 msg/s
id: test-134340-156, time: 56.578s, sent: 165273 msg/s
id: test-134340-156, time: 57.578s, sent: 133989 msg/s
id: test-134340-156, time: 58.674s, sent: 118304 msg/s
id: test-134340-156, time: 59.674s, sent: 171348 msg/s
id: test-134340-156, time: 60.690s, sent: 113101 msg/s
id: test-134340-156, time: 61.690s, sent: 135783 msg/s
id: test-134340-156, time: 62.710s, sent: 153132 msg/s
id: test-134340-156, time: 63.710s, sent: 160241 msg/s
id: test-134340-156, time: 64.742s, sent: 130908 msg/s
id: test-134340-156, time: 65.750s, sent: 129415 msg/s
id: test-134340-156, time: 66.750s, sent: 143648 msg/s
id: test-134340-156, time: 67.750s, sent: 158917 msg/s
id: test-134340-156, time: 68.750s, sent: 110700 msg/s
id: test-134340-156, time: 69.842s, sent: 165297 msg/s
id: test-134340-156, time: 70.842s, sent: 148905 msg/s
id: test-134340-156, time: 71.926s, sent: 112570 msg/s


More publishers but lower throughput

Let’s see what happens when we increase the number of producers even more, to 28:

Workloads with large number of queues

10k publishers with 1 msg per second, 50k queues, 10k active queues, 0 consumers. Management UI was not responding very well, and showing a lot of ‘spikes’ in disk writes / published messages. From client perspective, publishing was quite smooth:

id: test-151004-460, time: 1419.854s, sent: 10628 msg/s
id: test-151004-460, time: 1420.854s, sent: 7981 msg/s
id: test-151004-460, time: 1421.854s, sent: 10767 msg/s
id: test-151004-460, time: 1422.854s, sent: 10661 msg/s
id: test-151004-460, time: 1423.854s, sent: 8290 msg/s
id: test-151004-460, time: 1424.854s, sent: 12137 msg/s
id: test-151004-460, time: 1425.854s, sent: 9529 msg/s
id: test-151004-460, time: 1426.854s, sent: 9335 msg/s
id: test-151004-460, time: 1427.854s, sent: 10353 msg/s
id: test-151004-460, time: 1428.854s, sent: 9804 msg/s
id: test-151004-460, time: 1429.854s, sent: 11477 msg/s
id: test-151004-460, time: 1430.854s, sent: 8782 msg/s
id: test-151004-460, time: 1431.854s, sent: 9482 msg/s
id: test-151004-460, time: 1432.854s, sent: 10707 msg/s
id: test-151004-460, time: 1433.854s, sent: 8618 msg/s
id: test-151004-460, time: 1434.854s, sent: 11940 msg/s
id: test-151004-460, time: 1435.854s, sent: 10053 msg/s
id: test-151004-460, time: 1436.854s, sent: 9235 msg/s
id: test-151004-460, time: 1437.854s, sent: 7749 msg/s
id: test-151004-460, time: 1438.854s, sent: 12193 msg/s
id: test-151004-460, time: 1439.854s, sent: 10509 msg/s
id: test-151004-460, time: 1440.854s, sent: 8886 msg/s
id: test-151004-460, time: 1441.854s, sent: 10281 msg/s
id: test-151004-460, time: 1442.854s, sent: 10886 msg/s
id: test-151004-460, time: 1443.854s, sent: 8043 msg/s

Let’s reduce the number of publishers and active queues to 2000. 2k publishers with 1msg/second, 50k queues, 2k active queues (they match publishers), 0 consumers. The management UI responds very well, and the charts are also more accurate.

bin/runjava com.rabbitmq.perf.PerfTest --queue-pattern 'perf-test-%05d' \
          --queue-pattern-from 1 --queue-pattern-to 2000 \
            --producers 2000 --consumers 0 \
            --nio-threads 10 --producer-scheduler-threads 10 --consumers-thread-pools 10 \
             --publishing-interval 1 -s 512 \
              --heartbeat-sender-threads 10 -f persistent --producer-random-start-delay 60 -h amqp://tjarosik:tjarosik@10.39.124.201:5672
id: test-154316-044, time: 971.590s, sent: 1999 msg/s
id: test-154316-044, time: 972.590s, sent: 2001 msg/s
id: test-154316-044, time: 973.590s, sent: 2000 msg/s
id: test-154316-044, time: 974.590s, sent: 2000 msg/s
id: test-154316-044, time: 975.590s, sent: 2003 msg/s
id: test-154316-044, time: 976.590s, sent: 1994 msg/s
id: test-154316-044, time: 977.590s, sent: 2002 msg/s
id: test-154316-044, time: 978.590s, sent: 2000 msg/s
id: test-154316-044, time: 979.590s, sent: 2001 msg/s
id: test-154316-044, time: 980.590s, sent: 2000 msg/s
id: test-154316-044, time: 981.590s, sent: 2000 msg/s
id: test-154316-044, time: 982.590s, sent: 1999 msg/s
id: test-154316-044, time: 983.590s, sent: 1999 msg/s
id: test-154316-044, time: 984.590s, sent: 2001 msg/s
id: test-154316-044, time: 985.590s, sent: 1999 msg/s
id: test-154316-044, time: 986.590s, sent: 2002 msg/s
id: test-154316-044, time: 987.590s, sent: 1999 msg/s
id: test-154316-044, time: 988.590s, sent: 1997 msg/s
id: test-154316-044, time: 989.590s, sent: 2004 msg/s
id: test-154316-044, time: 990.590s, sent: 1998 msg/s

Tuning

RabbitMQ offers a lot of options, which we can use to configure it for different use cases. Some of them go to rabbitmq.conf file, and some of them to advanced.config file. For detailed description see documentation at https://www.rabbitmq.com/configure.html
Let’s start with enabling hipe, and repeating our first test.

management.rates_mode=basic
collect_statistics_interval=5000
hipe_compile=true

To check, that the config and hipe configuration has been applied, run:

root@prod-nvme-rabbitmq:~# rabbitmqctl status | grep "Erlang/OTP 21"
     "Erlang/OTP 21 [erts-10.0.8]  [64-bit] [smp:28:28] [ds:28:28:10] [async-threads:448] [hipe]\n"},

And the test itself:

bin/runjava com.rabbitmq.perf.PerfTest --queue-pattern 'perf-test-%d' --queue-pattern-from 1 --queue-pattern-to 14 --producers 14 -f persistent --consumers 0 -h amqp://tjarosik:tjarosik@10.39.124.201:5672
id: test-184223-546, time: 63.674s, sent: 246798 msg/s
id: test-184223-546, time: 64.674s, sent: 253016 msg/s
id: test-184223-546, time: 65.674s, sent: 260214 msg/s
id: test-184223-546, time: 66.774s, sent: 200255 msg/s
id: test-184223-546, time: 67.862s, sent: 231370 msg/s
id: test-184223-546, time: 68.886s, sent: 218316 msg/s
id: test-184223-546, time: 69.886s, sent: 231360 msg/s
id: test-184223-546, time: 70.938s, sent: 225584 msg/s
id: test-184223-546, time: 71.938s, sent: 225614 msg/s
id: test-184223-546, time: 72.938s, sent: 250840 msg/s

NOTE: We are publishing small messages, 12bytes each. If we add a flag -s 512, to publish messages of 512bytes, then the rates will go down to about 180k msg/second, which is still higher than without hipe.

Disable calculation of message rates

Let’s make another test with this config:

management.rates_mode=none
collect_statistics_interval=60000
hipe_compile=true

And apply IoT-type workload, with 20k connections, 20k channels, 20k queues, and 1msg/sec per publisher. Which should give us 20k msg/second:

id: test-190408-104, time: 584.517s, sent: 20381 msg/s
id: test-190408-104, time: 585.517s, sent: 21684 msg/s
id: test-190408-104, time: 586.517s, sent: 19393 msg/s
id: test-190408-104, time: 587.517s, sent: 20009 msg/s
id: test-190408-104, time: 588.517s, sent: 19493 msg/s
id: test-190408-104, time: 589.517s, sent: 21592 msg/s
id: test-190408-104, time: 590.517s, sent: 19699 msg/s
id: test-190408-104, time: 591.517s, sent: 19629 msg/s
id: test-190408-104, time: 592.517s, sent: 20610 msg/s
id: test-190408-104, time: 593.517s, sent: 18659 msg/s
id: test-190408-104, time: 594.517s, sent: 20087 msg/s
id: test-190408-104, time: 595.517s, sent: 19357 msg/s
id: test-190408-104, time: 596.517s, sent: 20279 msg/s

However, after this test RabbitMQ became unstable (or very, very slow) and I ended up restarting it.

queue_index_embed_msgs_below

After restart, most of small messages will be loaded in memory, even for lazy queues. Actually 16384 messages below queue_index_embed_msgs_below from each queue (by default queue_index_embed_msgs_below=4096). After setting queue_index_embed_msgs_below to 0, all messages are kept in disk, and nothing will be kept in memory. But it comes with a price. Our test with 250k msg/sec, goes down to 45k msg/sec as there are two writes per message: one in QueueIndex journal and one in message store.

You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More