Reduce, Reuse & Recycle your thread pools ♻️
Real-world applications today are mostly multi-threaded. This means developers should be mindful of managing the concurrency of their applications. Mastering the threading can help boost an app’s performance. On the other hand, using concurrency without fully understanding it could lead to problems that negatively impact the app’s health.
For Android applications, every thread is mapped to a system-level thread at runtime. Each thread costs a minimum of 64k of memory on Android, If we always create a thread for any new asynchronous task, we will create memory pressure on the app. The app performance may suffer because spawning up new threads and context switching among threads are both taking up time and resources. If a thread is referenced even if it is not active, it will be kept in memory and can’t be cleaned up by the garbage collector.
Thread pools can help us manage concurrency more efficiently. ThreadPoolExecutor
creates a pool of worker threads and schedules the tasks for them to execute. It can grow the pool size to meet the demand as new tasks arrive, and it can shrink the pool when threads are idle and no longer need to be kept alive. Thread pool, therefore, improves the app performance by reducing the per-task overhead and controls the resource usage by bounding the resources.
Thread pool seems the solution to our concurrency headache, and many libraries have adopted this technique, such as OkHttp and AndroidX WorkManager. Each library maintains its own thread pool by default.
When we include these libraries in our application, since each library spins up new threads by itself without the awareness of other thread pools, we come full circle. Hundreds of threads exist in our application because those libraries do not know each other.
In this story, we are going to walk through an example of configuring libraries to share a common thread pool. We will also summarize the benefits and costs of managing thread pools, as it is necessary for continuously evolving our apps without degrading the performance.
Setup
The sample Android application can be found in this Github repo. It supports displaying Reddit feed by querying https://www.reddit.com/top.json. The following libraries are used in this application:
- Retrofit/OkHttp for handling network requests
- Coroutine/Flow for data layer processing
- Jetpack Compose for rendering UI
- Coil-compose for loading images
- Work Manager for no reason (for enriching our example)
All these libraries involve concurrency and threading. They are configured in the default way without any advanced API usage.
Analysis
We choose the Android Studio template Basic Activity
as our baseline, in order to compare it with our sample Reddit app. Let’s analyze our thread usage in both apps first.
We can inspect the currently active threads of Android applications in many ways:
- Use Android Studio Profiler and record system tracing on CPU
- Use debugger to view stack trace from any thread (We can then see all thread names from the drop-down)
- Log
Thread.getAllStackTraces().keys.sortedBy { it.name }
(Don’t use this in the production apps)
We use the last approach for the demonstration purpose because we can easily sort their names alphabetically. Below is the comparison of all thread usage with the baseline app on the left and our sample Reddit app on the right.
Surprise! Even with our simple Reddit app, there can be as many as 35 active threads. Our baseline app only uses 5 threads though.
Let’s look at the difference one by one.
ConnectivityThread
ConnectivityThread is from the Android OS. According to its Javadoc, it is a shared singleton connectivity thread for the system. It is used for connectivity operations such as AsyncChannel connections to system services.
In short, this thread would appear once we introduce the network operations into our app. Since it is bound to the Android OS, we do not have much control of it.
DefaultDispatcher-worker-*
These are the worker threads managed by CoroutineScheduler
. If some coroutines use the non-main standard dispatchers as the context, namely Dispatchers.Default
and Dispatchers.IO
, the actual coroutine execution will then be performed on these worker threads.
Rather than relying on the implementation of ThreadPoolExecutor
from Java SDK, CoroutineScheduler
takes its own responsibility of adding new threads when tasks come in and removing threads when idling. The same CoroutineScheduler
instance is used to back both Dispatchers.Default
and Dispatchers.IO
.
CoroutineScheduler
sets corePoolSize
as the number of available CPU cores or 2, whichever is larger. ItsmaximumPoolSize
is the number of available CPU cores or 64, whichever is larger. Both are configurable by their corresponding system property.
Beyond scaling the threads on-demand, CoroutineScheduler
is also a task scheduler that manages the ordering of tasks and practices the work-stealing policy. We will pause here for now. If you are curious to learn further, I would recommend continuing reading the documentation and the implementation of CoroutineScheduler
.
OkHttp Dispatcher
These threads come from Dispatcher
in the OkHttp. Dispatcher
creates an internal ThreadPoolExecutor
with corePoolSize
equal to 0 and an unlimited maximumPoolSize
. Dispatcher
is responsible for dispatching the actual work of sending requests and receiving responses to its threads.
All the threads are of the same name OkHttp Dispatcher
.
OkHttp TaskRunner
The rest of the OkHttp prefixed threads are managed by TaskRunner
. TaskRunner
is under the internal package so it should be respected as the implementation detail. Nonetheless, the TaskRunner
is a set of worker threads as daemons and accepts Task
. For instance, trimming the cache to the max size is submitted to TaskRunner
as a Task.
As of OkHttp 4.9, all threads created by TaskRunnner
dynamically changes the thread name based on the current Task
. For example, the thread name OkHttp www.reddit.com
indicates that there is a Task
named www.reddit.com
currently being run.
val currentThread = Thread.currentThread()
val oldName = currentThread.name
currentThread.name = task.name
Okio Watchdog
Watchdog is a single daemon thread that is responsible for invoking the action at the scheduled timeout. Okio’s Watchdog is created on the critical path of OkHttp executing a network request.
WM.task-* & androidx.work-*
WorkManager exposes public APIs to specify the executor for both the threading of Worker
as well as its internal tracking. To be more specific:
Configuration.Builder#setExecutor()
: This executor controls the thread ofWorker.doWork()
. Since this is to be implemented by the application, all threads are prefixed withandroidx.work
Configuration.Builder#setTaskExecutor()
: This executor is used by WorkManager for its internal bookkeeping. For example,WorkManagerTaskExecutor
wraps the task executor inside aSerialExecutor
to guarantee the execution order of the tasks, such as aStopWorkRunnable
. Because they are used internally, the thread names are prefixed withWM.work
For either of the API, an internalThreadPoolExecutor
will be created by default if they are not specified. The actual configuration is a fixed thread pool of size matching the number of CPU processors minus 1 (and between 2 and 4).
kotlinx.coroutines.DefaultExecutor
This is a single thread initialized by DefaultExecutor
. It is used primarily for delay()
in coroutines.
queued-work-looper
This is an Android OS HandlerThread
created by QueuedWork
. Because the WorkManager library comes along with a few Android services, this path is hit upon the app launch.
Merge
Based on the analysis above, the opportunity is quite obvious. We can merge three thread pools by creating our own ThreadPoolExecutor
for reuse.
- OkHttp’s
Dispatcher
- WorkManager’s
Executor
- Coroutine’s
CoroutineScheduler
Let’s make this happen for our sample Reddit app! We could start with defining our shared ThreadPoolExecutor
by setting its corePoolSize
to the number of CPU processors and maximumPoolSize
as unlimited.
Next, we need to pass this shared ThreadPoolExecutor
to OkHttp, Retrofit, and Coil. Keep in mind that if we forget to configure OkHttp for all libraries depending on it, say if we configure Retrofit, but not Coil, we may still end up with twoOkHttpClient
instances. This would simply worsen the situation by creating more threads.
For WorkManager, we should follow the on-demand initialization section from the Android Developer documentation. This requires us to remove the ContentProvider
from AndroidManifest.xml in initializing the work provider, as well as having our Application implement Configuration.Provider
. In addition, make sure all WorkManager
references are requested through WorkManager.getInstance(Context)
. Because WorkManager.getInstance(Context)
takes the customized configuration into consideration whereas WorkManager.getInstance()
does not.
Last but not least, configuring the shared thread pool for coroutine requires a certain amount of effort. First, we need to make sure that all Dispatchers are injectable. Not only does injecting dispatchers make testing easier by allowing tests to provide a different implementation, but it also allows us to provide our own Dispatcher
implementation.
Second, we need to provide our implementation through dependency injection. Here we are using ExecutorService.asCoroutineDispatcher
to convert the ThreadPoolExecutor
into a CoroutineDispatcher. And we need to inject this CoroutineDispatcher
to all places that were previously using Dispatchers.Default
or Dispatchers.IO
Fortunately, all the concurrency libraries used in our sample Reddit application expose API to configure the thread pools. So this is sufficient for this story. The full change can be found in this commit.
Benefit
Reduced number of threads in our application
Previously we had three different thread pools. Now we can reuse the same thread pool to run concurrent tasks. As illustrated below, we can observe the reduced number of threads from 35 to 19 in our sample application.
Indirect improvement of the app performance
When I did the A/B testing to quantify the impact of merging thread pools for a real Google Play app with 500M+ downloads, the results were quite promising: 2% reduction of cold app launch time at 90th percentile with low p-value (<0.01). The number should only be used for reference only because this is very contextual. It depends on the total number and activity of those thread pools to be merged.
An improvement in the general app performance is also within expectation. Because we have centralized control of our thread pool, we could analyze the required parallelism and configure the thread pool accordingly. We can adjust corePoolSize
, maximumPoolSize
, and our scheduling algorithm based on the production workflow.
Cost
Behavioral change
Not all thread pools are configured the same way. They may have different corePoolSize
, maximumPoolSize
, keepAliveTime
, or priority
(Thread priority itself is already an advanced topic). For example,
OkHttp
setscorePoolSize
to 0 by defaultWorkManager
uses aFixedThreadPool
where the number of threads does not change by default- Coroutine implements its thread pooling and scheduling
All these three thread pools are not configured with the same parameters. Merging thread pools, therefore, is likely going to change the behavior of concurrent tasks. Because thread pools are no longer isolated, thread safety issues that were previously undetected may now be exposed with merging.
The additional effort may be required to maintain unique features. For example, Coroutine provides Dispatchers.Default
for CPU-intensive work and Dispatchers.IO
for IO work. If this is necessary, making an equivalent implementation may require additional effort.
Vulnerability
In case we forget to configure any library using concurrency utility — say we configure the OkHttp and Retrofit, but forget to configure Coil, we would end up with two OkHttpClient
instances and leave the thread pool problem unresolved.
Another issue with Kotlin coroutine is that if Dispatchers.Default
or Dispatchers.IO
is still referenced in the production code, we may still keep the default CoroutineScheduler
being used. As a result, we still have multiple thread pools.
Lastly, we need to audit the library to ensure the proper exposure of thread configuration API. A library may create some threads for internal usage, and we need to be fully aware of them in case they grow into potential problems.
Takeaway
Recycling and reducing thread pools bring us great performance benefits and increase the overall app health. On the other hand, it also requires meticulous auditing and continuous validation to ensure that the merged thread pool maintains consistent behavior. I would suggest we start the following step by step:
- Qualify all threads with meaningful names.
- Audit thread creation to identify opportunities to reduce and reuse thread pools.
- A/B test the change of merging thread pools to validate the behavioral change and performance improvement.
- Set up guidance or tooling to ensure no future regression.
Last but not least, thanks to my colleague Colin White for proofreading and giving valuable feedback.
If you find this story valuable, you may support me by joining Medium through the link below. Members get full access to every story on Medium.