# How do you ensure horizontal and vertical scalability?

Screena uses multiple threads by request. The response time per request (**latency**) is based on the server resources on which Screena is installed. To reduce latency, we can increase the **number of CPUs**. For higher volumes, we can also use **GPU processing**.

To increase the number of transactions processed per second (**throughput**), we can increase the **number of VMs** behind the load balancer (**auto-scaling** group in AWS). This means that the number of requests per second can be easily augmented without impacting the average response time.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.screena.ai/resources/faq/performance/how-do-you-ensure-horizontal-and-vertical-scalability.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
