Distributed Scheduling with spring Boot: the challenges & pitfalls of implementing a background job

Abstract

Running scheduled jobs is not a trivial subject, there are a lot of things to consider, especially when something goes wrong.

Starting from a simple scheduling job:

@Component
public class MyJob {
  private CardsClient cardsClient;
  private CardRepository cardRepository;
  private ProposalRepository proposalRepository;
 
  @Scheduled(fixedDelay = 60_000)
  public void execute() {
    var proposals = proposalRepository.findAllByStatusOrderByCreatedAtAsc(ELIGIBLE);
    proposals.forEach(proposal -> {
      var cardData = cardsClient.findCardByProposalId(proposal.getId());
 
      var newCard = cardData.toModel();
      cardRepository.save(newCard);
 
      proposal.attachTo(newCard);
      proposalRepository.save(proposal);
    });
  }
}

Recoverability

Add @Transactional to ensure ACID if the server crashes:

@Scheduled(fixedDelay = 60_000)
@Transactional
public void execute() {

Batch processing

If volume increases, our application might grow in memory usage. So we need to apply batch processing. However, with the previous fix, we have long-running transactions. To mitigate this, use programmatic transaction management:

private TransactionTemplate transactionManager;
 
// no @Transactional here
@Scheduled(fixedDelay = 60_000)
public void execute() {
  while (true) {
    transactionManager.execute(transaction -> {
      // executes the logic within a transactional scope
      var proposals = proposalRepository.findTop50ByStatusOrderByCreatedAtAsc(ELIGIBLE);
      // ...
    });
  }
}

High availability

Use database pessimistic locking to ensure no two nodes / threads are processing the same batch:

@Repository
public interface ProposalRepository extends JpaRepository {
  @Lock(LockModeType.PESSIMISTIC_WRITE)
  List findTop50ByStatusOrderByCreatedAtAsc(ProposalStatus status);
}

which translates in SQL:

SELECT p.*
FROM proposals p
WHERE p.status = 'ELIGIBLE'
ORDER BY p.created_at ASC
LIMIT 50
FOR UPDATE;

Scalability

What if we want more throughput? The previous mitigation only ensures only one job is dealing with a specific batch. To have more throughput, we can leverage database locking clause to prevent the operation from waiting for other transactions to commit.

@Repository
public interface ProposalRepository extends JpaRepository {
  @QueryHints({
    @QueryHint(
      name = "javax.persistence.lock.timeout",
      value = LockOptions.SKIP_LOCKED // org.hibernate.LockOptions
    )
  })
  @Lock(LockModeType.PESSIMISTIC_WRITE)
  List findTop50ByStatusOrderByCreatedAtAsc(ProposalStatus status);
}

which translates in SQL:

SELECT p.*
FROM proposals p
WHERE p.status = 'ELIGIBLE'
ORDER BY p.created_at ASC
LIMIT 50
FOR UPDATE
SKIP LOCKED;

Optimal batch processing

The previous code will insert and update the database atomically. However, we can configure hibernate to perform in batch, which will perform one single write operations. To configure it, we can configure our spring JPA like this:

spring.jpa.properties.hibernate:
  jdbc.batch_size: 50
  order_inserts: true
  order_updates: true