Learnitweb

Data Masking and Tokenization in Microservices

1. Overview

With the increasing shift to microservices-based architectures, applications often become responsible for handling sensitive data such as credit card numbers, Social Security Numbers (SSNs), phone numbers, and health records. It becomes imperative to secure this sensitive information to meet compliance requirements such as GDPR, HIPAA, and PCI-DSS.

This tutorial explains two common data protection techniques—Data Masking and Tokenization—and how to apply them effectively in microservices.

2. Introduction to Data Security in Microservices

In monolithic systems, securing sensitive data typically occurs at a single point. But in microservices, where different services are responsible for different operations and data may travel across multiple services and layers (APIs, logs, caches, databases), the attack surface increases. Hence, security measures must be built into every service that handles sensitive data.

Two techniques that help mitigate data exposure risks are:

  • Data Masking: Hiding data with altered values that resemble the original.
  • Tokenization: Replacing sensitive data with non-sensitive placeholders (tokens) while storing the original data in a secure vault.

3. What is Data Masking?

Data masking is a method of creating a structurally similar but inauthentic version of data. The goal is to protect the original sensitive data while ensuring the masked data remains usable for business processes like testing, UI display, or logging.

Types of Data Masking:

  • Static Masking: Data is masked in storage. Commonly used in test environments.
  • Dynamic Masking: Data is masked at runtime for display or logging without modifying the source data.

Examples:

Original DataMasked Data
1234-5678-9012-3456XXXX-XXXX-XXXX-3456
john.doe@example.comj***.d**@example.com
+91-9876543210+91-9XXXXXX210

Common Use Cases:

  • Masking data shown on UI for end users.
  • Masking logs to avoid leaking PII.
  • Debugging or auditing while avoiding full data exposure.

4. What is Tokenization?

Tokenization replaces a sensitive data element with a non-sensitive equivalent (token) that has no exploitable meaning or value. The mapping between the token and original data is stored in a secure token vault, making the process reversible only through authorized access.

Key Characteristics:

  • Tokens retain some format characteristics (e.g., length).
  • Only the vault service can resolve the token back to the original data.
  • Used for securing data in transit and at rest.

Examples:

Original DataToken
4111-1111-1111-1111tok_1001_ABCD5678
SSN: 123-45-6789tok_2009_XYZ9999

Common Use Cases:

  • Storing payment data securely.
  • Sharing sensitive identifiers between internal services.
  • Avoiding full encryption overhead for data in motion.

5. Key Differences Between Masking and Tokenization

FeatureData MaskingTokenization
ReversibilityUsually irreversibleReversible via token vault
PurposeObfuscate for display or loggingReplace for secure storage/processing
StorageOriginal data often removedOriginal stored in secure vault
Example UseUI masking, test dataStoring credit card tokens
Compliance FitUseful for partial complianceFully meets PCI-DSS, GDPR, etc.

6. Architecture in a Microservices Environment

In microservices, both masking and tokenization can be applied in a layered architecture:

  1. Gateway Layer:
    • Apply masking before logging incoming payloads.
    • Strip sensitive data before sending to downstream services.
  2. Service Layer:
    • Tokenize sensitive fields before storage.
    • Use detokenization for processing within authorized services.
  3. Data Layer:
    • Store only tokens, not the raw sensitive values.
Client --> Gateway --> Masking --> Service A --> Tokenization --> DB
                               --> Logging (Masked)

7. Implementation Example in Spring Boot

Step 1: Tokenization Controller

@RestController
@RequestMapping("/tokenize")
public class TokenizationController {

    private final Map<String, String> tokenVault = new ConcurrentHashMap<>();

    @PostMapping
    public String tokenize(@RequestBody String cardNumber) {
        String token = "tok_" + UUID.randomUUID();
        tokenVault.put(token, cardNumber);
        return token;
    }

    @GetMapping("/{token}")
    public String detokenize(@PathVariable String token) {
        return tokenVault.getOrDefault(token, "INVALID_TOKEN");
    }
}

Step 2: Masking Utility

public class MaskingUtils {
    public static String maskCardNumber(String cardNumber) {
        return "XXXX-XXXX-XXXX-" + cardNumber.substring(cardNumber.length() - 4);
    }

    public static String maskEmail(String email) {
        String[] parts = email.split("@");
        String name = parts[0];
        String domain = parts[1];
        return name.charAt(0) + "***" + name.charAt(name.length() - 1) + "@" + domain;
    }
}

Step 3: Integrating in Business Logic

@RestController
@RequestMapping("/payments")
public class PaymentController {

    private final RestTemplate restTemplate = new RestTemplate();

    @PostMapping
    public ResponseEntity<String> handlePayment(@RequestBody Map<String, String> request) {
        String cardNumber = request.get("cardNumber");

        String token = restTemplate.postForObject("https://localhost:8081/tokenize", cardNumber, String.class);

        String masked = MaskingUtils.maskCardNumber(cardNumber);

        // Here you would persist the token instead of the real card number

        return ResponseEntity.ok("Stored Token: " + token + ", Masked for UI: " + masked);
    }
}

8. Best Practices

  • Use masking for UI and logs, not for storage.
  • Store only tokenized data in your database.
  • Use HTTPS everywhere to protect masked/tokenized data in transit.
  • Secure access to the token vault and audit access logs.
  • Apply RBAC to restrict detokenization operations.

8. Tools and Technologies

ToolUse Case
HashiCorp VaultSecure token vault
Spring VaultVault integration in Spring Boot
AWS KMS + DynamoDBTokenization + storage
Apache NiFiData flow + masking processors
Redgate Data MaskerDB-level static masking