Have you ever felt that AEM is playing tricks on you? I bet you have. This is one of those stories - the more you dig in, the less you understand.
That was a classic headless setup - AEM’s role was to expose content via REST-like JSON API so various applications can consume the data. Things were running smoothly, but one day a misrendered JSON shows up - a set of mandatory properties just disappeared from the object. The case got thoroughly investigated, however no one could even reproduce it. Nothing has changed at JCR level, there was no deployment in the meantime and when you visit the exact same URL all the data are correct. “Oh, that must have been a one-off incident” someone said. The ticket gets closed and life goes on. A week after similar issue got reported - a different JSON object is broken this time, but at least you can reproduce it. Unfortunately, an hour later the problem magically goes away. Time passes by and a slightly different variant of the problem surfaces in production - you keep requesting affected URL and the response alternates between completely valid JSON and its broken form. Your team hops on a call to get to the bottom of the problem, but in a matter of minutes it just vanishes without a trace again.
Interested in what happened and where we ended up? That’s what the talk’s going to be about.
19. Troubleshooting phase
▪ Not reproducible on local AEM
▪ Debug loggers / headers
▪ Page (re)activation solves the problem
▪ Cache bypassing tricks
▪ Ongoing monitoring
https://flic.kr/p/c5xUxS
20. “If you torture the data long enough, it
will confess to anything”
Ronald H. Coase
21. Analysis results
▪ Incomplete AEMaaCS logs (Loki 2.3 issue)
▪ No relevant log entries
▪ New endpoints with invalid JSON
▪ Bypassed cache == issue is gone*
▪ Warmup service goes first*
https://flic.kr/p/dgfRgD
22. AEMaaCS startup and warmup service
▪ AEMaaCS /systemready probe
▪ Warmup service
▪ goal: pre-populate dispatcher cache
▪ requests the most popular URLs (last 24h)
▪ Host header is taken into account
https://flic.kr/p/xGJnBw
25. New hope
▪ SKYOPS-16686 (~mid July 2021)
▪ dead end
▪ SKYOPS-17857 (~mid Aug 2021)
▪ delivered early Sep 2021
https://flic.kr/p/8JhwbU
26. Over the finish line
▪ AEMaaCS 2021.8.5755 includes the fix
▪ Dispatcher caching got re-enabled
▪ No JSON issues for 2 weeks - let’s celebrate!
https://flic.kr/p/5bdFXt
50. Asset link hurdles
▪ DM is not enabled everywhere
▪ Enabled DM implies blocked DAM access (404)
▪ /conf/global/settings/dms7enabled - bad idea!
▪ Per asset DM detection (dam:scene7File)
▪ Dedicated option for DAM fallback
https://flic.kr/p/CyXpjV