Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.
Azure
- A fast, serverless, big data pipeline powered by a single Azure Function. This is a blog post about how to use Azure Serverless functions to build highly performant data pipelines. At work, we are looking at implementing something similar.
Distributed Computing
Everything is broken. This is a very cool post by Murat where he lists some very relevant quotes/statements from a recent Everything Is Broken meetup he attended. Some of the quotes I particularly liked:
- Without observability you don’t have chaos engineering, you have a chaos.
- You don’t know what you don’t know, so dashboards are very limited utility. Dashboards are only for anticipated cases: every dashboard is an artifact of past failures. There are too many dashboards, and they are too slow.
prerequisites for chaos engineering:
- monitoring & observability
- on-call & incident management
- know the cost of your downtime per hour (British Airlines’s 1 day outage costed $150 millon)
How to choose a chaos experiment?
- identify top 5 critical systems
- choose 1 system
- whiteboard the system
- select attack: resource/state/network
- determine scope
Data Science
- Announcing ML.NET 0.6 (Machine Learning .NET). Microsoft just released ML.NET 0.6, and this post highlights some of the new enhancements.
Streaming
- Machine learning & Kafka KSQL stream processing — bug me when I’ve left the heater on. I like this post as it combines two of my favorite topics: Streaming and Machine Learning. So anyway, the post is about how you can, by using Kafka and Machine Learning, monitor household power usage and alert when something out of the ordinary occurs.
- An introduction to ACID guarantees and transaction processing. A while ago dataArtisans introduced serializable, distributed ACID transactions directly on data streams in Flink. This post here talks about the foundations of the capability.
- KSQL Recipes Available Now in the Stream Processing Cookbook. A post which introduces the “KSQL Cookbook”: a collection of “recipes” designed to help people build event-driven, real-time systems.
- How Apache Flink manages Kafka consumer offsets. This post explains with a step-by-step example of how Apache Flink works with Apache Kafka to ensure that records from Kafka topics are processed with exactly-once guarantees.
~ Finally
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.