Interesting Stuff - Week 41

Oct 14, 2018 in ROUNDUP
data science distributed computing sql server sql server r services sql server machine learning services kafka flink azure serverless ml.net
3 min read

Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:

data science
data in general
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.

Azure

A fast, serverless, big data pipeline powered by a single Azure Function. This is a blog post about how to use Azure Serverless functions to build highly performant data pipelines. At work, we are looking at implementing something similar.

Distributed Computing

Everything is broken. This is a very cool post by Murat where he lists some very relevant quotes/statements from a recent Everything Is Broken meetup he attended. Some of the quotes I particularly liked:
- Without observability you don’t have chaos engineering, you have a chaos.
- You don’t know what you don’t know, so dashboards are very limited utility. Dashboards are only for anticipated cases: every dashboard is an artifact of past failures. There are too many dashboards, and they are too slow.
- prerequisites for chaos engineering:
  1. monitoring & observability
  2. on-call & incident management
  3. know the cost of your downtime per hour (British Airlines’s 1 day outage costed $150 millon)
- How to choose a chaos experiment?
  - identify top 5 critical systems
  - choose 1 system
  - whiteboard the system
  - select attack: resource/state/network
  - determine scope

Data Science

Announcing ML.NET 0.6 (Machine Learning .NET). Microsoft just released ML.NET 0.6, and this post highlights some of the new enhancements.

Streaming

Machine learning & Kafka KSQL stream processing — bug me when I’ve left the heater on. I like this post as it combines two of my favorite topics: Streaming and Machine Learning. So anyway, the post is about how you can, by using Kafka and Machine Learning, monitor household power usage and alert when something out of the ordinary occurs.
An introduction to ACID guarantees and transaction processing. A while ago dataArtisans introduced serializable, distributed ACID transactions directly on data streams in Flink. This post here talks about the foundations of the capability.
KSQL Recipes Available Now in the Stream Processing Cookbook. A post which introduces the “KSQL Cookbook”: a collection of “recipes” designed to help people build event-driven, real-time systems.
How Apache Flink manages Kafka consumer offsets. This post explains with a step-by-step example of how Apache Flink works with Apache Kafka to ensure that records from Kafka topics are processed with exactly-once guarantees.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.

Blog Feed:

To automatically receive more posts like this, please subscribe to my RSS/Atom feed in your feed reader!

Follow Me: