Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.
Distributed Computing
- The many faces of consistency. A blog post by Murat where he dissects a white paper about consistency. The paper talks about two types of consistency: state and operation. Seeing that Murat now does sabbatical work at Microsoft (see below), he compares the two consistency types with what Cosmos DB provides. The post is a must read if you are the least interested in distributed computing and consistency.
Cloud / Big Data
- Globally replicated data lakes with LiveData using WANdisco on Azure. The post discusses how you can achieve globally replicated Azure Data Lakes. The replication can be both hybrid: on-prem to Azure as well as Azure to Azure.
- Cloud data and AI services training roundup August 2018. This post lists some free data and AI training sessions.
- Azure Cosmos DB. This post by Murat is about his first impressions of Azure Cosmos DB. Murat has taken a sabbatical and spends a year at Microsoft in the Cosmos DB team. I look forward to more posts by Murat about Cosmos DB, and other Azure related topics.
Misc.
- HowTo - Docker on Windows. My mate and colleague, Charl continues his blogging journey. This post is how to run Docker on Windows.
Streaming
- Streaming Data Dominates: Over 2000 Developers Say “Only Batch” Is Almost Extinct. A survey by Lightbend, (formerly known as Typesafe), makes it clear that developers now moves more and more towards real-time processing as opposed to batch. That, my friends, is music to my ears!
- Apache Flink 1.6.0: What’s new in the latest Apache Flink release. What the title says; this is a post detailing some of the new features in the Flink 1.6.0 release.
- Getting Started with Apache Kafka and Kubernetes. A blog-post about the work done to enable Kafka to run on Kubernetes. The post points to a white paper: Run Confluent Platform on Kubernetes Using Best Practices which is really good!
- Kafka Streams in Action. A post about the upcoming book: Kafka Streams in Action. Apart from announcing the book, the post also contains the foreword to the book. This book is a must if you are interested in Kafka Streams!
- Building a Real-Time Attribution Pipeline with Databricks Delta. this blog post looks at how to use the Databricks DataFrame API to build Structured Streaming applications and use Databricks Delta to query the streams in near-real-time.
Data Science / AI
- Model Serving: Stream Processing vs. RPC/REST With Java, gRPC, Apache Kafka, TensorFlow. A short and sweet blog-post comparing stream processing applications with a model serving infrastructure, like TensorFlow Serving, for serving machine learning models.
- IEEE Language Rankings 2018. A post by David about the latest IEEE Spectrum language rankings.
- Scalable IoT ML Platform with Apache Kafka + Deep Learning + MQTT. This post describes a hybrid machine learning infrastructure leveraging Apache Kafka as a scalable central nervous system. Very interesting!
SQL Server Machine Learning Services
I have started on the third post in the Install R Packages in SQL Server ML Services series. I hope to be able to publish it in a week or so.
~ Finally
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.