Interesting Stuff - Week 31

Aug 5, 2018 in ROUNDUP
data science distributed computing sql server sql server r services sql server machine learning services kafka databricks databricks delta ai flink
4 min read

Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:

data science
data in general
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.

.NET

Tiered Compilation Preview in .NET Core 2.1. A blog-post about a new feature in .NET Core 2.1: Tiered Compilation. Tiered Compilation allows .NET to have multiple compilations for the same method that can be hot-swapped at runtime. This should improve compile times drastically!

Streaming

Apache Kafka - Whats That. This post about Kafka is by a colleague and a good friend of mine, Charl Lamprecht. In the post, he takes us through a very succinct overview of Kafka. Charl is “Mr Kafka” at Derivco, and he knows his “stuff”. Please be sure to follow his blog for more about Kafka (Charl, no pressure, hey?!).
Decoupling Systems with Apache Kafka, Schema Registry and Avro. An excellent post on how to decouple the systems you integrate via Kafka by using the Confluent Schema Registry. An added bonus in this post is that the code is .NET code!
A Practical Guide to Broadcast State in Apache Flink. This article discusses Broadcast State, a new feature in Apache Flink 1.5. With Broadcast State you can evaluate dynamic patterns on event streams by combining and jointly process two streams of events in a specific way.
Introducing Confluent Platform 5.0. As the title says, this post introduces the latest version of Confluent Platform: 5.0. Lots and lots of new interesting features. Go and have a look!
Apache Kafka for Microservices: A Confluent Online Talk Series. This post is a link to a three-part online talk series which introduces fundamental concepts, use cases and best practices for getting started with microservices and Kafka.

Big Data / Cloud

Processing Petabytes of Data in Seconds with Databricks Delta. In my roundups lately, I have covered Databricks Delta quite a bit and discussed how efficient it is processing lots and lots of data. This blog post takes a look under the hood and examines what makes Databricks Delta capable of sifting through petabytes of data within seconds. If you, like me, are interested in knowing how “stuff” works under the covers, then this post is a must-read!
Databook: Turning Big Data into Knowledge with Metadata at Uber. This post is about Databook, Uber’s in-house platform that surfaces and manages metadata about the internal locations and owners of specific datasets, and allows Uber to turn data into knowledge.

Data Science / AI

Video: How to run R and Python in SQL Server from a Jupyter notebook. A short post by David linking to a video showing how to run Python and R from inside SQL Server.
The InfoQ eMag: Real-World Machine Learning: Case Studies, Techniques and Risks. An InfoQ link to an eMag focusing on the current landscape of machine-learning technologies and real-world case studies of applied machine learning.
3 Steps to Build Your First Intelligent App – Conference Buddy. A blog-post which takes us through how to build an application utilising AI.

SQL Server Machine Learning Services

sp_execute_external_script and SQL Compute Context - III. I finally managed to finish and publish the third post in the sp_execute_external_script and SQL Server Compute Context series. In this post we use WinDbg, Process Monitor and WireShark to look in detail what happens in SQL Server when we use RxSqlServerData to pull data.

~ Finally

That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.

Blog Feed:

To automatically receive more posts like this, please subscribe to my RSS/Atom feed in your feed reader!

Follow Me: