Throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.
.NET
- Profiling .NET Code with BenchmarkDotNet. If you want to benchmark your .NET code, you probably use BenchMarkDotNet (if you do not, you should). The man behind BenchMarkDotNet is Adam Sitnik, and in the linked blog post he announces how you, soon, can use the EtwProfiler to profile benchmarked code! Very cool!
Databases
- The design and implementation of modern column-oriented database systems. In this post, Adrian dissects a white paper about column-oriented databases. Having worked a little bit with SQL Server’s column store indexes, it is very cool to get the “lowdown” on the design behind it.
Azure Cloud
- Azure Databricks – Delta in preview, 9 regions added, and other exciting announcements. A blog post announcing that Azure Databricks Delta is available in preview. This is very interesting since I have been “chomping at the bits”, to do some tests with Databricks Delta.
- Spark Debugging and Diagnosis Toolset for Azure HDInsight. This post is another announcement from Microsoft. This time it is how Spark Diagnosis Toolset for HDInsight is now available in preview. The toolset allows you to identify low parallelization, to detect data skew and run data skew analysis, and quite a lot more.
Streaming
- Real-Time Presence Detection at Scale with Apache Kafka on AWS. This post discusses how Zenreach has implemented a framework for real-time presence detection, using Kafka Streams.
- State TTL for Apache Flink: How to Limit the Lifetime of State. Instead of me summarising the post, I shamelessly copy the opening paragraph: A common requirement for many stateful streaming applications is the ability to control how long application state can be accessed (e.g., due to legal regulations like GDPR) and when to discard it. This blog post introduces the state time-to-live (TTL) feature that was added to Apache Flink with the 1.6.0 release. It is very, very interesting. I need to start to play around with Flink!
- Troubleshooting KSQL – Part 1: Why Isn’t My KSQL Query Returning Data?. The obligatory Kafka link. The post is the first in a series how to troubleshoot KSQL. This and future posts in the series is, and, will be required reading for our Kafka team!
SQL Server
- Azure Data Studio for SQL Server. A post by Vicky Harp. Vicky is Principal Program Manager Lead at Microsoft for SQL Server tooling, and in the post, she introduces Azure Data Studio (the artist formerly known as SQL Operations Studio). Azure Data Studio is a new cross-platform desktop environment for both on-premises and cloud data platforms on Windows, MacOS, and Linux.
- SQL Server 2019 preview combines SQL Server and Apache Spark to create a unified data platform. An announcement by Microsoft how SQL Server 2019 comes with support for both Spark as well as Hadoop File System (HDFS). We do live in exciting times!
- Introducing Microsoft SQL Server 2019 Big Data Clusters. This post builds on top of the post above. It discusses how we can create big data clusters utilising the support in SQL Server 2019 of Spark and HDFS.
- What is New in SQL Server 2019 Public Preview. A post by yours truly. I do a quick look at what is new in SQL Server 2019, and I especially look at the Java language extension.
- SQL Server 2019 for Linux in Docker on Windows. Another post my myself. Since SQL Server 2019 for Linux now have support for SQL Server Machine Learning Services, I want to have a look at how it works. For that I obviously need it installed and I decided to install it as a Docker for Windows container. The post walks through what I did to get it installed. The post also discusses Azure Data Studio briefly.
~ Finally
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.