Geez, does time fly or what? We are already past the halfway mark of the year! Anyway, throughout the week, I read a lot of blog-posts, articles, and so forth, that has to do with things that interest me:
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This blog-post is the “roundup” of the things that have been most interesting to me, for the week just ending.
Databases
- A one size fits all database doesn’t fit anyone. A very interesting post by Werner Vogels, CTO at Amazon, where he argues that (from the article): The days of the one-size-fits-all monolithic database are behind us, and developers are now building highly distributed applications using a multitude of purpose-built databases.. As I said, a very interesting read!
Streaming
- We ❤ syslogs: Real-time syslog processing with Apache Kafka and KSQL—Part 3: Enriching events with external data. This article is the third in the series about syslog processing and Apache Kafka. In this episode Robin Moffat discusses how the inbound streams of syslog data can be enriched. Awesome article!
Big Data / Cloud
- Structured streaming with Azure Databricks into Power BI & Cosmos DB. A post, discussing the concept of Structured Streaming and how a data ingestion path can be built using Azure Databricks to enable the streaming of data in near-real-time. The post also talks about how Databricks can be connected directly into Power BI for reporting etc., and to Cosmos DB for persistence.
- The emerging big data architectural pattern. A very interesting blog post, discussing the popularity and success of the Lambda architecture as well as some of the shortcomings. The post then goes on to talk about how some of the shortcomings of Lambda can be solved by the use of Azure and Azure Cosmos DB. In essence, the post discusses how we can implement the Kappa architecture in Azure.
- A closer look at Azure Data Lake Storage Gen2. Microsoft recently announced Azure Data Lake Storage Gen2 and this post discusses some of the new features and capabilities.
Data Science
- Announcing RStudio and Databricks Integration. This post announces the integration of RStudio with the Databricks Unified Analytics Platform. Databricks “pops up” all over the place lately. I really need to look into it!
- The Data Analysis Maturity Model – Level Three: Distributed, consistent reporting systems. The third “episode” in Buck Woody’s series about data analysis maturity levels. In this post, Buck talks about distributed and consistent reporting systems.
SQL Server Machine Learning Services
- Installing R Packages in SQL Server Machine Learning Services - II. I published part 2 of the Install R Packages in SQL Server ML Services Series. In this post, we discussed how to use functionality in RevoScaleR to install packages on a remote SQL Server ML Services instance.
~ Finally
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.