Interesting Stuff - Week 5

Feb 5, 2017 in ROUNDUP
data science distributed computing sql server streaming kafka r sql server on linux azure stream analytics managed data
3 min read

Throughout the week, I read a lot of blog-posts, articles, etc., that has to do with things that interest me:

data science
data in general
distributed computing
SQL Server
transactions (both db as well as non db)
and other “stuff”

This is the “roundup” of the posts that has been most interesting to me, this week.

Distributed Computing

Life Beyond Distributed Transactions. An excellent piece about distributed transactions in large scale systems. As a side note; the queue.acm.org is a goldmine if you are interested in enterprise computing related papers.
How Uber Manages a Million Writes Per Second Using Mesos and Cassandra Across Multiple Datacenter. Very interesting post about how Uber has designed their systems.
The Infrastructure Behind Twitter: Scaling Networking, Storage and Provisioning. Similar to the post above, but this time about Twitter. Some interesting takeaways:
- There is no such a thing as a “temporary change or workaround”. In most cases, workarounds are technical debt.
- Architect beyond the original specifications and requirements.

Data Science

Real Time Credit Card Fraud Detection with Apache Spark and Event Streaming. A post how you how to build a real time solution for credit card fraud detection.
Introduction to Machine Learning with Python. First part in a series about machine learning.
THE YEAR IN SQL ENGINES. So this is not about relational databases, but a roundup of various sql engines for data science and big data.
fst: Fast serialization of R data frames. A new R package for serialization of data.
A look back at the year in R and Microsoft. Looking at what happened in 2016 in R and Microsoft (related to machine learning).

Streaming

Streaming Live Data and the Hadoop Ecosystem. A very interesting presentation about Hadoop and streaming of data in Hadoop.
New in Azure Stream Analytics: Geospatial functions, Custom code and lots more!. Microsoft has just released new features and functionality for Azure Stream Analytics (ASA). I have played around with the Visual Studio tools for ASA, and it rocks!

SQL Server

JSON data in clustered column store indexes. Jovan has written a really nice post how Clustered Column Store indexes can give you compression and query performance benefits for JSON data store in SQL Server.
How to determine what causes a particular wait type. A post by Paul Randal from 2014 about how to find out when and why wait types occur.

Finally two more posts by Bob Dorr about SQL Server and Linux:

SQL Server on Linux: An LLDB Debugging Tale. What Microsoft did in order to be able to debug SQL Server running on Linux.
SQL Server on Linux: Scatter/Gather == Vectored I/O. How scatter/gather are done on Linux.

That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.

Blog Feed:

To automatically receive more posts like this, please subscribe to my RSS/Atom feed in your feed reader!

Follow Me:

Distributed Computing

Data Science

Streaming

SQL Server

Blog Feed:

Related Articles