Throughout the week, I read a lot of blog-posts, articles, etc., that has to do with things that interest me:
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This is the “roundup” of the posts that has been most interesting to me, this week.
Distributed Computing
- Life Beyond Distributed Transactions. An excellent piece about distributed transactions in large scale systems. As a side note; the queue.acm.org is a goldmine if you are interested in enterprise computing related papers.
- How Uber Manages a Million Writes Per Second Using Mesos and Cassandra Across Multiple Datacenter. Very interesting post about how Uber has designed their systems.
- The Infrastructure Behind Twitter: Scaling Networking, Storage and Provisioning. Similar to the post above, but this time about Twitter. Some interesting takeaways:
- There is no such a thing as a “temporary change or workaround”. In most cases, workarounds are technical debt.
- Architect beyond the original specifications and requirements.
Data Science
- Real Time Credit Card Fraud Detection with Apache Spark and Event Streaming. A post how you how to build a real time solution for credit card fraud detection.
- Introduction to Machine Learning with Python. First part in a series about machine learning.
- THE YEAR IN SQL ENGINES. So this is not about relational databases, but a roundup of various sql engines for data science and big data.
- fst: Fast serialization of R data frames. A new R package for serialization of data.
- A look back at the year in R and Microsoft. Looking at what happened in 2016 in R and Microsoft (related to machine learning).
Streaming
- Streaming Live Data and the Hadoop Ecosystem. A very interesting presentation about Hadoop and streaming of data in Hadoop.
- New in Azure Stream Analytics: Geospatial functions, Custom code and lots more!. Microsoft has just released new features and functionality for Azure Stream Analytics (ASA). I have played around with the Visual Studio tools for ASA, and it rocks!
SQL Server
- JSON data in clustered column store indexes. Jovan has written a really nice post how Clustered Column Store indexes can give you compression and query performance benefits for JSON data store in SQL Server.
- How to determine what causes a particular wait type. A post by Paul Randal from 2014 about how to find out when and why wait types occur.
Finally two more posts by Bob Dorr about SQL Server and Linux:
- SQL Server on Linux: An LLDB Debugging Tale. What Microsoft did in order to be able to debug SQL Server running on Linux.
- SQL Server on Linux: Scatter/Gather == Vectored I/O. How scatter/gather are done on Linux.
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.