Throughout the week, I read a lot of blog-posts, articles, etc., that has to do with things that interest me
- data science
- data in general
- distributed computing
- SQL Server
- transactions (both db as well as non db)
- and other “stuff”
This is the “roundup” of the posts that has been most interesting to me, this week.
This post is little late as I was in Cape Town during the weekend and gave a talk at satRday. The conference was really good, great job by Andrew Collier for arranging this. During the week I’ll put the code for my talk up on GitHub.
Data Science
- 6 Deep Learning Applications a beginner can build in minutes (using Python). Interesting article trying to de-mystify Deep Learning.
- Deep Learning in R. More about Deep Learning. This talks about various R packages for Deep Learning.
- Real-World, Man-Machine Algorithms. This article, which is part of the InfoQ series An Introduction To Machine Learning, talks about the end-to-end flow of developing machine learning models: where to get training data, how to pick the ML algorithm, and so forth.
- Performance improvements coming to R 3.4.0. Talks about what can be expected in the new R release 3.4, scheduled for March.
- RedQueen: An online algorithm for smart broadcasting in social networks. From the morning paper. This is about algorithms can be used to find the optimal time to post on social networks.
Streaming
- Spark is the Future of Analytics. Interesting analysis of Spark.
- Kafka Streams - how does it fit the stream processing landscape?. Post about Kafka Streams, a library for transforming and combining data streams in Kafka.
- Towards a real-time streaming architecture. How Sky Betting & Gaming went with Kafka for real-time streaming.
- User Activity Events using Kafka Streams. More about Kafka and Kafka streams. How to enrich an incoming stream of events with side data, and then compute aggregations based on the enriched stream.
Big Data & Databases
- Petabytes Scale Analytics Infrastructure @Netflix. About Netflix’ overall big data platform architecture, focusing on Storage and Orchestration.
- Spanner, the Google Database That Mastered Time, Is Now Open to Everyone. About Google Spanner, a database that can span multiple geo-locations and still be seen as one instance.
That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.