A Byte of Coding Issue 410

A Byte of Coding Issue 410

A Byte of Coding

Hi,

I missed yesterday’s issue because I had a late start and honestly didn’t find anything I thought was worth sharing.

Unrelated to programming, but very much on the data side of things, we launched our use our data page for FindEnergy yesterday. We publish a bunch of electricity rate and production data on a monthly basis, as well as a bunch of other data and launching live power outage monitoring as well. If you think your company would be interested in purchasing this data for anything, let me know! There are lots of useful examples of applications on the page I linked.

Anyway, here’s the issue.

Made possible through generous sponsorship by:

Published: 28 May 2024

Tags: data processing, data, parquet

Trevor Hilton discusses the use of Bloom filters in Parquet files to improve query performance, especially for high-cardinality data. The article details the effectiveness and storage impact of Bloom filters with various parameters.

Some highlights:

  • Moderate Bloom filter settings (FPP 0.01, NDV 1,000) optimize pruning efficiency, reducing query time significantly.

  • Bloom filters added storage costs ranging from 2 KB to 8 KB per column per row group.

  • Fine-tuning Bloom filter parameters to match data cardinality resulted in high storage penalties without significant pruning benefits.

Published: 23 May 2024

Tags: api, http, infosec, architecture

Juhani Eronen argues against redirecting HTTP API requests to HTTPS, advocating for a fail-fast approach to improve security. The post highlights the risks of silent redirects and suggests disabling HTTP interfaces or returning clear error responses.

Some highlights:

  • Redirecting HTTP API calls to HTTPS can expose API keys in plaintext.

  • Fail-fast by disabling HTTP or returning errors to make security issues visible early.

  • Automatic revocation of API keys sent over HTTP enhances security.

Published: 27 May 2024

Tags: seo, architecture

Not super technical, but we depend on SEO a good bit and I found it an interesting read. Mike King analyzes leaked Google internal documentation revealing insights into Google Search's ranking systems and data storage. He discusses the implications of these findings on SEO practices and Google's transparency.

Some highlights:

  • The leak exposes over 14,000 attributes and 2,596 modules used by Google Search.

  • Contrary to Google's public statements, the documentation shows the use of "siteAuthority" and click-based metrics in rankings.

  • The documentation reveals measures for sandboxing new sites and utilizing Chrome data for ranking.

Thanks for your Support!

Big thanks to all of the Patreon supports and company sponsors. If you want to support the newsletter you can checkout the Patreon page. It's not necessary, but it lets me know that I'm doing a good job and that you're finding value in the content.