State of Data Engineering Q3 2024

Here is this quarter’s state of data engineering newsletter. There is only a little chat about AI this time, and a focus on Open Table Formats, the Apache Iceberg Rest Spec, Open Table Format updates, and new updates in the Amazon Data Engineering ecosystem.

Prompt Engineering – Meta Analysis Whitepaper


One of my favorite AI podcasts, Latent Space, recently featured Sander Schulhoff, one of the authors of a comprehensive research paper on prompt engineering. This meta-study reviews over 1,600 published papers, with co-authors from OpenAI, Microsoft, and Stanford. 
 
[podcast] 
https://www.latent.space/p/learn-prompting 

 
[whitepaper] 
https://arxiv.org/abs/2406.06608 

 
The whitepaper is an interesting academic deep dive into prompting, and how to increase the quality of it through exemplars (examples provided into an LLM), but not providing too many (more than 20 hurts quality), and strange things like Minecraft agents are tools to understand how this ecosystem works. 
 
Other practical tips are given where asking for data in JSON and XML generally is more accurate and results formatted based of the LLM’s training data is better.  That does kind of lead to a problem if you don’t know what the training data is based off 
 
It provides a wide range of tools outside of the two common ones we use the most – chain of thought – which is a multi turn conversation, and retrieval augmented generation (RAG). 
 
We can expect in the next couple of years where LLMs themselves will integrate these workflows so we don’t care about it anymore, but if you want to squeak out some better performance this paper is worth reading. 

Open Table Format Wars – Continued

As a quick refresher, the history of data engineering kind of goes like this in 30 seconds

  1. 1980s – big data warehouses exist.  SQL is lingua franca
  2. 2000s – Apache Hadoop ecosystem comes out to address limitations of data warehouses to cope with size and processing
  3. 2010s – Datalakes emerge where data is still in cloud storage (E.G. Amazon S3)
  4. 2020ish – Datalakehouses or Transactional Datalakes come out to address limitations of Datalakes capability to be ACID compliant
  5. 2023 –  Consensus emerges over the term Open Table Format (OTF) with three contenders
    • Apache Hudi
    • Databrick Deltalake
    • Apache Iceberg
  6. Mid 2024s
    • June 3, 2024 – Snowflake announces Polaris catalog support for Apache Iceberg
    • June 4, 2024 – Databricks buys Tabular (thereby bringing in the founders of Apache Iceberg)

Historically, we see a major shift in technology about every 20 years, with older systems being overhauled to meet new paradigms. Consider the companies that fully embraced Apache Hadoop in the 2000s—they’re now in the process of rebuilding their systems. Right now we are in the middle of the maturing of open table formats.

Data has always kind challenging to deal with because the nature of data is messy, and moving data from one system to another seems simple, but is quite a bit of work as we know most ETL rarely is straight forward when taking into accounts SLAs, schema changes, data volumes, etc.

OTFs really matter for us when we deal with big data, and especially for extremely large data (think Uber or Netflix size data).  Databases usually can handle the blue and green without problem, but break at the yellow and red.

When working with your data platform, these are key questions you should be asking to help in refining your technology stack.

  • How much data is being processed (are we talking hundreds of gigabytes, terabytes, or petabytes?)
  • What is the SLA the data needs to be queried?
  • What is the existing data foot print in your organization (are you using a lot of MySQL, Microsoft, etc)
  • Does the organization have the capability to own the engineering effort of an OTF platform?
  • Do any of the customer’s data sources work for ZeroETL (like Salesforce, Aurora MySQL/Postgres, RDS?)
  • Is the customer already using Databricks, Hudi, Snowflake, Iceberg, Redshift, or Big Query?

The Future: Interoperability via the Apache Iceberg Catalog API

Apache Iceberg, which emerged from Netflix recently has recently been making a lot of news lately. From the acquisition of Tabular (basically the guys who founded Iceberg), to Snowflake open sourcing the Polaris catalog, to Databricks support in private preview, many signs are pointing to a more cross compatible future if certain conditions are met.

In this article

https://www.snowflake.com/en/blog/introducing-polaris-catalog/

There is a pretty important diagram where it shows cross compatibility of AWS, Azure, and Google Cloud. We aren’t here yet, but if all 3 vendors move towards implementing the Apache Iceberg HTTP Catalog API spec, that means cross federated querying will be possible.

I’m hopeful, because ETL’ing data from one place to another place has always been a huge hassle. This type of future really opens up interesting workloads where compute really can be separate even from your cloud.

Everything is a little strange to me, because moving towards the future really isn’t a technology problem, but more of a political one if each cloud choose to move that direction. We are getting signs, but I would say by this time next year, we will learn the intentions of all players. Meanwhile, stay tuned.


New emerging technology: DuckDB

DuckDB was created in 2018 and is a fast in-process analytical database.  There is a hosted version called MotherDuck, which is based off a serverless offering. DuckDB takes a different approach where you can run analysis on a large data set either via a CLI or your favorite programming language.  The mechanisms are slightly different where the compute runs closer to your application itself.


Article: Running Iceberg and Serverless DuckDB in AWS

https://www.definite.app/blog/cloud-iceberg-duckdb-aws

In this article, DuckDB can query Iceberg tables stored in S3.  Also, as an alternative it describes deploying DuckDB in a serverless environment using ECS with custom containers via HTTP requests.

In the future, I expect AWS to take more notice and integrate DuckDB in the ecosystem in the next couple of years. 

ChatGPT even has a DuckDB analyst ready

https://chatgpt.com/g/g-xRmMntE3W-duckdb-data-analyst

Use Cases:

  • Say you have a lot of log data in EC2.  Typically, you would load it into S3 and query via Athena.  Instead you could  load the data in EC2, and then load a DuckDB instance there where you can query it without penalty for exploration
  • Preprocessing and pre-cleaning of user-generated data for machine learning training
  • Any type of system that previously used SQLite
  • Exploration of any data sets if it is on your laptop – this one is a no brainer.

— Iceberg Updates:



[Article]: The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables


TLDR: Iceberg now supports snapshot and orphan file removal
https://aws.amazon.com/blogs/big-data/the-aws-glue-data-catalog-now-supports-storage-optimization-of-apache-iceberg-tables/

Amazon previously tackled the issue of small file accumulation in Apache Iceberg tables by introducing automatic compaction. This feature consolidated small files into larger ones, improving query performance and reducing metadata overhead, ultimately optimizing storage and enhancing analytics workloads.

Building on this, Amazon has now released a new feature in AWS Glue Data Catalog that automatically deletes expired snapshots and removes orphan files. This addition helps control storage costs and maintain compliance with data retention policies by cleaning up unnecessary data, offering a more complete solution for managing Iceberg tables efficiently.

[Feature]: Accelerate query performance with Apache Iceberg statistics on the AWS Glue Data Catalog

TLDR: If you want faster SLA in Iceberg tables, run the table statistics feature for a potential of 24 –> 80% in improvement in query time

https://aws.amazon.com/blogs/big-data/accelerate-query-performance-with-apache-iceberg-statistics-on-the-aws-glue-data-catalog

Column-level statistics are a method for enhancing the query performance of Iceberg tables in Amazon Redshift Spectrum and Athena. These statistics are based on the Puffin file format and allow query engines to optimize SQL operations more effectively. You can enable this feature via the AWS Console or by running an AWS Glue job. According to the performance data in the blog post, improvements range from 24% to 83%, simply by running a job to store metadata.

Summary:

  • Use this if you have large datasets and need consistent query performance. Small datasets may not benefit enough to justify the effort.
  • Be aware of the overhead involved in running and maintaining statistics jobs.
  • Since data will likely change over time, you should set up automated jobs to periodically regenerate the statistics to maintain performance gains. While manual effort is required now, this feature could be more integrated into the platform in the future.

[Article]: Petabyte-Scale Row-Level Operations in Data Lakehouses
Authors: Apache Foundation, Apple Employees, Founder of Apache Iceberg

TLDR: If you need to do petabyte scale row level changes, read this paper.
https://www.vldb.org/pvldb/vol17/p4159-okolnychyi.pdf

We rarely have run into the scale of needed to run petabyte row level changes, but it details a strategy with these techniques


    
TechniqueExplanationHudi EquivalentDatabricks Equivalent
Eager MaterializationRewrites entire data files when rows are modified; suitable for bulk updates.Copy-on-Write (COW)Data File Replacement
Lazy MaterializationCaptures changes in delete files, applying them at read time; more efficient for sparse updates.Merge-on-Read (MOR)Delete Vectors
Position DeletesTracks rows for deletion based on their position within data files. Delete Vectors
Equality DeletesDeletes rows based on specific column values, e.g., row ID or timestamp. Delete Vectors
Storage-Partitioned JoinsEliminates shuffle costs by ensuring data is pre-partitioned based on join keys. Low Shuffle MERGE
Runtime FilteringDynamically filters out unnecessary data during query execution to improve performance. Runtime Optimized Filtering
Executor CacheCaches delete files in Spark executors to avoid redundant reads and improve performance.  
Adaptive WritesDynamically adjusts file sizes and data distribution at runtime to optimize storage and prevent skew.  
Minor CompactionMerges delete files without rewriting the base data to maintain read performance.Compaction in MOR 
Hybrid MaterializationCombines both eager and lazy materialization strategies to optimize different types of updates.  

The paper also half reads as a marketing paper for Iceberg, but the interesting aspect is that half of the authors are from Apple.  One of the authors of that paper also made this video on how Apache Iceberg is used at Apple.

Video:
https://www.youtube.com/watch?v=PKrkB8NGwdY

[Article]: Faster EMR 7.1 workloads for Iceberg

TLDR: EMR 7.1 runs faster on its customized Spark runtime onEC2

https://aws.amazon.com/blogs/big-data/amazon-emr-7-1-runtime-for-apache-spark-and-iceberg-can-run-spark-workloads-2-7-times-faster-than-apache-spark-3-5-1-and-iceberg-1-5-2

This article essentially serves as marketing for Amazon EMR, but it also demonstrates the product team’s commitment to enhancing performance with Apache Iceberg. It’s a slightly curious comparison, as most users on AWS would likely already be using EMR rather than managing open-source Spark on EC2. Nevertheless, the article emphasizes that EMR’s custom Spark runtime optimizations are significantly faster than running open-source Spark (OSS) on EC2.

  1. Optimizations for DataSource V2 (dynamic filtering, partial hash aggregates).
  2. Iceberg-specific enhancements (data prefetching, file size-based estimation).
  3. Better query planning and optimized physical operators for faster execution.
  4. Integration with Amazon S3 for reduced I/O and data scanning.
  5. Java runtime improvements for better memory and garbage collection management.
  6. Optimized joins and aggregations, reducing shuffle and join overhead.
  7. Increased parallelism and efficient task scheduling for better cluster utilization.
  8. Improved resource management and autoscaling for cost and performance optimization.

[Article]: Using Amazon Data Firehose to populate Iceberg Tables
TLDR: Use this technique if you might need Iceberg tables from the raw zone for streaming data and you need ACID guarantees

 https://www.tind.au/blog/firehose-iceberg/

Recently, a sharp-eyed developer spotted an exciting new feature in a GitHub Changelog: Amazon Data Firehose now has the ability to write directly to Iceberg tables. This feature could be hugely beneficialfor anyone working with streaming data and needing ACID guarantees in their data lake architecture.

Warning: This feature isn’t production-ready yet, but it’s promising enough that we should dive into how it works and how it simplifies the data pipeline.

: An Interesting Future: Example of Iceberg being queried from Snowflake and Databricks

Randy Pitcher from Databricks shows an example how an Iceberg table created in Databricks is queried with Snowflake.   As mentioned earlier, the chattering is not all vendors are fully implemented the Catalog API spec (yet), but once this gets mature in 2026-ish, expect the ability to query data across cloud to be possible.

https://www.linkedin.com/posts/randypitcherii_snowflake-is-killing-it-with-their-iceberg-ugcPost-7239751397779419136-z1ue

Redshift Updates

Major updates for Zero ETL

All Other AWS Updates:

Other

Braces

Growing up I had the same dentist from childhood to adulthood. My dentist’s office was run by Dentist Chung (in Vietnamese I called him Bác Sĩ Chung – which means Dr Chung translated directly) and his sister running the office.

The office was in Garden Grove, in between the Korean and Vietnamese districts. Walking in I would always smell the incense from an herbal shop next door.

The office looked like it was from the 1970s. They had this really old but comfortable couch and constantly played oldies music from the local radio station.

I distinctly recall being afraid as a kid going in, and somehow the office manager convinced me if I did a good job with a cleaning I could someday get the dentist’s chair. With my warped sense of rationalizing things, it all made sense and I calmed down.

When I was in early high school Dr Chung said, “you should think about getting braces and fixing your underbite.” I really had no issues with my teeth so far, but I entertained his proposal. I went to an Orthodontist consult.

The Orthodontist I saw was in the heart of Little Saigon – the Vietnamese area of Westminster. When coming in I waited in the reception area for a bit, where the Orthodontist admitted me in the office.


He asked me to bite down and said pretty quickly – “class 3 malocclusion jaw surgery – recommend jaw surgery.” He explained to me that the process would be to remove my 4 wisdom teeth, have braces for 2 years, have jaw surgery, and then have braces again for potentially another year. He didn’t explain much any pros and cons and ushered me away to talk to the assistant for more details.

In another room, the assistant put on some DVD of the process of dealing with class 3 malocclusions. It meant that I had an underbite, and what they needed to do is remove my wisdom teeth to make space, and then crack my jaw and move it back. The recovery would involve sewing my lips (?) and going on a liquid diet for a while.

The assistant also said that some people liked having this jaw surgery because of improvements to their facial profile. She also mentioned that some people don’t even recognize them after the surgery.

The assistant ended with saying, “You know, Vietnamese are a superstitious bunch, so some say that doing jaw surgery will change your destiny!”

Okay, count me in for not believing in superstition, but really that is the absolute worst thing you could say to a teenager after getting a quick 5 minute consult, a gory video on the treatment of an underbite, and somebody saying it will change your destiny. At that point, I decided not to go along with my surgery and went along my merry way.

A couple years after the consult, I called the dentist’s office to book an appointment, and I was told the dentist had a heart attack! He evidently had been eating a pretty unhealthy diet (I know correlation isn’t causation, but he did eat McDonalds every day for lunch). Fortunately he bounced back and started working again.

A couple years after the heart attack, he actually had another heart attack and this time fatal. When he passed away, my family went to his funeral and saw his grieving sister, and the dentist’s daughter who I talked on and off with throughout going to the office. Oddly enough, the dentist’s daughter did a quick internship at one of my old startups back in the day.

After grieving the loss of my dentist, there were the practical issues of finding a new dentist. Pausing for a moment, I remembered, my optometrist’s brother (whose parents live next to my parents) was a dentist.

Dr Tan Huynh was also in the heart of Little Saigon, but when I drove into his office, they had  computers that could do x-rays, and an efficient staff to make cleanings and appointments way easier. I had realized at that point I had been going to Dr Chung’s office with technology from the stone ages.

With the first consult, the dentist asked me to bite down and asked if I considered braces and jaw surgery to fix my underbite. This time being older, I peppered him with questions on pros and cons. He mentioned my teeth were functionally fine at the moment, but in the future I might not be able to chew as my teeth wore down. Asking what age I might not be able to eat, he threw out what seemed to be the random number of 60.

Remembering the experience at my last Orthodontist, I wasn’t convinced the pros outweighed the cons (eg – cons meaning my destiny would change).

When I moved up to Vancouver, I was faced yet again on finding a new dentist. Jason recommended me to visit an office nearby, where Dr M was the first to see me.

He did the whole consult and analysis, but this time they took pictures and some fancy 360 xray scan. He brought up again my underbite, and we again talked through the pros and cons. I asked whether I should try to fix it and he said a lot of people have underbites and just manage it. Apparently when eating I push food through my back teeth immediately.

During the pandemic when I got my first cleaning I saw Dr F, a younger dentist who was one of the co-owners of the office. She saw my bite and asked if I wanted to fix my underbite, and after the 4th mention in my life it got my thinking a little bit more seriously about it. This time she said Invisalign might be able to fix it.

I came back to another appointment after my cleaning to get an Invisalign consult. They did some scans and because of the pandemic they wanted to limit in person meetings, so the follow-up was a zoom call.

Dr F proceeded to say that she initially thought she could take out my middle bottom tooth, but to fix my underbite.   However she concluded Invisalign wouldn’t work and that I should see an Orthodontist.

This time I was a little more open to it because I was no longer traveling as a consultant during the pandemic, and wearing a mask would make it pretty easy to hide the fact I had braces.

Weeks later I saw the orthodontist Doctor D and they did the initial analysis. He basically said I have two options. First, remove 2 wisdom teeth, braces for 2 years, jaw surgery, then braces for 2 years. Second, remove 6 teeth, braces for 2 years and you are done.

I peppered him with questions on the pros and cons health wise, and he said functionally both would lead to the same outcome. He said the jaw surgery would change my profile, but would come with more risks since it was a surgery. I decided to go with option 2.  I also wondered why when I was a teenager I wasn’t presented with a non jaw surgery option, but I’m guessing it was because the technology of modeling these outcomes weren’t available.

Dentistry is an interesting field because most dentists and orthodontists can’t tell you definitely what will happen with your teeth in the future. It all seems to be what risk/reward you are comfortable with.
As part of the assessment I had to pay $500.  If I chose to move forward with braces they would credit my account, but if not, I would lose it.  I think sunk cost fallacy nabbed me this time as this pushed me over the edge to do a final commitment of the decision.

Before putting on braces, and I had to get 6 teeth extracted.  To ease the pain, I got 3 extracted from my regular dentist, and 3 extracted from an extraction specialist doctor.  Let’s just say, the extraction specialist finished the entire job in about 30 minutes while my regular dentist took about 1.5 hours.  My regular dentist felt so guilty taking so long she gave me her cell phone number and told me to call her if I had any post extraction complexities.

The process of wearing braces involved seeing the orthodontist about every 6 weeks for an adjustment, and compliance to get the results you want.  In addition to braces, you have a wire running across and little hooks where you can attach rubber bands to.  Throughout the process compliance meant always wearing and rotating the rubber bands as needed as well as avoiding eating really hard food (like nuts), to avoid breaking your bracket.  Slipping up on compliance inevitably leads to a longer total process.

When I saw my Orthodontist, I noticed I was the oldest person in the office as it was mostly kids and teenagers.  Often I would overhear my Orthodontist sternly warn the kids that they weren’t being compliant by either not brushing their teeth well or not wearing their rubber bands. I would then hear parents berating their children in one sentence, and in the next sentence begged them to be compliant.  It usually ended with the parents trying to guilt trip their children by saying seemingly unhelpful things like, “don’t you want good teeth like your brother.”

Getting braces as an adult is a bit different as I was on a mission to be compliant and to finish it as soon as possible because I paid for every penny of it.  Psychologically, something different clicks in your head when it is your money on the line.

The initial side effects I had were teeth sensitivity.  There were times hard food was difficult to eat (like sandwiches, cucumbers, steak, etc), so I bought these tiny tots scissors originally intended for parents to use when cutting food for their babies.  The scissors were an obnoxious bright blue color, but I liked it because it was compact and had a case.

One time I had a business meeting with a customer at a restaurant and when the food came I took out the scissors.  The person next to me paused and asked why I had bright blue scissors.  I explained to him the whole dental situation, and then the whole table caught wind of the conversation and asked me about the scissors.  It was a bit awkward in the beginning, but then the whole table spent the next hour talking their dental issues.  Also through this experience I learned bringing scissors is generally helpful at restaurants if you are sharing food.


2.5 years later (6 months behind schedule mind you), I had an appointment to remove my braces.  The doctor told me saying, “there was a lot of movement of your teeth, we probably need to install a permanent wire retainer behind your bottom front teeth”.  And at the same time I was told I needed to wear a retainer full time for 6 months, and then at night time for the rest of my life.

I was a little shocked as I never really put two and two together that after the braces I would have to wear a retainer at night in my mouth for the rest of my life.  I wonder if ortho offices gave a really honest assessment of the entire process (brackets breaking, wires poking, teeth sensitivity, retainers for the rest of your life), if fewer people would opt in.

Am I happy with the result?  Well my underbite is fixed now, but really the whole intended health outcome of being to chew when I’m 60 might require another blog post in 20ish years.

State of Data Engineering 2023 Q2

When looking at data engineering for your projects, it is important to think about market segmentation. In particular, you might be able to think about it in four segments

  • Small Data
  • Medium Data
  • Big Data
  • Lots and Lots of Data


Small Data – This refers to scenarios where companies have data problems (organization, modeling, normalization, etc), but don’t necessarily generate a ton of data. When you don’t have a lot of data, different tool sets are in use ranging from low code tools to simpler storage mechanisms like SQL databases.

 
Low Code Tools 

The market is saturated with low code tools, with an estimated 80-100 products available. Whether low code tools work for you depends on your use case. If your teams lack a strong engineering capacity, it makes sense to use a tool to help accomplish ETL tasks.

However, problems arise when customers need to do something outside the scope of the tool.

Medium Data– This refers to customers who have more data, making it sensible to leverage more powerful tools like Spark. There are several ways to solve the problem with data lakes, data warehouses, ETL, or reverse ETL.

Big Data – This is similar to medium data, but introduces the concepts of incremental ETL (aka transactional data lakes or lake houses). Customers in this space tend to have data in the hundreds gigabytes to terabytes.

Transactional data lakes are essential because incremental ETL is challenging. For example, consider an Uber ride to the airport that costs $30. Later, you give a $5 tip, and now your trip costs $35. In a traditional database, you can run some ETL to update the script. However, Uber has tons of transactions worldwide, and they need a different way of dealing with the problem.

Introducing transactional data lakes requires more operational overhead, which should be taken into consideration.

Lots and Lots of Data – Customers in this space generate terabytes or petabytes of data a day. For example, Walmart creates 10 pb of data (!) a day.

https://medium.com/walmartglobaltech/lakehouse-at-fortune-1-scale-480bcb10391b

When customers are in this space, transactional data lakes with Apache Hudi, Apache Iceberg, and Databricks Deltalake are the main tools used.

Conclusion

The data space is large and crowded. With the small and lots of data sizes, the market segment is clear. However, the mid-market data space will probably take some time for winners to emerge.

West Coast Trail – The 75km/48 mile death hike

Author Note: This trip was taken in 2021, but updated in 2023 with updated details.

I’m not really sure where I get these crazy ideas, but a friend and I booked the West Coast Trail. It is this multi day thru hike in the west coast of Vancouver Island, which is accessible via ferry. Unfortunately in 2020 the hike was canceled, but a friend and I fortunately got in the lotto and booked one of the most coveted start times, July 2nd. July typically is better to go because you want as little precipitation as possible.

I have done a lot of hiking, and cool trips, but never thru-hiking. What this means is you start from one point and end out and another point. You carry everything on your back including your food, tent, and supplies.

To prepare for the trail, there pretty much were two resources to read. This book Blister’s and Bliss and the super valuable Facebook group.

From reading the group, everybody recommended to either buy dehydrated food or make it yourself. The reason being is you don’t want to carry real food for the possibility of spoilage and additional weight.

I bought the book from the backpacking chef, and decided to start experimenting. First thing I bought was a dehydrator.

There is a fan on top of the dehydrator and you set the temperature and time. It runs typically for a long time, and takes about 8-20 hours to dehydrate certain foods. What you do is fully cook whatever you are going to eat, let it cool a bit, then dehydrate it from 120-135 degrees for multiple hours.

After much experimenting I successfully dehydrated:
+ rice
+ beans
+ lentils
+ tofu (you have to freeze it first)
+ kale
+ ratatouille
+ thai curry paste
+ quinoa

I didn’t really like dehydrating meat such as chicken breast because it kind of tasted weird at end of the day.

For the food I would pack one meal in a ziplock bag.

At the end I made 7 meals consisting of
+ japanese curry – tofu, kale, beans, ratatouille mix, textured vegetable protein
+ thai curry – instant rice noodles, thai curry paste, tofu, beans
+ lentils – green lentils, quinoa, salsa macha

For breakfast I packed oatmeal, for lunch tortillas, and PB&J, some parmesan crackers – bars. Total weight – about 9-10 pounds.

Preparation #2: Packing

For the west coast trail, you want to only have a backpack which is about 20-30% of your body weight. The lighter the better. That meant for me about 30-40 pounds.

What a lot of people do for thru-hiking is weigh every item and put it in a website called lighter pack. It basically is a fancy excel spreadsheet online.

https://lighterpack.com/r/sokgof

During the pandemic, all sports gear in Vancouver was in short supply. I spent uhh, a lot of pennies upgrading all of my gear. I bought an ultralight 1.2 lb tent in the states, bought a new jacket, a new sleeping pad, and a gravity filter. I couldn’t find the tent in Canada, so I bought it from REI in the states, and then asked my parents to ship it up.

Visualizing my gear one last time I put everything in my bag for a final weigh in and test

Final weigh in was about 34 lbs. If I count the number of hours I spent dehydrating and packing and thinking about the trip, I for sure spent at least 40 hours planning.

One app which was incredible useful was Avenza Maps. With this you are able to see where you are relative to the trail that Parks Canada provides as a PDF. However be aware that the map is not 100% updated to the latest routes so use Avenza Maps only as a reference and cross-check the physical map given.

Trail Report Day 1: 75km —> 70km – 3.1 miles
AKA – The day I despise ginormous large ladders

For the thru-hike there were two options, south to north or north to south. We opted to go south to north as it starts off super difficult, then slowly gets easier. Logistically, we spent a night in Victoria, and then got dropped off the trailhead in Port Renfrew. After a quick orientation we took a ferry across and this was the first thing we saw:

If there was anything to wake you up, it is a ladder two stories high. At this point I turned off my brain and went up really slowly.

I didn’t realize it at the time, but this trail was actually quite dangerous, because if you fall or slip, consequences could be quite fatal. In hiking, there are some interesting terms such as calling a trail ‘technical’.

When hikers call something technical it refers to the terrain being more difficult where you don’t simply walk on a dirt path. When you walk, on more technical terrain it may refer to scrambling on rocks, uneven trail, roots, etc.

For this portion of the trail it wasn’t too technical, but rather high in elevation. The hiking in this section took about 4.5 hours to get to the campsite.

In this hike, every campsite is by a beach because there are glacial melt from rivers which feed into oceans. This is important because you need to filter water at each site when you are done. Carrying gallons of water for 7 days would be impossible!

At the campsite there were a mix of people finishing the trail and starting the trail. It is pretty typical in any really big hike to inquire about trail conditions. We heard that many people bailed out at the hike half way because of the heat conditions. I’m sure you heard about the ‘heat dome’ in the Pacific Northwest, and temperatures were in Portland/Seattle/Vancouver from 100f and higher! Hiking in 100 degree weather would be brutal.

After we ate dinner, one of the ladies we were talking to came back to me and asked if I was a doctor. She asked if I had hydrogen peroxide and said I looked familiar and asked if I worked at the BC Women’s Hospital.

—— Aside
For some odd reason, people pretty often have asked me pretty weird questions about my occupation. One time I was in Dallas Lovefield Airport flying on Southwest airlines waiting for my gate. Somebody asked me if I was a pilot.

I was kind of just puzzled like, what makes me look like a pilot? Just kind of weird what people assume of you.

Another time I was yet again at the airport (this was pre-covid life where I used to travel twice a month), where someone asked if I was an athlete competing in the Olympics. As flattered as I was, that was again a pretty weird assumption to make. I distinctly recall wearing sweat pants and having a Bose headset on me.
—— End Aside

Knowing I didn’t want to cramp up doing yoga stretches on the beach was near impossible, so I did it on the platform of the restroom.

I’m sure people were wondering who that crazy person was doing yoga at night.

Unfortunately/fortunately I was getting strong 5G reception from T-mobile from Washington. Most people had the true chance to disconnect, but uhh.. I was checking my e-mails before sleeping.

Trail Report Day 2: 70 —> 58km – 7.4 miles
AKA – The day I despise rocks

You would think sleeping by the beach is relaxing, but really that is far from the case. I didn’t sleep that well as the ocean was thundering in the middle of the night. I finally dug out my ear plugs and somewhat slept okay

One of the things which was really beautiful and I couldn’t capture in photos was that mornings unique sunrise. On the left where you see that bright light is the sun. As time progressed because of the cloud formation all I would see is an expanding line over the horizon.

Brushing your teeth also has some special considerations. That means brushing and flossing near the ocean and away from your campsite because you don’t want any food bits to be near your tent to attract animals.

Again, these were one of those times where I just shut off my brain, and prayed for safety the entire trek. This would be rated uber technical.

Later on in the Facebook group I read about someone who slipped off a rock and fell and had to be medivac’ed out. Looking back it was a pretty dicey section.

We finally reached a section called Owen Point, where you could not cross unless tides were low enough.

While my friend was taking a picture I witnessed someone attempt to cross when the tide was not low enough and slipped off a rock. She fortunately was okay. After watching several people get hurt, we decided to really wait for the tides to be safe and crossed.

After the boulder section there was a super interesting coastal walk for quite a long time. The waves really shaped the geography of the land in a unique way.

However walking on coast shelves had their own problems. You would need to be aware of what was slippery and not.

Certain spots looked like dead body markings, but they were just salt which had dried up, perhaps from previous rocks moved?

Similar to Galiano Island, again so many interesting formations in the rocks

After the coastal part, we reached KM 66 and went inland. The scenery changed back to forest

At one point, the trail turned to be pretty muddy and as I was stepping off a slippery platform. I slipped right off and fell 4 feet off the log and right on my back. Fortunately I landed right on my backpack. I was pretty shaken up, extremely scared, but Praise God had no injuries from that fall. Later on, I checked and nothing broke in my backpack.

We stayed at a pretty small campsite for the night.

Trail Report Day 3: 58km —> 41km / 10 miles cullite to cribs
AKA – The day I despise uneven coastal hiking and realized I forget stuff easily

Paranoia set in after falling off a log earlier. I basically was watching nearly every step I was taking.

We had a super long 10km walk along the beach. You would think walks along the beach are fun, but nope. First off when you step, you sink into the sand. Second off, you are kind of walking at a weird 45 degree slope where your left and right legs are uneven.

—— Aside: the grand debate about shoes
When of the topics debated quite heavily in the hiking community is to wear trail shoes or boots. For most of my hikes I have always worn trail shoes. The pros I would say are:

+ Lightweight
+ Dry quickly
+ You don’t develop blisters around your toes

I had always done hiking in very hot areas so I never had an issue with trail shoes. EXCEPT on this trail I got my shoes and socks wet. What happened is that my shoes never dried because of the mistiness and humidity of the trail causing 2 blisters on the bottom of my feet.

A lot of people say that boots protect your ankles, but I am of the view that having strong ankles protects your ankles. That means doing various lunges, steps, and light weights to help your feet.

I learned later from the Facebook group that trail shoe wearers should be bringing a mineral based cream to put on their feet when wet to avoid blisters.

Let’s say at the end of the day I am still a trail shoe fan, but now open to perhaps waterproof style shoes. Still not convinced about boots~
— End Aside

After endless walking, we went through tide pools again, and there were quite a few dead crabs, washed up kelp, and sea urchins. We even saw some green sand which I’ve only seen in Hawaii.

After a long slog we finally arrived at a pretty nice beach campsite.

When you cook in the back country, it is quite different than regular cooking. What you do is put your dehydrated food in a camping stove, add water, and bring it to a boil. Think of it as a healthier cup of noodles.

After dinner we chatted with a mom who was with 5 kids (!). She mentioned that her husband had a brain concussion 10 years ago and couldn’t do any of these hikes. She really liked talking with us because she wanted some adult time as all of her conversations were mainly jokes with kids.


I then proceeded to do my night routine and realized I couldn’t find my toothbrush. I started to panic and realized I couldn’t find my toiletry bag. I had left it at the previous campsite at the beach *face palm*.

Further more, the repercussions would be bigger because I wouldn’t be able to brush or floss for 4 days!

I approached Cindy (the mom) as she was sitting down with other people. I publicly explained my debacle and Cindy gave me some toothpaste in a ziplock bag. I needed to floss with braces, and another lady had dental floss picks which were BRACES FRIENDLY. The odds of getting this were so small. I offered chocolate to them, but they just said to pay it forward.

The bigger problem is I now had no toothbrush, but from talking to some people, they said that at the next stop, I probably would be able to pick up a toothbrush.

At late night, I fell asleep to the chorus of frogs chirping. Actually was quite soothing after a stressful day.

Trail Report Day 4: 42km —> 33km cribs to nitinat narrows

— Warning: below talks about poop talk
One of the things hikers and campers talk a lot about is poop. You need to consider how you will poop and where. For this trail, there are outhouses, so all you have to do is bring toilet paper, hand sanitizer, and soap.

It is important to time your poop schedule because you want to go to the bathroom in the morning then in the evening. Because if you need to go #2 in the middle of the day, it is extremely inconvenient as you have to dig a hole.

My routine pretty much is wake up to poop, eat breakfast, then poop one more time before heading out. Fortunately throughout the hike I have pretty much adhered to this routine.

Another huge issue is peeing in the middle of the night. When you are warm in the tent, you have to change, walk to the bathroom, then walk back. Imagine being at home, and instead of walking to your bathroom, you have to walk to the building next to you.

Many people try to alleviate this issue by doing a double pee. So peeing at night, hanging around the restroom for 20 minutes, and peeing again.
 End Poop Talk

This morning it didn’t rain, but the beach was EXTREMELY misty and everything got wet. That means packing up was miserable. I was so out of it I thwacked myself in the eye with my tent pole, but fortunately everything was fine.

We trekked inland and the trail was extremely overgrown and extremely muddy. After 5 hours of hiking we passed by this really beautiful lily field.

I knew the first half of the trip would be brutal, so I booked a cabin halfway. In the middle of the hike you have the opportunity to do something called ‘comfort camping’. There is a place where you can eat and order real food. Although it is at exorbitant prices, every morsel was worth it.

We finally arrived at Nitanit Narrows which is an area run by first nations, the Nitinat tribe. The area consists of cabins for rent and a super popular food shack pretty much everyone eats at.

It was odd that I had only been eating dehydrated food for 2 days, but I already was craving real food. I got the halibut and baked potato and it was GLORIOUS.

Afterwards we met Doug, one of the caretakers of the property. He showed us to our room and I was pretty pleasantly surprised. I had seen pictures, but this way actually better in person.

After drying all of our stuff outside, we sat in the patio area where there was a group of 5. They were heading north to south, and they asked about a bunch of tips on the difficult section.

Doug came by to talk about the land and his experiences here. He talked about how his family escaped residential schooling because his mom was white, but many were taken away.

Residential schooling has occurred in the United States, but it is a a pretty hot button issue in Canada. In short, there has been a long history of first nations (in the US called Indians or Native Americans), being taken away from their families to be educated in government run schools. Of course you can imagine the trauma, and destruction of families about this.

We were with 5 other guys in the afternoon talking, and when we all were talking Doug asked if we all wanted to go pick up crabs from their crab traps in a boat!

We all headed into the boat with the DOG, who amazingly enjoyed the experience and probably quite used to it. Crab traps were set-up with fish heads spread out in the lake and then later on they are picked up.

There are regulations where crabs have to be a certain size, or else they are thrown back. This does make sense in a sustainability perspective.

Trail Report Day 5: Nitanit Narrows 32 km to 23 km klanawa river.
AKA: Approaching easy town

—Aside Hiking Debate #2 – poles or no poles
You would be surprised but there are so many debates in the hiking community. This debate is to bring hiking poles or not.

Hiking poles to me are insurance that if you have a slip you have the opportunity to catch yourself with your poles.

For gear, my opinion is to buy higher quality but more expensive gear because if it breaks on the trail, you are out for the rest. I remember buying cascade hiking poles from Costco, and it breaking in the middle of hiking of Peru. That really was not a cool experience.

My vote is if the trail is remotely technical – yes poles!
—End Aside

After a refreshing nights sleep, we headed out once again. There was some mud, some slippery boardwalks, and a lot of walking through twisted roots in a forest.

We did a brief stop at Tsusiat Falls where we both jumped into the lake. About 2 km later, we arrived at a campsite where it was the only the two of us.

After setting up camp, I explored the beach area

Around near the campsite I saw mussel shells and a ton of logs everywhere. I remember reading that during the winter, torrential storms come in and reshape the beach landscape. Here are tons of logs that washed up in the beach.

Trail Report 6: 23km to 0km pachena bay
AKA: Let’s get out of here!

The trail started again coastal with an endless slog of beach and tons of rocks and boulders. At this point I had developed two blisters from wet socks so I was cautious. We arrived at the last campsite before the exit at 1pm, and decided just to exit out of the park immediately. It was another 4 long hours, but then we exited!

The ending was super uneventful. Like we could really find the parking lot and there were no acclaims of cheer or anyone to even meet.

At the end of the day, a lot of people have been asking me, was the hike enjoyable or worth it?

I’ve been thinking about it a lot. I think my style of hiking is to hike to a super gorgeous viewpoint and take photos. The West Coast Trail to me is more of a hike of endurance as I’ve never done a thru-hike before.

Life revelations?

As I told some before I usually don’t have any life revelations during really challenging hikes. I guess that’s a good sign?

As in most things of life going outdoors is part preparation, part training, part luck, and all prayer.

Addendum: Here the recipes I used for my trip
Dehydrated Recipes:

  • Black beans – 125F, 5 hours –
  • Mayocoba beans – 125F, 5 hours
  • Ratatouille – 135 – 18 hours – need to break it up half way, make sure all vegetables
  • Quinoa – 135 – 5-8 hours, fruit roll up tray
  • Lentils – 135 – 8 hour

OATMEAL

  • 50 g
  • 3g chia
  • 6 g barley
  • 10g blueberries


Japanese Curry (2x)

  • 66g dried rice
  • 4g kale
  • 50g lentils
  • 25g tofu
  • 15g tvp
  • 21g beans
  • 8g ratatouille
  • 1g spring onions
  • 10g dried mushrooms
  • 1/2 block block japanese curry block
  • Furikake spice

Tumeric Curry

  • 1 package rice noodles
  • 2g curry packet
  • 50g tofu
  • 50g beans
  • 5g coconut milk powder
  • 10g tvp
  • fish sauce

Green Lentils

  • 100g lentils
  • 50grams quinoa
  • 20g vegetables
    Addition
  • Salsa macha
  • Raisins
  • Olive Oil

Discuss:
https://news.ycombinator.com/item?id=35681810