State of Data Engineering 2023 Q2

When looking at data engineering for your projects, it is important to think about market segmentation. In particular, you might be able to think about it in four segments

  • Small Data
  • Medium Data
  • Big Data
  • Lots and Lots of Data


Small Data – This refers to scenarios where companies have data problems (organization, modeling, normalization, etc), but don’t necessarily generate a ton of data. When you don’t have a lot of data, different tool sets are in use ranging from low code tools to simpler storage mechanisms like SQL databases.

 
Low Code Tools 

The market is saturated with low code tools, with an estimated 80-100 products available. Whether low code tools work for you depends on your use case. If your teams lack a strong engineering capacity, it makes sense to use a tool to help accomplish ETL tasks.

However, problems arise when customers need to do something outside the scope of the tool.

Medium Data– This refers to customers who have more data, making it sensible to leverage more powerful tools like Spark. There are several ways to solve the problem with data lakes, data warehouses, ETL, or reverse ETL.

Big Data – This is similar to medium data, but introduces the concepts of incremental ETL (aka transactional data lakes or lake houses). Customers in this space tend to have data in the hundreds gigabytes to terabytes.

Transactional data lakes are essential because incremental ETL is challenging. For example, consider an Uber ride to the airport that costs $30. Later, you give a $5 tip, and now your trip costs $35. In a traditional database, you can run some ETL to update the script. However, Uber has tons of transactions worldwide, and they need a different way of dealing with the problem.

Introducing transactional data lakes requires more operational overhead, which should be taken into consideration.

Lots and Lots of Data – Customers in this space generate terabytes or petabytes of data a day. For example, Walmart creates 10 pb of data (!) a day.

https://medium.com/walmartglobaltech/lakehouse-at-fortune-1-scale-480bcb10391b

When customers are in this space, transactional data lakes with Apache Hudi, Apache Iceberg, and Databricks Deltalake are the main tools used.

Conclusion

The data space is large and crowded. With the small and lots of data sizes, the market segment is clear. However, the mid-market data space will probably take some time for winners to emerge.

Data Engineering Low Code Tools

In the data engineering space we have seen quite a few low code and no code tools pass through our radar. Low code tools have their own nuances as you will get to operationalize quicker, but the minute you need to customize something outside of the toolbox, you may run into problems. That’s when we usually deploy our custom development using things like Glue, EMR, or even transactional datalakes depending on your requirements.

This list is split into open source, ELT (reverse ETL), streaming, popular tools, and the rest of the tools. In the space, one thing I have been looking for is a first class open source product. I know that many of these products start as open source and end up releasing a managed version of the product. Personally of course I am all in for open source teams to make back their money somehow, but it would be ideal to have the platforms still contain an open source license.

One thing my team has been noticing is the traction dbt has been gaining in the market. It flips the paradigm a bit doing ELT (Extract Load Transform – reverse ETL), where everything is loaded to your data warehouse first then you start doing transformations on it.

Another project I have been watching with Zach Wilson’s recommendation is mage.ai. It is a pretty spiffy way of creating quick DAGs with executable Python notebooks. The platform is pretty active soliciting feedback on Slack and is one to watch for the future. Airbyte and Meltano are newer to me and I hope to take some time to play with those tools. This list is by no means the most exhaustive, but let me know if there is anything I have missed.

Opensource Tools

Product: Airbyte
Description: Airbyte is an open-source data integration platform that allows users to replicate data from various sources and load it into different destinations. Its features include real-time data sync, robust data transformations, and automatic schema migrations.
Link: https://airbyte.io/
Github Link: https://github.com/airbytehq/airbyte
Cost: Free, with paid plans available
Release Date: 2020
Number of Employees: 11-50

Product: mage.ai
Description: mage.ai is a no-code AI platform that enables businesses to automate and optimize workflows. It includes features such as visual recognition, natural language processing, and predictive analytics, with a focus on e-commerce applications.
Link: https://mage.ai/
Github Link: https://github.com/mage-ai
Cost: Open source
Release Date: 2020
Number of Employees: 11-50

Product: Meltano
Description: Meltano is an open-source data integration tool that allows users to build, run, and manage data pipelines using YAML configuration files. Its features include source and destination connectors, transformations, and orchestration.
Link: https://meltano.com/
Github Link: https://github.com/meltano/meltano
Cost: Free, with paid options available
Release Date: 2020
Number of Employees: 11-50

Product: Apache Nifi
Description: Apache Nifi is a web-based dataflow system that allows users to automate the flow of data between systems. Its features include a drag-and-drop user interface, data provenance, and support for various data sources and destinations.
Link: https://nifi.apache.org/
Github Link: https://github.com/apache/nifi
Cost: Free
Release Date: 2014
Number of Employees: N/A

Product: Apache Beam
Description: Apache Beam is an open-source, unified programming model for batch and streaming data processing. It provides a simple, portable API for defining and executing data processing pipelines, with support for various execution engines.
Link: https://beam.apache.org/
Github Link: https://github.com/apache/beam
Cost: Free
Release Date: N/A
Number of Employees: N/A

ELT

Product: dbt (data build tool)
Description: dbt is an open-source data transformation and modeling tool that enables analysts and engineers to transform their data into actionable insights. It provides a simple, modular way to manage data transformation pipelines in SQL, with features such as version control, documentation generation, and testing.
Link: https://www.getdbt.com/
Github Link: https://github.com/dbt-labs/dbt
Cost: Free, with paid options available for enterprise features and support
Release Date: 2016
Number of Employees: 51-200

Streaming

Product: Confluent
Description: Confluent is a cloud-native event streaming platform based on Apache Kafka that enables organizations to process, analyze, and respond to data in real-time. It provides a unified platform for building event-driven applications, with features such as data integration, event processing, and management tools.
Link: https://www.confluent.io/
Github Link: https://github.com/confluentinc
Cost: Free, with paid options available for enterprise features and support
Release Date: 2014
Number of Employees: 1001-5000

Popular Tools

Product: Fivetran
Description: Fivetran is a cloud-based data integration platform that automates the process of data pipeline building and maintenance. It provides pre-built connectors for over 150 data sources and destinations, with features such as data synchronization, transformation, and monitoring.
Link: https://fivetran.com/
Github Link: https://github.com/fivetran
Cost: Subscription-based, with a free trial available
Release Date: 2012
Number of Employees: 501-1000

Product: Alteryx
Description: Alteryx is an end-to-end analytics platform that enables users to perform data blending, advanced analytics, and machine learning tasks. It provides a drag-and-drop interface for building and deploying analytics workflows, with features such as data profiling, data quality, and data governance.
Link: https://www.alteryx.com/
Github Link: https://github.com/alteryx
Cost: Subscription-based, with a free trial available
Release Date: 1997
Number of Employees: 1001-5000

Product: Informatica
Description: Informatica is a data management platform that enables users to integrate, manage, and govern data across various sources and destinations. It provides a unified platform for data integration, quality, and governance, with features such as data profiling, data masking, and data lineage.
Link: https://www.informatica.com/
Github Link: https://github.com/informatica
Cost: Subscription-based, with a free trial available
Release Date: 1993
Number of Employees: 5001-10,000

Product: Matillion
Description: Matillion is a cloud-native ETL platform that enables users to extract, transform, and load data into cloud data warehouses. It provides a visual interface for building and deploying ETL workflows, with features such as data transformation, data quality, and data orchestration.
Link: https://www.matillion.com/
Github Link: https://github.com/matillion
Cost: Subscription-based, with a free trial available
Release Date: 2011
Number of Employees: 501-1000

Orchestration Tools

Sure! Here are the entries for Prefect, Dagster, Airflow, Azkaban, Luigi, and Oozie:

Product: Prefect
Description: Prefect is a modern data workflow orchestration platform that enables users to automate their data pipelines with Python. It provides a simple, Pythonic interface for defining and executing workflows, with features such as distributed execution, versioning, and monitoring.
Link: https://www.prefect.io/
Github Link: https://github.com/PrefectHQ/prefect
Cost: Free, with paid options available for enterprise features and support
Release Date: 2018
Number of Employees: 51-200

Product: Dagster
Description: Dagster is a data orchestrator and data integration testing tool that enables users to build and deploy reliable data pipelines. It provides a Python-based API for defining and executing pipelines, with features such as type-checking, validation, and monitoring.
Link: https://dagster.io/
Github Link: https://github.com/dagster-io/dagster
Cost: Free, with paid options available for enterprise features and support
Release Date: 2019
Number of Employees: 11-50

Product: Airflow
Description: Airflow is an open-source platform for creating, scheduling, and monitoring data workflows. It provides a Python-based API for defining and executing workflows, with features such as task dependencies, retries, and alerts.
Link: https://airflow.apache.org/
Github Link: https://github.com/apache/airflow
Cost: Free
Release Date: 2015
Number of Employees: N/A (maintained by the Apache Software Foundation)

Product: Azkaban
Description: Azkaban is an open-source workflow manager that enables users to create and run workflows on Hadoop. It provides a web-based interface for creating and scheduling workflows, with features such as task dependencies, notifications, and retries.
Link: https://azkaban.github.io/
Github Link: https://github.com/azkaban/azkaban
Cost: Free
Release Date: 2010
Number of Employees: N/A (maintained by the Azkaban Project)

Product: Luigi
Description: Luigi is an open-source workflow management system that enables users to build complex pipelines of batch jobs. It provides a Python-based API for defining and executing workflows, with features such as task dependencies, retries, and notifications.
Link: https://github.com/spotify/luigi
Github Link: https://github.com/spotify/luigi
Cost: Free
Release Date: 2012
Number of Employees: N/A (maintained by Spotify)

Product: Oozie
Description: Oozie is a workflow scheduler system for managing Hadoop jobs. It provides a web-based interface for defining and scheduling workflows, with features such as task dependencies, triggers, and notifications.
Link: https://oozie.apache.org/
Github Link: https://github.com/apache/oozie
Cost: Free
Release Date: 2009
Number of Employees: N/A (maintained by the Apache Software Foundation)

Tools

3forge – https://3forge.com/ – 3forge delivers software tools for creating financial applications and data delivery platforms.

Ab Initio Software – https://www.abinitio.com/ – Ab Initio Software provides a data integration platform for building large-scale data processing applications.

Adeptia – https://adeptia.com/ – Adeptia offers a cloud-based, self-service integration solution that allows users to easily connect and automate data flows across multiple systems and applications.

Aera – https://www.aeratechnology.com/ – Aera provides an AI-powered platform for enterprises to accelerate their digital transformation by automating and optimizing business processes.

Aiven – https://aiven.io/ – Aiven offers managed cloud services for open-source technologies such as Kafka, Cassandra, and Elasticsearch.

Ascend.io – https://ascend.io/ – Ascend.io provides a unified data platform that allows users to build, scale, and automate data pipelines across various sources and destinations.

Astera Software – https://www.astera.com/ – Astera Software offers a suite of data integration and management tools for businesses of all sizes.

Black Tiger – https://blacktiger.io/ – Black Tiger provides an open-source data pipeline framework that simplifies the process of building and deploying data pipelines.

Bryte Systems – https://www.brytesystems.com/ – Bryte Systems offers an AI-powered data platform that helps organizations manage their data operations more efficiently.

CData Software – https://www.cdata.com/ – CData Software provides a suite of drivers and connectors for integrating with various data sources and APIs.

Census – https://www.getcensus.com/ – Census offers an automated data syncing platform that allows businesses to keep their customer data up-to-date across various systems and applications.

CloverDX – https://www.cloverdx.com/ – CloverDX provides a data integration platform for building and managing complex data transformations.

Data Virtuality – https://www.datavirtuality.com/ – Data Virtuality offers a data integration platform that allows users to connect and query data from various sources using SQL.

Datameer – https://www.datameer.com/ – Datameer provides a data preparation and exploration platform that enables users to analyze large datasets quickly and easily.

DBSync – https://www.mydbsync.com/ – DBSync provides a cloud-based data integration platform for connecting and synchronizing data across various systems and applications.

Denodo – https://www.denodo.com/ – Denodo provides a data virtualization platform that allows users to access and integrate data from various sources in real-time.

Devart – https://www.devart.com/ – Devart offers a suite of database tools and data connectivity solutions for various platforms and technologies.

DQLabs – https://dqlabs.ai/ – DQLabs provides a self-service data management platform that automates the process of discovering, curating, and governing data assets.

eQ Technologic – https://www.eqtechnologic.com/ – eQ Technologic offers a data integration platform that enables users to extract, transform, and load data from various sources.

Equalum – https://equalum.io/ – Equalum provides a real-time data ingestion and processing platform that enables organizations to make data-driven decisions faster.

Etleap – https://etleap.com/ – Etleap offers a cloud-based data integration platform that simplifies the process of building and managing data pipelines.

Etlworks – https://www.etlworks.com/ – Etlworks provides a data integration platform that allows users to create and manage complex data transformations.

Harbr – https://harbr.com/ – Harbr is a data exchange platform that connects and facilitates secure data collaboration between organizations.

HCL Technologies (Actian) – https://www.actian.com/ – Actian provides hybrid cloud data analytics software solutions that enable organizations to extract insights from big data and act on them in real time.

Hevo Data – https://hevodata.com/ – Hevo Data provides a cloud-based data integration platform that enables companies to move data from various sources to a data warehouse or other destination in real time.

Hitachi Vantara – https://www.hitachivantara.com/ – Hitachi Vantara provides data management, analytics, and storage solutions for businesses across various industries.

HULFT – https://www.hulft.com/ – HULFT provides data integration and management solutions that enable businesses to streamline data transfer and reduce data integration costs.

ibi – https://www.ibi.com/ – ibi provides data and analytics software solutions that help organizations make data-driven decisions.

Impetus Technologies – https://www.impetus.com/ – Impetus Technologies provides data engineering and analytics solutions that enable businesses to extract insights from big data.

Infoworks – https://www.infoworks.io/ – Infoworks provides a cloud-native data engineering platform that automates the process of data ingestion, transformation, and orchestration.

insightsoftware – https://insightsoftware.com/ – insightsoftware provides financial reporting and enterprise performance management software solutions that help organizations improve their financial and operational performance.

Integrate.io – https://www.integrate.io/ – Integrate.io provides a cloud-based data integration platform that enables businesses to integrate and manage data from various sources.

Intenda – https://intenda.net/ – Intenda provides a data integration and analytics platform that enables businesses to unlock insights from their data.

IRI – https://www.iri.com/ – IRI provides data management and integration software solutions that enable businesses to integrate and manage data from various sources.

Irion – https://www.irion-edm.com/ – Irion provides a data management and governance platform that enables businesses to automate data quality and compliance processes.

K2view – https://www.k2view.com/ – K2view provides a data fabric platform that enables businesses to connect and manage data across various sources and applications.

Komprise – https://www.komprise.com/ – Komprise provides an intelligent data management platform that enables businesses to manage and optimize data across various storage tiers.

Minitab – https://www.minitab.com/ – Minitab is a statistical software package designed for data analysis and quality improvement.

Nexla – https://www.nexla.com/ – Nexla offers a data operations platform that automates the process of ingesting, transforming, and delivering data to various systems and applications.

OpenText – https://www.opentext.com/ – OpenText is a Canadian company that provides enterprise information management software.

Palantir – https://www.palantir.com/ – Palantir is an American software company that specializes in data analysis.

Precisely – https://www.precisely.com/ – Precisely provides data integrity, data integration, and data quality software solutions.

Primeur – https://www.primeur.com/ – Primeur is an Italian software company that offers products and services for data integration, managed file transfer, and digital transformation.

Progress – https://www.progress.com/ – Progress is an American software company that provides products for application development, data integration, and business intelligence.

PurpleCube – https://www.purplecube.ca/ – PurpleCube is a Canadian consulting company that specializes in data integration, data warehousing, and business intelligence.

Push – https://www.push.tech/ – Push is a French software company that provides products and services for data processing and analysis.

Qlik – https://www.qlik.com/ – Qlik provides business intelligence software that helps organizations visualize and analyze their data.

RELX (Adaptris) – https://www.adaptris.com/ – Adaptris, now a RELX company, offers data integration software that helps organizations connect systems and applications.

Rivery – https://rivery.io/ – Rivery is a cloud-based data integration platform that allows businesses to consolidate, transform, and automate data.

Safe Software – https://www.safe.com/ – Safe Software provides spatial data integration and spatial data transformation software.

Semarchy – https://www.semarchy.com/ – Semarchy provides a master data management platform that helps organizations consolidate and manage their data.

Sesame Software – https://www.sesamesoftware.com/ – Sesame Software offers data management solutions that simplify data integration, data warehousing, and data analytics.

SnapLogic – https://www.snaplogic.com/ – SnapLogic provides a cloud-based integration platform that enables enterprises to connect cloud and on-premise applications and data.

Software AG – https://www.softwareag.com/ – Software AG offers a platform that enables enterprises to integrate and optimize their business processes and systems.

Stone Bond Technologies – https://www.stonebond.com/ – Stone Bond Technologies offers a platform that enables enterprises to integrate data from various sources and systems.

Stratio – https://www.stratio.com/ – Stratio offers a platform that enables enterprises to process and analyze large volumes of data in real-time.

StreamSets – https://streamsets.com/ – StreamSets offers a data operations platform that enables enterprises to ingest, transform, and move data across systems and applications.

Striim – https://www.striim.com/ – Striim offers a real-time data integration and streaming analytics platform that enables enterprises to collect, process, and analyze data in real-time.

Suadeo – https://www.suadeo.com/ – Suadeo provides a platform that enables enterprises to integrate and manage their data from various sources.

Syniti – https://www.syniti.com/ – Syniti offers a data management platform that enables enterprises to integrate, enrich, and govern their data.

Talend – https://www.talend.com/ – Talend provides a cloud-based data integration platform that enables enterprises to connect, cleanse, and transform their data.

Tengu – https://tengu.io/ – Tengu offers a data engineering platform that enables enterprises to automate the process of ingesting, processing, and delivering data.

ThoughtSpot – https://www.thoughtspot.com/ – ThoughtSpot offers a cloud-based platform that enables enterprises to analyze their data in real-time.

TIBCO Software – https://www.tibco.com/ – TIBCO Software offers a platform that enables enterprises to integrate and optimize their business processes and systems.

Tiger Technology – https://www.tiger-technology.com/ – Tiger Technology offers a platform that enables enterprises to manage, move, and share their data across systems and applications.

Timbr.ai – https://timbr.ai/ – Timbr.ai provides a platform that enables enterprises to manage and process their data in real-time.

Upsolver – https://www.upsolver.com/ – Upsolver offers a cloud-native data integration platform that enables enterprises to process and analyze their data in real-time.

WANdisco – https://wandisco.com/ – WANdisco offers a platform that enables enterprises to replicate and migrate their data across hybrid and multi-cloud environments.

ZAP – https://www.zapbi.com/ – ZAP offers a data management platform that enables enterprises to integrate, visualize, and analyze their data.

Domo – https://www.domo.com/ – Domo is a cloud-native platform that gives data-driven teams real-time visibility into all the data and insights needed to drive business forward.

Dell Boomi – https://boomi.com/ – Dell Boomi is a business unit acquired by Dell that specializes in cloud-based integration, API management, and Master Data Management.

Stitch – https://www.stitchdata.com/ – Stitch is a cloud-first, open-source platform for rapidly moving data. It allows users to integrate with over 100 data sources and automate data movement to a cloud data warehouse.

Sparkflows – https://sparkflows.io/ – Sparkflows is a low-code, drag-and-drop platform that enables organizations to build, deploy, and manage Big Data applications on Apache Spark.

Liquibase – https://www.liquibase.com/ – Liquibase is an open-source database-independent library for tracking, managing, and applying database schema changes.

Shipyard – https://shipyardapp.com/ – Shipyard is a container management platform that makes it easy to deploy, manage, and monitor Docker containers.

Flyway – https://flywaydb.org/ – Flyway is an open-source database migration tool that allows developers to evolve their database schema easily and reliably across different environments.

Software Estimations Using Reference Class Forecasting

18 years ago I’m sitting in my cubicle doing Java programming, and my tech lead comes up to me to chat about my next project. We discuss the details, and then she asks me the dreaded questions programmers fear which is “how long will it take?”. I stumble with some guestimate based off my limited experience and she goes along her merry way and plugs the number into a gantt chart.

Even with the emergence with the agile manifesto, and now the current paradigms of using 1-2 week sprints to plan projects, business and customers still are asking technologists to provide how long a project will take.

The unfortunate thing about agile is that even though it is an ideal way to run a project, financial models rarely follow that methodology. Meaning, most statement of works are written with a time estimate on a project. There are some exceptions to the rule where some customers pay for work 2 weeks at a time, but it is pretty rare.

Throughout my technical career, I have rarely seen any formalized software estimation models emerge that we all use, so I was surprised when I was reading How Big Things Get Done, a mention about software project estimation. The beginning chapters talked about the challenges and successes of large architectural projects ranging from the Sydney Opera House (problematic project) all the way to the Guggenheim in Bilbao (amazingly under budget).

The book proposes using reference class forecasting which asks you to

  1. Get software estimates of all similar projects perform in the past in your organization with your current project
  2. Take the mean value
  3. Use that as an anchor

For example, if I was doing an application modernization of Hadoop to EMR and I had no idea how long it would take, I would try to get references to other projects of similar complexity. Let’s say I had data of 10 previous projects and the mean came out to 6 months. Then 6 months would be your anchor point.

The book does immediately point out that the biggest problem isn’t this approach, it is obtaining the historical data of how long previous projects took. Think about it this way, out of all the projects you have ever estimated, have you compared the actuals to your forecast? I bet you, most of us haven’t done these retros at all.

Some take aways for me is:

  1. If you are in a large organization and you have done multiple projects, take the time to do a retro on projects you have done and store in a spreadsheet what project you have done, the tasks, complexities, and the actual time it took to finish. Unfortunately large companies have this valuable data but don’t go through the exercises to calculate this. With this, some rudimentary reference class forecasting can start to be used instead of subjective software estimations.
  2. If you are a small organization or don’t have a history of projects and don’t have any reference point, then unfortunately I just think you are out of luck.

At the end of the day, I think industry needs to get better at software estimation, and the only way is to develop some type of methodology and refine it over time.

Chile

Easter Island

Before going on any travel trips, my travel style is usually to do as much research as possible.  My general methodology is to ask friends for advice, read guide books like Lonely Planet, look up travel itineraries on Reddit, and try to schedule zoom calls with travel writers.

The last idea of scheduling zoom calls with travel writers was actually a new idea that has been rather successful.  When we were planning Portugal, we had to make some major decisions on travel routing, and the guide books didn’t provide clear guidance one way or another.  One day I was listening to a Rick Steve’s podcast about Portugal, and a local guide Cristina Duarte seemed rather knowledgeable.  I decided to cold e-mail her to ask if she would be open to doing a one hour paid zoom session.

She actually responded pretty quickly via email and then we scheduled a session on zoom in a couple days.  With zoom we had a google sheets up with a draft itinerary and a google map to ask questions about locations.  Cristina was able to guide us on some major decision points as well as a lot of tips about the country.

This model was so successful in helping plan the Portugal trip that we did the same thing again with another travel writer Mark Johanson for Chile.

As we talked to Mark, he guided us on selecting a few regions, and for Easter Island said you have to book a guide to see the sights so you can’t see things on your own.   Post pandemic, government regulations have changed (I think due to the better) to help protect the moais from tourists.

Oddly enough, I think my first exposure to Easter Island was when I was 8 years old playing a Nintendo game called Gradius.  The game was a 2d side scroller, and you piloted a spaceship basically blowing things up.  For some whatever really weird reason there were enemies in the videos of moai shooting bubbles of their mouths. 

To this day, I wonder why these moais were even in Gradius. Perhaps maybe because of a story you will see later in this post?

Other than that video game, I really didn’t know much about the island at all.  Even guide books for some reason didn’t provide much information about Easter Island.

With the advent of the Internet, I think there definitely are tradeoffs to traveling.  On one hand doing research and learning about your destination is now easier than ever.  There is no shortage of travel influencers creating videos on Youtube about destinations you can watch.  With the Internet you can virtually experience almost everything before getting to your destination.

On the other hand, we have lost a sense of truly exploring the unknown and genuine wonder.  I remember backpacking through Europe in 2005 with really limited access to the Internet because smart phones were not widely available.  We got lost so often in the city, and nowadays with Google Maps, it is just really hard to get lost.

Getting to this tiny remote island was an adventure on its own.  You first had to get to Santiago, Chile, then you take a 5 hour flight directly west.  You are such in the middle of nowhere in the ocean that you can take a direct flight to Tahiti and then fly back to North America. 

After landing and settling in our hotel we had a day to explore the tiny town as our tour didn’t begin until the next day.

The first huge surprise was seeing everyone dressed up in the streets in costumes like Batman, The Flash, and even anime characters like Naruto.  After seeing they were getting candy from all of the store owners, we realized it was Halloween, Oct 31st.  I asked how long this tradition has been around Easter Island and some shop owners said it came probably 10 years ago.  I’m amazed how some of the weirdest holidays in the Western world can migrate all over.

The second surprise I had was I expected Easter Island to be way more touristy.  In the town, barely anyone spoke English, so those old Spanish high school lessons were fortunate to help us survive going around town with Jason.

Kava Kava

To tour or not to tour?  Often this is a question always asked when you visit a foreign country, but for Easter Island, the choice is taken away from you.  Now when you visit Easter Island, you can only visit sites if you are going with a tour or through a local indigenous guide.  Lonely Planet recommended the tour group Kava Kava, so we booked a 2.5 day tour with them.

Our tour guide was Sebastian, and we were super lucky as he is one of the owners of Kava Kava.  In addition to being a tour guide, he formerly was a park ranger, and involved in the tourism ministry for the government.

The first fact we learned from him was that the name Easter Island came because the early colonizers discovered the island on Easter day.  However the local indigenous people call the island Rapa Nui (which is the same name as the indigenous population).

La Pandemia

We all were affected by the pandemic in different ways, but Rapa Nui really was a rough situation.  When borders were shut, the people on the island had to fend and survive mostly by themselves as the island was cut off from the world for about 1.5 years. 

In a normal year, the people of Rapa Nui would get their goods from the bellies of commercial flights, but once that dried up due to the closures, the people had to fend for themselves.

Many of the Rapa Nui people reverted back to their roots of fishing and farming, but many of the Chileans on the island didn’t have that ancestral knowledge to rely on and took free repatriation flights from the government to leave.  The population went from 10,000 people down to 5,000 people.

There was quite a bit of fear of reopening borders for quite a while until Omicron started. Once most people on the island got infected, and there weren’t any deaths, the island finally decided to open in 2022 with the requirement of a negative PCR test. However I think as of this writing, tests are no longer required to enter the island.

One thing which I think changed the experience of Rapa Nui for us was being on the island without many people.  Since at the time they just started to open up to the world, about half of the restaurants and the town generally speaking were understaffed.

Touring


Sebastian could write a whole book about the history of moais and the Rapa Nui people if he wanted to.  The most interesting thing to me about the trip was a reminder that history, contrary to our thinking of it, is not absolute and not set in stone.

Sebastian told us there are 3 important parts of the moai.  First, the ahu, the ceremonial platform the moai is on.  Secondly, the actual moai statue. And lastly the pu’kau, the hat on top of it.

We spent the first day visiting the Rano Raruku – the moai factory.  Moais were painstakingly handcrafted out of a mountain, and then moved up to 11 miles away.  In a totally different part of the island, the pu’kaus were made and then transported to the moais and placed on top of their heads.

Since the island just recently opened we had most of the site to ourselves. It was quite surreal to walk through moais through various state of restoration.

The current theory is that 4 people with 4 ropes were used to ‘walk’ the moai.  With one person on every corner they moved the moais slowly.

Pu’kaus on the other hand are still a big mystery.  To this day, archeologists, don’t have a settled theory on how these huge hats are placed on the moais. The further the mystery of moais, they are unable to carbon date the moais because there is no organic material on it.

The thing which moved me the most was that these people spent most of their lives making the moais, moving them, and repeating this process. Talk about a legacy because we are still admiring these statues 11 centuries later.

Sebastian said that most people ask how long it took to build the moais. He said that in our current capitalistic society that time is the unit of measure which most interested in. Back in the day the Rapa Nui cared of no such thing. I think there is something to be learned from this, where perhaps time shouldn’t be the most important factor, but instead the quality and the journey of what we go through.

I look at my life today and find myself so distracted with so many things I want to do.  Even in my Google Chrome tab, I probably have 50 tabs open with 3 instances of the browser to show the extent of my multi tasking of what is on my mind.

Looking at how the people worked on the moais have got me thinking, what kind of legacy do we want to leave?  And does making whatever legacy require a singular focus? Learning about the history of the Rapa Nui people revealed a beauty of doing one thing well, like really well, for a long time.

The Most Run Down Site

Towards the end of the tour, Sebastian took us to Ahu Te Peu (which oddly is on Google Maps and oddly is rated 4.1) where all the statues were knocked down and still on the ground.  It was definitely out of the beaten path where we had to hike there and nobody else was there at all.

He said that archeologists have specifically not restored the site to continue to do research on the moais.  I never really thought about it, but most of the historical things we see, look in awe, and post pictures on Instagram are of restored sites. Whether it be sites in Europe, pyramids, or temples, if things weren’t restored you would just see rubble.

As we walked around and looked and everything in a decrepit state, I found this moai site to be my favorite on the island.

Very rarely in life do we see something in its raw, unfiltered state.  Half destroyed, but preserved, and probably never restored again ever in the future.

Japan and Rapa Nui

We were standing by Ahu Tongariki, a set of moais by the ocean. Sebastian explained to us in 1960 a tsunami caused by an earthquake swept the moais way off the platform and all over the place. Since these statues were super huge, specialized cranes had to be used to move it because of its weight.

The Japanese company Tadano had the idea to volunteer to help restore the moai for their marketing material. The idea was that the company would benefit from the press of having one of their cranes restore such a historic site. The goal was to use the cranes and the moais in their marketing material.

As the Japanese worked on the restoration, they ended up learning of the spiritual significance of the site and withdrew all of their marketing materials out of a sign of respect. From that time forward, the Rapa Nui people and Japan have had a special relationship.

In current times, there has been much discussion of repatriation of stolen artifacts from colonial powers. In the British Museum, in the front is a moai where the Rapa Nui have petitioned the British government for its return.

In one sense, museums provide people the ability to experience artifacts and cultures without traveling afar. On the other hand, there is a troublesome history of artifacts stolen from other countries not being returned. In the perspective of the British Museum, once that floodgate opens, then all the countries will start asking for their goods back.

Sebastian told us the Japanese also are helping the Rapa Nui people recoat all the moais every 15ish years or so as the moais are indeed falling victim to weather and time. That eventually there will be a time they don’t exist anymore.

Since the Japanese and Rapa Nui have this special relationship it was interesting to learn that the Rapa Nui volunteered loaned one of their moais to Japan for an exhibit in Osaka in 1982. All this kind of reminds me of the importance of earning respect of cultures we encounter, and the special bonds and relationships that can form from mutual respect of one another.

San Pedro de Atacama – Explora

The Rapa Nui trip was so fulfilling in every sense that if we ended our Chile trip there, we could have just went home happy.  Fortunately we had another leg of the trip, the north of Chile, San Pedro de Atacama.­

Talking with Mark earlier, he mentioned that this area either serviced low end backpackers, or high end fully established excursion experiences.  Given there wasn’t a mid-range end, we opted to splurge on the latter. We chose Explora as they planned all the tour guiding for you.

After departing the plane and being exposed to the elements, it is as if a machine sucked all the moisture from my skin.  I knew San Pedro de Atacama was a desert, but I didn’t realize how harsh it was.  If you have ever been to Death Valley National Park, it is similar to that, but more intense.

Looking at the UV forecast, it was at 14 and uhh, I always thought 10 was the highest number.  After an hour via ground transport, we arrived at the hotel.

Explora is one of those experiences where every one there is on the fancier side of the income spectrum.  We met quite a few people there for their honey moon and chose Explora to not worry about the hassle of planning.

A trip planner helped plan our next 4 days.  We did a mix of hiking, and photogenic driving tours around the city.

The tours we went on were really great and were made in  way to maximize enjoyment.  For example, you would start at one part of the hike, and they would meet you at the end of the hike with a picnic.  In a regular hike you would start at some beginning point, but afterwards you would have to come back.

I had mixed feelings about the experience, as it was super nice, but only available to those with financial means to take these tours.

When I was back in Southern California recently visiting my parents, I met up with an old friend who was also in the computer science program at the University of California, Irvine.  After playing a tennis set, we were just chatting about our jobs and we talked about salary transparency.

I asked if we had an alumni event, would he share his salary with our friends from our graduating class.  He said yes, and we chatted a bit about how our generation is a bit reluctant to share salary with others.

I think within our social networks we probably can guess how much other people make through some visual cues (cars, lifestyles, housing, etc), but if we knew exactly how much our friends made would it affect our relationship with them?

My friend mentioned something super interesting where at his company, generation z employees are transparently sharing their salary with each other, regardless of the gap of salaries present amongst their peers in order to have the most information to see if any of them are getting underpaid.

I think there is a fear in letting people know how much you make.  Fear that perhaps it might change the relationship, but perhaps some transparency might actually help in people make a decision when pondering career paths.  I don’t know, this is really still an unresolved topic in my mind.

Rare Plants

The most memorable tour of the trip in San Pedro de Atacama, was the scariest one.  We started a hike pretty high up, about 4,000 meters to see the Tatio Geysers.  The geysers were super hot and cool to see, but the coolest part was starting a hike there along a hot river which was 40C.  As we walked along the river bed we saw this yellow grass which only exists above 4000m.

Towards the end of the hike we saw these green fuzzy things. Our hiking guide said they were called llaretas, and grew one mm a year. He guessed that this fuzzy plant was probably a thousand years old.

I just stood there in awe at something which could survive for such a long time. When I reflect upon this time frame, I wonder possibility what things we have in our society today will last a thousand years?

The things we collect will inevitably go out of date and decay, but I think nature will reign supreme in the long run in reminding us what is important with its longevity.

Valparaiso

The last part of our trip was visiting the city of Valparaiso.  I had wanted to visit the city (which is about an hour away from Santiago), last time I was there in 2015, but had gotten massively sick from some food at a restaurant back then.

Okay, this is my theory, but Jason disagrees.  Back in 2015 we were eating at Astrid y Gaston in Lima, Peru where we ordered the tasting menu.  There were four of us there, and we had one vegetarian menu as he didn’t eat meat.  Protein wise, we had raw fish, chicken, and some goat.

After the meal, the 3 people who ordered the meat tasting menu all got bad cases of diarrhea, while the vegetarian friend was fine.  Jason said it was because the vegetarian friend was from India and had a better microbiome, but I really just thought we got nailed from the meat.

This was an awful situation because I was going to hike the W, which was a 5 day hike in Patagonia Chile about 4 days later.  After a couple days of suffering with diarrhea, I gave up (as well as the other 3 meat eaters), and took antibiotics to clear out our system.  Seriously, after that experience I considered going vegetarian.

When we flew back from San Pedro de Atacama to Santiago, we took transit directly to Valparaiso.  The city is known for being completely covered with graffiti art.

Our tour guides in the city were Sebastian and Esteban.  We learned about the art of the buildings, and really the art on the buildings reflected either a love letter to the city or a statement of social protest.

This piece describes a protest against big farms taking water from all the smaller farms. Everywhere you turn left or right there is literally graffiti art somewhere. Even for local businesses, they request local artists to paint something local and relevant to their businesses on their storefront.

As with all informal economies, Sebastian told us the unwritten cultural rules of graffiti art. First rule is that artists don’t paint over each others art. If you did it and people found out, you would get a bad rap.

As we drove through byzantine streets, Sebastian stopped to explain this piece of art. What you don’t see is on the right of the building is the ocean front. This is an ode to locals who simply enjoy and never take for granted the sights and scenery of town. Many of the people on murals would be local people, like the old local milk man, newspaper delivery person, fisherman, etc.

This piece of graffiti art is an ode to refugees. As we finished the tour I didn’t actually think of anything as ‘graffiti’ as we have in our urban cities. Instead I saw everything as art, and perhaps this type of guerilla architecture and design is what we need in our hyper planned cities we have in North America.