Skip to main content
Category

Case Study

How the Wikimedia Foundation Balances Security and Open Information in Web Development

By Blog, Case Study

Background

The Wikimedia Foundation is the non-profit that hosts Wikipedia and other free knowledge and open data projects. These projects are made possible by a global community who, together with the Foundation, comprise the “Wikimedia movement”. The Wikimedia movement is united by a vision: to bring about a world in which every single human being can freely share in the sum of all knowledge.

Photo by Sage Ross (CC BY-SA 2.0)

We talked with Timo Tijhof, Principal Engineer at the Wikimedia Foundation, to find out how the organization approaches security and performance at scale. Timo has worked at the Wikimedia Foundation for over 10 years, first starting as a front-end developer and eventually as a part of the Performance Team. 

The Wikimedia movement is rooted in the culture of freely licensed software. The MediaWiki application that Wikipedia runs on, and all other software developed at the Foundation, is open source. “That includes the configuration and datacenter automation of our web servers, databases, and CDN service,” said Timo. The Wikimedia community and any other individual or organization may inspect, contribute to, reuse for themselves, or fork any aspect of the platform at any time. This philosophy is also the basis of long-standing security practices which support visibility and openness.

Increased Security is about Increased Visibility and Trust

We live in an incredible world. Today, most online devices are powered by open source. Whether the data centers of video streaming giants and social media sites, or your smartphone, they likely run an open source operating system like Linux or a BSD derivative. The vast majority of websites are also built with open source tools, or run on open source platforms. When you build on existing software that is developed by another organization or community, this is called an “upstream”.

The Wikimedia Foundation relies heavily on upstream technology to power its platforms. This allows the organization to focus on its core mission of providing free knowledge to the world, rather than on developing and maintaining technology from scratch. Additionally, by collaborating with other open source projects, the Foundation is able to give back to the broader free software ecosystem and help advance the state of technology for everyone.

The Wikimedia Foundation is notable for operating exclusively with upstreams that are also open source. This ensures the community’s freedom principles (to freely inspect, modify, reuse, and fork) are not hindered by proprietary components.

New Wikimedia production software components or dependencies must pass certain fitness checks and a chain of trust for the software’s security and integrity. When the Wikimedia community creates software that is peer-reviewed during development, this trust follows implicitly from its public policies and standards. When adding a new third-party package or dependency (“upstream”), this chain needs to be established by other means.

The Wikimedia Foundation extends its chain to several credible upstream vendors and communities. For example, Debian, known for its Linux operating system, is host to the highly trusted and curated Debian package repository. When a package is present in the Debian repository, this signals trust, stability, and confidence to the industry. Timo adds, “While we usually don’t audit source code of Debian packages, installing a Debian package may still require a concept review to validate and verify that the package actually intends to meet our scale, threat model, and performance requirements.”

When considering PHP or JavaScript libraries from an anonymous and open registry like npm or Packagist, the Wikimedia Foundation audits the code as if it were its own. The Wikimedia Foundation keeps on-going costs to a minimum by adopting upstream packages in areas that solve non-trivial problems, have stable external requirements, and sit behind a module boundary. “Dependencies should reduce cost, not increase it. In practice, we only consider packages with few or no transitive dependencies, written for a stable runtime,” said Timo.

As an added precaution, the Wikimedia Foundation prohibits networking to third-party services in its production realm. When deploying or installing the MediaWiki application, it does not download JavaScript or PHP packages from npm or Composer. Instead, upstream packages are downloaded as a file with an integrity hash, and already checked into Git. This approach implements the organization’s security requirements, allowing for transparent auditing, patch-ability, and independent offline deployment. “It also helps with faster onboarding, consistent and reproducible development, and creates a natural space for auditing upstream changes,” said Timo.

The Most Localized Software in the World

With over 300 language editions, Wikipedia might be among the most-translated literature in the world. Wikipedia editors usually write or translate articles manually, and in recent years, the ContentTranslation tool has helped editors do this more efficiently, producing over 1 million articles through this new tool alone. 

The MediaWiki platform underneath it all recognizes and localizes its user interface in over 400 languages, including gender, pluralization rules (“10 new messages”), and sort order ICU collations. “We contribute to the Unicode CLDR standard on behalf of Wikipedia’s language communities. These contributions flow downstream to other Unicode customers such as Linux, Apple, and Microsoft.” said Timo.

Languages like Arabic and Hebrew are written from right to left. CSSJanus takes stylesheets designed and developed for left-to-right languages like English, and automatically converts them into right-to-left layouts. “We deploy the MediaWiki platform on a weekly basis. Each change to functionality is deployed to all supported languages from day 1, every time. CSSJanus is part of what makes this feasible and with little to no developer training,” said Timo.

Not all issues are that easy! During VisualEditor development, extensive effort went into localizing the bold and italic toolbar buttons. The familiar “B” and “I” buttons usually make place for an equivalent abbreviation, such as F (Fett) and K (Kursiv) in German, with a stylized A for language communities that have no accepted standard. But, early adoption of English-centric software led to “B” and “I” becoming the established and culturally familiar design pattern in some languages. In Hebrew, Czech, and Malayalam “correcting” these with a translation actually created confusion.

No Profit Motive Means Better Support

Large corporations, driven by profit motives, regularly drop support for older devices and browsers. The Wikimedia Foundation, however, has an imperative to make information more accessible, not less.

How does the organization pull that off without the resources of a large corporation? “Through equal parts being aggressively lean and aggressively uncompromising,” says Timo.

The organization saves development and testing costs by writing and deploying native JavaScript that targets only modern browsers. Through an approach inspired by BBC News’ cutting the mustard, the Foundation enables millions of people (1% of its 2 billion monthly users) to access Wikipedia through a JavaScript-free experience. This is the same experience that all page views start at prior the (optional) arrival of JavaScript code.

The Wikimedia Foundation’s development principles and browser support policy reflects this by emphasizing the importance of progressive enhancement.

Viewing Wikipedia through a web browser is the most common access method, but Wikipedia’s knowledge is consumed far beyond the canonical experience at Wikipedia.org. “Wikipedia content goes everywhere. It’s distributed offline through Kiwix and IPFS, rendered in native apps like Apple Dictionary, and even shared peer-to-peer through USB sticks,” said Timo. What these environments have in common is that they may not involve JavaScript as they require high security and high privacy. This is made possible at no extra cost due to APIs offering complete content HTML-first, with CSS and embedded media based on open formats only.

Summary

The Wikimedia Foundation prioritizes both security and openness. To achieve this balance, it implements a number of practices and policies that ensure that it protects both the freedoms and the privacy of its audience, all while sharing information transparently.

For example, the Foundation publishes an annual transparency report detailing its response to information and takedown requests twice per year. The Wikimedia Foundation’s Board positions are largely held by community members, and appointed by public election through anonymous and cryptographically-verifiable votes from any eligible Wikipedia account. Its Governance Wiki publishes the Foundation’s bylaws, board decisions, and meetings.

The Foundation participates in an ecosystem of organizations that collaborate on freely-licensed information and open-source software. Overall, the organization balances exceptional security and openness by implementing strong security practices, and providing transparency about their policies and procedures.

——

Timo is currently helping with the Open Source Security Foundation (OpenSSF) Project Omega-Alpha and the OpenJS Foundation to reduce potential security risks for jQuery. OpenSSF has committed close to $350,000 to reduce potential security incidents for jQuery by helping modernize its infrastructure and its code. The goal for 2023 is to update jQuery infrastructure, identify potential security risks and pain points for end-users, and understand the factors that influence the adoption of new software versions.

OpenJS in Action: There’s Open Source in Your Credit Card with Neo Financial

By Blog, Case Study, OpenJS In Action

We recently met with Ian Sutherland, engineering lead and Head of Developer Experience at Canadian fintech startup, Neo Financial. Ian has been with Neo Financial from the very beginning and has seen the engineering team grow from 1 employee to over 150 individuals in the last three years. Ian is also a Collaborator on the Node.js project hosted at the OpenJS Foundation. Watch the full interview:

https://youtu.be/pCfM4_jxH0E

What is Neo Financial?

Neo is a financial technology company that is reimagining how Canadians bank. Their first product was a rewards credit card, they later introduced a top rated high-interest savings account and recently launched Neo Invest, the first fully digital, actively managed investment experience. With Neo Invest your portfolio is actively managed by experts and engages a greater range of asset classes and investment strategies than most competitor portfolios.

Developed with JavaScript First

The Neo Financial web banking portal provides seamless mobile-first interactions for users. The backend of the portal is built entirely using JavaScript and Node.js and powers all of the app’s microservices and transaction processing. Ian and the engineering team decided early on that they would use JavaScript for everything they possibly could in developing their product. Ian shared his opinion that “Node.js is the technology of choice for running JavaScript on the server, so from day one, a decision was made that the team would use Node.js.”

Other factors influenced their decision to work with JavaScript and related technologies like Node.js. JavaScript is currently the most widely used programming language, which they felt would make it easier to scale their team of developers quickly. The language also, put simply, works well for them. Their team finds it easy to containerize apps using Node.js. It’s also easy to build serverless functions written in JavaScript running on Node.js, with no compilation step required. It’s fair to say that Node.js provides excellent performance and scalability, keeping their team’s infrastructure costs low. 

Working with Other Open Source Technologies

Using Node.js and JavaScript for local development has worked well for the Neo Financial dev team. Ian shared that they have a swift development experience where a person can change some code and have the project reload instantly. He also cited the npm ecosystem and the “millions” of packages out there as a benefit, helping his team work very productively. They also use TypeScript and Fastify in all of their services, and webpack indirectly through other frameworks.

Open source packages and plugins help speed up their team’s development work. Chances are, someone has already dealt with a similar problem. These packages make it easier to solve whatever the needs are without recreating the wheel.

A Note on Security

As a financial company, security is top of mind for Ian and his team. The Node.js project has also been focusing more on the security of Node.js itself. As a result, the devs at Neo feel very comfortable running it in production.

Contributing to Open Source

On a personal level, Ian has been involved in open source for several years. He started by making smaller contributions and later got involved in the React community. He eventually became a core maintainer of the Create React App and has been working on that project for the last three or four years. Then, about four years ago, he got involved in Node.js itself, primarily as part of a working group called the Tooling Group. The focus of this group is on making Node.js the best tool it can be for building things like CLI tools, or other tools that might run in a CI or build environment, lambdas, etc. 

As a team, the Neo Financial engineers try to do their part. They’ve open sourced developer tools and GitHub actions and try their best to give back wherever they possibly can. In a big thank you to the open source community, Ian said, “We have an awesome community. People are doing development on open source projects for free as volunteers when they contribute lines of code and fixes and documentation.”

We at the OpenJS Foundation feel the same way. We wouldn’t be anywhere without our contributors and our fantastic community. It was a pleasure speaking with Ian, and we’re grateful for his input as an individual and an engineering team leader. 

OpenJS In Action: How Wix Applied Multi-threading to Node.js and Cut Thousands of SSR Pods and Money

By Blog, Case Study, OpenJS In Action
Author: Guy Treger, Sr. Software Engineer, Viewer-Server team, Wix

Background:

In Wix, as part of our sites’ Server-Side-Rendering architecture, we build and maintain the heavily used Server-Side-Rendering-Execution platform (aka SSRE). 

SSRE is a Node.js based multipurpose code execution platform, that’s used for executing React.js code written by front-end developers all across the company. Most often, these pieces of JS code perform CPU-intensive operations that simulate activities related to site rendering in the browser.

Pain: 

SSRE has reached a total traffic of about 1 million RPM, requiring at times largely more than accepted  production Kubernetes pods to serve it properly. 

This made us face an inherent painful mismatch:
On one side, the nature of Node.js – an environment best-suited for running I/O-bound operations on its single-threaded event loop. On the other, the extremely high traffic of CPU oriented tasks that we had to handle as part rendering sites on the server.

The naive solution we started with clearly proved inefficient, causing some ever growing pains in Wix’s server and infrastructure, such as having to manage tens of thousands of production kubernetes pods.

Solution: 

We had to change something. The way things were, all of our heavy CPU work was done by a single thread in Node.js. The straightforward thing that comes to mind is: offload the work to other compute units (processes/threads) that can run parallely on hardware that includes multiple CPU cores.

Node.js already offers multi-processing capabilities, but for our needs this was an overkill. We needed a lighter solution that would introduce less overhead, both in terms of resources required and in overall maintenance and orchestration.

Only recently, Node.js has introduced what it calls worker-threads. This feature has become Stable in the v14 (LTS) released in Oct 2020.

From the Node.js Worker-Threads documentation:

The worker_threads module enables the use of threads that execute JavaScript in parallel. To access it:

const worker = require(‘worker_threads’);

Workers (threads) are useful for performing CPU-intensive JavaScript operations. They do not help much with I/O-intensive work. The Node.js built-in asynchronous I/O operations are more efficient than Workers can be.

Unlike child_process or cluster, worker_threads can share memory.

So Node.js offers a native support for threading that we could use, but since it’s a pretty new thing around, it still lacks a bit in maturity and is not super smooth to use in your production-grade code of a critical application.

What we were mainly missing was:

  1. Task-pool capabilities
    What Node.js offers?
    One can manually spawn some Worker threads and maintain their lifecycle themselves. E.g.:
const { Worker } = require(“worker_threads”);

//Create a new worker
const worker = new Worker(“./worker.js”, {workerData: {….}});

worker.on(“exit”, exitCode => {
  console.log(exitCode);
});


We were reluctant to spawn our Workers manually, make sure there’s enough of them at every given time, re-create them when they die, implement different timeouts around their usage and more stuff that’s generally available for a multithreaded application.

  1. RPC-like inter-thread communication
    What Node.js offers?
    Out-of-the-box, threads can communicate between themselves (e.g. the main thread and its spawned workers) using an async messaging technique:
// Send a message to the worker
aWorker.postMessage({ someData: data })

// Listen for a message from the worker
aWorker.once(“message”, response => {
  console.log(response);
});

Dealing with messaging can really make the code much harder to read and maintain. We were looking for something friendlier, where one thread could asynchronously “call a method” on another thread and just receive back the result.

We went on to explore and test various OS packages around thread management and communication.


Along the way we found some packages that solve both the threadpool problem and the elegant RPC-like task execution on threads. A popular example was piscina. It looked all nice and dandy, but there was one showstopper. 


A critical requirement for us was to have a way for our worker threads to call some APIs exposed back on the main thread. One major use-case for that was reporting business metrics and logs from code running in the worker. Due to the way these things are widely done in Wix, we couldn’t directly do them from each of the workers, and had to go through the main thread.

So we dropped these good looking packages and looked for some different approaches. We realized that we couldn’t just take something off the shelf and plug it in.

Finally, we have reached our final setup with which we were happy.

We mixed and wired the native Workers API with two great OS packages:

  • generic-pool (npmjs) – a very solid and popular pooling API. This one helped us get our desired thread-pool feel.
  • comlink (npmjs) – a popular package mostly known for RPC-like communication in the browser (with the long-existing JS Web Workers). It recently added support for Node.js Workers. This package made inter-thread communication in our code look much more concise and elegant.

The way it now looks is now along the lines of the following:

import * as genericPool from ‘generic-pool’;
import * as Comlink from ‘comlink’;
import nodeEndpoint from ‘comlink/dist/umd/node-adapter’;

export const createThreadPool = ({
                                    workerPath,
                                    workerOptions,
                                    poolOptions,
                                }): Pool<OurWorkerThread> => {
    return genericPool.createPool(
        {
            create: () => {
                const worker = new Worker(workerPath, workerOptions);
                Comlink.expose({
                    …diagnosticApis,
                }, nodeEndpoint(worker));

                return {
                    worker,
                    workerApi: Comlink.wrap<ModuleExecutionWorkerApi>(nodeEndpoint(worker)) };
            },
            destroy: ({ worker }: OurWorkerThread) => worker.terminate(),
        },
        poolOptions
    );
};


And the usage in the web-server level:

const workerResponse = await workerPool.use(({ workerApi }: OurWorkerThread) =>
    workerApi.executeModule({
        …executionParamsFrom(incomingRequest)
    })
);

// … Do stuff with response

One major takeaway from the development journey was that imposing worker-threads on some existing code is by no means straightforward. Logic objects (i.e. JS functions) cannot be passed back and forth between threads, and so, sometimes considerable refactoring is needed. Clear concrete pure-data-based APIs for communication between the main thread and the workers must be defined and the code should be adjusted accordingly.

Results:

The results were amazing: we cut the number of pods by over 70% and made the entire system more stable and resilient. A direct consequence was cutting much of Wix’s infra costs accordingly.

Some numbers:

  • Our initial goal was achieved – total SSRE pod count dropped by ~70%.
    Respectively, RPM per pod improved by 153%.
  • Better SLA and a more stable application – 
    • Response time p50 : -11% 
    • Response time p95: -20%
    • Error rate decreased even further: 10x better
  • Big cut in total direct SSRE compute cost: -21%

Lessons learned and future planning:

We’ve learned that Node.js is indeed suitable also for CPU-bound high-throughput services. We managed to reduce our infra management overhead, but this modest goal turned out to be overshadowed by much more substantial gains.

The introduction of multithreading into the SSRE platform has opened the way for follow-up research of optimizations and improvements:

  • Finding the optimal number of CPU cores per machine, possibly allowing for non-constant size thread-pools.
  • Refactor the application to make workers do work that’s as pure CPU as possible.
  • Research memory sharing between threads to avoid a lot of large object cloning.
  • Apply this solution to other major Node.js-based applications in Wix.

Node.js Case Study: Ryder

By Blog, Case Study, member blog, Node.js

Ryder Delivers Real-Time Visibility in Less Time with Profound Logic’s Node.js solution

This case study was initially published on Profound Logic’s website. Profound Logic is a member of the OpenJS Foundation.

Ryder’s low-code, screen scraping solution was an effective solution for a long time, yet, as their customers’ expectations evolved,  they had an opportunity to upgrade. 

To keep up with consumer demand, they implemented Profound Logic’s Node.js development products to create RyderView. Their new web-based solution helped transform usability for their customers and optimize internal business processes for an overall better experience.

The Challenge

Third-party freight carriers across North America rely on Ryder’s Last Mile legacy systems to successfully deliver packages. Constantly adding features the legacy system made for a a monolithic application that was no longer intuitive nor scalable.

The Solution

The Ryder team, lead by Barnabus Muthu, IT & Application Architect, wanted to develop an intuitive web application that provided real-time access to critical information. Muthu wanted to balance the need for new development with his legacy programs’ extensive business logic.

Profound Logic’s Node.js development solutions were a great fit and allowed Muthu to expose his IBM i databases via API to push and pull data from external cloud systems in real-time. He was also able to drive efficiency on dev time by using npm packages. Using Node.js, Ryder was able to built a modern, web-based application that no longer relied on green screens, while leveraging his existing RPG business logic.

The Result

This new solution was named RyderView and it transformed usability for its customers, translating to faster onboarding and reduced training costs for Ryder.

For third-party users, it led to improved productivity as the entire time-consuming processes were made obsolete. Previously, Ryder’s third-party agents used paper-based templates to capture information while in the field. Now that Ryder’s new application used microservices to push and pull data from iDB2, end users were upgraded to a mobile application. These advancements benefited Ryder as well, allowing them to eliminate paperwork, printing costs, and the licensing of a document processing software.

Read the full case study: https://www.profoundlogic.com/case-study/ryder/

AMP Project Case Study: VOGSY

By AMP, Blog, Case Study

VOGSY Improves Services Firms’ Quote-to-Cash Speed by 80% with AMP-powered dynamic emails

The full case study was originally published on the AMP website. AMP is a hosted project at the OpenJS Foundation

AMP Project with lightening bolt

VOGSY is a professional services automation solution built on Google Workspace. By offering a single source of engagement to efficiently manage projects, resources, tasks, timesheets and billing, VOGSY streamlines services firms’ business operations from quote to cash, preventing handoff delays between sales, project delivery, and accounting teams.

Challenge

VOGSY was facing challenges due to siloed departments and disparate tools. Seeing an opportunity to never drop the quote-to-cash baton, VOGSY implemented AMP For Email to send actionable workflow emails directly to its users’ inboxes.

Results

The results of using the open source project led to huge efficiency gains for VOGSY clients including an 80 percent increase in approval speed for invoices, timesheets, quotes and expenses.  

To read more about the benefits for VOGSY read the full case study: https://amp.dev/success-stories/vogsy/

Dressed to Impress: NET-A-PORTER, Mr Porter and JavaScript Frameworks

By Blog, Case Study, Fastify, OpenJS In Action

For this OpenJS In Action, Robin Glen, Principal Developer for YNAP joined the OpenJS Foundation Director Robin Ginn to discuss their use of JavaScript in building a global brand. YNAP is the parent company of luxury retailer NET-A-PORTER. Glen works within the Luxury division team at NET-A-PORTER (NAP), working on NET-A PORTER and Mr Porter. He has been with NAP for over 10 years. In addition to his work with NAP, Glen is also a member of the Chrome Advisory board.

Glen has been leading the developer team at YNAP for almost a decade, and continues to test, iterate and implement cutting edge open source technologies. For example, he was an early adopter of the Fastify web framework for Node.js to help increase web performance, particularly with the demand spikes his company experiences during holidays and sales.

Topics ranged from ways to make the user experience feel more pleasant and secure, to issues around Javascript bloat. Questions focused on the history of NAP, how NAP chose their current framework, and how that framework allows them to best service customers in their e-commerce site. 

The full interview is available here: OpenJS In Action: NET-A-PORTER, Mr Porter and JavaScript Frameworks 

Timestamps

0:00 Brief Introduction

2:09 Technology and NET-A-PORTER 

3:11 Defining Architectural Moment

4:50 Where YNAP is Today

6:50 Factors in Choosing Technologies? 

10:30 Fastify

14:00 YNAP and JS Foundation

15:10 Looking Forward: Engineering Roadmap  

18:58 What’s a “Good Day At Work” for you?

20:00 Wrap-Up

OpenJS In Action: ESRI powering COVID-19 response with open source

By Blog, Case Study, Dojo, ESLint, Grunt, OpenJS In Action

The OpenJS In Action series features companies that use OpenJS Foundation projects to develop efficient, effective web technologies. 

Esri, a geographic information systems company, is using predictive models and interactive maps with JavaScript technologies to help the world better understand and respond to the recent COVID-19 pandemic. Recently, they have built tools that visualize how social distancing precautions can help reduce cases and the burden on healthcare systems. They have also helped institutions like Johns Hopkins create their own informational maps by providing a template app and resources to extend functionality. 

Esri uses OpenJS Foundation projects such as Dojo Toolkit, Grunt, ESLint and Intern to increase developer productivity and deliver high-quality applications that help the world fight back against the pandemic. 

Esri’s contributions to the COVID response effort and an explanation of how they created the underlying technologies are available at this video: 

https://youtu.be/KLnht-1F3Ao

Robin Ginn, Executive Director of the OpenJS Foundation, spoke with Kristian Ekenes, Product Engineer at Esri, to highlight the work his company has been doing. Esri normally creates mapping software, databases and tools to help businesses manage spatial data. However, Ekenes started work on a tool called Capacity Analysis when the COVID-19 pandemic began to spread. 

Capacity Analysis is a configurable app that allows organizations to display and interact with results from two scenarios predicting a hospital’s ability to meet the demand of COVID-19 patients given configurable parameters, such as the percentage of people following social distancing guidelines. Health experts can create two hypothetical scenarios using one of two models: Penn Medicine’s COVID-19 Hospital Impact Model for Epidemics (CHIME) or the CDC’s COVID-19Surge model. Then they can deploy their own version of Capacity Analysis to view how demand for hospital beds, ICU beds, and ventilators varies by time and geography in each scenario. This tool is used by governments worldwide to better predict how the pandemic will challenge specific areas.

During the interview, Ekenes spoke on the challenges that come with taking on ambitious projects like Capacity Analysis. Esri has both a large developer team and a diverse ecosystem of applications. This makes it difficult to maintain consistency in the API and SDKs deployed across desktop and mobile platforms. To overcome these challenges, Esri utilizes several OpenJS Foundation projects including Dojo Toolkit, Grunt, ESLint and Intern

Ekenes explained that Grunt and ESLint increase developer productivity by providing real-time feedback when writing code. The linter also standardizes work across developers by indicating when incorrect practices are being used. This reduces the number of pull requests between collaborators and saves time for the entire team. Intern allows developers to write testing modules and create high-quality apps by catching bugs early. In short, Esri helps ensure consistent and thoroughly tested applications by incorporating OpenJS Foundation projects into their work. 

From streaming to studio: The evolution of Node.js at Netflix

By Blog, Case Study, Node.js, Project Update

As platforms grow, so do their needs. However, the core infrastructure is often not designed to handle these new challenges as it was optimized for a relatively simple task. Netflix, a member of the OpenJS Foundation, had to overcome this challenge as it evolved from a massive web streaming service to a content production platform. Guilherme Hermeto, Senior Platform Engineer at Netflix, spearheaded efforts to restructure the Netflix Node.js infrastructure to handle new functions while preserving the stability of the application. In his talk below, he walks through his work and provides resources and tips for developers encountering similar problems.

Check out the full presentation 

Netflix initially used Node.js to enable high volume web streaming to over 182 million subscribers. Their three goals with this early infrastructure was to provide observability (metrics), debuggability (diagnostic tools) and availability (service registration). The result was the NodeQuark infrastructure. An application gateway authenticates and routes requests to the NodeQuark service, which then communicates with APIs and formats responses that are sent back to the client. With NodeQuark, Netflix also created a managed experience — teams could create custom API experiences for specific devices. This allows the Netflix app to run seamlessly on different devices. 

Beyond streaming

However, Netflix wanted to move beyond web streaming and into content production. This posed several challenges to the NodeQuark infrastructure and the development team. Web streaming requires relatively few applications, but serves a huge user base. On the other hand, a content production platform houses a large number of applications that serve a limited userbase. Furthermore, a content production app must have multiple levels of security for employees, partners and users. An additional issue is that development for content production is ideally fast paced while platform releases are slow, iterative processes intended to ensure application stability. Grouping these two processes together seems difficult, but the alternative is to spend unnecessary time and effort building a completely separate infrastructure. 

Hermeto decided that in order to solve Netflix’s problems, he would need to use self-contained modules. In other words, plugins! By transitioning to plugins, the Netflix team was able to separate the infrastructure’s functions while still retaining the ability to reuse code shared between web streaming and content production. Hermeto then took plugin architecture to the next step by creating application profiles. The application profile is simply a list of plugins required by an application. The profile reads in these specific plugins and then exports a loaded array. Therefore, the risk of a plugin built for content production breaking the streaming application was reduced. Additionally, by sectioning code out into smaller pieces, the Netflix team was able to remove moving parts from the core system, improving stability. 

Looking ahead

In the future, Hermeto wants to allow teams to create specific application profiles that they can give to customers. Additionally, Netflix may be able to switch from application versions to application profiles as the code breaks into smaller and smaller pieces. 

To finish his talk, Hermeto gave his personal recommendations for open source projects that are useful for observability and debuggability. Essentially, a starting point for building out your own production-level application!  

Personal recommendations for open source projects 

Metrics and alerts: 

Centralized Logging 

Distributed tracing 

Diagnostics 

Exception Management 

Expedia Group: Building better testing pipelines with opensource.

By Blog, Case Study, ESLint, OpenJS In Action

The OpenJS In Action series features companies that use OpenJS Foundation projects to help develop efficient, effective web technologies. 

Software developers at global travel company Expedia Group are using JavaScript, ESLint and robust testing pipelines to reduce inconsistency and duplication in their code. Switching from Java and JSP to Node.js has streamlined development and design systems. Beyond that, Expedia developers are looking into creating a library of reusable design and data components for use across their many brands and pages. 

Robin Ginn, executive director of the OpenJS Foundation, interviewed Tiffany Le-Nguyen, Software Development Engineer at Expedia Group.

Expedia is an example of how adoption of new technologies and techniques can improve customer and developer experiences. 

A video featuring Expedia is available here: https://youtu.be/FDF6SgtEvYY

Robin Ginn, executive director of the OpenJS Foundation, interviewed Tiffany Le-Nguyen, Software Development Engineer at Expedia Group. Le-Nguyen explained how accessibility and performance concerns led developers to modernize Expedia’s infrastructure. One of the choices they made was to integrate ESLint into their testing pipeline to catch bugs and format input before content was pushed live. ESLint also proved to be a huge time-saver — it enforced development standards and warned developers when incorrect practices were being used. 

ESLint was especially useful for guiding new developers through JavaScript, Node.js and TypeScript. Expedia made the bold move to switch most of their applications from Java and JSP to Node.js and TypeScript. Le-Nguyen is now able to catch most errors and quickly push out new features by combining Node.js with Express and a robust testing pipeline. 

However, Expedia is used globally to book properties and dates for trips. Users reserve properties with different currencies across different time zones. This makes it difficult to track when a property was reserved and whether the correct amount was paid. Luckily, Expedia was able to utilize Globalize, an OpenJS project that provides number formatting and parsing, date and time formatting and currency formatting for languages across the world. Le-Nguyen was able to simplify currency tracking across continents by integrating Globalize into the project. 

To end the talk, Le-Nguyen suggested that web developers should take another look into UI testing. Modern testing tools have simplified the previously clunky process. Proper implementation of a good testing pipeline improves the developer experience and leads to a better end product for the user. 

How Node.js saved the U.S. Government $100K

By Blog, Case Study, Node.js, OpenJS World

The following blog is based on a talk given at the OpenJS Foundation’s annual OpenJS World event and covers solutions created with Node.js.

When someone proposes a complicated, expensive solution ask yourself: can it be done cheaper, better and/or faster? Last year, an external vendor wanted to charge $103,000 to create an interactive form and store the responses. Ryan Hillard, Systems Developer at the U.S. Small Business Administration, was brought in to create a less expensive, low-maintenance alternative to the vendor’s proposal. Hillard was able to create a solution using ~320 lines of code and $3000. In the talk below Hillard describes what the difficulties were and how his Node.js solution fixed the problem.

Last year, Hillard started work on a government’s case management system that received and processed feedback from external and internal users. Unfortunately, a recent upgrade and rigorous security measures prevented external users from leaving feedback. Hillard needed to create a secure interactive form and then store the data. However, the solution also needed to be cheap, easy to maintain and stable. 

Hillard decided to use three common services: Amazon Simple Storage Service (S3), Amazon Web Services (AWS) Lambda and Node.js. Together, these pieces provided a simple and versatile way to capture and then store response data. Maintenance is low because the servers are maintained by Amazon. Additionally, future developers can easily alter and improve the process as all three services/languages are commonly used. 

To end his talk, Hillard discussed the design and workflow processes that led him to his solution. He compares JavaScript to a giant toolkit with hundreds of libraries and dependencies — a tool for every purpose. However, this variety can be counterproductive as the complexity – and thus the management time – increases.

Developers should ask themselves how they can solve their problems without introducing anything new. In other words, size does matter — the smallest, simplest toolkit is the best!