Rhizomic Software Architecture
This is a broad-level overview of an approach to micro-services that leans heavily on structured data to create a system of linked data in order to facilitate intelligent heterogeneity across a fragmented content infrastructure.
Contents
Collected Notes
0x01
Any node in the system can directly connect to any other node in the system.
0x02
The nodes in our system are micro-services, content management systems, digital asset managers, identity providers, and audience experiences.
0x03
Micro-services include systems like Agora for products and Encore for backlinks.
0x04
Contrast with macro-services, which are things like Hermano for structured data, content api for entry data, and sbn for everything.
0x05
Splitting apart the macro-services into many micro-services allows for greater performance, stability, and domain-driven data structures.
0x06
A distributed system requires a thoughtful approach to both querying data from across the system, as well as emitting data into the system.
0x07
A universal event stream layer can synchronize and coordinate data transfer from an originating service to whichever services require that data.
0x08
A Linked Data approach can allow services to reference each other without reifying data.
0x09
Assume two micro-services; one for Entries and one for Authors. Each Entry and each Author is identified by a URL that resolves each nodes data. An Entry has an array of Authors, each identified by the URL of the Author node. An Author has an array of Entries, each identified by the URL of the Entry.
0x10
// https://authors.voxmedia.org/@sarahjeong
{
"name": "Sarah Jeong",
"profile": "https://images.voxmedia.org/123456",
"bio": "Sarah Jeong is smart and cool.",
"contributedTo": [
"https://anthem.voxmedia.org/someuuid",
"https://pinnacle.voxmedia.org/anotheruuid",
]
}
// https://entries.anthem.voxmedia.org/someuuid"
{
"hed": "Bluesky showed everyone’s ass",
"dek": "In many cases, literally.",
…
"byline": [
"https://authors.voxmedia.org/@sarahjeong"
]
}
0x11
Connections can be traversed by resolving the URLs. If a URL needs to know the titles of the entries that Sarah Jeong has published, it can request the data from the array of URLs under contributedTo
.
0x12
Relationships between nodes can be aggregated into a single micro-service. This service can track the relationship author:sarahjeong contributedTo anthem:someuuid
or anthem:someuuid hasProduct agora:someproductid
and make those relationships queryable.
0x13
A network of entity relationships can be explored with complex queries, for instance “find all authors who have contributed to the same stories this author has contributed to” or “find all authors who have contributed to entries that mention this product”.
0x14
A micro-service like Encore is CMS agnostic. Any CMS can access the API for manipulating the scoped backlinks structure.
0x15
Entries across CMSs have different data structures, but many of the concepts are the same. It’s possible to translate these concepts between data structures, provided there is an approach to handle concepts that simply don’t exist from one to the other.
0x16
With JSON-LD, the data in any JSON blob can be separated from the format of that blob, and cast into different formats.
0x17
const context = {
id: "@id",
name: "uri:uuid:thing-name",
uri: "@id",
title: "uri:uuid:thing-name"
}
const objects = {
"@context": context,
"@graph": [
{
id: "0x01",
name: "Widget"
},{
uri: "thing-two-slug",
title: "Fidget"
}
]
}
const harmonize = async () => {
let expanded = await jsonld.expand(objects)
let terser = await jsonld.compact(expanded, {})
return terser
}
return await harmonize()
Returns:
{
"@graph": [
{
"@id": "0x01",
"uri:uuid:thing-name": "Widget"
},
{
"@id": "thing-two-slug",
"uri:uuid:thing-name": "Fidget"
}
]
}
0x18
No service within the system should need to refactor or make serious changes in its architecture to participate in the linked data system.
0x19
The first step in integrating with the Linked Data system is for a micro-service to expose a node’s data as JSON at a URL.
0x20
Any other service that wants to reference external data should store the external node’s URL as an ID. The external node data could be reified into the services data model, or could be resolved in the services runtime logic, or pulled from a cache.
0x21
A universal event system could be used to update caches across the system, as well as track when relationships between service nodes change and record those changes in the relationship tracking service.
0x22
A universal query service like Tower can easily resolve from micro-services by integrating with those services APIs – either REST or GraphQL
0x23
The worst-case query across Tower would be something that hops back and forth across services multiple times; for instance “get stories from this author, get authors from those stories, get stories from those authors”. This cross-boundary query can be done in a single request with the relationship tracking service.
0x24
Micro-services can have their own stand-alone UIs for manipulating their data, apart from any CMS integration.
0x25
Micro-services can provide a simple CRUD API or a web component that allows for a CMS to directly integrate with them rather than send users to an external UI
0x26
Any service in the system can be removed without disrupting the service, or any other service can rely on a subset of the entire service.
0x27
Cache layers for each service provide resilience to outages, latency, and other problems.
0x28
Each micro-service runs on it’s own K8s cluster, allowing for workloads to be scaled up under load.
0x29
The macro-services can be split up into a collection of micro-services.
0x30
The Hermano macro-service can split into Map, Venue, Product, Game, and other domain-specific micro-services.
0x31
The macro-services do not need to be split up before realizing efficiency improvements.
0x32
The SBN macro-service can be split into audience data, identity management, authors, link sets, hub layouts, taxonomies, communities/networks, and RSS feed filters.
0x33
Micro-services can provide web components that render the data they expose. Any given audience layer could query any given service node – say, a Product – and attach the web component provided by that product.
0x34
// Product Service
{
"name": "Widget",
"id": "https://products.voxmedia.org/someproduct",
"simpleComponent": "https://products.voxmedia.org/simpleproductcomponent",
"complexComponent": "https://products.voxmedia.org/complexproductcomponent",
}
0x35
This approach to linked data and components is very similar to the structure of Clay.
0x36
Encore was a service that was created quickly, and was a light lift. Most of the work was in setting up the systems that manage its deployment, and deciding on its architecture. Now that that is settled, it should be possible to rapidly spin up additional services in the same model.
0x37
The Content API macro-service can be broken into smaller services for signaling and managing Operational Transforms, and storing and querying entry data.
0x38
Operational Transform are a form Conflict-free Replicated Data Type that allows for multiple authors editing a single text field at a given time. They are how Google Docs works.
0x39
Complexity in the Anthem CMS is currently in managing how components interact with the Autosave system, and creating user interfaces for those components.
0x40
Adding a single key:value pair into Anthem autosave currently involves editing the document schema, adjusting the content api graphQL schema, and adjusting the Tower graphQL schema. This can be simplified.
0x41
The process of users pushing their document edits into the OT and CRDT system is called Autosave.
0x42
Creating new components with autosaving data fields in the Anthem CMS can be largely automated with scripts.
0x43
It should be possible to create new entry body component types and entry metadata fields, exposed to downstream consumers, in under an hour in the Anthem CMS.
0x44
Autosave works by accepting a PATCH
request to the endpoint https://voxmedia.stories.usechorus.com/api/content/docs/{id}/contents
0x45
The request JSON has some metadata:
{
"content_hash": "5b8b1aaf01498da24d86bddb6c55408e",
"delta": […],
"schema_version": 4,
"sequence": 17,
"user_id": 8465140,
"uuid": "2b50230a-4085-4a95-8340-61f844fe4e3f"
}
This contains which schema we need, what user is doing the thing, and the entry uuid. The content hash is presumably of the deltas.
0x46
The delta array contains the operational transforms themselves:
"delta": [
{
"o": {
"ops": [
{
"retain": 17
},
{
"insert": " signature plz"
}
]
},
"p": [
"draft",
"dek"
],
"t": "rich-text"
}
]
o
is the operations, p
is the path through the document tree to the proper key, and t
is the format, either rich text or plain text.
0x47
The Autosave service needs to collect every operational transform sent to it. It does not need a schema, but it does need the author id and the node id. Autosave then handles the intersection of the data pipeline.
0x48
Autosave allows clients to connect via web socket, and sends operational transforms it receives from its PATCH
endpoint down to those clients.
0x49
When autosave receives an operational transform for a given field on a given document, it resolves all the operational transforms for that field, then emits an event with the document id, the field path, and the new computed value.
0x50
Autosave becomes a micro-service thats decoupled from any given CMS or datastore, allowing it to serve any editing experience within the entire system.
0x51
Autosaves own datastore can be domain-specific. Types include documents
, paths
, and transforms
, and versions
.
0x52
A Taxonomy micro-service is entirely independent from any expression of content or any particular CMS.
0x53
A Collections micro-service is a generic CMS-agnostic feed-building tool that can be used to create feeds that power RSS, activity pub, or any component that consumes a feed. Decisions about what feeds exist, what source query (groups, entry type, etc) and what stories are “pinned” in the feed happen there, in a destination-neutral way.
0x54
The Taxonomy micro-service can implement the IPTC SKOS vocabulary, or we can layer our own vocabulary over the top of it and extend it for our own uses.
0x55
It may be worth forking the IPTC triplestore and hosting our own instance of it.
0x55
The Taxonomy micro-service only needs to know the associations between entry identifiers and IPTC identifiers. The relationships between those identifiers can be nuanced – for instance a given entry could have a subject
, mentions
, or references
relationship with any IPTC identifier.
0x56
Feed building from the Taxonomy Service can leverage the transitive nature of the IPTC identifier relationships to capture the full web of entries associated with an identifier. As in, get all stories with this subject, as well as all stories with subjects that are transitively narrower than this subject
. This reduces data reification and free-form tag population (ie, every entry must have every relevant tag).
0x57
A structured-data store like SKOS is an example of a rhizomic system – allowing arbitrary cross-connections of heterogeneous nodes while allowing the seamless transition along the spectrum of smooth-to-striated, and able to cast a given network of flat connections into an arbolic tree structure.
0x58
A proof-of-concept app ecocystem could be interesting. We would have
- a collection of domain-specific CRUD apps
- entry cms one
- entry cms two
- autosave documents
- autosave authors
- assets
- skos
- an event message system for synchronizing data
- stream triples with SSE
0x59
A data-store layer would provide API access to entry data via REST and GraphQL. It would accept events from the Autosave service, and store them on the appropriate documents. It would act as a consumer client app for the rhizome system, structuring data into a shape for consumption by a presentation app further downstream.
0X60
We have no idea what ideas are going to work to achieve goals.
0x61
Anthem can create experimental components that can be re-used across a community. These can be created incredibly quickly, and have minimal risk. If they are successful, they can be formalized easily into official components and distributed across the entire organization.
0x62
Experimental components (mc) deliver their own front-end HTML and require no adjustments to schema, client app, or any other system.
0x63
Experimental components (kv) accept a json blob as a schema, which generates a UI in the entry body compose. The generated form accepts values according to the schema, and delivers the JSON blob downstream to the consumer apps.
0x64
Flyvbjerg tells us that successful big systems are often made from an agglomeration of small, modular pieces. His example is the solar farm — a project where solar cells aggregate into solar panels, panels into arrays, and arrays into the farm — which reliably comes in on-time and under-budget.
0x65
Alexander has a nuanced understanding of modularity – no two things in a living system are truly identical, but instead have minor variations that allow them to respond to local conditions.
0x66
Flyvbjerg notes that a project to build schools across Nepal was accomplished under budget and ahead of schedule, thanks to the modular nature of the classrooms built using local processes. He notes that timelines and budgets could have been improved further by prefabricating classroom modules and delivering them to the site. Alexander tells us that this would have in fact damaged the project, but removing the ability for local labor to respond to local labor with local solutions.
0x67
Flyvbjerg and Alexander both identity a need for the “construction site” to become an “assembly site”.
0x68
Flyvbjerg endorses the removal of the situational context, and the abstraction of the site by creating detailed computer models. Alexander endorses transforming the site into a model of itself, a machine of maximum information designed to simulate its own next step.
0x69
Key to Alexanders way of working, which is supported by Flyvbjerg insistent of “maximum virtual model”, is engaging in what Easterling calls “medium design” — the act of planning and designing a process that the work follows.
0x70
Flyvbjerg identifies conceptual labor become necessary during the delivery of a project as a key indicator of that project blowing its budget and timeline. The solution to this is to do as much conceptual labor as you can at the front of the project, leaving conventional labor to occur in a single lump.
0x71
Alexander’s work on process suggests that front-loading all of a projects conceptual labor is not feasible, and deprives the project of critical information that can only be known through the doing of the project. The hard part becomes understanding which segments of a project will require conceptual labor, and ordering them so that conceptual labor builds off the previous steps instead of undermining them.
0x72
Deleuze and Guatari identify an information system structure they call the “rhizome” that is juxtaposed against the traditional tree structure of hierarchy. They lay forth the principles of the rhizome as connection, heterogeneity, multiplicity, signification of rupture, cartography, and decalomania.
0x73
Connectivity in a rhizome system is the principle that any given section of the system can and must connect to any other given section. Lateral exchange of information is a necessity.
0x74
Heterogeneity in a rhizome system is the principle that there may be many different kinds of things. Deleuze and Guatari show that the rhizome in fact as no nodes, and it composed only of lines. The “nodes” in a rhizome are assemblages of lines, dense knots of lines that come together in the form of a bulb or tuber.
0x75
These principles can be stated in a software architecture model as a rejection of centralized, tree-shaped data structures and flows. Instead, data and systems can and should be cross-compatible, able to exist independently or within a network of endless other segments of the system. The data and structure of the system must devolve entirely into edges, while services and documents for the tuber-like aggregation of those edges.
0x76
The rhizome is an assemblage of statements, with no subject or object and no central order or control. Language and vocabulary create pressure systems of order which can seize power or be disrupted.
0x78
In concrete terms, our software architecture has scalar typed values and relationships. Scalar values may be related to each other and be related to other relationships. Relationships themselves can be related to the relationships. Any relationship is directional, with an implied inverse.
0x79
Relationships and values can be interested across varying contexts, with the underlying information they contain preserved, destroyed, translated, or transformed.
0x78
A micro-service for the domain of “Authors” is an example of lines and edges aggregating into a useful tuber or bulb within the Rhizome.
0x79
An ‘Authors’ micro-service would create a domain vocabulary for speaking about what we want to communicate about an author. This would include statements about their name, their biography, images, articles they have contributed on, anything relevant to the domain of the author.
0x80
The Author
micro-service exposes the basic CRUD operations on that data for any other service that wants to operate in it, either editing and updating the data or considering and rendering it. The micro-service may have it’s own stand-alone interface that may be exposed as a dynamic island to other services.
0x81
A micro-service may implement the Autosave
model of service interaction, allowing for multiple concurrent edits of any given piece of information, along with complete version control and attribution of changes.
0x82
A micro-service can have it’s data consumed and translated into any given context for any other service, given that a context-mapping resolution is written for that moment of connection.
0x83
Data in the author service is considered canonical for its domain. Ideally, that data is not reified into other services.
0x84
Micro-services can be queried against as a single, federated endpoint. In this way, the collected services can be treated as a single large service which maintains relationships and connections across them.
0x85
The federated query endpoint can be accessed from any of the individual services.
0x86
Autosave is an implementation for a service. It works by defining a document. Each document has a stack of versions, each referencing the one before. Versions are immutable, and each references a static collection of operational transforms. Each document has a draft. aDrafts are mutable, and each has a stack of Operational Transforms that define its difference from the version at the head of the document. As operational transforms come in, autosave re-emits them to other clients listening over a web socket connection. When the draft is published, the draft is attached to the head of the versions and a new draft is created.
0x87
The read API’s for an autosave service are oriented around requesting a documents head, it’s draft, or a specific version and it’s transforms.
0x88
The write API’s for an autosave service are in creating new documents and patching autosave transforms to a draft.
0x89
The strength of the rhizome system is in embracing heterogeneity and the loosening of control. It allows for many agents within a system to control their own domains, it can grow, shrink, spread, and incorporate new systems and ideas easily. It creates a framework for maximum interoperability and minimum lock-in to structure, shape, and past decisions.
0x90
The weakness of the rhizome system is in its chaotic complexity. Understanding the whole system becomes difficult, and reasoning through complex third order consequences may be challenging.
0x91
Methods of addressing the weaknesses int he rhizome system hinge around re-imagined ways of working that are individually smaller, individually simpler.
0x92
Alongside working small, the rhizome should be thought large. Models of the system should exist within the system, and allow for testing and understanding changes that can happen to the system. The rhizome requires maintenance and thought at every step along the specificity gradient, not just at the code.
0x93
The assemblage is a body without organs, it builds upon itself through density and multiplicity.
0x94
Individual editors of individual publications don’t want or need to be constrained by
Goals
The purpose of this system is twofold:
- Track and expose complex relationships between items in order to facilitate queries.
- Reduce reification of canonical data across systems to allow for abstracted re-use of items across sites, verticals, and content management systems.
For example, The Verge has a new format of short entries called Quickposts. These often link out to the wider internet, and have some additional contextual information. Staff at The Verge want to know if any previous Quickpost has linked to any given URL in order to avoid duplicating content. A variety of other properties have the same question when it comes to the use of images to illustrate stories. A common question is “how many other stories have used this image”.
As an example of reducing reification, our Agora system has a collection of Products that many different stories can reference. Ideally, this data lives in a single record in Agora, and is referenced whenever it is needed by any story inside any Content Management System. Another example of data reification today is with Authors. The data for any given Author is reified across all the stories that author has contributed too, making updates to that data large, slow affairs. Additionally, an Author in Anthem is unable to be reused in either either Pinnacle or Clay.
Approach
The Rhizome consists of a large number of separate services that must interact with each other. These include;