The Renaissance of the Document

Why unstructured data might become a trusted sanctuary, again

Martin Jahr
6 min readFeb 19, 2023

Knowledge Evolution

“Lifelong Learning”, “Knowledge Society”, “The Learning Organization” — not one day passes by without an info bit about how we should tame the information tornado around us. In fact forces us the sheer amount of available information within our reach to build personal knowledge strategies. Self-management books and coaching in the domain are in high demand, and on a corporate and on corporate level is learning and knowledge management more interwoven in everyday activities than in the last “Knowledge Management Summer” 20 years ago.

On the other side, we see many who do not have sharpened their knowledge tools. Fake news, conspiracy theories, wonder healers, angel coaches, and other post science nonsens is a big (and I would assume growing) part of said tornado. The mood in social media is edgy, and to be in the stocks, a medieval rite we overcame in literate society, is back in the form of flame wars and ad hominem attacks for opinions that stick out.

Communication Tipping Point

Within the next two years or so, the tornado will grow to a tsunami. AI text generators make content production so quick and cheap that regular communication channels will drown in generated content of mediocre quality (not to say that typical Linkedin ads would get a literary award regularly, but this new content will be so much easier to produce).

Not only marketing agencies are at stake — we see how AI is becoming the perfect liar, phrasing ungrounded statements in a way that people spend no second thought on the truth of what they are hearing. The new conspiracy generator is up and running. Science publications, wikipedia articles, political debate threads, medical treatment plans, homework in schools and universities — all those areas will be threatened by content pollution.

As well, the end of the search engines as we know them seems to be near. AI will generate the answers to our questions, making it impossible to unlock a knowledge domain by “lateral search”. We will lose the process of having to learn the domain lingo first. This is in normal searches most often the first step in domain building since otherwise you would not get satisfying results.

So does that mean, all information and knowledge exchange will drown in machine-controlled “content swamps”? Critical areas in business, science, education and public society cannot survive without high-quality information. These domains will have to adopt to the changed environment, and will need to build a trust ecosystem to be able to follow secured knowledge chains.

This means, we will see a renaissance of human-centered communication, with digestible and trustable pieces of information that we can build or own knowledge system upon.

Documents are a perfect carrier for this purpose. They have boundaries, thus building a scope which is good for human perception, and when the flow of documents is properly managed and secured, and when lineage is transparent and recognizable, then we are quite close to the original reason why documents were invented.

The medieval scroll of records with the king’s seal on it, was exactly built for these purposes. Now we have to bring these processes to warp speed.

What is a Document at all?

Before we start building a human-centric knowledge flow, it might be helpful to reflect what a document actually is. ISO 12651–2 tells us that a document is “recorded information or object which can be treated as a unit.”

We can see here the roots of early documentalist Suzanne Briet who shaped this definition in the 1950s: “[A document is ] any concrete or symbolic indication, preserved or recorded, for reconstructing or for proving a phenomenon, whether physical or mental.”

This interpretation goes beyond the typical king’s scroll. It shows three important aspects: “content” is not only written word on a page, it can be virtually anything as long as there is the intent to create it. And second, the document is created to be proof of something, evidence of a fact or idea. This includes purpose, the willingful act to achieve something, which fits perfectly to our knowledge building effort. And last, the intended proof implies that a document as isolated piece of information is only as good as the context to which we can trustfully relate it.

“An object’s treatment as evidence is contingent not only on its own properties but on its very framing as a source of information. It must be organized into a meaningful relationship with other evidence in order to have the indexical power of evidence.” (from: https://en.wikipedia.org/wiki/Suzanne_Briet)

So, is a tweet a document? Yes, even though these days one can be concerned about the proof part, and surely a lot of automatically generated tweets might be on the edge of the definition.

And a AI-generated text, stored in a blog? Here it is harder to decide. As long as a human intent is beyond the production, and the phenomenon is referring to some persistent entity outside the machine, we could argue in favor of it being a document. It might need labels that show the origin as machine-generated, and even though current neural networks have difficulties to prove their information lineage, we will see these type of linked systems, like e.g. perplexity.ai which name the sources from which they primarily drew the conclusions.

What about the wave of fake posts we can expect in the next years to emerge? And what about the result of a Text AI scanning the internet and digesting content created by other AI engines? Like when copying an analog audio tape again and again, the signal/noise indicator goes increasingly down. So surely we can stop at one point to talk about “preserving” or “recording” content, but just replicating noise? This goes deep into a triage philosophy I would like to omit for the moment, since it leaves somewhat the scope of this document’s boundary.

DocOps

The classical area of document management today is mostly concerned with storing, managing and tracking electronic documents and scanned images coming from a paper-based document entry. I think this was nice for corporates’ needs in the last 40 years but seeing the current dynamic, it becomes clear that we need more.

Fortunately, we already have some exciting possibilities to secure and evolve knowledge in a broader way as we use it today. Inspiring patterns are

  • The scientific publication process, including peer reviews. This might not be enough because the volume we need to process is too high, and artifacts might be much more volatile than a science publication. But it can help in building the trust relationships we need for our chains.
  • Logistics might help with their “batch size one” tracking systems, when we add authenticity checks and modification capabilities at each node, to support knowledge evolution. But what is the difference between a parcel and a piece of knowledge being managed in high velocity streaming processes? The parcel has destination, it’s contents are opaque on the journey. Knowledge, to the contrary, might be inspected on every hop, either as it passes by (e.g. review or approval scenarios), or for later analysis.
  • Modern real-time analytics systems have all the mechanics to manage such transformative flows of data, why not knowledge pieces as well? Modern object stores in the cloud can easily compete with the classic archives when it comes to durability, secure access, versioning etc.
  • Blockchains and smart contracts, of course! But — we are after 10 years still at the starting point of that journey. Recent crypto crashes based on blockchain technology have shown that the idea of a selfsustaining documentation network is not yet completely plausible. Hooks into the transactional business world are unclear, regulation is a nightmare, and energy consumption is not to underestimate.
  • Last but not least, the software DevOps ecosystem has invented not only the tools to keep all this together but also the management methods and group rituals we need to make it happen. Often too deep in technology today, we will have to find ways to bring the idea of Version Control, Pull Requests and Automatic / Continuous Testing and Deployments to the knowledge world.

Even though no clear usage pattern has emerged, and possibly the solution space will take some time to evolve these ideas to mainstream, I think we have enough to start working on these human-centric, flow-based document systems. Let’s name the appoach DocOps for the moment, and let’s start to build first pilots where trusted knowledge really matters — in the public society space.

--

--

Martin Jahr

Digital Designer & life-long learner of computers & humans. Now up to create, coach and deliver learning deployment strategies in Germany where things are late.