Storing and Retrieving All Knowledge (and a request for feedback) [written by Luke Call probably in 1998; posted on]

The value of knowing *everything* you need, when you need it, is beyond expression. ("Knowledge is power," but knowing everything you need to know is hard.) I propose that we create a new kind of computing system that can store and retrieve *all available knowledge* in a systematically useful way. The penalty for not building such a system is to continue wallowing in the huge and growing masses of data and information that we have stored in rows, columns, and documents, with which we have not found any really efficient means to cope. ("We can't solve our current problems with the same level of thinking that created them," to paraphrase Einstein or somebody.)

This posting attempts to show the structure and benefits of such a system, as well as the feasibility of building it.

Every solution to knowledge management I've seen has a key limitation: the use of words as the dominant communication tool. The human minds does input/output of words very slowly, compared to the amount of knowledge processed via manipulating a moving image or 3-d object with your hands. To read and write, a person's mind has to spend time making the effort to translate from words to understanding, and vice-versa. How about if we skip the translation step? Instead of reading, we will interact with all knowledge in the form of moving 3-d representations that respond to our physical or verbal actions.

Now the question: can we decompose all knowledge into some "native format", so that it can be sensibly manipulated in a systematic way? The benefits of decomposing data into a systematic format, such as with relational data, are clearly enormous. In considering my own knowledge, I would say that it consists of models: moving, manipulable images and patterns, based on experiences and the attendant facts that I have stored.

I propose that the native format for storing and manipulating all knowledge in a systematic way is simply in a comprehensive object model. This is familiar to software developers who have done object-oriented analysis & design (OOA&D). One could consider it a logical representation of all the physical world, stored in an object-database system, with data describing all observed phenomena (derived initially from everything that has been written about them), and methods or named funtions that operate on that data, stored with the data on which it operates. The human interface for such a system would be a 3-d interactive virtual reality environment (or some temporary 2-d substitute during its early development).

Such a system could be built by first creating an engine to read and parse all available textual material, extracting the grammatical patterns that represent knowledge (i.e., nouns map to classes or instances of objects, adjectives map to object attributes, etc., just as a software engineer doing object modeling uses noun-verb lists or CRC cards representing usage scenarios), and to systematically store those patterns in a data-dictionary that models a given domain, with the intent that all such data dictionaries would eventually be merged into one.

This system would consist of these parts:
1) A software engine that would parse text from all sources in order to: a) perform the initial creation of a data dictionary with classes (thus creating a object-oriented class hierarchy that mimics the patterns of the recorded world), and then, b) populate an object database with the objects that implement those classes (i.e., the real physical objects, people, organizations, etc. that are found in the world).
2) The data dictionary itself. In addition to class names, "state" variables, and methods/functions, the data dictionary would include relationships between classes (is-a, has-a), as well as historical actions performed by one object over a given timespan, that affect its own state or the state of other objects. (These actions would consist of method calls to other objects at specific points in time.) The system would dynamically find and rearrange is-a and has-a relationships in the model as it grows the amount of data it uses to determine them.
3) A program to generate software which implements the classes in the data dictionary, based solely on the data dictionary. Note especially that humans should not the software; they should write the program which generates this software, based on the data dictionary, and updated in real time (during usage) as the data dictionary driving it updated.
4) A virtual reality 3-d human interface (supplemented by a more traditional 2-d interface where useful) for navigating and interacting with the representations in the object database. The interface provides a visual navigation of the knowledge in the model, and allows modifications to it, for the purposes of: a) adding to the knowledge by "experimenting", "building", "browsing", "working", or "playing" with objects found there, and b) for easily correcting errors or making updates.

No person manually designs this object model--the engine should create and update the model automatically from all the textual input, with corrections as needed from humans during refinement. The model updates itself by observing the real world. Humans then update the model on an ongoing basis as they begin using this system to look things up, and then they correct or add to what they find, via a virtual reality interface.

Here are some examples of how such a system could be used:

Example #1:
A project manager Kim asks: "What is the most successful way anyone has ever found to manage a certain type of project?" To find the answer, Kim, with the system's help, defines "success" for her particular project as one where the team members have a high degree of individual specialization before starting, team members and project customers are happy with the outcome, and the project was completed within its time and budget goals.

Since the system stores everything as an object with data (properties) and methods (or functions to operate on the data), everything to be queried is an object in the model, and can be systematically processed by the query engine. Most objects in the system contain references to many other objects. Object also have measurable properties which reflect their characteristics. (Note: the objects aren't stored using their English or other human-language names as identifiers, since these aren't unique. However, such human-language names are saved with the object, and can be used in the system's human interface to help the users identify the objects. Through these common names, and also by observing the context and relationshops of an object (is-a, has-a, belongs-to), a user will quickly understand an object's identity.)

Here are some of the classes and objects used in this query (in quotes): "team" contains several "individuals" (as team members); "individuals" have "skills" with a recorded level of "specialization"; "projects" also have "customers", and "customers" are "individuals" who have (at given points during their recorded lifetime) levels of "satisfaction" which can be queried by the system, since these are defined and can be recorded over time; "projects" also have "objectives" and "outcomes", each of which includes things like "work hours to required to complete". These objects were set up when the system was created, and were populated with data when the grammar parsing engine processed all the text in all documents in the organization.

Since the Kim uses the knowledge system regularly, she knows that she and her team members are stored as objects in it; it has recorded the desired information about her work and her team members' work in the past--well enough to run this query. Additionally, the system has processed all the text from documents on her company's past projects and those of several companies for which her firm has done work. Thus, she can expect that the above classes of objects (teams, individuals, projects, etc.), are stored in the system, and that there is enough information recorded about them for the query to run usefully.

The knowledge system can run the query, since Kim's definition of "success" in this instance consists of criteria which can be compared to measurable aspects of these objects. The query engine then searches the system for all organizational structure objects that are of a class "like" Kim's (i.e., they are "teams" which are in the same class hierarchy in the model as Kim's team, or they are other organization types which have taken on similarly classed projects), and whose data can be observed as to whether they have changed over any period time in a way that reflects the her definition of "success." The system finds the projects with the highest degrees of success and reports on them.

For a more complex and thorough query, the system could also search through data which has been added to the system by vendors of collaboration or project management software, where those vendors's data refer their customers' use of such software in real-world projects. Thus Kim could to see how those vendors' approaches compare to the experiences of her own company's past projects, based on her defined criteria for success.

Once Kim finds the model project whose processes or strategy she will use as a model for her own, she may adopt it as the "best practice" project for her group until she improves it herself. Both actions (the "best practice" adoption and the improvement(s) she makes to it later) will results in updating the system's object model with these attributes for her and/or her "team". If her management is quite pleased with the results they may adopt it as a "best practice" at the department or division level, which would also result in a similar update to the object model--saving it as a "model procedure" for the department in this situation. Thus, future queries to the system that could have mirrored Kim's original question: "What is the most successful way anyone has found to do this type of project?", could have added to it: "..., or what is the standard practice for this situation for our team, department, or division?", and the object model would reflect and report on that data.

Example #2:
Paul uses the knowledge modeling system to find out how his home VCR works, so he can make a needed repair. The company that made the VCR allowed the knowledge system to process the VCR's component specifications, so the all the necessary data for his inquiry is on file. Paul is not an electronics repair technician, but since the sytem displays the VCR's internal components, Paul, by viewing their dimensions, colors, etc. from the object model in a 3-d interface, and Paul can navigate them in the desired level of detail to see which part is probably misbehaving. The dimensions, materials, and subcomponents are also stored in the object model, with all their known physical and electrical properties. The system has processed enough engineering, physics and electronics materials (thus it can model those aspects of the VCR as well) to be able to show Paul what will happen to the VCR's performance if he makes a certain change. He test the results of making the considered repair, by using VR gloves in the model and observing the results.

Of course these examples would work best if all human knowledge could be stored in the system in a single instance.

This system may sound much too far-fetched to be implementable. There are clearly obstacles. However, I believe there are legitimate avenues to realizing such a comprehensive system for modeling all human knowledge. I believe that the value of having all knowledge represented effectively in one place makes it imperative to try.

Microsoft and others are doing very interesting work in grammar parsing (see and search for the paragraph containing the word "grammar", for one example). If we can parse the grammar of all the recorded text we can find (simply extract nouns, adjectives, verbs, etc.), throw it through an object modeling process that further extracts "is-a" and "has-a" and "does-this..." etc. relationships, while letting humans merely watch the process to correct errors or "help" the parser (but never develop the data dictionary manually), then we could conceivably build an object model representing everything in our known universe that anyone has written about. (I'll betray what I think is (currently) the concept's main weakness: Unfortunately, the Microsoft grammar parser, while it sounds promising based on public statements as well as based on info from one of the developers, is not ready for public use at this time. I know Oracle bought a company that did some pretty interesting work in text parsing and AI people have been working on natural language understanding for years, but building this part of the engine is the main area needing work. I'd give several quarts of blood to get my hands on the right natural-language grammer parser. :)

We need to move our concept of information systems a notch above the current textual mode of thought, to a more realistic and useful representation of our world and ourselves. We now represent the world awkwardly with rows and columns of data, and with ASCII text; in isolated cases we also have models but these are difficult and time-consuming to create.

I would like to see general discussion on a way to systematically store & retrieve all knowledge, especially on ways to strengthen the concept, and on concrete strategies for bringing it about. All comments are welcome and appreciated.