From Pickwiki
Jump to navigationJump to search

A formal mathematical critique of Relational Theory and its MultiValue opposition.

This paper is intended to show, by use as far as possible of formal mathematics, that relational theory for the most part is a subset of MultiValue, and that many of the problems with Relational stem from the constraints applied to it that are not relevant to its MultiValue sibling.

Mathematical Logic

But firstly, we need a basic introduction into Mathematical Logic. It's all very well saying "let's be mathematically rigorous", but if your mathematical foundation has thumping great cracks running throught it, then the entire superstructure is going to come crashing down!

All mathematical theory is built upon axioms. An axiom is true "because I said so". In order to "prove" an axiom (in the old-fashioned meaning of the word - to test - "the exception proves the rule (is wrong)"), you start with the assumption that the axiom is true and try to prove it is false, and vice versa. If it survives this battering, then it is consistent, and circular logic at least proves you're not being stupid. Even better, however, is to succeed in proving your axiom is true without assuming it is true to start with - ie proving it using solely other axioms. If you can do this, it ceases to be an axiom, and becomes a theorem - a fact derived from other axioms.

In all walks of intellectual endeavour, the desire is to reduce the number of axioms and replace them by theorems. To the extent that many lay (and not so lay) practitioners of theory do not even realise the difference between an axiom and theorem - they think axioms are proven fact.

To give a perfect example of the havoc this can cause, consider Euclid's statement that parallel lines never meet. For thousands of years, this was considered an axiom of geometry. So much so, that even Newton didn't question it. Then, in the 19th century, various logicians said "what if we assume parallel lines *can* meet?", and they came up with some very strange geometries. Einstein picked up on this, and we ended up with Relativity.

Just take a look at the wikipedia's entries for Euclid and Axioms

The Relational Model and the Two-Dimensions Rule

According to C.J.Date, the first rule of relational databases is:

1. The Information Rule simply requires all information in the database to be represented in one and only one way, namely by values in column positions within rows of tables.

And the second rule is:

2. The Guaranteed Access Rule. This rule is essentially a restatement of the fundamental requirement for primary keys. It says that every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column, and the primary key value of the containing row.

(Quoted from "An introduction to Database Systems", 5th Edition)

Note that the formal proof, as noted above, is "because". These two rules basically require that the database store data as two-dimensional tables. Now what's all this crap about "clustering", if not an attempt to get round these rules without actually breaking them?

Meanwhile, MultiValue represents data as n-dimensional arrays where n is unrestricted (but usually limited in practice to 3 or 4 at the outside). Any individual scalar value can be accessed by specifying the FILE name, RECORD primary key, FIELD name, and then as many sequence offsets as are required (almost invariably 1 at most). So we can easily meet relational's requirements by specifying a subset of MultiValue with n=2.

Don't forget - in Mathematics, the general proof always trumps the specific! So if I can up with a proof for "n", where n is any number, it will always trump a proof for "n where n=2".

Systematic Treatment of Null Values

Both relational and MultiValue fall down here, in practice. To quote Date:

3. Systematic Treatment of Null Values. The DBMS is required to support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

MultiValue uses the empty string. Relational uses NULL. Both mechanisms are incapable of distinguishing between "missing" and "inapplicable", and hence are incapable of manipulating the two in a systematic way. What is even worse, picking on Date's own example, is that relational cannot handle "infinity", which is a valid number ...

The important point to note here is that relational seeks to handle it within the database (and fails). MultiValue just admits that it can't, and pushes it out to the application level.

Defining the Database in terms of itself

Both MultiValue and Relational require that the metadata be accessed using the standard query tools. In principle, there is no difference.

In practice, however, MultiValue simply makes no distinction between data and metadata. Relational treats metadata as very distinct from data, just accessed via the same tools. There is nothing to stop relational from implementing a metadata query differently from a data query so long as the user doesn't see the difference. MultiValue is not allowed to know that there is a difference. Which is approach is simpler and more generic?

Comprehensive Data Access and Update

WHY!? The primary aim of Date's rules 5, 6 and 7 seems to be to ensure that data integrity can be maintained when the data to be updated is necessarily scattered across multiple tables. The obvious cause of this is where data in the real world is packaged in n-dimensional chunks called objects, and it is a requirement to keep reality and the database in sync. Now why does MultiValue define data as coming in n-dimensional arrays, again?

Implementing a bi-directional access language (read and write) is a nightmare. SQL was meant to be for end-users - it was originally called Structured English Query Language. I can only presume the "English" was dropped because the end result bore no acceptable resemblance to English.

On the other hand, one of the names for MultiValue's query language is "English", precisely because it is reasonably close to it. There is no standard write language for MultiValue (unless you count SQL as it), though, precisely because the generic task is so complex.

However, MultiValue does not need to require that we be able to update views. A relational view is n-dimensional, and for any view where requiring updates would make logical sense, MultiValue would store it as a single array.

Equally, high level insert, update and delete is a completely arbitrary requirement that is presumably relied upon to delete all related rows that, in MultiValue, would have been stored in a single array.

Physical and Logical Independence

This requirement in particular seems predicated on viewing things as a database, rather than an information store. Okay, it is a Relational Database Management System we're trashing...

"Logical Independence" requires that the DBA be able to rearrange how data is stored without affecting the user's view of that data. MV, however, restricts the DBA's freedom by imposing a practical requirement that that view approximates to the real-world view of the thing being modelled by the data. No single requirement highlights the divorce behind relational's theory of data and the real world more clearly than this one.

Given that MV imposes some resemblance between the database and the reality it is modelling, and given also that modern implementations use an OS file store rather than seeking to manage their own storage as one huge blob, the physical independence is easily achieved. FILEs can be placed in any convenient location.

This also addresses rule 11 - Distribution independence - in pretty much the same way.

Integrity Independence and Non-Subversion

Currently missing from the MV model as a formal constraint. But the relational model does not distinguish between "natural law" constraints, where a description has no meaning outside of the object it describes, and "statute law" constraints, relationships where the two objects can have an independent existence independent of the relationship between them.

A "natural law" constraint is unnecessary in MV because in a properly designed database, the description is stored with the object and if the object does not exist there is nowhere to store the description.

On the other hand, "statute law" constraints can be very dangerous - enforcing constraints that don't exist in the real world thereby requiring that the user enter data that is invalid in order to persuade the database to accept data that is valid.

So this sort of satisfies the non-subversion rule in exactly the same way - you can't subvert "natural law" constraints because the MV structure enforces it. But you can subvert "statute law" relationships because you can get at and edit the data directly.