8/31/17

Canonical Schema - Interview Question of the Week #1 Day 3

A canonical schema is a design pattern, which is applied within a service-oriented paradigm, and within the BizTalk Server context to establish loose coupling between systems. Through performing transformation of messages from one system to a canonical schema and from a canonical schema to a message of another system, systems have no direct relationship with each other. A canonical schema can also be viewed as an internal schema in BizTalk and aid you in structuring your solution through the best practice of creating separate projects for maps, orchestrations, internal and external schemas.

The term canonical schema is often used to describe the creation of schemas internal to an integration mechanism such as BizTalk.

Use of canonical schemas is widely regarded as a best practice, as it decouples your BizTalk flow control mapping from any 'other' system's schemas (other system here could be internal to your organisation or external to it, e.g. a supplier, customer or partner system). This way, if any of the systems integrated via BizTalk change, it is just the external schemas, and maps to the canonical schemas which need to be changed. It also prevents foreign conventions, naming and hierarchy differences inherent in external schemas from leaking into your internal BizTalk artefacts.

Generally, transformation of incoming messages to a canonical schema is done as early as possible e.g. on a receive, and similarly, transformation out of canonical done as late as possible, e.g. on a send port map.

A common scenario for Canonical Schemas (CS) is where a single orchestration or message flow is common to multiple trading parties (e.g. you may have many suppliers with different systems, however, all of them submit invoices for processing). In this case, each new supplier system just needs to be integrated with your CS - no new processing logic needs to be added or duplicated - CS can actually reduce the overall effort in such instances. Another example of where CS are vital is where your business IS switching of messages - e.g. a Medical industry switch will have many doctor and practice systems sending authorisation requests and invoices and these need to be mapped and routed to multiple medical fund (medical aid) systems.

And For What It's Worth:

  1. In my opinion, CS make most sense in an when BizTalk is the end-end solution in an EAI or ESB scenario, e.g. direct integration of 2 or more line of business systems. Otherwise, if BizTalk is just one endpoint on a larger corporate ESB, then it probably makes sense to use the corporate ESB schemas internally, and hence map external schemas directly to the ESB schemas (i.e. no need for another set of CS within BizTalk, provided that you have a good change management / version control mechanism across your enterprise).
  2. If standard schemas (e.g. EDIFACT) exist for your industry, it is moot as to whether it is a goal to adopt these as internal CS. In general these may conflict with the meaning of Canonical as being 'simple', as industry schemas often need to be verbose in order to model all flavours and 'edge cases' of the document). Personally I would ensure that I have a mapping to / from said industry schemas, but would use a custom schema internally.
As they say, "a picture paints a thousand words" so here is a graphical view of this pattern compared against a solution not implementing this pattern (i.e. a peer to peer (P2P) solution):


As you can see in regards to the canonical pattern (in green/with a tick), documents that are logically equivalent map to a standard application specific format (the canonical format).  Lets unpack this statement a little.

The term logically equivalent is specific to our application; for example, external purchase orders in the formats indicated in the diagram above are equivalent in the context of the solution and so map to a standard format internally.  This means that in the context of the application, these external purchase order formats are the same and will be processed in the same way.  However to stores and suppliers, these different purchase order formats are quite distinct.

Canonical format describes how documents will be represented internally in our solution.  In BizTalk, this has to be in XML (since BizTalk uses XML internally to represent messages).

The next question is how do we build our canonical document such that it can represent documents that are logically equivalent but may actually be formatted quite differently?  Actually this statement is not quite correct: the canonical document should be created first independently of any external representations (e.g. to represent the essence of what a purchase order is) and then it should be a case of deciding how external representations map to the canonical representation.  In the case of BizTalk, this will typically involve writing some XSLT that converts various formats from or to the canonical format.

I have to admit that when I first started out building BizTalk solutions I didn’t immediately grasp the benefits of having canonical representations of messages in my solution.  This quickly changed however.  Obviously there is a performance hit since every message will be transformed twice but I think this overhead is well justified given some of the benefits it provides below (I have tried to list these in order of importance):
  1. Impact of schema change is minimised – since all messages map to or from the canonical document, if (following our example) a store or supplier decide to change their schema, it will only be necessary to change one map.  Compare this to the P2P solution: 4 maps would need to be changed if a store changed their schema and not only that, each supplier would need to contacted and regression testing would need to arranged with each.  By utilising a canonical document type, we protect parties from the impact of schema changes.
  2. Minimising impact of change (2) – since orchestrations, for example, will work on the canonical schema, any changes to external schemas will not require orchestration changes and redeployment.
  3. Additional document formats can be added with relative ease – only one new additional map would be required to or from the canonical format.  Also it would only be necessary to deal with one integration partner and specific knowledge of all downstream message formats is not required – only detailed knowledge of the new message format and the canonical format is needed.
  4. Reduction in solution complexity – with the canonical solution, 7 maps need to be maintained; 12 maps need to be maintained with the P2P solution.
Here are a couple of caveats that I have come across in respects to this pattern:
  1. There can be only one canonical representation for your logical message type!  I recently worked on a solution where Xsd.exe had been used to create classes for the canonical schemas and then these classes where used in the solution orchestrations…  As the canonical schemas changed, the classes were not recreated.  This can introduce subtle bugs; for example, if you were to assign canonical message 1 (schema) to canonical message 2 (class) in your orchestration, data not defined in message 2 will be lost…  So it is definitely best practice to ensure that only one canonical representation is available in your solution.
  2. It is harder to implement this pattern retrospectively, after the solution is in Production.  So even if your solution is simple, do yourself a favour and future proof by baking in a canonical schema.
Thoughts on the Canonical Messaging Pattern
Create a Canonical Schema – Step by Step

No comments:

Post a Comment