Omair Shakeel

Thursday, May 05, 2011

SgmlReader - converting HTML into a well formed XML


SgmlReader is a nice .NET library that converts an SGML document into a well formed XML. It has a built-in support for converting HTML as well.


One of our clients was sending us emails to our systems that were extracting the required information from those emails. The content type of their emails was HTML. The main job was to convert the HTML into an XML and parse the XML and extract the information that our system was looking for. Most of todays browsers and email clients are able to view content even if the HTML is not well formed. And HTML is not itself required to be well-closed / well-formed. 

Unclosed tags such as <br /> are acceptable. Also attributes without enclosing double quotes are also allowed such as
<div id=mDiv> </div> . Loading an HTML string into an XmlDocument will throw an exception. Free libraries such as SgmlReader can come into handy in such cases that can correct your ill-formed HTML into a well formed XML document.

SgmlReader is an XmlReader API over any SGML document. You can download it from here.

A common code example looks like this:

SgmlReader reader = new SgmlReader();
reader.DocType = "HTML";
reader.WhitespaceHandling = WhitespaceHandling.All;
reader.CaseFolding = CaseFolding.ToLower;

using (StringReader htmlStringReader = new StringReader(html))
{
    reader.InputStream = htmlStringReader;

    // Load the xml document
       XmlDocument document = new XmlDocument();
    document.PreserveWhitespace = true;
       document.XmlResolver = null;
       document.Load(reader);
}

Tuesday, May 03, 2011

Send HTML emails using BizTalk's SMTP adapter without orchestrations


This article explains how you can send HTML emails using BizTalk's SMTP adapter without the help of orchestrations. My problem was to just perform message routing to an external party at the ports, using orchestration would be an overkill-solution to this problem.

Initially it looks quite simple to do this. Just map your source message to an XHTML message that would be sent as an email by the SMTP adapter. But the problem is that the SMTP adapter automatically sets the content type to text/plain (or text/xml if the XML pipeline is selected). There is no ContentType property that you can set among the properties of the SMTP adapter. I googled for this and found out plenty of articles on sending html emails using the STMP adapter, but they were all using the orchestrations.

The job is pretty easy when you use an orchestration for this. All you need to do is set your email body as a RawString (Microsoft.Samples.BizTalk.XlangCustomFormatters.RawString) such as:
EmailMessage.Body = new Microsoft.Samples.BizTalk.XlangCustomFormatters.RawString(htmlstring);
and set the ContentType such as:
EmaiMessage.Body(Microsoft.XLANGs.BaseTypes.ContentType) = "text/html";

I need to do the same thing but without orchestrations. I decided to create my own custom pipeline component that would set the ContentType value during the Encode stage. Here is the following code of the pipeline component:

[ComponentCategory(CategoryTypes.CATID_PipelineComponent)]    [System.Runtime.InteropServices.Guid("CA9082F3-557A-48f6-A36C-B999B2BB6BE7")]    [ComponentCategory(CategoryTypes.CATID_Encoder)]

public class ContentTypeEditingComponent : Microsoft.BizTalk.Component.Interop.IComponent, IBaseComponent, IPersistPropertyBag, IComponentUI

{

  private string contentType;

  private static readonly string CONTENTTYPE_NAME = "ContentType";


        #region IComponent Members

        public Microsoft.BizTalk.Message.Interop.IBaseMessage Execute(IPipelineContext pContext, Microsoft.BizTalk.Message.Interop.IBaseMessage pInMsg)
        {
            if (pInMsg != null)
            {
                if (pInMsg.BodyPart != null)
                {
                    // If the Content type is not set, then default to text/html
                    pInMsg.BodyPart.ContentType = "text/html";
                }
            }

            return pInMsg;
        }

        #endregion
}
Compile the assembly of this pipeline component, create a new Send pipeline in Visual Studio.NET, add your pipeline component assembly in the toolbox of the pipeline designer and deploy your pipeline in BizTalk. This should do the trick and you would be sending email messages as content type "text/html" :)