Tuesday, 16 February, 2010

PostHeaderIcon Solving the connection problems of Windows Live Writer (WLW), Part 1

Problems connecting to a blog site using Windows Live Writer (WLW) are common; especially for PHP based systems.  When this happens, you usually see an error message like this one:

  • An error occurred while attempting to connect to your blog: Invalid Server Response - The response to the blogger.getUsersBlogs method received from the web-log server was invalid. You must correct this error before proceeding.

In this article, we'll see the measures that we can take to resolve many of these communication problems, especially those who are not already or are poorly explained on the Internet.

1- Introduction

Français:
Permalien: Solutionner problèmes connection WLW
Le Paparazzi~Codeur: Solutionner problèmes connection WLW - Partie 1

Titre: Solutionner les problèmes de connection de Windows Live Writer (WLW), partie 1

Anglais:
Permalink: Solving connection problems WLW
Title: Solving the connection problems of Windows Live Writer (WLW) - Part 1

The vast majority of the communication problems for WLW mentioned on the Internet are about blogging systems based on PHP scripts; not only because these systems are the most used (think about WordPress, Drupal and Joomla) but also sadly because of the way that web services - an essential component of protocols such as XML-RPC used by WLW - are implemented in PHP which makes them very sensitive to many type programming errors; even as small as an extraneous white line at the beginning or the end of a script file.  The rest of this article will deal only with communication problems related to PHP.  Of these blogging systems, WordPress (WP) is picking up the lion's share, so the examples used in this article are based on WP.  However, this doesn't mean that you won't find anything interesting here if you are using another blogging platform than WP.  Except for the first example that uses a common problem specific to WP to explain the log file of WLW, the majority of the explanations given here will fully apply to many other blogging systems using PHP.

With the exception of a particular problem related to the uploading of images and video files, the vast majority of communication problems that you'll get with WLW will happen when you'll try to create a new account.  In these cases, you'll be greeted with the following error message "An error occurred while attempting to connect to your blog: Invalid Server Response - The response to the blogger.getUsersBlogs method received from the web-server log was invalid. You must correct this error before proceeding.".  Less often, and error will happen when you'll try to publish a new article using an account that was working correctly in the past. In these cases, the method "blogger.getUsersBlogs" will be replaced in the error message with other methods such as metaWeblog.newPost or metaWeblog.getRecentPosts:

  The response to the metaWeblog.newPost method received from the web-log server was invalid.

  The response to the metaWeblog.getRecentPosts method received from the web-log server was invalid.

Obviously, if you can no longer publish an article using an existing account that was worked correctly before, it's quite probably because something has changed in the mean time and we can safely assume that if you now try to recreate this account, you will be greeted by the first error message, the one with "blogger.getUsersBlogs".

2- metaWeblog.newMediaObject and other well-known problems

Before continuing, we will take a look as some of the most known problems of WLW that you'll find on the Internet. This will be only a quick look because I don't want to repeat what you can already find on the Internet and also because many of these are already a thing of the past; ie., they are no longer present in the latest versions of either WLW or of the blogging systems.

The most important of these is an intermittent failure for uploading images, videos and other big media files and will usually occurs only when you will want to upload multiple files at once or a big one but not when uploading a few small files or a medium one. This problem is different from most of the other communication problems because in the mean time, all the other publishing features of WLW will keep working correctly.  You can identifie this problem by the presence of the metaWeblog.newMediaObject method that will be in the error message: "The response to the metaWeblog.newMediaObject method received from the weblog server was invalid" and the name of this method says to us that the problem is clearly related to the creation of a new media file on the system and is not something like an invalid password.

This problem happend when the uploading of your media files goes over the internal memory limits of PHP and can be fixed in different ways: by reducing the image size; by reducing the total number of uploaded files in a single time; by using FTP or by increasing the memory available to PHP for the execution a script.  To learn how to use FTP with WLW, take a look at the following article: "Live Writer Error – The response to the metaWeblog.newMediaObject method received from the weblog server was invalid – Invalid Response Document Returned From XmlRpc Server".

For increasing the PHP memory limit, you must either change the value of the memory_limit parameter that you will find in your "php.ini" file (or add the line if this parameter is not already there) or you could also add the following line of code « ini_set ("memory_limit", "12M"); » (without the quotes «...», of course) right after the <?php characters in the PHP script that handles either the whole XML-RPC protocol or only the uploading of images and other medial files.  For example, in the case of WordPress, you could change the "wp-admin/includes/image.php file.  You'll find more details about this in the article "PHP Allowed Memory Size Exchausted Fatal Error » My Digital Life".  Also, the value of 12M above is your personal choice.

Before going to the next round, here a few references that will provide the list of the most common known connection problems of Windows Live Writer (WLW); although many of these are old problems and no longer exist in the latest versions of WLW:

3 - The Log File of Windows Live Writer

When you have a problem communicating XML-RPC, the log file of Windows Live Writer (WLW) is probably the first place to look into. To explain its use, we will start by taking as an example a WordPress (WP) configuration error familiar its users and which usually - but not necessarily - happens right after a new installation.  Note that even if you don't use WP, the following discussion is still interesting because it's an introduction to the use of the WLW's log file.

Since version 2.7 of WP, the XML-RPC remote communication protocol is not enabled by default on a new installation of WP. Therefore, if you forget later to activate it explicitly in the configuration options, WLW will obviously not be able to the WordPress server using the XML-RPC protocol and you will be welcomed with the following error message:

captured_Image.png[3]

This error message describes exactly the cause of the problem and clearly says that the XML-RPC remote communication protocol should be enabled; which can be done by going to the Admin Dashboard of WP under Settings | Writing | Remote Publishing and then checking the box "XML-RPC - Enable XML-RPC protocols (WordPress, Movable Type, MetaWeblog, Blogger XML-RPC)".

In this example, identifying and correcting the source of the problem was relatively easy because an explicit error message has been provided to us by the system. However, the situation will be much less clear when, instead of getting an explicit error message, you get nothing but a generic error message that, excerpt for telling you that something got wrong, will provide absolutely no clue whatsoever about the nature of the underlying problem:

captured_Image.png

"An error occurred while attempting to connect to your blog: Invalid Server Response - The response to the blogger.getUsersBlogs method received from the web-log server was invalid. You must correct this error before proceeding."

When we get this error message, you can either start trying at random solutions gleaned all over the internet - I do not recommend it and I don't think that you'll disagree with that, otherwise probably that you would not be here reading this - or go at the bottom of things and find what happens first before trying any solution and for this, there are two very effective tools: the first one is the log file of WLW and the second one would be to use an HTML protocol analyzer such as Fiddler. We shall see how to use Fiddler later but first, let's take a look at the log file of WLW.

This log file is aptly called "Windows Live Writer.log" and is created automatically by WLW in the directory "%USERPROFILE%\Local Settings\Application Data\Windows Live Writer\" or "C:\Documents and Settings\__YOUR_USER_NAME__\Local Settings\Application Data\Windows Live Writer" if you're running WinXP. In the case of Vista, you will find it in the directory "%localappdata%\Windows Live Writer".  To access this directory, you can of course Use Windows Explorer or you can open the command Start | Run and copy the desired address in the dialog, eg for WinXP:

captured_Image.png[7]

An even simpler method would be to use the dialog box "About Windows Live Writer" of WLW; available from the menu "Help" and which contains a direct link to the directory behind the link "Show log file"; as shown in the figure below (red arrow):

captured_Image.png[1]

By default, WLW will write only a few things in the log file but it will include at a minimum any error message. This is often sufficient in many cases but to get a better view of what happened, we can activate the verbose logging mode, which will greatly increase the amount of written information and will make things easier (or more complicated!) to understand. All the examples shown in this article have been based on using the verbose logging mode. To find out how to activate this mode for WLW, please consult the following reference: "Donkblog : How to enable verbose logging in Windows Live Writer".  Important: note that in the case of WLW, if you have multiple windows opened, there is still always only a single WLW processus to control them all; so you cannot change the verbose mode of WLW if you have already at least one open window so therefore, you must close any WLW windows before to be able to switch to the verbose logging mode.

In order not to be lost in a mass of useless information, the first thing to do would be obviously to delete any previous log file that's already there in order to see only the most recent and relevant events after that. There is absolutely no problem deleting this file anytime, even when WLW is already open.  You can use any program such as Notepad or WordPad or your favorite text editor to open it and read it but as Notepad doesn't convert partial end of line (only chr(10) to full Windows end of line (chr(13)+chr(10)), it's a better idea to use WordPad instead; otherwise some stuff will be cramped on a single line instead of multiple lines.

If we now repeat the same test as above with the XML-RPC remote publishing configuration error on a new installation of WordPress, this is what we'll find in the log file of WLW:

 

Note the presence of two XML documents in this file; as shown by the two lines beginning with <?xml ...> and followed by a serie of paremeters delimited by <...>.  Each of these tags has a matching end tag delimited by </...>; for example, we have the pair <value> and </value> and between these two tags, there is some piece of information; which can include other pairs of tags.  There is not matching end tag for the very first one, the <?xml ...> tag.  Instead, the end of the very first thing to follow it, the root tag, which is <methodCall> in the example above, is used to indicate the end of the corresponding XML document.

The first XML document describes the connection request that has been sent to the server as somewhat indicated by the name of the root tag: <methodCall> (in yellow below).  After that, we have another tag <methodName> which gives us the name of the method that have been called on the server: blogger.getUsersBlogs.  The other tags that follow are simply the parameters to be used for this method.

<?xml version=""1.0"" encoding=""utf-8""?>
<methodCall>
<methodName>blogger.getUsersBlogs</methodName>
  <params>

       ...
  </params>
</methodCall>",""

There is nothing really interesting in this first XML document because it has been build by WLW to be sent to the server and doesn't contains any error in it.  However and much more interesting is the second XML document that contains the server response as indicated by the name methodResponse of its root tag and which contains in this case an item called <fault> (in pink below).  Clearly, this is indication from the server that something got wrong on its side.  After that, you can recognise the text of the error message that has been displayed by WLW and which was telling us about the configuration error in the options of WordPress (in red below):

<?xml version=""1.0""?>
<methodResponse>
  <fault>

      ...
          <value><string>Les services XML-RPC sont désactivés sur ce blog. Un administrateur peut les activer à /BlogueWP1/wp-admin/options-writing.php</string></value>

      ...
  </fault>
</methodResponse>
",""

In this example, it is important to pay a very special attention to the end of the XML document and specifically, to the termination tag "</methodReponse>" that I have highlighted in yellow. This ending tag might look unsignificant but we will soon see that often, it will contains the key in solving many communication problems of WLW.

For the following example, I have reproduced on my system a problem that I've had to deal with recently and seems to be quite prevalent amongst people using WordPress but that will also happen - albeit less frequently - to other bloggins system using PHP.  In this particular case, the file wp-config.php has been corrupted in a very specific manner that I will describe later but many other files in WordPress or from another blogging system could be corrupted in the same way and would give the same erroneous result.  The interesting part of this example is that it will precisely result in an aborted communication and WLW will then show us its infamous error message: "An error occurred while attempting to connect to your blog: Invalid Server Response -- The response to the blogger.getUsersBlogs method received from the web-server log was invalid. You must correct this error before proceeding".  This example is then a perfect demonstration to show what's written in the log file when such an error occurs and how to interpret this information to solve our problem.

As we saw earlier, there is a setting for blocking the use of XML-RPC on WordPress and any attempt to use it will result in an error message sent back by WordPress; error message which is then displayed to us by WLW.  However, if I attempt to repeat this test but now, with the corrupted script file in place, instead of getting this explicit error message I will get the generic error message (the one without any information).

If we take a look at the log file, we can see that the error message about the incorrect configuration has been indeed sent back by WP but now, there is something fishy about this entry: that last characters of the termination tag have now gone missing! :

<?xml version=""1.0""?>
<methodResponse>

     ...
          <value><string>Les services XML-RPC sont désactivés sur ce blog. Un administrateur peut les activer à /BlogueWP1/wp-admin/options-writing.php</string></value>

     ...

</methodRespons",""

Compare this tag « </methodRespons", "" » with the previous one: « </methodReponse> ».  In fact, even the end of the line characters have vanished because the comma and the two double-quotes were part of the following line.

Immediately after this truncated XML document, we can see that WLW has now wrote an entry to the effect that the received response from the server was invalid and this is this last entry that it has shown us:

WindowsLiveWriter,5612,Fail,00031,16-Jan-2010 04:27:02.953,"WindowsLive.Writer.Extensibility.BlogClient.BlogClientInvalidServerResponseException: Invalid Server Response - The response to the blogger.getUsersBlogs method received from the blog server was invalid:

     ...

If we still keep the corrupted script file but now retrying to connect but only after activating the protocol XML-RPC on the WordPress server, we will now see a full response back from WordPress but still, with its ending tag truncated: 

<?xml version=""1.0""?>
<methodResponse>
  <params>
    <param>
      <value>
        <array><data>
  <value><struct>
  <member><name>isAdmin</name><value><boolean>1</boolean></value></member>
  <member><name>url</name><value><string>/BlogueWP1/</string></value></member>
  <member><name>blogid</name><value><string>1</string></value></member>
  <member><name>blogName</name><value><string>Le p&amp;#039;tit blogue à Sylvain</string></value></member>
  <member><name>xmlrpc</name><value><string>/BlogueWP1/xmlrpc.php</string></value></member>
</struct></value>
</data></array>
      </value>
    </param>
  </params>

</methodRespons",""

And surely enough, because of this truncated ending tag, WLW keep telling us that the received response from the XML-RPC server was invalid and will abort the rest of the process. So even if a this stage the XML-RPC publishing protocol has now been correctly set up on the WordPress server, the problem caused by the corrupted file will still be blocking us from using WLW against this server.

Of course, the next step is to identify what is this corruption that has been introduced into the script file and for this, we will use another tool: Fiddler; which is a free tool used to capture and analyze the raw HTML traffic that passes between a client and a web server.  Of couser, you could also use any other tool available for capturing and analyzing this HTML traffic HTML but Fiddler is free, simple to use and more then enough powerful to give us the solution that we are looking for.  Also, the following discussion about Fiddler might look complicated at first but you'll see at the end that it's relatively easy to get the information that we need to solve our little communication problem.

4- Fiddler

You can get Fiddler on the following website: "Fiddler Web Debugger - A free web debugging tool".  Once installed, you click Fiddler2 in your Windows menu to launch the program (click on the image to see a larger version) :

image

Again, even if the above figure might look complicated, you'll see that's relatively easy to get the information that we want and you don't have to worry about most of these parameters but I prefer to explain too much than not enough.

At the left of the previous figure, we can see a list of TCP/IP sessions, each session consisting of a request sent to the server (which may be any kind of TCP/IP server: a web server or a XML Server for example) and a response sent back by the server. This is always the case with most TCP/IP communications: for each request sent, there is a single response sent back and the pair of two forms a session.  In our case, these sessions are also of the type HTTP as we can see on the figure with the light blue arrow #3.  The XML-RPC protocol used by WLW to communicate with a blog use standard HTTP sessions over TCP/IP in exactly the same way as your browser (Internet Explorer, Firefox) communicate with a web server. This which means that you shouldn't have any firewall or proxy issue using this protocol if you can access your blog using  any browser.  At the left of each session, Fiddler will also put an icon describing its type and status.  You can get a list of all these icons with their meanings at the following page: "Fiddler Web Debugger - User Interface".

At the right, we can see the details of each of these sessions if we select the Inspectors tab (#1 - orange circle).  At the top is displayed the details of the request and below it the details of the response.  We can also choose between a variety of formats for displayed those details: Headers, TextView, RAW HexView, etc.; as indicated by the red arrows (#2).  Note that you can choose a different display format for the request and the response.

With the HTTP protocol, each request and each response are constitued of a single file (more exactly, a file for the request and a second file for the response) that contains two parts separated by a single blank line (arrows #6 and #7): the header and the text.  So we have a header and text for the request and another header and text for the response; each of these two pairs separated by exactly a blank line in their respective file.  If you click on either the Header or the TextView tab, only the header or the text will be shown - each of them with a special formating - but we can also choose the RAW format for displaying both of them but this time, without any special formating.  This is the format that has been chosen for both the request and the response windows in the previous figure.

Finally, the HexView format will also show us both the header and the text - likes the RAW format - but will also add the hexadecimal value of each character into a second column that enable us to accurately determine the exact identification of each character, including the various *blank* and control character that might be present. Later, in the section 5, we will see an example if this hexadecimal format that will show us the identication of some strange characters coming from the corrupted file.

If we take a closer look at the header for the request shown in the previous figure, we can see that it contains first a POST parameter (#8, blue arrow) with the URL of the script file that the XML-RPC request was sent to (http://sylvain2/BlogueWP1/xmlrpc.php" in this case).  After a few other parameters, we find toward the end of the header this very important parameter that is the Content-Length.  By the way, notice that the exact order of parameters in the header is without any signification and is not garantee to remains the same.  After that, we see one last parameter ("Connection: Close") and finally, the blank line that is used as a separator between the header and the text that follows.  In the case of XML-RPC, this text is a XML document that contains the information that we are sending to the server but with other requests, we could also see an ordinary HTML POST document.

If we now take a look at the detail of the Response that has been sent back by the XML-RPC server (bottom right window); we can see that it follows a similar format: an header followed by a blank line and a XML document as the text at the end.  In the header, there is no POST parameter because this is a response but there's still a Content-Length parameter (#10, pink arrow) which contains the length of the text.  Well, we should say that this Content-Length *should* contains the length of the following text.  However, we will see later that with PHP, this is not always the case and that this little problem with the Content-Length parameter is indeed the source of many of our communication problems with WLW and that will also explains the truncated ending tag that we have saw earlier in the log file.

-- The rest of this section is for explaining some basic configuration options of Fiddler but if you want to, you can skip this reading and jump directly to the next section "5- Issue #1 - Appearance of the UTF-8 BOM" to see what happens with the Content-Length parameter. --

When we use Fiddler, the first setting to check is the Capturing mode.  It's function is very simple: when the Capture mode is off, nothing is captured by Fiddler and we can see its value displayed at the bottom left of the Fiddler's window on the status bar: #4, green arrow.  We switch between these two modes (ON and OFF) by either clicking displayed on the displayed value (note: only a white space is displayed when the mode is OFF but you can still click on it) or by pressing the function key F12 on the keyboard or by using the option "Capture Traffic: F12" under the File menu.

Even when the capturing mode is ON, Fiddler doesn't necessarily capture everything and there are many settings where you can adjust what will be captured and one of these settings is the second text icon directly at the right of the Capture mode text icon on the status bar, as indicated on the figure by the yellow arrow #5.  This setting can be set to either "All Processes", "Web Browsers", "Non-Browser" and "Hide-All" and you can change this value simply by clicking on the text icon.  In our case with WLW, this setting must be set to either "All Processes" or "Non-Browser"; otherwise, we won't be capturing the information that we need.

When Fiddler starts capturing, it is not uncommon to quickly get dozens or even hundreds of captured sessions so we must have a way to clarify this a bit and evidently, the first thing we can do is to delete all the older sessions captured from previous run.  This can be done with the menu Edit | Remove | All Sessions.  Besides that, Fiddler is offering us many filtering options that are found under the tab Filters which is shown in the following figure (orange circle, #1) :

image

Before using many of these filtering options, we must first activate them by checking ON the "Use Filters" checkbox in the upper left (#2).  Note that the Filters tab will visually indicates to us when these filters are active by displaying a check mark directly on the tab itself (orange circle, #1).

The first available option is the "Keep only the most recent [200] sessions" (#7) that will automatically delete the older sessions above a certain number.  You can adjust this value but it's set to 200 by default.  Below that, there are the filtering options that apply to the servers - or "Hosts" in Fiddler terminology - for which capturing will be turned ON of OFF.  There, we can choose to restrict the capture to the requests sent to any server (or "Hosts") or to a server either located on the local LAN (intranet) or on the Internet (Wide Areal Network (WAN)) by selecting "Show only Intranet Hosts" or "Show only Internet Hosts" on the combobox #3.

We can even be more specific than and directly name the desired servers or Hosts by selecting the option "Show only the following Hosts" on the next comobox and then indicating their names on the text box that follows right below, see #4. Here, I've set up the local server "sylvain2" as the specific host for which the capturing must be made.  You can name more than one server if you want to.

Instead of filtering based on the servers (or "hosts"), you can also choose to filter based on one of the processes running on your machine and the "Client Process" is just for that.  We must first check the box "Show only traffic from" (#6) and then select on the dropdown menu at the right the desired process.  Here, I have chosen the process "WindowsLiveWriter: 3884 - Solutionner problème connection WLW" so that Fiddler will capture only the TCP/IP traffic initiated by Windows Live Writer.  The process number (here 3884) and the shown title ("Solutionner problème connection WLW" here (in french)) will of course be different in your case.  Beware: WLW always uses a single process for managing all of your windows; so even if you have multiple WLW windows opened, this selection will only show you one at a time, usually the one that's currently active.  However, this choice will always be good for every opened WLW windows; regardless of the current active window.

Of course, you don't really need to select all of these options of even anyone of them.  Also, be aware that some of these filtering options can also affect other things than just what will be captured or not by Fiddler and that with these, you can  or could also block or change the transmission of any requests or responses related to any kind of TCP/IP activity.  For example, if you check the "Block images files" option in the "Response Type and Size" section, the transmission of all the images files will be blocked not only for WLW but for the other processes as well; even including your current browser.  In effect, this means that the web pagew will now be shown without any image displayed on it, as you can see from the second figure below.  Therefore, you must be careful when using a program like Fiddler and you must check all the options carefully when something is not going or loking right.

image

image

5- Issue # 1 - Appearance of the UTF-8 BOM

Now that we know Fiddler, we can use it to solve our little problem: if we try again to communicate with WordPress using Windows Live Writer (WLW) but this time using Fiddler to watch the processus, we are quickly greeted by the following error message from Fiddler:

image

"Fiddler has detected a protocol violation in session #633.  Content-Length mistmatch: Reponse Header claimed 634 bytes, but server sent 637 bytes."

Immediately after that, WLW shows us its generic error message that we have already seen previously and abort the rest of the communication. So it's looks like that the problem from a not well formated response from the server: the length of the text as indicated in the header (the Content-Length parameter) is too short of 3 characters. This difference of 3 characters explains why the text of the response recorded by WLW in its log file has been truncated: WLW stopped reading the rest of the text after reaching the total numbers of characters as indicated in the header and simply dropped the rest. So WLW is giving full precedence to the Content-Length parameter instead of trying to follow the textual layout of the XML document has it has been written in the response file by the XML-RPC server.

One way to correct that would then be to make a proper adjustement to this parameter in the PHP script where it is calculated.  For example, Peter Van Eeckhoutte suggests in his blog ("Windows Live Writer unable to connect to Wordpress Blog | Peter Van Eeckhoutte's Blog") to modify the WordPress file class-IXR.php - which is the script that generates the XML response - located in the directory "wp-includes" by changing the following line located in the function output ($xml) :

$ length = strlen ($xml);

so that it will now automatically add 3 characters every time:

$ length = strlen ($xml) +3;

It doesn't need much explanation to say that this correction is not very satisfactory not only because this is a "blind" correction but also because it doesn't tell us anything about why the calculated length is invalid in the first place and, of course, doesn't correct the problem at its source at all.  We don't know if this will always be the case or if the problem will get corrected at some point in time and furthermore, we don't know either what will happen is someone want to use another blog editiong software than WLW.

So this is clearly not a very good solution but to find a better one, we must first know why there is this discrepancy in the first place and this exactly what we can find if we take a deeper look at the response using the Hexadecimal mode of Fiddler by chosing the HexView tab (orange circle) : image

In this mode, the information is displayed on three columns: in the first, we can see the address of each line in hexadecimal; in the second column, the hexadecimal value of each character and finally in the third, the characters themselves; with the exception the control and other non-printable characters which are replaced by a point.

If we look in the third column for the beginning of the XML document, that is, the five characters <?xml;  we can see that they are preceded right before them by three strange characters :  ; that I have highlighted in blue on the figure.  These three characters are the BOM (Byte Order Mark, see "Byte order mark - Wikipedia, the free encyclopedia") for an UTF-8 (Unicode 8 bit) encoded file.  Usually, a BOM marker for an Unicode file will only be present at the beginning of it and not in the middle.  While this location in the middle is not really forbidden, the presence of a BOM at a location other then the beginning of the file has no signification and normally, a program should simply read past them when they are in the middle of a file.

However, for Windows Live Writer (WLW), these three characters are not totally skipped away and while they are not used to determine the encoding of the file, as it should be because they are not at its beginning, they are still counted as part of the total number of characters indeicated in the Content-Length parameter.  However, a close observation of the PHP code reveals that when they are emitted, they are not be counted as part of the length of the text; which at the end gives us this truncated XML decument by three characters.

(Note: the sequence of 4 characters "0D-0A-0D-0A" that you can see just before the BOM and the tag "xml" are the two newlines + carriage return needed to create the white line separating the header text that follows, the two characters 0x0D and 0x0A is the definition of a new line in Windows. On Macintosh and Unix, you'll find only the character 0x0D instead of two characters 0x0D and 0x0A. These four non-printable characters are shown by 4 points in the third column.)

It looks that we have found the cause of our truncated response: the presence of the BOM in mid-file offset the reading of the XML document by WLW by 3 characters and there is a serious problem here: if we don't adjust the Content-Length; WLW is thrown off by three characters.  However, we don't know for sure if this will always be the case and we don't know eithr what will happen if someone else will want to use another blog editing software than WLW.  Furthermore, this problem with the presence of the BOM is only one of the many other possible problems that can also affect the value of the Content-Length parameter. For example, on his blog (in italian), Etrusco says that he has found a difference of 10 characters between the length specified in Content-Length and the actual length of the XML document; see "Windows Live Writer e wordpress, quando l’xmlrpc smette di funzionare! | pensando.it".

So our best solution here would be to get rid of the presence this BOM in the middle of the file and for this, we must know how and why is PHP writing this BOM in the first place and if there is any way of stopping it doing this.  This is the topic of the next section.

6- BOM (Byte Order Mark),  PHP and XVI32.EXE

The primary function of the BOM (Byte Order Mark) is to indicate the storage order of bytes for files that have been encoding using the 16 bit of Unicode (UTF-16); because, in the case of the 16 bit mode, two bytes are used for each character but depending of the operating system, either the high order byte or the low order by could be written first for each pair.  The writing of a BOM at the beginning of the file is then useful for specifying which one of these two orders has been used.  The case for the 8 bit mode of Unicode (UTF-8) is a little different because there is no byte order in this case and whatever the operating system, all files encoded with UTF-8 are always encoded in the same way.  However, writing a marker at the beginning of a file encoded with UTF-8 could still be useful to distinguish it from other encoding that are not UT-8 (weither it is one of the two UTF-16 or one of the various Ascii modes or anything else).

Beside the ordering of the bytes, the big problem with UTF-16 is that for Latin alphabets, they take twice the size of files encoded with an Ascii character sets such as Windows-1252 but this is not a problem with UTF-8.  English files encoded in UTF-8 occupy the same size and for other latin alphabet such as french or spanish, the difference in size is barely bigger.  The advantage of size is not always there for other alphabets such as Thai but then, there is the problem is mixing multiple alphabets inside the same file; like a combination of english, french, spanish and Thai if someone want to do that.  This is not a problem with UTF-8, so in recent years, it has slowly but surely started to make its way of becoming the de-facto standard for the internet and for the storage of internationalized files.

However, there is one small problem that remains and this is the question of writing or not a BOM for UTF-8 files and on this question, the of PHP 5 for UTF-8 is seriously lagging behind.  While many other languages offer the option of writing or not a BOM; the developers of PHP 5 have simply chosen to automatically write or not of a BOM based on its presence in any of the PHP script files involved.  So to be clear: you or the programmer doesn't decide to write a BOM or not but PHP will decide based on the fact that it will find at least one PHP script file with a BOM when it will execute a list of script files for a call.  So the presence of a single file with a BOM wins it all.  And to make things worst, there is not even a simple way of knowing or finding from inside a script is one will be written or not; so this is something that is totally out of our control.

However, in our case, the solution is simple: if a BOM has been written then it's because one (or more) of the PHP script files has one and all we have to do is to find this culprit and strip away its BOM marker.

There are many ways of finding this.  First, you could use a program such as EGREP to scan for every files and report those who have one or you can use an editor such as NotePad++ capable of showing - and deleting - a BOM at the beginning of a file and take a look at each file one by one.  Many of you probably have one favorite but for those who don't have one, I will suggest here a small program because of its simplicity of use: XVI32, a free hexadecimal editor that you can download for free and that doesn't need to be installed on your system. Once downloaded, all you have to do is unzip it and then double-click on the executable XVI32.EXE file to launch the program. You can then use the menu to open the desired file or better yet, do a Drag & Drop of the desired file directly on the program's window:

image

The only problem with this method is that you have to verify each file one by one until you find the culprit but in this case of Wordpress, there is one likely suspect and it's the configuration file wp-config.php.  While the distributed version of this file doesn't have a BOM, this is our likely suspect because this is a file that we'll have to open in order to make change to the configuration of the WordPress installation and by doing so, we are putting ourself at the mercy of any editor that will think that it will be a good idea to add a BOM at the beginning of a file without even telling us (or by telling us while we don't understand the full implication of doing this).  This is the root of the problem: even if the distributed file doesn't have any BOM, there is a lot of ways that can make this happens or, as bad, removing it.  Therefore, it's a moving target and you must be careful, when you are editing this file or any other PHP script file, to not introduce inadvertantly a BOM at its beginning.

If you want to remove the BOM using XVI32, simply use the Delete key over its three characters as they are shown above.  Note however that with other editors, the process for removing a BOM is not necessarily the same and that you might have to look at the options provided by the program.  Again, as some programs might add one - or strip it - automatically without telling you; so you must be careful and always check before using any editing program with a PHP script file.

7- Presence of extraneous characters in a PHP script file

As we have also say earlier, the presence a BOM is not the only thing that can throw away the determination of the exact value for the Content-Length parameter in a PHP script and two others would be the presence of extraneous characters - usually newlines and blank lines - at the beginning and the end of one or more PHP script files.  These extraneous characters are anything - including a blank space or a new line - that comes before the PHP starting tag <?php or the ending tag ?> (albeit we have some very little lose for the end).  A PHP script used for a web page can normally have many of these starting and ending tags and also have a lot of other characters outside of these delimiters to be retranscribed verbatim on its output but the case for a script file used to build an XML-RPC response is different because these XML responses must follow a strict guideline and the presence of any illegal or extraneous character is forbidden; therefore those scripts must be constitued of a single pair of tags and have nothing else outside of them; not even a blank space or a newline at the beginning or at the end.  The only disgression is one single newline after the ending ?> with maybe one or more blank space between the final ?> and its single newline but nothing else after that; not even a second newline or a blank line.

Unless by taking some very convoluted look at the output buffer, a PHP script cannot know if there is anything else outside of these two delimiters so it cannot takes these into account when calculating the length the XML document to be put in the Content-Length parameter and therefore, if there is indeed some extraneous characters in any of the script files, this parameter will be invalid and WLW will be unable to correctly read the XML response and will throw an error message.  If this is the case, we cannot directly determine which script is badly written by looking at the response or in the log file; however, we can at least determine if we are in front of a problem caused by extraneous characters instead of something else such as the presence of a BOM or a error or a warning message from the script itself; thus greatly restricting the field of search for solving the communication problem.  In the next two sections, we will see how to look in the Response for the presence of extraneous characters at either the beginning or the end of a script file.

8- Presence of extraneous characters at the beginning of a PHP script

We already know that when the Response is correctly formatted, one blank line (#5 and 6) separate the header from the XML response (#3 and 4); as we can see in the below figure illustrating a capture from Fiddler using the RAW mode.  Remember that the top window at the right is the Request and the bottom window if the Response and that we are mainly interested only in the Response in these cases.  (With most XML-RPC communication problems with WLW against a PHP server, there is no problem with the Request, only with the Response.)

image

If we take a look in the log file of WLW for the same correctly formated Response, we don't see the header or the blank line - because normally WLW doesn't record them - but we can see a proper and well written XML document; with no blank line before and no truncation of its ending tag </methodResponse> at the end:

image

Now, let's take a look at what happens if we add a single extraneous blank line (#1) at the beginning of one of the PHP script, for example wp-config.php, just before the starting tag <?php :

image

As expected, we will now receive an error message from WLW to the effect that the received XML-RPC response from the server is invalid and the capture from Fiddler will now show a second blank line separating the header from the XML text in the case of the Response (#2, bottom window) :

image

Furthermore, a look at the entry in the log file will now show a blank line before the XML document (#1, there was none in the case of a correctly formated Response) and the presence of a truncated ending tag (#2) :

image

Of course, if there are other extraneous characters such as additional blank lines or anything else, we will see all of these in both the capture from Fiddler in the log file.  For anything more than a single extraneous blank line, Fiddler will also start telling us about a difference between the Content-Length parameter and the real length of the text. (But it won't say anything when there is only a single extraneous blank line because there is some lose for the eventual presence of an ending newline at the very end of the file.)

The problem here is that we don't know which file(s) on the server side has this or these extraneous characters; so we will have to look at them one by one.  The best way of doing this would be to switch to a standard template and deactivate all the plugins.  If the problem doesn't disappears, we can say that it is located in one of the file that is part of the standard distribution of blogging system; probably wp-config.php in the case of Wordpress.  If it disappears, then we can say it's either in the installed template or in one of the plugins.  Most likely, this will be the second case; ie, one of the plugins; so you can start re-activating them one by one until your find the culprit.

9- Presence of extraneous characters at the end of a PHP script file

Beside being possibly at the beginning of a script file, extraneous characters can also be present at the end of the file; after the final ?> ending tag.  However, there is a little twist here: it's not because that they are at the end of the script file that you will find them at the end of the Reponse, after the XML document.  When PHP execute the main script file (xmlrpc.php in the case of WordPress) for executing a XML-RPC call, this script file will usually include one or more other script files near its beginning in order to get access to some additional PHP functionality and often, many of these included files will also include themselves other additional files.

However, while these additional functions might be called later (or never) during the execution of the script; this is not the same for the presence of any extraneous characters: their printing will not be delayed later but instead will be done as soon as the containing file is including; which means usually before the execution of the main call in the main script file.  Therefore, the presence of any extraneous characters at the end of an included file will appears before the writing of the XML document response; ie, will appear at the same place as for the presence of extraneous characters at the beginning of a script file.

For example, I have put one more blank line ad the end of the file wp-config.php; as you can see from the following figure:

image

I had to select the fnals character so that you can it.  As expected, after this change, the XML-RPC communication between WLW and WordPress doesn't work anymore and the capture of the Response by Fiddler or the WLW's log file will now show us an extraneous white line before the XML document and a truncated ending tag; just like it was in the case with an extraneous blank line at the beginning of the script file:

image

So, if it's difficult or impossible to distinguish between extraneous characters at either the beginning or the end of a script file, what is our best solution?  Well, there is no magic solution here and like it was the case for extraneous characters at the beginning, the first step is to identify the culprit by checking all the main files - always think first about the configuration wp-config.php in the case of WordPress - and if you don't find it, by deactivating all plugins and all installed templates and going through them one by one and carefully checking their end.

However, in this case, PHP has a nice little feature that can greatly facilitate our task of carefully checking the end of each file and it is that the ending tag ?> at the end of a PHP script file is now optional since some of the last versions.  This means that instead of carefully checking their ends; we can simply delete the ending tag ?> and after that, the scripting part will automatically extend to the very end of the file; without anymore the possibility of any extraneous characters at its end:

image

-- Look Mom, No Hands!

 

This feature that the last matching closing tag ?> is optional at the end of a script file is a peculiarity of the PHP language but it comes to be very handy in this situation.

This completes the first part of this quick overview about some of the communication problems of Windows Live Writer with a PHP based blog systems using XML-RPC.  In the next part, we'll see some of the other causes of communication problems that can happen and in particular, we will take a look at the presence of various warning messages that PHP can wrongly insert inside a XML Response under various conditions and that can result into a disrupted XML-RPC communication.

Share/Save/Bookmark

0 comments:

Post a Comment

Welcome!

I'm in the process of launching this blog, so pardon me if a lot of features are still missing and that the articles are only partially written. In a few days, everything should be fine.
 
Thanks for your patience!
S. L.

Latest comments

Share - Bookmark - Email

Share/Bookmark

Google Friend Connect

Translate this page

      

About Me

My Photo
Consultant - Travailleur autonome pour les bases de données et l'internet.

Donate ($CAD)

 
If you have liked this site or if I've been helpful, you can support my work by making a donation. Thanks.

Advertisements

This Blogger template has been created with:
 
Artisteer - Web Design Generator

Disclaimer • Privacy policy

The information on this blog is provided for informational purposes only. The writer is not responsible for any damage caused by performing actions specified on this blog, or by relying on information published on this blog.  By using information or components provided on this blog you are accepting these terms.

This blog uses cookies.  You can find information about these and about the privacy policy of this blog by clicking here.