Thursday, July 1, 2010

.NET SerialPort Woes

Preface

This is a very long post articulating the .NET SerialPort bug and the proposed fix for Microsoft to implement in its post .NET 4.0 framework. An interim fix that doesn't involve Microsoft is also available.


Introduction

The built-in serial port support in .NET has been a major let down (which has remained largely unchanged since its introduction in v2.0 to the latest v4.0). Posts on MSDN have suggested that a lot of people (both C# and VB users alike) are in fact facing some form of difficulties using System.IO.Ports.SerialPort:

IOException when reading serial port using .NET 2.0 SerialPort.Read

Port IOException

SerialPort.Close throws IOException

IOException when SerialPort.Open()

WinCE 5.0 - IOException when serialPort.Open()

WARNING! SerialPort in .NET 3.5

... and many more, but take the last one with a grain of salt.


Yet Microsoft could't seem to be able to reproduce the bug, or worse, brushed it aside thinking it's a problem with users' code. That is likely due to the noise (i.e. incorrect answers accepted as correct answers) introduced by the forum's serial port expert pretenders (a lot of them Microsoft support staffs) - so far none of the posts were answered correctly, yet were all marked as correctly answered! Yay to the quality of MSDN forum - a forum where n00bs answer questions by other n00bs. The truth is, there's only one similarity in all of the posts - they all encountered IOException.


The Problem - IOException

To understand why IOException occurs in SerialPort (or rather, SerialStream to be exact), one would only need to look as far as the WinAPI Comm functions. SerialStream calls several comm APIs to get the real job done and when the API functions fail and return an error, SerialStream simply throws an IOException with the message returned by FormatMessage with the error code from GetLastError.

So why does the WinAPI function fail? From the posts, they all have a common error message:
"The I/O operation has been aborted because of either a thread exit or an application request." (error code 995)


While a thread exit will also cause an overlapped (asynchronous) function to implicitly abort, in this case it's aborted because the serial port is in a mode where any errors encountered by the UART chip would trigger the abort flag causing all current and subsequent serial port related calls to abort. Errors include parity error, buffer overrun, etc.

Some developers have even encountered IOException as soon as they call SerialPort.Open(), especially in slow devices such as handhelds running .NET CE. Some encounter it when the garbage collector disposes the serial port. Some encounter it when they call SerialPort.Read().

They're all due to a mistake in .NET's SerialStream implementation - neglecting to set the fAbortOnError flag in the DCB structure when initializing the serial port.

This negligence on Microsoft's part means every time you run your application you could potentially encounter a different behavior (this flag is persistent across app runs and the default is determined by BIOS and/or UART hardware vendor). Some claim that it only happens in one machine and not others. This also explains why it has remained such a pesky problem for both developers and Microsoft since the first incarnation of the SerialPort class.

When fAbortOnError flag is set to true, this is indeed the expected behavior - but is this the desired behavior Microsoft intended for its users? No. System.IO.Ports.SerialStream was never meant to work with fAbortOnError set to true, because the ClearCommError WinAPI function that goes hand-in-hand was nowhere to be found among its methods. Clearly, whoever wrote SerialStream made a mistake (and needs to be shot).


The Solution

It took me an entire day to root cause this problem. Luckily the solution is much simpler.
Here's what Microsoft needs to do to fix the problems (in reference to the .NET 4.0 source):

1) In InitializeDCB, SetDcbFlag for bit 14 to zero - this sets fAbortOnError to false. Also, retry GetCommState and SetCommState if it fails with error 995 (call ClearCommError() before retrying).

2) In SerialStream's c'tor, move InitializeDCB to the line before GetCommProperties. This fixes the problem for the folks who've been getting IOException when calling SerialPort.Open(). The reason SerialPort.Open() only failed on slow devices because between the port's CreateFile and the time GetCommProperties() is called, a comm port physical error might have already occurred.


The reason some people have claimed that their app simply crashes out when their application terminates is due to DiscardInBuffer() in SerialStream.Dispose() throwing IOException because PurgeComm failed with error 995, likely because of buffer overrun as their serial devices would've been sending and filling up the input buffer before user closes the app. And mind you, Dispose() at that point would've been called by the garbage collector thread - hence a try-catch would've been ineffective, unless of course, you've manually disposed the object prior to closing the app - causing the app to hard crash with unhandled exception.

How do you fix it in the interim? Simple. Before you call SerialPort.Open(), simply open the serial port by calling CreateFile and SetCommState's fAbortOnError to false. Now you can safely open the serial port without worrying that it might throw an IOException. I've whipped up a sample workaround in C# that you could use as a reference.

34 comments:

Andy said...

Zach, this post is very interesting indeed. I have been fighting .NET serial port woes for the longest time, including random 'thread exit...' exceptions for no apparent reason after long running times.

I'm not sure I totally understand though, are you saying that setting this DCB flag to false inhibits any UART errors from rippling their way up to become an IOException?

Would this lead to other problems though because it had been ignored?

I'm extremely curious about this, because the .NET serial port has driven me nuts!

Thanks for any clarification you can give.

Zach Saw said...

Errors will NOT be ignored. The SerialPort class was not written to handle operation abort due to UART errors. It is however, designed to report all UART errors.

Note the distinction between aborting operations and reporting errors. For example, a read doesn't have to abort when the UART encounters an error. The error, however would still come through via a separate thread triggering an error event. You then have the option to discard the read bytes if you choose to.

Andy said...

A-ha, understood. Many thanks Zach, good work mate.

Thomas Kjørnes said...

I submitted this bug on Microsoft Connect earlier, and I recently also added a reference to your blog, and more complete sollution.

Still haven't heard anything specific, and was wondering if you have gotten any response from Microsoft on this issue?

Regards
Thomas

Thomas Kjørnes said...

I submitted this bug on Microsoft Connect earlier, and I recently also added a reference to your blog, and more complete sollution.

Still haven't heard anything specific, and was wondering if you have gotten any response from Microsoft on this issue?

Regards
Thomas

Zach Saw said...

No I haven't got a response from Microsoft either.

Anonymous said...

Excellent, many thanks for your hard work!

Anonymous said...

Zach,

I've encounted an odd issue on a customer's machine related to a USB/Serial converter that results in the error "System.ArgumentException: The given port name does not start with COM/com or does not resolve to a valid serial port". This occures after getting a list of serial ports (SerialPort.GetPortNames() static method) and used the first port returned which, I believe to be the USB/Serial converter).

The root exception start at "System.IO.Ports.SerialStream..ctor(...)".

Now, I am not posting to have you debug this for me, just wanted to see if, in your experience, the bug you wrote about here may be related. I realize this is somewhat vague. Unfortulatey I have no access to the customer's USB/serial port information (type, driver, etc.) or access to the customer.

Thanks...and thanks for your article!

Anonymous said...

Excellent post. Thanks for the detailed info; it got my app running reliably.

admin said...

i have made a console App in which,I pasted your given code but it gives error

Microsoft.Win32.SafeHandles

Error 1 The type or namespace name 'SafeHandles' does not exist in the namespace 'Microsoft.Win32' (are you missing an assembly reference?) C:\.....

help me

Zach Saw said...

@admin

You're most likely targeting .NET Compact Framework. The safehandle namespace doesn't exist, so you'll have to change SafeHandle to IntPtr.

The code I implemented was in response to a reader who requested it for .NET 2.0.

Mateusz said...

Great article. I was fighting with this problem on .net CF 2.0.
The other solution for SafeHandles is to import OpenNETCF.Win32.SafeHandles which is free in Community Edition.
Regards

Mateusz said...

Hey Zach,
I made some changes in your code so it works on CF without OpenNetCF. Can i post it on my blog with link to your post?
Regards,
Matusz

Anonymous said...

Zach, I've been fighting a mystery hang at SerialPort.Open() for two days. My code now works. I'm standing on the shoulders of a Giant and the view is pretty good! Thanks!!!

Anonymous said...

You sir are a real lifesaver, can you believe this problem has plague me for months? Your fix also fixed the problem with usb to serial port getting removed when it's still open in application.

Anonymous said...

Thanks for this Zac. As well as this problem, I also have the stream dispose issue, however. It seems MS completely *****-up this class.

Barton said...

I've put this fix in on spec, although I'd never actually seen this behavior.

What I have seen (and found a fix for) is the program crash when an existing connection goes away before the app terminates. This happens a lot, for example, when a usbser hosted USB device is unplugged.

Through a little experimentation and some luck, I found that the crash happens when the framework tries to dispose the stream which no longer exists. To totally free the port so that it might be reconnected to, it is also necessary to call Close() even though we know that it doesn't exist.

SafeDisconnect() must be called in response to InvalidOperationException when a working usbser hosted serial connection becomes disconnected.

public void SafeDisconnect()
{
GC.SuppressFinalize(_port.BaseStream);
try { _port.Close(); }
catch { }
_port = null;
}

Hope this helps others...

Anonymous said...

Brilliant, works perfectly.

Thanks so much Zach!

Anonymous said...

Thanks Zach...awesome work. Just makes you wonder if MS use their own classes for development?

Jóse said...
This comment has been removed by the author.
Jóse said...

Hi: None of those workarounds has worked for me. Finally I found the solution adding to app.config under runtime section:

legacyUnhandledExceptionPolicy enabled="1"

Hope this help.

lalitha said...

CreateFile is not creating file .it is throwing exception file not found

SafeFileHandle hFile = CreateFile(@"\\.\" + portName, dwAccess, 0, IntPtr.Zero, 3, dwFlagsAndAttributes,
IntPtr.Zero);
Please help me

magnum said...

About the USB serial port com not found, the problem is in the driver configuration of the converter

When you try to access the com port even it's already listed in the device manager or the list returned by the app, the drives is not loaded.
START_TYPE has to be changed from 3 to 1.

Stavros said...

I also had problems with the serialport.

I submitted feedback about the serialport instability at microsoft connect web site
http://connect.microsoft.com/VisualStudio/feedback/details/755055/serialport-is-not-stable

Is you have problems with the serialport (app hangs up), please vote for it and click the "I also have this problem option".

Maybe that way MS will be interested to fix the bugs.

ramakrishna said...

SerialPortSettings.Write(sp, "05IAE\r\n", ref errMsg);
When I am writing this command to serial port i am getting time out error and crashing my application, can i handle this in error handling using timeout exception and to continue to connect other ports as i scheduled to connect though it fails for one com port. Zach could you suggest some thing to handle such exceptions. The problem of this error might be with the device to which iam connecting, but I will have to communicate to other devices using other COM ports with out my application getting crashed.

Anonymous said...

private void button1_Click(object sender, EventArgs e)
{
_displayWindow = rtbDisplay;
SerialPortTester.SerialPortFixer.Execute("COM4");
using (SerialPort port = new SerialPort("COM4"))
{
port.Open();
int i = 0;
while (true)
{
port.Write("test");
DisplayData(MessageType.Outgoing, "Test " + i++);
}
}
}

Here is my button handling routine.
It worked but as soon as I disconnect the usb device (USB to Serial converter) I got IOException.
Any idea?

Thanks,

Cesar said...

Awesome, THANKS!!!!

Anonymous said...

@Mateusz
How did you make it run on CF without OpenNetCF?
Please help

Ryan Lush said...

I realize this isn't really the subject of your original post but you seem pretty knowledgeable about the subject so I'm hoping you can help me. My first problem is communicating with an FT2232. I have one of the ports spitting out valid data any terminal program can read but my buffer in C# is full of '?'s and nulls. So I've resorted to sending the data out of one of the other UARTS on my ARM chip and feeding it to an XBee chip, then receiving it with another XBee plugged into an FT232. The same C# program has no problem accepting that data. I can send data from the C# code to the ARM device with either setup no problem.

The other problem is speed. I've tried using both the DataReceived event and calling port.ReadLine() in a while(1) loop. In either case I can let the ARM spit data out for about 60 seconds and then turn it off and the C# code will continue to process receptions for another 30 seconds. The strangest thing to me is sometimes it can keep up, sometimes it can't, even though I haven't made any changes.

Jim said...

I am using a level shifter that is sketchy at 115200 and have lots of UART errors. I couldn't run for more than a sceond without the read operations locking up with I/O operation aborted error. I set dcb.fAbortOnError = FALSE; and now I am in business. Thanks.

Anonymous said...

Hi I have he IOException error but you fix does not seem to solve my problem.

I get "A device attached to the system is not functioning.".

Any Ideas?
Regards Robert P

Walter Holman said...

As a result of a minor analysis and some fortune, I discovered that the accident happens when the structure tries to get rid of the flow which no more prevails.

Jimmy James said...

Zach, it’s been a while since this article, however, we had a potential for the condition you described in your article & thus, we tried your solution. However, now the question is, how can we test that this fix is indeed in place, since in our case it could literally take a very long time (like 1 year) before a serial port failure would show, such as a serial port lockup, where you have to reboot the machine before you could actually get a serial port back & available for re-use. Any suggestions on test cases ??? Thank you.

Anonymous said...

Has anyone been able to figure this out for .NET CF? Been trying for a while now and still no joy.