The following is not my content, it is cut and paste, but it does a very good job of explaining the difference between mutexes and semaphores:
The Myth: Mutexes and semaphores are similar--even interchangeable--operating system primitives.
The Truth: Mutexes and semaphores should always be used for distinct purposes, and should thus feature distinct APIs. (My recommendations to RTOS vendors are at the end.)
The cause of the confusion between mutexes and semaphores is historical, dating all the way back to the 1974 invention of the semaphore by Djikstra. Prior to that date, the interrupt-safe task synchronization and signaling mechanisms known to computer scientists were not efficiently scalable for use by more than two tasks. Dijkstra's scalable semaphore could be used for task synchronization (including mutual exclusion) as well as signaling.
After the introduction of commercial real-time operating systems (beginning with VRTX, ca. 1980) and the publication of a 1990 paper on priority inheritance protocols it became apparent that mutexes needed to be more than just semaphores with a binary value. Because of the possibility of unbounded priority inversion, which would break RMA assumptions, ordinary semaphores cannot be used for mutual exclusion.
Many bad sources of information add to the general confusion by introducing the alternate names binary semaphore for mutex and counting semaphore. The current wikipedia entry for semaphore is a prime example.
The correct and appropriate solution is a distinct set of RTOS primitives: one for semaphores and another for mutexes. Mutexes must prevent unbounded priority inversion. The APIs for semaphores and mutexes should be as distinct as possible, as their use is quite different.