So I was very surprised when I found that a memcpy() from a packet buffer to a structure raised a Fault exception whereas a naive structure copy operation worked just fine.
From the STM32 datasheet,
The Cortex-M3 processor supports unaligned access only for the following instructions:
● LDR, LDRT
● LDRH, LDRHT
● LDRSH, LDRSHT
● STR, STRT
● STRH, STRHT
All other load and store instructions generate a usage fault exception if they perform an unaligned access, and therefore their accesses must be address aligned.
The problem was that the GCC compiler optimises certain instances of memcpy() and structure assignments into Load Multiple (LDMIA) and Store Multiple (STMIA) instructions. If instead you write your own word-at-a-time memcpy() macro the compiler generates LDR and STR instruction which execute nearly as fast but also work for unaligned reads or writes.
static inline memcpy(ptr_t dst, ptr_t src, size_t sz)
if (sz & 1)
*(uint8_t*)dst = *(uint8_t*)src;
if (sz & 2)
*(uint16_t*)dst = *(uint16_t*)src;
src += 2;
dst += 2;
sz -= 2;
*(uint32_t*)dst = *(uint32_t*)src;
src += 4;
dst += 4;
sz -= 4;