Key points for debugging STM32+SDIO+FATFS on platforms with DMA and CACHE

Posted by fhil85 on Mon, 03 Jan 2022 11:03:59 +0100

1. Preface

FATFS tutorials and processes are common, and its platforms cover from 8051 core MCU to Cortex-M7 core high-performance MCU and even higher performance application processors. However, most STM32 tutorials and routines bypass this mode using D-Cache writeback mode and DMA. For example, the processes of punctual atoms and wildfire set D-Cache to write through mode. This greatly weakens the performance of the processor with D-Cache. Especially when connecting to SDRAM, there will be a long delay when frequently writing to SDRAM using D-Cache through write mode, and the processor needs to wait for data writing to complete. Using D-Cache write back mode does not need to write to SDRAM frequently, and SDRAM is written in Cache behavior units when Cache is updated, which can give full play to the advantages of SDRAM burst reading and writing.

2. Relevant points

When DMA and Cache are used at the same time, the data in the Cache and its corresponding memory will be inconsistent. Because the content stored in the Cache completely depends on the active memory access of the CPU.

In a system with only CPU and Cache, because only CPU has the ability to read and write memory and Cache, the final output result is completely determined by the program. When DMA is added to the system, DMA can change the memory data without synchronizing to the Cache, and DMA can also output the old data stored in memory that is not updated from the latest state in the Cache (the Cache is in write back mode). The problem of inconsistent data arises.

To solve this problem, we need to reasonably use the two functions of clearing Cache and invalidating Cache to synchronize Cache and data in memory. In actual use, these two functions are not simple calls. If you don't pay attention to the details and order of address alignment, you will risk losing data. For example, the key point of this post to solve the problem is the official SD_ diskio. The details of address alignment in the C file are not handled well. When synchronizing the DMA received the data in memory to the Cache, the Cache needs to be invalidated. When invalidating the Cache, the data written to the Cache of adjacent addresses that has not been synchronized to the memory is lost.

3. Detailed introduction to key issues

3.1 inconsistency between cache and memory data

In a system with only CPU and Cache, when the CPU sends an active memory access request, if it hits the Cache, it will directly read and write the Cache. If it does not hit, it needs to access the memory, and the inverse code between the access address and the Cache line size will be used to obtain the address of the Cache line where the access address is located, and the whole Cache line will be called into the Cache. For example, if the access address is 0X1314 and the Cache line size is 32 bytes (0X001F bytes, the inverse is 0xfffe0), the Cache line address of 0X1314 is 0X1314 & 0xfffe0 = 0X1300.      

When DMA occurs in the system, although DMA only has the right to read and write memory, its read and write operations and CPU read and write operations are asynchronous and do not notify each other directly, which will lead to:

  1. After DMA inputs data, if the input address hits in the Cache, the CPU will only read the data in the Cache and ignore the data in memory.
  2. When DMA outputs data, because the CPU directly reads and writes the Cache when it hits the Cache, the results output by the CPU may be stored in the Cache rather than in memory. If DMA directly outputs the data in memory, the unprocessed old data will be output.

3.2 data loss caused by synchronous Cache and memory data operation

Cache synchronization is performed by line. If the address of the synchronized content is not aligned with the address of the cache line, the data adjacent to the synchronized content address but stored in the cache will be lost when the cache operation is invalidated. This problem is more complicated, as shown in the figure below.

Comparing the situation before and after synchronization, it can be clearly found that although the operation of invalidating the Cache during synchronization synchronizes the data received by DMA into the Cache, it also synchronizes the old data before CPU modification into the Cache, resulting in the loss of new data after CPU modification.

4. Practical steps

  1. Configuring SDIO and FATFS using CUBEMX
  2. Setting Cache and DMA
  3. Change sd_diskio.c Documents
  4. Write test program

5. Configure SDIO,FATFS and DMA

5.1 SDIO configuration

STM32H7 series and STM32F7 series are called SDMMC. The contents introduced in this post have nothing to do with the details of SDIO and SDMMC, so the names of SDIO and SDMMC in this post can be interchanged, that is, the interface connecting SD card. The basic parameter configuration is shown in the figure below.

Pay special attention to the SDMMC frequency division setting of "SDMMC clock divide factor". This parameter needs to be related to the input clock of SDMMC peripherals in the clock tree. The ultimate purpose is not to exceed 25MHZ after frequency division. For example, in this post, the clock configured for SDMMC in the clock tree is 90MHZ. After four allocation, it is 22.5MHZ, which meets the requirements.

 

5.2 FatFs configuration

Select SD first_ Card option, and then a selection list will appear below. Select from the list, and the rest will remain the default. Select the setting in the red box. This setting allows the file system to recognize files with Chinese names. At the same time, because it is stored in the system stack, it is necessary to increase the size of the system stack, as shown in the figure.

The actual measurement shows that the heap size can not be set and remains the default, but the stack size needs to be increased.

Finally, configure FATFS to use DMA for bottom reading and writing, as shown in the figure. This option must select DMA when using FreeRTOS.

 

5.3Cache configuration

Select "cortex" in the "system kernel" settings category_ M7 "kernel settings: both ICache and DCache are enabled. After enabling, DCache is set to write back mode by default

5.4 modification sd_diskio.c Documents

First, add a macro definition:

#define ENABLE_SCRATCH_BUFFER			1
#define ENABLE_SD_DMA_CACHE_MAINTENANCE  1

Otherwise, because the Cache is enabled, if you do not add these macro definitions, conditional compilation will not compile the code of Cache synchronization data during reading and writing, and the function cannot be realized directly. Secondly, pay attention to the scartch array defined in the second red box. This array is a remedy when address alignment is not carried out for the buffer used in reading and writing. Since the built-in DMA of SDIO is transmitted in word units, the DMA buffer should be aligned with 4 bytes. Since the D-Cache line of Cortex-M7 is 32 bytes, the buffer should be aligned with 32 bytes when using Cache.

This remedy was originally used in special cases. However, after the FATFS file system is initialized, the buffer used to read the basic information such as the first sector of the SD card and the first sector of the file system when mounting the disk is defined in the structure named FATFS. The structure is defined as follows

/* File system object structure (FATFS) */

typedef struct {
	BYTE	fs_type;		/* File system type (0:N/A) */
	BYTE	drv;			/* Physical drive number */
	BYTE	n_fats;			/* Number of FATs (1 or 2) */
	BYTE	wflag;			/* win[] flag (b0:dirty) */
	BYTE	fsi_flag;		/* FSINFO flags (b7:disabled, b0:dirty) */
	WORD	id;				/* File system mount ID */
	WORD	n_rootdir;		/* Number of root directory entries (FAT12/16) */
	WORD	csize;			/* Cluster size [sectors] */
#if _MAX_SS != _MIN_SS
	WORD	ssize;			/* Sector size (512, 1024, 2048 or 4096) */
#endif
#if _USE_LFN != 0
	WCHAR*	lfnbuf;			/* LFN working buffer */
#endif
#if _FS_EXFAT
	BYTE*	dirbuf;			/* Directory entry block scratchpad buffer */
#endif
#if _FS_REENTRANT
	_SYNC_t	sobj;			/* Identifier of sync object */
#endif
#if !_FS_READONLY
	DWORD	last_clst;		/* Last allocated cluster */
	DWORD	free_clst;		/* Number of free clusters */
#endif
#if _FS_RPATH != 0
	DWORD	cdir;			/* Current directory start cluster (0:root) */
#if _FS_EXFAT
	DWORD	cdc_scl;		/* Containing directory start cluster (invalid when cdir is 0) */
	DWORD	cdc_size;		/* b31-b8:Size of containing directory, b7-b0: Chain status */
	DWORD	cdc_ofs;		/* Offset in the containing directory (invalid when cdir is 0) */
#endif
#endif
	DWORD	n_fatent;		/* Number of FAT entries (number of clusters + 2) */
	DWORD	fsize;			/* Size of an FAT [sectors] */
	DWORD	volbase;		/* Volume base sector */
	DWORD	fatbase;		/* FAT base sector */
	DWORD	dirbase;		/* Root directory base sector/cluster */
	DWORD	database;		/* Data base sector */
	DWORD	winsect;		/* Current sector appearing in the win[] */
	BYTE	win[_MAX_SS];	/* Disk access window for Directory, FAT (and file data at tiny cfg) */
} FATFS;

The buffer is the last array of the structure, win[_MAS_SS]. Even if the 32-bit byte alignment is used when defining the structure variable, the 32-bit byte alignment of the first address of the array cannot be guaranteed. Therefore, the method described above needs to be used to avoid this problem.

5.5 writing test code and testing

#include <string.h>
char filename[] = "STM32H743_SDMMC_TEST.txt";
char wtext[] = "SUPER IDIL Your smile is not as sweet as yours, the sunshine in August is not as dazzling as yours, falling in love with you at 105 degrees, and pure distilled water";
char rtext[100];
void Fatfs_RW_test(void)
{
	uint32_t fre_clust, fre_sect=0, tot_sect=0;
	uint32_t write_count;
	uint32_t read_count;
	uint32_t br;
	FATFS *fs1;
	retSD = f_mount(&SDFatFS, (TCHAR const *)SDPath, 1);
	SCB_CleanDCache();

	if(retSD){
		printf("mount error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("mount success!!! \r\n");
	}
	
	retSD = f_getfree((const TCHAR*)SDPath, (DWORD*)&fre_clust, &fs1);
	if(retSD==0){
		tot_sect = (SDFatFS.n_fatent-2)*fs1->csize;
		fre_sect = fre_clust*fs1->csize;
		printf("total:%dMB, free:%dMB\r\n", tot_sect>>11, fre_sect>>11);
	}
	
	FIL fil;
	FILINFO finfo;
	DIR fdir;

	retSD = f_open(&fil, filename, FA_CREATE_ALWAYS|FA_WRITE);
	if(retSD){
		printf("open error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("open file success!!!\r\n");
	}
	memset(rtext, 0X00, sizeof(rtext));
	retSD = f_write(&fil, wtext, sizeof(wtext), (void *)&write_count);
	if(retSD){
		printf("write error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("write file success!!!\r\n");
	}
	f_close(&fil);
	if(retSD){
		printf("mount error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("close success!!!\r\n");
	}
	
	retSD = f_open(&fil, "SD Card test documentation.txt", FA_READ);
	if(retSD){
		printf("open error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("open file success!!!\r\n");
	}
	memset(rtext, 0X00, sizeof(rtext));	
	retSD = f_read(&fil, rtext, 50, &br);
	if(retSD){
		printf("read error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("read file %s success!!!\r\n", "SD Card test.txt");
		printf("%s\r\n", rtext);
	}
	f_close(&fil);
	if(retSD){
		printf("mount error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("close success!!!\r\n");
	}	
	
	retSD = f_open(&fil, "SD_TEST_TEXT.txt", FA_READ);
	if(retSD){
		printf("open error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("open file success!!!\r\n");
	}	
	memset(rtext, 0X00, sizeof(rtext));
	retSD = f_read(&fil, rtext, 50, &br);
	if(retSD){
		printf("read error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("read file %s success!!!\r\n", "SD_TEST_TEXT.txt");
		printf("%s\r\n", rtext);
	}
	f_close(&fil);
	if(retSD){
		printf("mount error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("close success!!!\r\n");
	}	
	
	retSD = f_open(&fil, filename, FA_READ);
	if(retSD){
		printf("open error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("open file success!!!\r\n");
	}	
	memset(rtext, 0X00, sizeof(rtext));
	retSD = f_read(&fil, rtext, sizeof(rtext), &br);
	if(retSD){
		printf("read error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("read file %s success!!!\r\n", filename);
		printf("%s\r\n", rtext);
	}
	f_close(&fil);
	if(retSD){
		printf("mount error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("close success!!!\r\n");
	}
	
	retSD = f_mount(NULL, (TCHAR const *)SDPath, 1);
	if(retSD){
		printf("UNmount error: %d,%s\r\n", retSD,FR_Table[retSD]);
	} else{
		printf("UNmount success!!! \r\n");
	}
}

The contents of the test code are: 1 mount the disk 2 select the total size and remaining size of the SD card 3 create the file STM32H743_SDMMC_TEST.txt. If it already exists, overwrite} 4 and write it to STM32H743_SDMMC_TEST.txt file content 5 close STM32H743_SDMMC_TEST.txt file 6 English name file reading test 7 Chinese file name file reading and writing test 8 STM32H743_SDMMC_TEST.txt 9 unmount disk

However, during the test, an error was found, as shown in the following figure

After mounting the disk, an error will be reported when opening the file. The error code is 2.

However, in subsequent verification, it was found that the cause of the error was not a problem with the file system at all, but an error occurred during D-Cache synchronization, resulting in the loss of key information.

After careful consideration sd_disk.c SD of documents_ Read function. It is found that there is a problem when writing this function. The function code is as follows:

/* USER CODE BEGIN beforeReadSection */
/* can be used to modify previous code / undefine following code / add new code */
/* USER CODE END beforeReadSection */
/**
  * @brief  Reads Sector(s)
  * @param  lun : not used
  * @param  *buff: Data buffer to store read data
  * @param  sector: Sector address (LBA)
  * @param  count: Number of sectors to read (1..128)
  * @retval DRESULT: Operation result
  */

DRESULT SD_read(BYTE lun, BYTE *buff, DWORD sector, UINT count)
{
  DRESULT res = RES_ERROR;
  uint32_t timeout;
#if defined(ENABLE_SCRATCH_BUFFER)
  uint8_t ret;
#endif
#if (ENABLE_SD_DMA_CACHE_MAINTENANCE == 1)
  uint32_t alignedAddr;
#endif

  /*
  * ensure the SDCard is ready for a new operation
  */

  if (SD_CheckStatusWithTimeout(SD_TIMEOUT) < 0)
  {
    return res;
  }

#if defined(ENABLE_SCRATCH_BUFFER)
  if (!((uint32_t)buff & 0x1f))
  {
#endif
    if(BSP_SD_ReadBlocks_DMA((uint32_t*)buff,
                             (uint32_t) (sector),
                             count) == MSD_OK)
    {
      ReadStatus = 0;
      /* Wait that the reading process is completed or a timeout occurs */
      timeout = HAL_GetTick();
      while((ReadStatus == 0) && ((HAL_GetTick() - timeout) < SD_TIMEOUT))
      {
      }
      /* incase of a timeout return error */
      if (ReadStatus == 0)
      {
        res = RES_ERROR;
      }
      else
      {
        ReadStatus = 0;
        timeout = HAL_GetTick();

        while((HAL_GetTick() - timeout) < SD_TIMEOUT)
        {
          if (BSP_SD_GetCardState() == SD_TRANSFER_OK)
          {
            res = RES_OK;
#if (ENABLE_SD_DMA_CACHE_MAINTENANCE == 1)
            /*
            the SCB_InvalidateDCache_by_Addr() requires a 32-Byte aligned address,
            adjust the address and the D-Cache size to invalidate accordingly.
            */
            alignedAddr = (uint32_t)buff & ~0x1F;
            SCB_InvalidateDCache_by_Addr((uint32_t*)alignedAddr, count*BLOCKSIZE + ((uint32_t)buff - alignedAddr));
#endif
            break;
          }
        }
      }
    }
#if defined(ENABLE_SCRATCH_BUFFER)
  }
    else
    {
      /* Slow path, fetch each sector a part and memcpy to destination buffer */
      int i;

      for (i = 0; i < count; i++) {
        ret = BSP_SD_ReadBlocks_DMA((uint32_t*)scratch, (uint32_t)sector++, 1);
        if (ret == MSD_OK) {
          /* wait until the read is successful or a timeout occurs */

          timeout = HAL_GetTick();
          while((ReadStatus == 0) && ((HAL_GetTick() - timeout) < SD_TIMEOUT))
          {
          }
          if (ReadStatus == 0)
          {
            res = RES_ERROR;
            break;
          }
          ReadStatus = 0;

#if (ENABLE_SD_DMA_CACHE_MAINTENANCE == 1)
          /*
          *
          * invalidate the scratch buffer before the next read to get the actual data instead of the cached one
          */
          SCB_InvalidateDCache_by_Addr((uint32_t*)scratch, BLOCKSIZE);
#endif
          memcpy(buff, scratch, BLOCKSIZE);
          buff += BLOCKSIZE;
        }
        else
        {
          break;
        }
      }

      if ((i == count) && (ret == MSD_OK))
        res = RES_OK;
    }
#endif

  return res;
}

The code is mainly composed of two branches. One branch is when the buff buffer address is aligned with 4 bytes, and the other is when the buff buffer is not aligned with 4 bytes. However, I forgot to consider that 32 byte alignment is required when using D-Cache. Therefore, if the buff is not stored in 32 byte alignment, but only in 4 byte alignment, the problem will arise. For example, see the following figure:

This figure is an example of the FATFS structure mentioned above at runtime. In this example, the address of the array win is 0X24000414, which is four byte aligned, but not 32 byte aligned. Therefore, errors will occur in the number of variables next to the win array when synchronizing D-Cache data. The actual test shows that the values from volbase to winsect are wrong, which is the direct cause of the error when opening the file. The root cause is that byte alignment is not considered when knowledge synchronizes D-Cache.

5.6 problem solving

        

Directly change the statement to judge whether 4 bytes are aligned to judge whether 32 bytes are aligned. Because if there is no 4-byte alignment, there must be no 32-byte alignment.

5.7 test results after correction

It can be seen from the result of serial port printing that the file can be written and read normally. Secondly, the file with Chinese file name can be opened normally, but because the file is encoded in Unicode and the serial port assistant is encoded and decoded in GB2312, it cannot be displayed normally.

Finally, after connecting the SD card to the computer, you can see the file written by the MCU

 6. follow-up

Next, implement these functions on the FreeRTOS operating system!

 

 

 

 

Topics: Operating System Single-Chip Microcomputer stm32