Continue to Site

RK3576 with UFS Storage: In-depth Analysis of Performance Advantages and Read-Write Test Data

forlinx999

Newbie
Newbie level 1
Joined
Feb 22, 2025
Messages
1
Helped
0
Reputation
0
Reaction score
0
Trophy points
1
Activity points
33
In embedded storage field, UFS (Universal Flash Storage) is gradually emerging. UFS is a type of flash memory. Similar to eMMC, it integrates a control chip, accesses a standard interface, and undergoes standard packaging on the basis of NAND storage chips, thus forming a highly integrated storage chip. Due to its compact characteristics, UFS is widely used in embedded devices such as mobile phones and tablets. Moreover, since UFS far outperforms eMMC in terms of performance, it is often used in high-end products.

Advantages of UFS

1. Faster response speed for multitasking Devices using UFS2.0. LVDS (Low-Voltage Differential Signaling) has a dedicated serial interface, allowing read and write operations to be carried out simultaneously. The CQ (Command) queue dynamically allocates tasks without waiting for the previous process to end. It’s like a car getting on the highway, with multiple lanes allowing high-speed and smooth travel. In contrast, mobile phones using EMMC must perform read and write operations separately, and the instructions are also packaged. In terms of speed, EMMC is already at a disadvantage, and it is naturally slower when performing multitasking. It likes traveling on an common two-lane road with speed limits.

2. Low latency, UFS has a 3-times faster response speed When reading large-scale games and large-volume files, UFS2.0 takes less time. The time required to load a game is one-third of that of EMMC5.0. Correspondingly, when experiencing games, mobile phones with UFS2.0 have lower latency and smoother pictures.

3. Shorter loading time for photo thumbnails in the album Taking the mobile phone album as an example, many people’s mobile phones are filled with hundreds or even thousands of photos. When you open the photo thumbnails in the album, you can clearly see the loading process. This is caused by the fact that the mobile phone cannot keep up with the refresh speed when reading photos from the flash memory. On a mobile phone with a good screen, the pictures will load smoothly as you scroll, while on a less-capable mobile phone, you can clearly feel the lag during loading.

4. Faster speed and lower power consumption After the UFS chip improves its speed, it means that it takes less time to complete the same task. Higher efficiency means lower power consumption. When working simultaneously, the power consumption of UFS is 10% lower than that of eMMC, and it can save approximately 35% of power consumption in daily work.

UFS interface read-write performance test

RK3576 CPU also provides a UFS2.0 interface and an emmc 5.1 interface.

file.php


FET3576-C SoM also reserves a UFS interface.

file.php


Refer to Rockchip’s official document “Rockchip_Developer_Guide_UFS_CN_V1.3.0” to conduct read-write tests on the UFS flash memory of OK3576-C.

Code:
Sequential write test

root@ok3576-buildroot:/# fio -filename=/dev /sda -direct=1 -iodepth

32 -thread -rw=write -bs=1024k -size=1G -numjobs=8 -runtime=180

 -group_reporting -name=seq_100write_1024k

seq_100write_1024k: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T)

1024KiB-1024KiB, ioengine=psync, iodepth=32

... fio-3.34 Starting 8 threads

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

Jobs: 8 (f=8): [W(8)][96.0%][w=359MiB/s][w=359 IOPS][eta 00m:01s] seq_100write_1024k: (groupid=0, jobs=8): err= 0: pid=1296: Thu Jan 1 00:01:32 1970

write: IOPS=332, BW=333MiB/s (349MB/s)(8192MiB/24631msec); 0 zone resets clat (msec): min=2, max=103, avg=23.55, stdev= 9.15 lat (msec): min=2, max=104, avg=23.77, stdev= 9.15

clat percentiles (msec):

 | 1.00th=[ 12], 5.00th=[ 14], 10.00th=[ 15], 20.00th=[ 16],

| 30.00th=[ 18], 40.00th=[ 20], 50.00th=[ 22], 60.00th=[ 25],

| 70.00th=[ 27], 80.00th=[ 31], 90.00th=[ 36], 95.00th=[ 41],

| 99.00th=[ 53], 99.50th=[ 59], 99.90th=[ 68], 99.95th=[ 73],

| 99.99th=[ 105] bw ( KiB/s): min=206590, max=432470, per=100.00%, avg=342387.68, stdev=7157.63, samples=385

iops : min= 196, max= 421, avg=331.98, stdev= 7.14, samples=385

lat (msec) : 4=0.11%, 10=0.49%, 20=42.49%, 50=55.44%, 100=1.45%

lat (msec) : 250=0.01%

cpu : usr=1.12%, sys=1.83%, ctx=18228, majf=0, minf=0

IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%

submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

issued rwts: total=0,8192,0,0 short=0,0,0,0 dropped=0,0,0,0

latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):

WRITE: bw=333MiB/s (349MB/s), 333MiB/s-333MiB/s (349MB/s-349MB/s), io=8192MiB (8590MB), run=24631-24631msec

Disk stats (read/write):

sda: ios=165/65464, merge=0/0, ticks=178/1074993, in_queue=1075171, util=99.64%

The print information is as described above, from which it can be known that the speed of sequential writing is 349 MB/s.

Sequential read test

root@ok3576-buildroot:/#fio -filename=/dev/sda -direct=1 -iodepth 32 -thread -rw=read-bs=1024k -size=1G -numjobs=8 -runtime=180 -group_reporting -name=seq_100read_1024k

seq_100read_1024k: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=32

...

fio-3.34

Starting 8 threads

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

Jobs: 8 (f=8): [R(8)][100.0%][r=756MiB/s][r=755 IOPS][eta 00m:00s] seq_100read_1024k: (groupid=0, jobs=8): err= 0: pid=1329: Thu Jan 1 00:08:54 1970

read: IOPS=754, BW=755MiB/s (791MB/s)(8192MiB/10857msec)

clat (usec): min=2331, max=16444, avg=10573.01, stdev=646.85

lat (usec): min=2335, max=16447, avg=10575.10, stdev=646.84

clat percentiles (usec):

| 1.00th=[ 9896], 5.00th=[10159], 10.00th=[10159], 20.00th=[10290],

| 30.00th=[10290], 40.00th=[10421], 50.00th=[10421], 60.00th=[10421],

| 70.00th=[ 10552], 80.00th=[ 10683], 90.00th=[ 10945], 95.00th=[ 12518],

| 99.00th=[ 13042], 99.50th=[ 13173], 99.90th=[ 13960], 99.95th=[ 15139],

| 99.99th=[16450]

bw ( KiB/s): min=762938, max=786629, per=100.00%, avg=772720.14, stdev=979.45, samples=168

iops : min= 740, max= 767, avg=749.19, stdev= 1.02, samples=168

lat (msec) : 4=0.01%, 10=1.65%, 20=98.34%

cpu : usr=0.37%, sys=3.81%, ctx=24750, majf=0, minf=2048

IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%

submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

issued rwts: total=8192,0,0,0 short=0,0,0,0 dropped=0,0,0,0

latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):

READ: bw=755MiB/s (791MB/s), 755MiB/s-755MiB/s (791MB/s-791MB/s), io=8192MiB (8590MB), run=10857-10857msec

Disk stats (read/write):

sda: ios=64132/0, merge=0/0, ticks=544319/0, in_queue=544320, util=99.26%

The print information is as described above, from which it can be known that the speed of sequential writing is 791 MB/s.

With the continuous development of embedded storage technology and the increasing richness of application scenarios, embedded storage has become indispensable in many fields such as smart homes, in-vehicle infotainment systems, and mobile devices. In the future, both eMMC and UFS will play irreplaceable roles in different application fields by virtue of their respective characteristics.
 
Last edited by a moderator:

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top