Memory copy without cache involved

icoming

Distinguished
Jan 25, 2011
7
0
18,510
0
Hello,

I'm searching for the way to do memory copy without cache involved. I want to copy data from one location of physical memory to another location of physical memory, i.e., I want real memory copy instead of virtual memory mapping. Does Intel processor provide instructions to do that? In some case the program just needs to do memory copy and the data isn't going to be used in the future, so there is no reason to copy it to the cache and pollute cache.

Thanks,
Da
 

Ijack

Distinguished
Ah, that's a different problem. You can use the MOVNTI instruction in conjunction with the SFENCE instruction to avoid polluting the data cache. But you have to balance that with the fact that these instructions are not as efficient as the MOVS ones. It would be an interesting experiment but I would guess that this inefficiency would outweigh any savings made by not having to reload data into the data cache. Bear in mind that the various pipelines in the processor operate in parallel, so the penalty of having to reload the cache may not be as significant as you think.

I'm fairly sure that the guys who wrote the GNU Standard C library have thought of these things. But only benchmarking tests could tell. Try writing a routine using these instructions to see if you can significantly improve on the library routine.
 

icoming

Distinguished
Jan 25, 2011
7
0
18,510
0
I agree that this is architecture specific. That's why I ask if Intel processors provide such instructions to do that. It's very likely only some of Intel processors can do it.
 

Ijack

Distinguished
Yes. Just set up entries in the Page Tables pointing to the locations in question and then use any of the MOVS instructions to do the move. Of course, you'll have to be running the processor at the highest privilege level - i.e. kernel code.
 

icoming

Distinguished
Jan 25, 2011
7
0
18,510
0
Do I need to set up the entries in the page table? Isn't it done automatically when the memory is allocated and accessed?
Why does the processor need to be at the privilege level?
 

Ijack

Distinguished
Because you are asking to move memory between physical memory locations rather than virtual ones. Only the kernel of the Operating System, running at the highest level, can access physical memory directly. User programs can only access it indirectly via the paging mechanism - that's what virtual memory mapping is.

To access the physical memory directly you need the appropriate entries in the Page Table pointing to that memory. Without manipulating the Page Tables you don't know what physical memory a virtual memory address is pointing at. Obviously only privileged code is allowed to manipulate those tables. If user programs could manipulate physical memory directly it would be possible to bypass security restrictions and it would make the Operating System unstable.
 

icoming

Distinguished
Jan 25, 2011
7
0
18,510
0
oh, sorry, then I didn't get my question clear. What I meant is that I need to do real memory copy instead of mapping two virtual memory addresses to the same physical memory. I don't need to address any specific physical memory addresses. The memory is still addressed with the virtual memory address.
 

icoming

Distinguished
Jan 25, 2011
7
0
18,510
0
What I really care is how to copy data without loading data to cache. I don't want cache to be polluted.

MOVS instructions will load data to the cache, right? I'm currently using memcpy to do memory copy, and it seems memcpy in GNU C library for x64 does use MOVS to copy data, and VTune shows me that data is loaded to cache when memcpy is used.
 

Ijack

Distinguished
Ah, that's a different problem. You can use the MOVNTI instruction in conjunction with the SFENCE instruction to avoid polluting the data cache. But you have to balance that with the fact that these instructions are not as efficient as the MOVS ones. It would be an interesting experiment but I would guess that this inefficiency would outweigh any savings made by not having to reload data into the data cache. Bear in mind that the various pipelines in the processor operate in parallel, so the penalty of having to reload the cache may not be as significant as you think.

I'm fairly sure that the guys who wrote the GNU Standard C library have thought of these things. But only benchmarking tests could tell. Try writing a routine using these instructions to see if you can significantly improve on the library routine.
 

icoming

Distinguished
Jan 25, 2011
7
0
18,510
0
I'm working on Atom processors, which uses in-order architecture. It runs at a fairly high frequency, but the memory bus is slow, so I'm thinking maybe I can improve the performance by avoiding polluting cache.
Thanks, I'll try.
 
Thread starter Similar threads Forum Replies Date
R Apps General Discussion 1
R Apps General Discussion 7
K Apps General Discussion 3
0 Apps General Discussion 1
B Apps General Discussion 5
F Apps General Discussion 5
D Apps General Discussion 1
C Apps General Discussion 1
M Apps General Discussion 1
S Apps General Discussion 3
J Apps General Discussion 4
I Apps General Discussion 1
L Apps General Discussion 2
I Apps General Discussion 1
S Apps General Discussion 3
S Apps General Discussion 7
M Apps General Discussion 1
A Apps General Discussion 3
B Apps General Discussion 6
R Apps General Discussion 1

ASK THE COMMUNITY