Memory Hierarchy And Entry Time - Sand Software And Sound
This web page takes a closer look at the Raspberry Pi memory hierarchy. Every level of the memory hierarchy has a capacity and velocity. Capacities are comparatively easy to find by querying the operating system or reading the ARM1176 technical reference MemoryWave Guide. Pace, nevertheless, is just not as easy to find and must usually be measured. I take advantage of a easy pointer chasing approach to characterize the conduct of each stage in the hierarchy. The technique also reveals the behavior Memory Wave of memory-related performance counter events at every level. The Raspberry Pi implements five ranges in its memory hierarchy. The degrees are summarized within the desk under. The best stage consists of digital memory pages which can be maintained in secondary storage. Raspbian Wheezy keeps its swap space within the file /var/swap on the SDHC card. This is sufficient house for 25,600 4KB pages. You're allowed as many pages as will fit into the preallocated swap house.
The Raspberry Pi has both 256MB (Mannequin A) or 512MB (Model B) of main memory. This is enough space for 65,536 pages or 131,072 bodily pages, if all of primary memory have been accessible for paging. It isn’t all available for user-space applications because the Linux kernel wants area for its personal code and MemoryWave Guide knowledge. Linux also supports massive pages, however that’s a separate topic for now. The vmstat command displays information about digital memory usage. Please refer to the man web page for usage. Vmstat is a good software for troubleshooting paging-related efficiency issues since it shows page in and out statistics. The processor within the Raspberry Pi is the Broadcom BCM2835. The BCM2835 does have a unified degree 2 (L2) cache. Nonetheless, the L2 cache is dedicated to the VideoCore GPU. Memory references from the CPU facet are routed around the L2 cache. The BCM2835 has two stage 1 (L1) caches: a 16KB instruction cache and a 16KB information cache.
Our analysis beneath concentrates on the data cache. The information cache is 4-approach set associative. Each means in an associative set stores a 32-byte cache line. The cache can handle as much as four active references to the identical set without battle. If all four ways in a set are legitimate and a fifth reference is made to the set, then a battle occurs and one of the 4 ways is victimized to make room for the new reference. The data cache is virtually indexed and bodily tagged. Cache lines and tags are stored separately in DATARAM and TAGRAM, respectively. Digital address bits 11:5 index the TAGRAM and Memory Wave DATARAM. Given a 16KB capability, 32 byte lines and 4 methods, there should be 128 units. Virtual tackle bits 4:0 are the offset into the cache line. The data MicroTLB translates a digital deal with to a physical handle and sends the bodily deal with to the L1 information cache.
The L1 knowledge cache compares the physical tackle with the tag and determines hit/miss standing and the right way. The load-to-use latency is three (3) cycles for an L1 data cache hit. The BCM2835 implements a two level translation lookaside buffer (TLB) construction for virtual to physical deal with translation. There are two MicroTLBs: a ten entry knowledge MicroTLB and a ten entry instruction MicroTLB. The MicroTLBs are backed by the main TLB (i.e., the second stage TLB). The MicroTLBs are absolutely associative. Every MicroTLB interprets a virtual address to a physical handle in one cycle when the web page mapping information is resident within the MicroTLB (that is, successful within the MicroTLB). The principle TLB is a unified TLB that handles misses from the instruction and information MicroTLBs. A 64-entry, 2-means associative construction. Primary TLB misses are handled by a hardware web page desk walker. A page table stroll requires a minimum of one extra memory entry to seek out the page mapping information in main memory.