Design an edge accelerator for GenAI, based on https://dl.acm.org/doi/full/10.1145/3768168 Evaluate PPA by RTL at N7 for all necessary building blocks Evaluate and benchmark different memory options, including various eDRAM and eNVM In case of eDRAM, quantify the refresh overhead for various options Evaluate various 3D array design options by considering the integration scheme (bonding, monolithic, hybrid) and granularity of 3D connections Evaluate the benefits of 3D interconnect pitch scaling Analyze the viability of the low-power memory access schemes at extreme bandwidth proposed in the reference and offer solutions tailored for specific memory devices and arrays Quantify the differences between the new design and the analytical estimation in the reference Include workload-level energy and latency benchmarks for the considered technology options Show the pathway for upscaling to larger workloads either by larger monolithic dies or 2.5D chiplet integration