The Illusion of Security: How Data Duplication Skews Secret Detection

A new analysis reveals that commonly used datasets for evaluating secret detection models are riddled with duplicated data, leading to inflated performance scores and a false sense of security.
![A causal Wiener filter, implemented via a spectral transformation incorporating parameters [latex]\alpha = 0[/latex], [latex]\beta = 0.9[/latex], and [latex]\omega_0 = 5[/latex] rad/s, effectively estimates a scale-free signal [latex]S_{xx} = A\gamma^{2}/((|\omega|-\omega_{c})^{2}+\gamma^{2})[/latex]-where [latex]\gamma = 2\pi[/latex] rad/s, [latex]A = 0.9[/latex], and [latex]\omega_c = 10 \cdot 2\pi[/latex] rad/s-from noisy measurements characterized by a power spectral density of [latex]S_{nn} = 5/\omega^{1.8} + 0.01[/latex], achieving performance comparable to a non-causal Wiener filter with relative error power spectral densities demonstrably reduced through Welch’s method averaging across approximately 250 logarithmically spaced bins.](https://arxiv.org/html/2601.22294v1/x1.png)




