DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,ebony Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
2025-06-27 03:10
1518 views
Best air purifier deal: Save $300 on the Dyson HEPA Big + Quiet air purifier
SAVE $300: The Dyson HEPA Big + Quiet air purifier with formaldehyde filtration (BP06) is on sale at
Read More
2025-06-27 03:02
2674 views
5 potential Trump theme songs as sung by his lawyer's '70s rock cover band
When you work for the Trump administration, prepare to be haunted by your past.There was Sean Spicer
Read More
2025-06-27 02:22
2254 views
Trump is running attack ads against TikTok now
Move over, Huawei. Beat it, Twitter. The Trump administration has anointed a new tech bogeyman: the
Read More