Birdie的博客 | Birdie Blog

TODO Enhancing LLM Alignment with Ternary Preferences

ICLR2025 偏好优化平局

偏好优化理论基础 BT模型 BT模型是一种用于表示实例、团队或对象之间成对比较结果的概率模型。它估计了排序关系 $i \succ j$ 为真的概率，其中符号 $\succ$ 表示偏好或排序关系，例如实例 $i$ 被偏好于 $j$。 BT 模型的计算如下所示，其中两个竞争者的正向强度分别表示为 $\lambda_1$ 和 $\lambda_2$，$r_{1,2}$ 表示第一个竞争者在比...

Posted by Birdie on May 30, 2025

GLOP Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time

AAAI2024 通过分治将大规模/多任务变成多个开源TSP

GLOP: Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time AAAI2024 开源：https://github.com/henry-yeh/GLOP 摘要最近的端到端神经求解器在处理小型路线规划问题上展现出了潜力，但在实...

Posted by Birdie on April 30, 2025

PolyNet Learning Diverse Solution Strategies for Neural Combinatorial Optimization

ICLR2025 扩大搜索空间

PolyNet: Learning Diverse Solution Strategies for Neural Combinatorial Optimization ICLR2025 Bielefeld University，Germany 摘要基于强化学习的方法在构建组合优化问题的解决方案方面，正迅速接近人工设计算法的性能水平。为了进一步缩小这一差距，基于学习的方法必须在搜索过...

Posted by Birdie on April 29, 2025

Rl4co an extensive reinforcement learning for combinatorial optimization benchmark

ICLR2024 通用、全面的RL4CO库

Rl4co: an extensive reinforcement learning for combinatorial optimization benchmark 一个用RL解决CO问题的算法库开源：ai4co/rl4co: A PyTorch library for all things Reinforcement Learning (RL) for Combinatoria...

Posted by Birdie on April 28, 2025

INViT A Generalizable Routing Problem Solver with Invariant Nested View Transformer

ICML2024 大规模+多分布

INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer 上海交通大学密歇根联合研究所，昆山杜克大学 ICML2024 开源：Kasumigaoka-Utaha/INViT: Official Implementation of the paper: INViT: A Gen...

Posted by Birdie on April 28, 2025

Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed

AAAI2024 引导知识蒸馏非自回归模型

Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed AAAI2024 省流就看方法论里面的那张图，做了一个从自回归引导到非自回归的知识蒸馏。摘...

Posted by Birdie on April 28, 2025

Learning What to Defer for Maximum Independent Sets

ICML2020 延迟决策，鼓励解多样性

Learning What to Defer for Maximum Independent Sets ICML2020 KAIST 摘要设计高效的组合优化算法在各个科学领域中普遍存在。近年来，深度强化学习（DRL）框架作为一种新方法受到了相当多的关注：它们可以在依赖较少目标问题的专业知识的情况下自动化求解器的设计。然而，现有的DRL求解器通过与解中元素数量成比例的阶段数来确定解，...

Posted by Birdie on April 27, 2025

Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

Arxiv2024.10 经验池

Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization Arxiv 2024.10.7 InstaDeep 开源：instadeepai/memento: Official Implementation of Memento 摘要组合优化对许多现实世界的应用至关重要...

Posted by Birdie on April 25, 2025

Enhancing LLM Safety via Constrained Direct Preference Optimization

ICLR2024 受约束的直接偏好优化

Enhancing LLM Safety via Constrained Direct Preference Optimization 杜兰大学 ICLR 2024 摘要随着大型语言模型（LLMs）能力的迅速提升，迫切需要将 AI 系统与多样化的人类偏好对齐，以同时增强其有用性和安全性。尽管这些目标往往存在冲突，但实现对齐是至关重要的。为了应对这一挑战，一种有前景的方法是在微调阶段...

Posted by Birdie on April 9, 2025

A Machine Learning Approach to Solve the E-commerce Box-Sizing Problem

装箱问题设计最优的包装盒尺寸

A Machine Learning Approach to Solve the E-commerce Box-Sizing Problem 来自：印度发表：Production and Operations Management 投稿时间：2023.11，接受时间：2024.8 研究背景与问题如何为电子商务平台上的大量商品（SKU）设计一组最优的包装盒尺寸，以最大化空间利用率...

Posted by Birdie on March 9, 2025

Birdie Blog

TODO Enhancing LLM Alignment with Ternary Preferences

ICLR2025 偏好优化平局

GLOP Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time

AAAI2024 通过分治将大规模/多任务变成多个开源TSP

PolyNet Learning Diverse Solution Strategies for Neural Combinatorial Optimization

ICLR2025 扩大搜索空间

Rl4co an extensive reinforcement learning for combinatorial optimization benchmark

ICLR2024 通用、全面的RL4CO库

INViT A Generalizable Routing Problem Solver with Invariant Nested View Transformer

ICML2024 大规模+多分布

Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed

AAAI2024 引导知识蒸馏非自回归模型

Learning What to Defer for Maximum Independent Sets

ICML2020 延迟决策，鼓励解多样性

Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

Arxiv2024.10 经验池

Enhancing LLM Safety via Constrained Direct Preference Optimization

ICLR2024 受约束的直接偏好优化

A Machine Learning Approach to Solve the E-commerce Box-Sizing Problem

装箱问题设计最优的包装盒尺寸

FEATURED TAGS

ABOUT ME

TODO Enhancing LLM Alignment with Ternary Preferences

ICLR2025 偏好优化 平局

GLOP Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time

AAAI2024 通过分治将大规模/多任务变成多个开源TSP

PolyNet Learning Diverse Solution Strategies for Neural Combinatorial Optimization

ICLR2025 扩大搜索空间

Rl4co an extensive reinforcement learning for combinatorial optimization benchmark

ICLR2024 通用、全面的RL4CO库

INViT A Generalizable Routing Problem Solver with Invariant Nested View Transformer

ICML2024 大规模+多分布

Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed

AAAI2024 引导知识蒸馏 非自回归模型

Learning What to Defer for Maximum Independent Sets

ICML2020 延迟决策，鼓励解多样性

Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

Arxiv2024.10 经验池

Enhancing LLM Safety via Constrained Direct Preference Optimization

ICLR2024 受约束的直接偏好优化

A Machine Learning Approach to Solve the E-commerce Box-Sizing Problem

装箱问题 设计最优的包装盒尺寸

FEATURED TAGS

ABOUT ME

ICLR2025 偏好优化平局

AAAI2024 引导知识蒸馏非自回归模型

装箱问题设计最优的包装盒尺寸