William Overman
  • Bio
  • Papers
  • Recent & Upcoming Talks
    • Example Talk
  • Publications
    • Can We Validate Counterfactual Estimations in the Presence of General Network Interference?
    • Aligning Model Properties via Conformal Risk Control
    • Higher-Order Causal Message Passing for Experimentation with Complex Interference
    • Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism
    • Occupancy Prediction with Patient Data: Evaluating Time-Series, Patient-Level Aggregation, and Deep Set Models
    • Beating Price of Anarchy and Gradient Descent without Regret in Potential Games
    • Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games
    • Independent Natural Policy Gradient always converges in Markov Potential Games
    • Some Ordered Ramsey Numbers of Graphs on Four Vertices
  • Projects
  • Blog
    • ๐ŸŽ‰ Easily create your own simple yet highly customizable blog
    • ๐Ÿง  Sharpen your thinking with a second brain
    • ๐Ÿ“ˆ Communicate your results effectively with the best data visualizations
    • ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿซ Teach academic courses
    • โœ… Manage your projects
  • Projects
    • Pandas
    • PyTorch
    • scikit-learn
  • Experience
  • Teaching
    • Learn JavaScript
    • Learn Python

Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism

Oct 4, 2024ยท
Kihyun Yu
,
Duksang Lee
,
William Overman
,
Dabeen Lee
ยท 0 min read
Cite arXiv
Type
Manuscript
Last updated on Oct 4, 2024

← Higher-Order Causal Message Passing for Experimentation with Complex Interference Nov 1, 2024
Occupancy Prediction with Patient Data: Evaluating Time-Series, Patient-Level Aggregation, and Deep Set Models Feb 1, 2024 →

ยฉ 2025 Me. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder โ€” the free, open source website builder that empowers creators.