linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-30 00:29:35 +08:00

Author	SHA1	Message	Date
Jakub Kicinski	67cfdd9210	ethtool: eeprom: add more safeties to EEPROM Netlink fallback The Netlink fallback path for reading module EEPROM (fallback_set_params()) validates that offset < eeprom_len, but does not check that offset + length stays within eeprom_len. The ioctl equivalent (ethtool_get_any_eeprom() in ioctl.c) has always enforced both bounds: if (eeprom.offset + eeprom.len > total_len) return -EINVAL; This could lead to surprises in both drivers and device FW. Add the missing offset + length validation to fallback_set_params(), mirroring the ioctl. Similarly - ethtool core in general, and ethtool_get_any_eeprom() in particular tries to zero-init all buffers passed to the drivers to avoid any extra work of zeroing things out. eeprom_fallback() uses a plain kmalloc(), change it to zalloc. Fixes: `96d971e307` ("ethtool: Add fallback to get_module_eeprom from netlink command") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260526153533.2779187-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-27 17:42:09 -07:00
Jakub Kicinski	2376586f85	ethtool: eeprom: add missing ethnl_ops_begin() / _complete() during fallback All ethtool driver op calls should be sandwiched between ethnl_ops_begin() / ethnl_ops_complete(). In Netlink eeprom code, if the paged access failed we fall back to old API, but we first call _complete() and the fallback never does its own ethnl_ops_begin(). Move the fallback into the _begin() / _complete() section. Fixes: `96d971e307` ("ethtool: Add fallback to get_module_eeprom from netlink command") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260526153533.2779187-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-27 17:42:09 -07:00
Jakub Kicinski	a8d8bef6b4	ethtool: strset: fix header attribute index in ethnl_req_get_phydev() strset_prepare_data() passes ETHTOOL_A_HEADER_FLAGS (3) as the header attribute to ethnl_req_get_phydev(). This is incorrect, in the main attr space 3 is ETHTOOL_A_STRSET_COUNTS_ONLY, not the request header attr. The correct constant is ETHTOOL_A_STRSET_HEADER (1). ethnl_req_get_phydev() only uses this value for the extack, so this is not a "functionally visible"(?) bug. Fixes: `e96c93aa4b` ("net: ethtool: strset: Allow querying phy stats by index") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260526153533.2779187-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-27 17:42:09 -07:00
Jakub Kicinski	c3fc9976f6	ethtool: tsinfo: don't pass ERR_PTR to genlmsg_cancel on prepare failure The goto err label leads to: genlmsg_cancel(skb, ehdr); return ret; If ethnl_tsinfo_prepare_dump() failed, it has not started a genlmsg. There's nothing to cancel, and passing an error pointer to genlmsg_cancel() would cause a crash. Fixes: `b9e3f7dc9e` ("net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20260526153533.2779187-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-27 17:42:08 -07:00
Jakub Kicinski	1de405699c	ethtool: tsinfo: fix uninitialized stats on the by-PHC path tsinfo_prepare_data() has two code paths: a "by-PHC" path for user-specified hardware timestamping providers, and the old path. Commit `89e281ebff` ("ethtool: init tsinfo stats if requested") added ethtool_stats_init() to mark stat slots as ETHTOOL_STAT_NOT_SET before the driver callback populates them, but placed the call inside the old-path block. When commit `b9e3f7dc9e` ("net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology") added the by-PHC early return, it landed above the stats initialization. On that path the stats array retains the zero-fill from ethnl_init_reply_data()'s zalloc. This leads to the reply including a stats nest with four zero-valued attributes that should have been absent. Reject GET requests for stats with HWTSTAMP_PROVIDER or dump. Fixes: `b9e3f7dc9e` ("net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260526153533.2779187-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-27 17:42:08 -07:00
Jakub Kicinski	6386bd772d	ethtool: tsconfig: fix missing ethnl_ops_complete() tsconfig_prepare_data() calls ethnl_ops_begin(), we need to call ethnl_ops_complete() before returning the error. Fixes: `6e9e2eed4f` ("net: ethtool: Add support for tsconfig command to get/set hwtstamp config") Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20260526153533.2779187-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-27 17:42:08 -07:00
Jakub Kicinski	ab5bf428fb	ethtool: pse-pd: fix missing ethnl_ops_complete() pse_prepare_data() is missing ethnl_ops_complete() if ethnl_req_get_phydev() returned an error. Move getting phydev up so that we don't have to worry about this (similar order to linkstate_prepare_data()). Note that phydev may still be NULL (this is checked in pse_get_pse_attributes()), the goal isn't really to avoid the _begin() / _complete() calls, only to simplify the error handling. While at it propagate the original error. Why this code overrides the error with -ENODEV but !phydev generates -EOPNOTSUPP is unclear to me... Fixes: `31748765be` ("net: ethtool: pse-pd: Target the command to the requested PHY") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260526153533.2779187-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-27 17:42:08 -07:00
Jakub Kicinski	596c51ed9e	ethtool: linkstate: fix unbalanced ethnl_ops_complete() on PHY lookup error linkstate_prepare_data() calls ethnl_req_get_phydev() before ethnl_ops_begin(), but routes its error path through "goto out" which calls ethnl_ops_complete(). Fixes: `fe55b1d401` ("ethtool: linkstate: migrate linkstate functions to support multi-PHY setups") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260526153533.2779187-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-27 17:42:08 -07:00
Jakub Kicinski	a888bbd439	ethtool: tsconfig: fix reply error handling A couple of trivial bugs in error handling in tsconfig_send_reply(). If we failed to allocate rskb we need to set the error. If we did allocate it but failed to send it - we need to remember to free it. Fixes: `6e9e2eed4f` ("net: ethtool: Add support for tsconfig command to get/set hwtstamp config") Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20260526153533.2779187-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-27 17:42:08 -07:00
Jakub Kicinski	7281b096b0	ethtool: coalesce: cap profile updates at NET_DIM_PARAMS_NUM_PROFILES ethnl_update_profile() walks the ETHTOOL_A_PROFILE_IRQ_MODERATION nest list with an index 'i' and writes new_profile[i++] without bounding i. The destination is kmemdup()'d at NET_DIM_PARAMS_NUM_PROFILES entries (5), but the Netlink nest count is entirely user-controlled. Netlink policies do not have support for constraining the number of nested entries (or number of multi-attr entries). Fixes: `f750dfe825` ("ethtool: provide customized dim profile management") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260526153533.2779187-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-27 17:42:07 -07:00
Jakub Kicinski	d5551f4c18	ethtool: cmis: validate fw->size against start_cmd_payload_size cmis_fw_update_start_download() copies start_cmd_payload_size bytes from the firmware blob into the CDB LPL vendor_data[] payload without validating that the FW has enough data. Since the start_cmd_payload_size can only be ~120B an image too short is most likely corrupted, so reject it. Fixes: `c4f78134d4` ("ethtool: cmis_fw_update: add a layer for supporting firmware update using CDB") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://patch.msgid.link/20260522231312.1710836-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:19:33 -07:00
Jakub Kicinski	12c2496a71	ethtool: cmis: validate start_cmd_payload_size from module The CMIS firmware update code reads start_cmd_payload_size from the module's FW Management Features CDB reply and uses it directly as the byte count for memcpy. The destination buffer is 112 bytes (ETHTOOL_CMIS_CDB_LPL_MAX_PL_LENGTH - 8). So a malicious module (or corrupted response) can cause a OOB write later on in cmis_fw_update_start_download(). Let's error out. If modules that expect longer LPL writes actually exist we should revisit. struct cmis_cdb_start_fw_download_pl's definition has to move, no change there. Fixes: `c4f78134d4` ("ethtool: cmis_fw_update: add a layer for supporting firmware update using CDB") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://patch.msgid.link/20260522231312.1710836-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:19:33 -07:00
Jakub Kicinski	3e8c3d464c	ethtool: cmis: fix u16-to-u8 truncation of msleep_pre_rpl ethtool_cmis_cdb_compose_args() accepts msleep_pre_rpl as u16 but stores it into the u8 field ethtool_cmis_cdb_cmd_args::msleep_pre_rpl, silently truncating values >= 256. Seven of the nine call sites pass 1000 ms (it's the third argument from the end). Fixes: `a39c84d796` ("ethtool: cmis_cdb: Add a layer for supporting CDB commands") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://patch.msgid.link/20260522231312.1710836-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:19:33 -07:00
Jakub Kicinski	6c3f999a9d	ethtool: cmis: require exact CDB reply length Malicious SFP module could respond with rpl_len longer than what cmis_cdb_process_reply() expected, leading to OOB writes. Malicious HW is a bit theoretical but some modules may just be buggy and/or the reads may occasionally get corrupted, so let's protect the kernel. The existing check protects from short replies. We need to protect from long ones, too. All callers that pass a non-zero rpl_exp_len cast the reply payload to a fixed-layout struct and read fields at fixed offsets, with no version negotiation or short-reply handling: - cmis_cdb_validate_password() - cmis_cdb_module_features_get() - cmis_fw_update_fw_mng_features_get() so let's assume that responses longer than expected do not have to be handled gracefully here. Add a warning message to make the debug easier in case my understanding is wrong... Note that page_data->length (argument of kmalloc) comes from last arg to ethtool_cmis_page_init() which is rpl_exp_len. Note2 that AIs also like to point out overflows in args->req.payload itself (which is a fixed-size 120 B buffer, on the stack), but callers should be reading structs defined by the standard, so protecting from requests for more data than max seem like defensive programming. Fixes: `a39c84d796` ("ethtool: cmis_cdb: Add a layer for supporting CDB commands") Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://patch.msgid.link/20260522231312.1710836-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:19:33 -07:00
Jakub Kicinski	760d04ebad	ethtool: module: fix cleanup if socket used for flashing multiple devices When a single Netlink socket issues MODULE_FW_FLASH_ACT against multiple devices, ethnl_sock_priv_set() overwrites sk_priv->dev on each call, retaining only the last one. The socket priv is used on socket close, to walk the global work list and mark the uncompleted flashing work as "orphaned". Otherwise if another socket reuses the PID it will unexpectedly receive the flashing notifications. Don't record the device, record net pointer instead. The purpose of the dev is to scope the work to a netns, anyway. If we store netns the overrides are safe/a nop since all flashed devices must be in the same netns as the socket. Fixes: `32b4c8b53e` ("ethtool: Add ability to flash transceiver modules' firmware") Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://patch.msgid.link/20260522231312.1710836-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:19:32 -07:00
Jakub Kicinski	504eaefa44	ethtool: module: check fw_flash_in_progress under rtnl_lock ethnl_set_module_validate() inspects module_fw_flash_in_progress but validate is meant for _input_ validation, not state validation. rtnl_lock is not held, yet. Move the check into ethnl_set_module(). Fixes: `32b4c8b53e` ("ethtool: Add ability to flash transceiver modules' firmware") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://patch.msgid.link/20260522231312.1710836-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:19:32 -07:00
Jakub Kicinski	7a84b965ff	ethtool: module: avoid racy updates to dev->ethtool bitfield When reviewing other changes Gemini points out that we currently update module_fw_flash_in_progress without holding any locks. Since module_fw_flash_in_progress is part of a bitfield this is not great, updates to other fields may be lost. We could use a bool and sprinkle some READ_ONCE/WRITE_ONCE here but seems like the issue is rather than the work is an unusual writer. The other writers already hold the right locks. So just very briefly take these locks when the work completes. Note that nothing ever cancels the FW update work, so there's no concern with deadlocks vs cancel. Fixes: `32b4c8b53e` ("ethtool: Add ability to flash transceiver modules' firmware") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://patch.msgid.link/20260522231312.1710836-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:19:32 -07:00
Jakub Kicinski	fb7f511d62	ethtool: module: avoid leaking a netdev ref on module flash errors module_flash_fw_schedule() is missing undo for setting the "in_progress" flag and taking the netdev reference. Delay taking these, the device can't disappear while we are holding rtnl_lock. Fixes: `32b4c8b53e` ("ethtool: Add ability to flash transceiver modules' firmware") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://patch.msgid.link/20260522231312.1710836-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:19:32 -07:00
Jakub Kicinski	84371fb584	ethtool: module: call ethnl_ops_complete() on module flash errors When validate() fails we are skipping over ethnl_ops_complete() even tho we already called ethnl_ops_begin(). Fixes: `32b4c8b53e` ("ethtool: Add ability to flash transceiver modules' firmware") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://patch.msgid.link/20260522231312.1710836-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:19:32 -07:00
Jakub Kicinski	32a9ecde62	ethtool: rss: avoid device context leak on reply-build failure We wait with filling the reply for new RSS context creation until after the driver ->create_rxfh_context call. The driver needs to fill some of the defaults in the context. The failure of rss_fill_reply() is somewhat theoretical, but doesn't take much effort to handle it properly. Call ->remove_rxfh_context(). If the driver's remove callback fails (some implementations like sfc can return real command errors from firmware RPCs) - skip the xa_erase and kfree, leaving the context in the xarray. This matches how ethnl_rss_delete_doit() behaves. Fixes: `a166ab7816` ("ethtool: rss: support creating contexts via Netlink") Link: https://patch.msgid.link/20260522230647.1705600-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:17:57 -07:00
Jakub Kicinski	78ccf1a70c	ethtool: rss: fix hkey leak when indir_size is 0 rss_get_data_alloc() allocates a single buffer that backs both the indirection table and the hash key, but only assigned data->indir_table when indir_size was nonzero. The expectation was that no driver implements RSS without supporting indirection table but apparently enic does just that (it's the only such in-tree driver). enic has get_rxfh_key_size but no get_rxfh_indir_size. data->indir_table stays as NULL, hkey gets set but rss_get_data_free() kfree(data->indir_table) is a nop and the allocation leaks. Always store the allocation base in data->indir_table so the free path is unambiguous. No caller treats indir_table as a sentinel; everything keys off indir_size. Fixes: `7112a04664` ("ethtool: add netlink based get rss support") Link: https://patch.msgid.link/20260522230647.1705600-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:17:57 -07:00
Jakub Kicinski	266297692f	ethtool: rss: fix indir_table and hkey leak on get_rxfh failure rss_prepare_get() allocates the indirection table and hash key buffer via rss_get_data_alloc(), then calls ops->get_rxfh() to populate them. If get_rxfh() fails, the function returns an error without freeing the allocation. Fixes: `4f038a6a02` ("net: ethtool: Don't call .cleanup_data when prepare_data fails") Link: https://patch.msgid.link/20260522230647.1705600-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:17:56 -07:00
Jakub Kicinski	8d60141a32	ethtool: rss: fix falsely ignoring indir table updates rss_set_prep_indir() compares the new indirection table against the current one to determine whether any update is needed. The memcmp call passes data->indir_size as the length argument, but indir_size is the number of u32 entries, not the byte count. Fixes: `c0ae03588b` ("ethtool: rss: initial RSS_SET (indirection table handling)") Link: https://patch.msgid.link/20260522230647.1705600-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:17:56 -07:00
Jakub Kicinski	3e6c6e9782	ethtool: rss: add missing errno on RSS context delete Remember to set ret before jumping out if someone tries to delete a context on a device which doesn't support contexts. Fixes: `fbe09277fa` ("ethtool: rss: support removing contexts via Netlink") Link: https://patch.msgid.link/20260522230647.1705600-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:17:56 -07:00
Jakub Kicinski	c75b6f6eaa	ethtool: rss: avoid modifying the RSS context response Gemini says that we're modifying the RSS_CREATE response skb. I think it's right, the comment says that unicast() should unshare the skb but I'm not entirely sure what I meant there. netlink_trim() does a copy but only if skb is not well sized (it's at least 2x larger than necessary for the payload). Fixes: `a166ab7816` ("ethtool: rss: support creating contexts via Netlink") Link: https://patch.msgid.link/20260522230647.1705600-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-26 08:17:56 -07:00
Chenguang Zhao	3d042592eb	ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics ethnl_bitmap32_not_zero() should return true if some bit in [start, end) is set: - Fix inverted memchr_inv() sense: return true when the scan finds a non-zero byte, not when the middle words are all zero. - Return false for an empty interval (end <= start). - When end is 32-bit aligned, indices in [start, end) do not include any bits from map[end_word]; return false after earlier checks found no non-zero data. Fixes: `10b518d4e6` ("ethtool: netlink bitset handling") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-12 18:45:13 -07:00
David Carlier	e3adf69f8e	net: ethtool: phy: avoid NULL deref when PHY driver is unbound phydev->drv can become NULL while the phy_device is still attached to its net_device, namely after the PHY driver is unbound via sysfs: echo <mdio_id> > /sys/bus/mdio_bus/drivers/<phy_drv>/unbind phy_remove() clears phydev->drv but doesn't call phy_detach(), so the phy_device stays in the link topology xarray and ethnl_req_get_phydev() still hands it back. ETHTOOL_MSG_PHY_GET then oopses on: rep_data->drvname = kstrdup(phydev->drv->name, GFP_KERNEL); drvname is already treated as optional by phy_reply_size(), phy_fill_reply() and phy_cleanup_data(), so just skip the allocation when there is no driver bound. Fixes: `9dd2ad5e92` ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump") Cc: stable@vger.kernel.org # 6.13.x Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260509215046.107157-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-12 17:45:12 -07:00
Quan Sun	4908f1395f	net: ethtool: fix NULL pointer dereference in phy_reply_size In phy_prepare_data(), several strings such as 'name', 'drvname', 'upstream_sfp_name', and 'downstream_sfp_name' are allocated using kstrdup(). However, these allocations were not checked for failure. If kstrdup() fails for 'name', it returns NULL while the function continues. This leads to a kernel NULL pointer dereference and panic later in phy_reply_size() when it unconditionally calls strlen() on the NULL pointer. While other strings like 'upstream_sfp_name' might be checked before access in certain code paths, failing to handle these allocations consistently can lead to incomplete data reporting or hidden bugs. Fix this by adding proper NULL checks for all kstrdup() calls in phy_prepare_data() and implement a centralized error handling path using goto labels to ensure all previously allocated resources are freed on failure. Fixes: `9dd2ad5e92` ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump") Signed-off-by: Quan Sun <2022090917019@std.uestc.edu.cn> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260507131738.1173835-1-2022090917019@std.uestc.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-08 15:30:10 -07:00
Hangbin Liu	b2fb1a3363	ethtool: strset: check nla_len overflow The netlink attribute length field nla_len is a __u16, which can only represent values up to 65535 bytes. NICs with a large number of statistics strings (e.g. mlx5_core with thousands of ETH_SS_STATS entries) can produce a ETHTOOL_A_STRINGSET_STRINGS nest that exceeds this limit. When nla_nest_end() writes the actual nest size back to nla_len, the value is silently truncated. This results in a corrupted netlink message being sent to userspace: the parser reads a wrong (truncated) attribute length and misaligns all subsequent attribute boundaries, causing decode errors. Fix this by using the new helper nla_nest_end_safe and error out if the size exceeds U16_MAX. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-5-7623a5e8f70b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:23:50 -07:00
Daniel Borkmann	22fdf28f7c	net, ethtool: Disallow leased real rxqs to be resized Similar to AF_XDP, do not allow queues in a physical netdev to be resized by ethtool -L when they are leased. Cover channel resize paths (both netlink and ioctl) to reject resizing when the queues would be affected. Given we need to have different checks for RX vs TX, detangle the code into a two-loop version rather than the range of new_combined + min(new_rx, new_tx) to old_combined + max(old_rx, old_tx). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402231031.447597-5-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 18:21:46 -07:00
Maxime Chevallier	10171b9383	net: ethtool: pass genl_info to the ethnl parse_request operation The .parse_request() ethnl operation extracts the relevant attributes from the netlink request to populate the private req_info. By passing genl_info as a parameter to this callback, we can use the GENL_REQ_ATTR_CHECK() macro to check for missing mandatory parameters. This macro has the advantage of returning a better error explanation through the netlink_ext_ack struct. Convert the eeprom ethnl code to this macro, as it's the only command yet that has mandatory request parameters. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260323095833.136266-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-24 20:14:29 -07:00
Björn Töpel	02bcb20083	ethtool: Add RSS indirection table resize helpers The core locks ctx->indir_size when an RSS context is created. Some NICs (e.g. bnxt) change their indirection table size based on the channel count, because the hardware table is a shared resource. This forces drivers to reject channel changes when RSS contexts exist. Add driver helpers to resize indirection tables: ethtool_rxfh_indir_can_resize() checks whether the default context indirection table can be resized. ethtool_rxfh_indir_resize() resizes the default context table in place. Folding (shrink) requires the table to be periodic at the new size; non-periodic tables are rejected. Unfolding (grow) replicates the existing pattern. Sizes must be multiples of each other. ethtool_rxfh_ctxs_can_resize() validates all non-default RSS contexts can be resized. ethtool_rxfh_ctxs_resize() applies the resize. Signed-off-by: Björn Töpel <bjorn@kernel.org> Link: https://patch.msgid.link/20260320085826.1957255-3-bjorn@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-23 17:59:54 -07:00
Björn Töpel	0475f9e779	ethtool: Track user-provided RSS indirection table size Track the number of indirection table entries the user originally provided (context 0/default as well!). Replace IFF_RXFH_CONFIGURED with rss_indir_user_size: the flag is redundant now that user_size captures the same information. Add ethtool_rxfh_indir_lost() for drivers that must reset the indirection table. Convert bnxt and mlx5 to use it. Signed-off-by: Björn Töpel <bjorn@kernel.org> Link: https://patch.msgid.link/20260320085826.1957255-2-bjorn@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-23 17:59:53 -07:00
Maxime Chevallier	82a5852595	net: ethtool: re-order local includes Most local #include in the ethtool command handling is out of order, with either : #include "netlink.h" #include "common.h" or even : #include "netlink.h" #include "common.h" #include "bitset.h" One of the reasons is because bitset.h s lacking definitions for nlattr, netlink_ext_ack, ETH_GSTRING_LEN, and types such as u32, bool, etc. Make bitset.h standalone by including <linux/ethtool.h> for ETH_GSTRING_LEN, and <linux/netlink.h> for nlattr, netlink_ext_ack and the rest. While at it, take a pass on ethnl sources to re-order the local includes : - put them after the global includes - add a newline between global and local includes - alpha-sort the local includes One notable exception is the cmis.h include, that needs definitions from module_fw.h. Keep them in this order for now. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260319180555.1531386-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-20 19:10:18 -07:00
Haiyang Zhang	dc3d720e12	net: ethtool: add ethtool COALESCE_RX_CQE_FRAMES/NSECS Add two parameters for drivers supporting Rx CQE coalescing / descriptor writeback. ETHTOOL_A_COALESCE_RX_CQE_FRAMES: Maximum number of frames that can be coalesced into a CQE or writeback. ETHTOOL_A_COALESCE_RX_CQE_NSECS: Max time in nanoseconds after the first packet arrival in a coalesced CQE or writeback to be sent. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260317191826.1346111-2-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-18 20:01:10 -07:00
Mohsin Bashir	cc39325f92	net: ethtool: Track pause storm events With TX pause enabled, if a device is unable to pass packets up to the stack (e.g., CPU is hanged), the device can cause pause storm. Given that devices can have native support to protect the neighbor from such flooding, such events need some tracking. This support is to track TX pause storm events for better observability. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-2-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-05 16:26:52 +01:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/\(alloc_objs(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Jakub Kicinski	a182a62ff7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.19-rc9). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c `3125fc1701` ("net: spacemit: k1-emac: fix jumbo frame support") `f66086798f` ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aYIysFIE9ooavWia@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 09:54:08 -08:00
Jakub Kicinski	1c172febdf	net: rss: fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for contexts Initializing input_xfrm to RXH_XFRM_NO_CHANGE in RSS contexts is problematic. I think I did this to make it clear that the context does not have its own settings applied. But unlike ETH_RSS_HASH_NO_CHANGE which is zero, RXH_XFRM_NO_CHANGE is 0xff. We need to be careful when reading the value back, and remember to treat 0xff as 0. Remove the initialization and switch to storing 0. This lets us also remove the workaround in ethnl_rss_set(). Get side does not need any adjustments and context get no longer reports: RSS input transformation: symmetric-xor: on symmetric-or-xor: on Unknown bits in RSS input transformation: 0xfc for NICs which don't support input_xfrm. Remove the init of hfunc to ETH_RSS_HASH_NO_CHANGE while at it. As already mentioned this is a noop since ETH_RSS_HASH_NO_CHANGE is 0 and struct is zalloc'd. But as this fix exemplifies storing NO_CHANGE as state is fragile. This issue is implicitly caught by running our selftests because YNL in selftests errors out on unknown bits. Fixes: `d3e2c7bab1` ("ethtool: rss: support setting input-xfrm via Netlink") Link: https://patch.msgid.link/20260130190311.811129-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-02 17:09:48 -08:00
Breno Leitao	1f6b527baf	ethtool: remove ETHTOOL_GRXRINGS fallback through get_rxnfc All drivers that need to report the RX ring count now implement the get_rx_ring_count callback directly. Remove the legacy fallback path that obtained this information by calling get_rxnfc with ETHTOOL_GRXRINGS. This simplifies the code and makes get_rx_ring_count the only way to retrieve the RX ring count. Note: ethtool_get_rx_ring_count() returns int to allow returning -EOPNOTSUPP, while the callback returns u32. The implicit conversion is safe since RX ring counts will not exceed INT_MAX while we are still alive. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260126-grxring_final-v1-1-0981cb24512e@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-27 17:32:56 -08:00
Jakub Kicinski	8766d61a1d	Revert "Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'" This reverts commit `77b9c4a438`, reversing changes made to `4515ec4ad5`: `931420a2fc` ("selftests/net: Add netkit container tests") `ab771c938d` ("selftests/net: Make NetDrvContEnv support queue leasing") `6be87fbb27` ("selftests/net: Add env for container based tests") `61d99ce3df` ("selftests/net: Add bpf skb forwarding program") `920da36341` ("netkit: Add xsk support for af_xdp applications") `eef51113f8` ("netkit: Add netkit notifier to check for unregistering devices") `b5ef109d22` ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create") `b5c3fa4a0b` ("netkit: Add single device mode for netkit") `0073d2fd67` ("xsk: Proxy pool management for leased queues") `1ecea95dd3` ("xsk: Extend xsk_rcv_check validation") `804bf334d0` ("net: Proxy netdev_queue_get_dma_dev for leased queues") `0caa9a8dde` ("net: Proxy net_mp_{open,close}_rxq for leased queues") `ff8889ff91` ("net, ethtool: Disallow leased real rxqs to be resized") `9e2103f361` ("net: Add lease info to queue-get response") `31127dedde` ("net: Implement netdev_nl_queue_create_doit") `a5546e18f7` ("net: Add queue-create operation") The series will conflict with io_uring work, and the code needs more polish. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-20 18:06:01 -08:00
Daniel Borkmann	ff8889ff91	net, ethtool: Disallow leased real rxqs to be resized Similar to AF_XDP, do not allow queues in a physical netdev to be resized by ethtool -L when they are leased. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260115082603.219152-5-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 11:58:49 +01:00
Maxime Chevallier	589e934d27	net: phy: Introduce PHY ports representation Ethernet provides a wide variety of layer 1 protocols and standards for data transmission. The front-facing ports of an interface have their own complexity and configurability. Introduce a representation of these front-facing ports. The current code is minimalistic and only support ports controlled by PHY devices, but the plan is to extend that to SFP as well as raw Ethernet MACs that don't use PHY devices. This minimal port representation allows describing the media and number of pairs of a BaseT port. From that information, we can derive the linkmodes usable on the port, which can be used to limit the capabilities of an interface. For now, the port pairs and medium is derived from devicetree, defined by the PHY driver, or populated with default values (as we assume that all PHYs expose at least one port). The typical example is 100M ethernet. 100BaseTX works using only 2 pairs on a Cat 5 cables. However, in the situation where a 10/100/1000 capable PHY is wired to its RJ45 port through 2 pairs only, we have no way of detecting that. The "max-speed" DT property can be used, but a more accurate representation can be used : mdi { connector-0 { media = "BaseT"; pairs = <2>; }; }; From that information, we can derive the max speed reachable on the port. Another benefit of having that is to avoid vendor-specific DT properties (micrel,fiber-mode or ti,fiber-mode). This basic representation is meant to be expanded, by the introduction of port ops, userspace listing of ports, and support for multi-port devices. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260108080041.553250-4-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-13 18:52:34 -08:00
Maxime Chevallier	3f25ff7409	net: ethtool: Introduce ETHTOOL_LINK_MEDIUM_* values In an effort to have a better representation of Ethernet ports, introduce enumeration values representing the various ethernet Mediums. This is part of the 802.3 naming convention, for example : 1000 Base T 4 \| \| \| \| \| \| \| \_ pairs (4) \| \| \___ Medium (T == Twisted Copper Pairs) \| \_______ Baseband transmission \____________ Speed Other example : 10000 Base K X 4 \| \| \_ lanes (4) \| \___ encoding (BaseX is 8b/10b while BaseR is 66b/64b) \_____ Medium (K is backplane ethernet) In the case of representing a physical port, only the medium and number of pairs should be relevant. One exception would be 1000BaseX, which is currently also used as a medium in what appears to be any of 1000BaseSX, 1000BaseCX, 1000BaseLX, 1000BaseEX, 1000BaseBX10 and some other. This was reflected in the mediums associated with the 1000BaseX linkmode. These mediums are set in the net/ethtool/common.c lookup table that maintains a list of all linkmodes with their number of pairs, medium, encoding, speed and duplex. One notable exception to this is 100BaseT Ethernet. It emcompasses 100BaseTX, which is a 2-pairs protocol but also 100BaseT4, that will also work on 4-pairs cables. As we don't make a disctinction between these, the lookup table contains 2 sets of pair numbers, indicating the min number of pairs for a protocol to work and the "nominal" number of pairs as well. Another set of exceptions are linkmodes such 100000baseLR4_ER4, where the same link mode seems to represent 100GBaseLR4 and 100GBaseER4. The macro __DEFINE_LINK_MODE_PARAMS_MEDIUMS is here used to populate the .mediums bitfield with all appropriate mediums. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260108080041.553250-3-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-13 18:52:34 -08:00
Gal Pressman	7b07be1ff1	ethtool: Avoid overflowing userspace buffer on stats query The ethtool -S command operates across three ioctl calls: ETHTOOL_GSSET_INFO for the size, ETHTOOL_GSTRINGS for the names, and ETHTOOL_GSTATS for the values. If the number of stats changes between these calls (e.g., due to device reconfiguration), userspace's buffer allocation will be incorrect, potentially leading to buffer overflow. Drivers are generally expected to maintain stable stat counts, but some drivers (e.g., mlx5, bnx2x, bna, ksz884x) use dynamic counters, making this scenario possible. Some drivers try to handle this internally: - bnad_get_ethtool_stats() returns early in case stats.n_stats is not equal to the driver's stats count. - micrel/ksz884x also makes sure not to write anything beyond stats.n_stats and overflow the buffer. However, both use stats.n_stats which is already assigned with the value returned from get_sset_count(), hence won't solve the issue described here. Change ethtool_get_strings(), ethtool_get_stats(), ethtool_get_phy_stats() to not return anything in case of a mismatch between userspace's size and get_sset_size(), to prevent buffer overflow. The returned n_stats value will be equal to zero, to reflect that nothing has been returned. This could result in one of two cases when using upstream ethtool, depending on when the size change is detected: 1. When detected in ethtool_get_strings(): # ethtool -S eth2 no stats available 2. When detected in get stats, all stats will be reported as zero. Both cases are presumably transient, and a subsequent ethtool call should succeed. Other than the overflow avoidance, these two cases are very evident (no output/cleared stats), which is arguably better than presenting incorrect/shifted stats. I also considered returning an error instead of a "silent" response, but that seems more destructive towards userspace apps. Notes: - This patch does not claim to fix the inherent race, it only makes sure that we do not overflow the userspace buffer, and makes for a more predictable behavior. - RTNL lock is held during each ioctl, the race window exists between the separate ioctl calls when the lock is released. - Userspace ethtool always fills stats.n_stats, but it is likely that these stats ioctls are implemented in other userspace applications which might not fill it. The added code checks that it's not zero, to prevent any regressions. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20251208121901.3203692-1-gal@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-12-18 12:24:25 +01:00
Yael Chemla	491c5dc98b	net: ethtool: Add support for 1600Gbps speed Add support for 1600Gbps link modes based on 200Gbps per lane [1]. This includes the adopted IEEE 802.3dj copper and optical PMDs that use 200G/lane signaling [2]. Add the following PMD types: - KR8 (backplane) - CR8 (copper cable) - DR8 (SMF 500m) - DR8-2 (SMF 2km) These modes are defined in the 802.3dj specifications. References: [1] https://www.ieee802.org/3/dj/public/23_03/opsasnick_3dj_01a_2303.pdf [2] https://www.ieee802.org/3/dj/projdoc/objectives_P802d3dj_240314.pdf Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/1763585297-1243980-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 18:21:29 -08:00
Oleksij Rempel	e6e93fb013	ethtool: netlink: add ETHTOOL_MSG_MSE_GET and wire up PHY MSE access Introduce the userspace entry point for PHY MSE diagnostics via ethtool netlink. This exposes the core API added previously and returns both capability information and one or more snapshots. Userspace sends ETHTOOL_MSG_MSE_GET. The reply carries: - ETHTOOL_A_MSE_CAPABILITIES: scale limits and timing information - ETHTOOL_A_MSE_CHANNEL_* nests: one or more snapshots (per-channel if available, otherwise WORST, otherwise LINK) Link down returns -ENETDOWN. Changes: - YAML: add attribute sets (mse, mse-capabilities, mse-snapshot) and the mse-get operation - UAPI (generated): add ETHTOOL_A_MSE_* enums and message IDs, ETHTOOL_MSG_MSE_GET/REPLY - ethtool core: add net/ethtool/mse.c implementing the request, register genl op, and hook into ethnl dispatch - docs: document MSE_GET in ethtool-netlink.rst The include/uapi/linux/ethtool_netlink_generated.h is generated from Documentation/netlink/specs/ethtool.yaml. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20251027122801.982364-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-03 18:32:27 -08:00

1 2 3 4 5 ...

586 Commits